Pandas DataFrame DataFrame.dropna() Function

Minahil Noor Jan 30, 2023
  1. Syntax of pandas.DataFrame.dropna()
  2. Example Codes: DataFrame.dropna() to Drop Row
  3. Example Codes: DataFrame.dropna() to Drop Column
  4. Example Codes: DataFrame.dropna() With how=all
  5. Example Codes: DataFrame.dropna() With a Specified Subset or Thresh
  6. Example Codes: DataFrame.dropna() With inplace=True
Pandas DataFrame DataFrame.dropna() Function

pandas.DataFrame.dropna() function removes null values (missing values) from the DataFrame by dropping the rows or columns containing the null values.

NaN (not a number) and NaT (Not a Time) represent the null values. DataFrame.dropna() detects these values and filters the DataFrame accordingly.

Syntax of pandas.DataFrame.dropna()

DataFrame.dropna(axis, how, thresh, subset, inplace)

Parameters

axis It determines the axis to be either row or column.
If it is 0 or 'index', then it drops the rows containing missing values.
If it is 1 or 'columns', then it drops the columns containing the missing values. By default, its value is 0.
how This parameter determines how the function drops rows or columns. It only accepts two strings, either any or all. By default, it’s set to any.
any drops the row or column if there is any null value in it.
all drops the row or column if all values are missing in it.
thresh It is an integer that specifies the least number of non-missing values that prevent rows or columns from dropping.
subset It is an array that has the names of rows or columns to specify the dropping procedure.
inplace It is a Boolean value that changes the caller DataFrame if set to True. By default, its value is False.

Return

It returns a filtered DataFrame with dropped rows or columns according to the passed parameters.

Example Codes: DataFrame.dropna() to Drop Row

By default, the axis is 0 i.e rows, so all the outputs have rows dropped.

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: None, 2: 80,3: None, 4: 95},
                    'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                    'Obtained Marks': {0: None, 1: 75, 2: 82, 3: 64, 4: None}})
print(dataframe)

The example DataFrame is as follows.

   Attendance    Name  Obtained Marks
0        60.0  Olivia             NaN
1         NaN    John            75.0
2        80.0   Laura            82.0
3         NaN     Ben            64.0
4        95.0   Kevin             NaN

All the parameters of this function are optional. If we pass no parameter, then the function drops all the rows containing a single null value.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)
dataframe1 = dataframe.dropna()
print(dataframe1)

Output:

   Attendance   Name  Obtained Marks
2        80.0  Laura            82.0

It has dropped all the rows that contained a single missing value.

Example Codes: DataFrame.dropna() to Drop Column

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)
dataframe1 = dataframe.dropna(axis=1)

print(dataframe1)

Output:

     Name
0  Olivia
1    John
2   Laura
3     Ben
4   Kevin

It has dropped all the columns that contained a single missing value because we set axis=1 in the DataFrame.dropna() method.

Example Codes: DataFrame.dropna() With how=all

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)

dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)

Output:

   Attendance    Name  Obtained Marks
0        60.0  Olivia             NaN
1         NaN    John            75.0
2        80.0   Laura            82.0
3         NaN     Ben            64.0
4        95.0   Kevin             NaN

The rows containing the missing values are not dropped because the how parameter has value set to all which means that all the values of the row should be null.

If all the values are missing in the specified axis, then DataFrame.dropna() method drops that axis even when the how is set to be all.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: None, 2: None, 3: None, 4: None},
    }
)

print(dataframe)
print("--------")
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)

Output:

   Attendance    Name Obtained Marks
0        60.0  Olivia           None
1         NaN    John           None
2        80.0   Laura           None
3         NaN     Ben           None
4        95.0   Kevin           None
--------
   Attendance    Name
0        60.0  Olivia
1         NaN    John
2        80.0   Laura
3         NaN     Ben
4        95.0   Kevin

Example Codes: DataFrame.dropna() With a Specified Subset or Thresh

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)

dataframe1 = dataframe.dropna(thresh=3)
print(dataframe1)

Output:

   Attendance   Name  Obtained Marks
2        80.0  Laura            82.0

The value of thresh is 3 which means that to prevent dropping, at least 3 non-empty values are required.

We could also specify the subset.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)

dataframe1 = dataframe.dropna(subset=["Attendance", "Name"])
print(dataframe1)

Output:

   Attendance    Name  Obtained Marks
0        60.0  Olivia             NaN
2        80.0   Laura            82.0
4        95.0   Kevin             NaN

It drops rows with missing values on the basis of Attendance and Name column. It doesn’t drop rows if only the values in other columns, Obtained Marks here, have missing values.

Example Codes: DataFrame.dropna() With inplace=True

DataFrame.dropna() changes the caller DataFrame in-place if inplace is set to True.

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
    }
)
dataframe1 = dataframe.dropna(inplace=True)
print(dataframe1)

Output:

None

The parameter has modified the caller DataFrame in-place and returned None.

Related Article - Pandas DataFrame