Pandas Drop Rows With NaN

Pandas Drop Rows With NaN

  1. Pandas Drop Rows With NaN Using the DataFrame.notna() Method
  2. Pandas Drop Rows Only With NaN Values for All Columns Using DataFrame.dropna() Method
  3. Pandas Drop Rows Only With NaN Values for a Particular Column Using DataFrame.dropna() Method
  4. Pandas Drop Rows With NaN Values for Any Column Using DataFrame.dropna() Method

This tutorial explains how we can drop all the rows with NaN values using DataFrame.notna() and DataFrame.dropna() methods.

We will use the DataFrame in the example code below.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'],
    'Age':  [19, None, 18, 21, None],
    'Income($)': [4000, 5000, None, 3500, None],
    'Expense($)': [3000, 2000, 2500, 25000, None]

})

print(data)

Output:

      Name   Age  Income($)  Expense($)
0    Alice  19.0     4000.0      3000.0
1   Steven   NaN     5000.0      2000.0
2  Neesham  18.0        NaN      2500.0
3    Chris  21.0     3500.0     25000.0
4    Alice   NaN        NaN         NaN

Pandas Drop Rows With NaN Using the DataFrame.notna() Method

The DataFrame.notna() method returns a boolean object with the same number of rows and columns as the caller DataFrame. If an element is not NaN, it gets mapped to the True value in the boolean object, and if an element is a NaN, it gets mapped to the False value.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'],
    'Age':  [19, None, 18, 21, None],
    'Income($)': [4000, 5000, None, 3500, None],
    'Expense($)': [3000, 2000, 2500, 25000, None]

})
print("Initial DataFrame:")
print(data)

print("")

data = data[data['Income($)'].notna()]
print("DataFrame after removing rows with NaN value in Income Field:")
print(data)

Output:

Initial DataFrame:
      Name   Age  Income($)  Expense($)
0    Alice  19.0     4000.0      3000.0
1   Steven   NaN     5000.0      2000.0
2  Neesham  18.0        NaN      2500.0
3    Chris  21.0     3500.0     25000.0
4    Alice   NaN        NaN         NaN

DataFrame after removing rows with NaN value in Income Field:
     Name   Age  Income($)  Expense($)
0   Alice  19.0     4000.0      3000.0
1  Steven   NaN     5000.0      2000.0
3   Chris  21.0     3500.0     25000.0

Here, we apply the notna() method to the column Income($), which returns a series object with True or False values depending upon the column’s values. When we pass the boolean object as an index to the original DataFrame, we only get rows without NaN values for the Income($) column.

Pandas Drop Rows Only With NaN Values for All Columns Using DataFrame.dropna() Method

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Id': [621, 645, 210, 345, None],
    'Age':  [19, None, 18, 21, None],
    'Income($)': [4000, 5000, None, 3500, None],
    'Expense($)': [3000, 2000, 2500, 25000, None]

})
print("Initial DataFrame:")
print(data)

print("")

data = data.dropna(how='all')
print("DataFrame after removing rows with NaN value in All Columns:")
print(data)

Output:

Initial DataFrame:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
1  645.0   NaN     5000.0      2000.0
2  210.0  18.0        NaN      2500.0
3  345.0  21.0     3500.0     25000.0
4    NaN   NaN        NaN         NaN

DataFrame after removing rows with NaN value in All Columns:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
1  645.0   NaN     5000.0      2000.0
2  210.0  18.0        NaN      2500.0
3  345.0  21.0     3500.0     25000.0

It removes only the rows with NaN values for all fields in the DataFrame. We set how='all' in the dropna() method to let the method drop row only if all column values for the row is NaN.

Pandas Drop Rows Only With NaN Values for a Particular Column Using DataFrame.dropna() Method

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Id': [621, 645, 210, 345, None],
    'Age':  [19, None, 18, 21, None],
    'Income($)': [4000, 5000, None, 3500, None],
    'Expense($)': [3000, 2000, 2500, 25000, None]

})
print("Initial DataFrame:")
print(data)

print("")

data = data.dropna(subset=["Id"])
print("DataFrame after removing rows with NaN value in Id Column:")
print(data)

Output:

Initial DataFrame:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
1  645.0   NaN     5000.0      2000.0
2  210.0  18.0        NaN      2500.0
3  345.0  21.0     3500.0     25000.0
4    NaN   NaN        NaN         NaN

DataFrame after removing rows with NaN value in Id Column:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
1  645.0   NaN     5000.0      2000.0
2  210.0  18.0        NaN      2500.0
3  345.0  21.0     3500.0     25000.0

It drops all the columns in the DataFrame, which have NaN value only in the Id Column.

Pandas Drop Rows With NaN Values for Any Column Using DataFrame.dropna() Method

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Id': [621, 645, 210, 345, None],
    'Age':  [19, None, 18, 21, None],
    'Income($)': [4000, 5000, None, 3500, None],
    'Expense($)': [3000, 2000, 2500, 25000, None]

})
print("Initial DataFrame:")
print(data)

print("")

data = data.dropna()
print("DataFrame after removing rows with NaN value in any column:")
print(data)

Output:

Initial DataFrame:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
1  645.0   NaN     5000.0      2000.0
2  210.0  18.0        NaN      2500.0
3  345.0  21.0     3500.0     25000.0
4    NaN   NaN        NaN         NaN

DataFrame after removing rows with NaN value in any column:
      Id   Age  Income($)  Expense($)
0  621.0  19.0     4000.0      3000.0
3  345.0  21.0     3500.0     25000.0

By default, the dropna() method will remove all the row which have at least one NaN value.

Related Article - Pandas DataFrame Row

  • Get the Row Count of a Pandas DataFrame
  • Randomly Shuffle DataFrame Rows in Pandas
  • Filter Dataframe Rows Based on Column Values in Pandas
  • Iterate Through Rows of a DataFrame in Pandas
  • Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas
  • Find Duplicate Rows in a DataFrame Using Pandas
  • Related Article - Pandas NaN

  • Replace All the NaN Values With Zeros in a Column of a Pandas DataFrame
  • Check if NaN Exisits in Pandas DataFrame
  • Pandas fillna Column