How to Filter Dataframe Rows Based on Column Values in Pandas

Suraj Joshi Feb 02, 2024
  1. Select Pandas Rows Based on Specific Column Value
  2. Select Pandas Rows Which Do Not Contain Specific Column Value
  3. Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value
  4. Select Pandas Rows Based on Multiple Column Values
  5. Select DataFrame Rows With Multiple Conditions
How to Filter Dataframe Rows Based on Column Values in Pandas

We can select rows of DataFrame based on single or multiple column values. We can also get rows from DataFrame satisfying or not satisfying one or more conditions. This can be accomplished using boolean indexing, positional indexing, label indexing, and query() method.

Select Pandas Rows Based on Specific Column Value

We can select pandas rows from a DataFrame that contains or does not contain the specific value for a column. It is widely used in filtering the DataFrame based on column value.

Select Pandas Rows Which Contain Specific Column Value

Filter Using Boolean Indexing

In Boolean indexing, we at first generate a mask which is just a series of boolean values representing whether the column contains the specific element or not.

df_mask = df["col_name"] == "specific_value"

We then apply this mask to our original DataFrame to filter the required values.

filtered_df = df[df_mask]

This returns the filtered DataFrame containing only rows that have the specific_value for column col_name.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] == 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

This gives all the rows in the df whose Sales values is 300.

Filter Using Positional Indexing

It is similar to boolean indexing but involves one extra step. In this method, we first create a boolean mask and then find positions at which boolean mask has True value. Then we use pass all the positions with True value in the mask to the iloc() method so that all the desired rows are only selected.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] == 300
positions = np.flatnonzero(df_mask)
filtered_df = df.iloc[positions]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

This also gives all the rows in the df whose Sales values is 300.

Pandas Chaining

We also can use Pandas Chaining to filter pandas dataframe filter by column value. In this method, we use pandas.DataFrame.eq() method for the DataFrame column whose values are to be checked to compare element-wise equality in DataFrame.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

filtered_df = df[df.Sales.eq(300)]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

pandas.DataFrame.query()

We could use pandas.DataFrame.query() to select rows by column value in Pandas.

import pandas as pd
import numpy as np

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

filtered_df = df.query("Sales == 300")
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

If we wish to update the existing DataFrame, we can set inplace=True in query method.

Select Pandas Rows Which Do Not Contain Specific Column Value

The method to select Pandas rows that don’t contain specific column value is similar to that in selecting Pandas rows with specific column value. The only thing we need to change is the condition that the column does not contain specific value by just replacing == with != when creating masks or queries.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] != 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
0  April-10    200      3
2  April-12    400      2
3  April-13    200      4

This selects all the rows of df whose Sales values are not 300.

Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value

To select Pandas rows with column values greater than or smaller than specific value, we use operators like >, <=, >= while creating masks or queries.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

df_mask = df["Sales"] >= 300
filtered_df = df[df_mask]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
2  April-12    400      2
4  April-14    300      3
5  April-16    300      2

This results in DataFrame with values of Sales greater than or equal to 300.

Select Pandas Rows Based on Multiple Column Values

We have introduced methods of selecting rows based on specific values of column in DataFrame. In this section, we will discuss methods to select Pandas rows based on multiple column values.

Select Pandas Rows Which Contain Any One of Multiple Column Values

To select Pandas rows that contain any one of multiple column values, we use pandas.DataFrame.isin(values) which returns DataFrame of booleans showing whether each element in the DataFrame is contained in values or not. The DataFrame of booleans thus obtained can be used to select rows.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values = [200, 400]
filtered_df = df[df.Sales.isin(values)]
print(filtered_df)

Output:

       Date  Sales  Price
0  April-10    200      3
2  April-12    400      2
3  April-13    200      4

It filters all the rows from DataFrame whose Sales value is either 200 or 400.

Select Pandas Rows Which Does Not Contain Any One of Multiple Specified Column Values

To select the rows of a DataFrame which does not contain any one of multiple specified column values, we will negate the DataFrame of booleans returned from pandas.DataFrame.isin(values) by placing ~ sign at the front.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values = [200, 400]
filtered_df = df[~df.Sales.isin(values)]
print(filtered_df)

Output:

       Date  Sales  Price
1  April-11    300      1
4  April-14    300      3
5  April-16    300      2

It filters all the rows from DataFrame whose Sales value is neither 200 nor 400.

Select DataFrame Rows With Multiple Conditions

If we want to filter rows considering row values of multiple columns, we make multiple conditions and combine them with & operators. Now, the row is only selected when it satisfies conditions for all the columns.

import pandas as pd

dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]

df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})

values_1 = [200, 400]
values_2 = [2, 3]
filtered_df = df[df.Sales.isin(values_1) & ~df.Price.isin(values_2)]
print(filtered_df)

Output:

       Date  Sales  Price
3  April-13    200      4

It filters all the rows from DataFrame whose Sales value is either 200 or 400 and Price value is neither 2 nor 3. The row in the output only satisfies both conditions in entire DataFrame.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame

Related Article - Pandas DataFrame Row