How to Filter DataFrame Rows Based on the Date in Pandas

Suraj Joshi Feb 02, 2024
  1. Select Rows Between Two Dates With Boolean Mask
  2. pandas.DataFrame.query() to Select DataFrame Rows Between Two Dates
  3. pandas.DataFrame.isin() to Select DataFrame Rows Between Two Dates
  4. pandas.Series.between() to Select DataFrame Rows Between Two Dates
How to Filter DataFrame Rows Based on the Date in Pandas

We can filter DataFrame rows based on the date in Pandas using the boolean mask with the loc method and DataFrame indexing. We could also use query, isin, and between methods for DataFrame objects to select rows based on the date in Pandas.

Select Rows Between Two Dates With Boolean Mask

To filter DataFrame rows based on the date in Pandas using the boolean mask, we at first create boolean mask using the syntax:

mask = (df["col"] > start_date) & (df["col"] <= end_date)

Where start_date and end_date are both in datetime format, and they represent the start and end of the range from which data has to be filtered. Then we select the part of DataFrame that lies within the range using the df.loc() method.

import pandas as pd
import numpy as np
import datetime

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
df = pd.DataFrame({"Joined date": pd.to_datetime(list_of_dates)}, index=employees)

mask = (df["Joined date"] > "2019-06-1") & (df["Joined date"] <= "2020-02-05")
filtered_df = df.loc[mask]
print(filtered_df)

Output:

        Joined date
Hisila   2019-11-20
Shristi  2020-01-02
Zeppy    2020-02-05

We can simplify the above process using the integrated df.loc[start_date:end_date] method by setting the date column as an index column.

import pandas as pd
import numpy as np
import datetime

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df = df.set_index(["Joined date"])

filtered_df = df.loc["2019-06-1":"2020-02-05"]
print(filtered_df)

Output:

                Name  Salary
Joined date                 
2019-11-20    Hisila     200
2020-01-02   Shristi     400
2020-02-05     Zeppy     300

pandas.DataFrame.query() to Select DataFrame Rows Between Two Dates

We can also filter DataFrame rows based on the date in Pandas using the pandas.DataFrame.query() method. The method returns a DataFrame resulting from the provided query expression.

import pandas as pd
import numpy as np
import datetime

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)

filtered_df = df.query("Joined_date >= '2019-06-1' and Joined_date <='2020-02-05'")
print(filtered_df)

Output:

      Name Joined_date  Salary
0   Hisila  2019-11-20     200
1  Shristi  2020-01-02     400
2    Zeppy  2020-02-05     300

pandas.DataFrame.isin() to Select DataFrame Rows Between Two Dates

pandas.DataFrame.isin() method returns the Dataframe of booleans which represent whether the element lies in the specified range or not. We can use this method to filter DataFrame rows based on the date in Pandas.

import pandas as pd
import numpy as np
import datetime

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)


filtered_df = df[df["Joined_date"].isin(pd.date_range("2019-06-1", "2020-02-05"))]
print(filtered_df)

Output:

      Name Joined_date  Salary
0   Hisila  2019-11-20     200
1  Shristi  2020-01-02     400
2    Zeppy  2020-02-05     300

pandas.date_range() returns a fixed DateTimeIndex. Its first parameter is the starting date, and the second parameter is the ending date.

pandas.Series.between() to Select DataFrame Rows Between Two Dates

We can also use pandas.Series.between() to filter DataFrame based on date.The method returns a boolean vector representing whether series element lies in the specified range or not. We pass thus obtained the boolean vector to loc() method to extract DataFrame.

import pandas as pd
import numpy as np
import datetime

list_of_dates = [
    "2019-11-20",
    "2020-01-02",
    "2020-02-05",
    "2020-03-10",
    "2020-04-16",
    "2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
    {"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)

filtered_df = df.loc[df["Joined_date"].between("2019-06-1", "2020-02-05")]
print(filtered_df)

Output:

      Name Joined_date  Salary
0   Hisila  2019-11-20     200
1  Shristi  2020-01-02     400
2    Zeppy  2020-02-05     300
Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DateTime