Pandas DataFrame isin Function
-
DataFrame.isin()
Method -
Filter Rows of Pandas DataFrame Using the
DataFrame.isin()
Method -
Filter Rows of Pandas DataFrame With Specified Values for a Column Using the
DataFrame.isin()
Method -
Filter Rows of Pandas DataFrame Based on Values of Multiple Columns Using the
DataFrame.isin()
Method
The Pandas DataFrame.isin()
method checks if each element in the DataFrame is in the given values.
import pandas as pd
students_df = pd.DataFrame({
'Name': ["Jonathan", "Will", "Michael", "Liva", "Sia", "Alice"],
'Age': [10, 11, 9, 10, 10, 11],
'Group': ["A", "B", "A", "A", "B", "B"],
'GPA': [3.2, 3.5, 4.0, 2.9, 4.0, 3.6]
})
print(students_df)
Output:
Name Age Group GPA
0 Jonathan 10 A 3.2
1 Will 11 B 3.5
2 Michael 9 A 4.0
3 Liva 10 A 2.9
4 Sia 10 B 4.0
5 Alice 11 B 3.6
We will use the example to explain how we can filter rows of Pandas DataFrame in Python using the DataFrame.isin()
method.
DataFrame.isin()
Method
Syntax
DataFrame.isin(values)
Parameters
values |
iterable - list , tuple , set , etc. Dictionary, Series DataFrame |
Return
It returns a DataFrame
of Booleans of the same dimension of the caller DataFrame
, indicating whether each element is contained in the input values
.
Filter Rows of Pandas DataFrame Using the DataFrame.isin()
Method
import pandas as pd
students_df = pd.DataFrame({
'Name': ["Jonathan", "Will", "Michael", "Liva", "Sia", "Alice"],
'Age': [10, 11, 9, 10, 10, 11],
'Group': ["A", "B", "A", "A", "B", "B"],
'GPA': [3.2, 3.5, 4.0, 2.9, 4.0, 3.6]
})
print("The initial DataFrame is:")
print(students_df, "\n")
boolean_indicies = students_df["Group"].isin(["A"])
filtered_df = students_df[boolean_indicies]
print("The DataFrame of students from Group A is:")
print(filtered_df, "\n")
Output:
The initial DataFrame is:
Name Age Group GPA
0 Jonathan 10 A 3.2
1 Will 11 B 3.5
2 Michael 9 A 4.0
3 Liva 10 A 2.9
4 Sia 10 B 4.0
5 Alice 11 B 3.6
The DataFrame of students from Group A is:
Name Age Group GPA
0 Jonathan 10 A 3.2
2 Michael 9 A 4.0
3 Liva 10 A 2.9
It applies the isin()
method to the Group
column of the students_df
DataFrame and the method returns a series with boolean values. The series’s value is True
if the Group
column for the row is A
and otherwise it is False
.
Then we use the series boolean_indicies
to filter out rows from the students_df
DataFrame. The rows with only True
value for the boolean_indicies
series are selected.
Filter Rows of Pandas DataFrame With Specified Values for a Column Using the DataFrame.isin()
Method
import pandas as pd
students_df = pd.DataFrame({
'Name': ["Jonathan", "Will", "Michael", "Liva", "Sia", "Alice"],
'Age': [10, 11, 9, 10, 10, 11],
'Group': ["A", "B", "A", "A", "B", "B"],
'GPA': [3.2, 3.5, 4.0, 2.9, 4.0, 3.6]
})
print("The initial DataFrame is:")
print(students_df, "\n")
boolean_indicies = students_df["Age"].isin([10, 11])
filtered_df = students_df[boolean_indicies]
print("The DataFrame of students with age greater than 10 years is:")
print(filtered_df, "\n")
Output:
The initial DataFrame is:
Name Age Group GPA
0 Jonathan 10 A 3.2
1 Will 11 B 3.5
2 Michael 9 A 4.0
3 Liva 10 A 2.9
4 Sia 10 B 4.0
5 Alice 11 B 3.6
The DataFrame of students with age greater than 10 years is:
Name Age Group GPA
0 Jonathan 10 A 3.2
1 Will 11 B 3.5
3 Liva 10 A 2.9
4 Sia 10 B 4.0
5 Alice 11 B 3.6
It filters all the rows from the DataFrame students_df
having the value of Age
column 10
or 11
.
Filter Rows of Pandas DataFrame Based on Values of Multiple Columns Using the DataFrame.isin()
Method
import pandas as pd
students_df = pd.DataFrame({
'Name': ["Jonathan", "Will", "Michael", "Liva", "Sia", "Alice"],
'Age': [10, 11, 9, 10, 10, 11],
'Group': ["A", "B", "A", "A", "B", "B"],
'GPA': [3.2, 3.5, 4.0, 2.9, 4.0, 3.6]
})
print("The initial DataFrame is:")
print(students_df, "\n")
boolean_indicies_group = students_df["Group"].isin(["B"])
boolean_indicies_gpa = students_df["GPA"].isin([4.0])
filtered_df = students_df[boolean_indicies_group & boolean_indicies_gpa]
print("The DataFrame of students in Group B with GPA 4.0:")
print(filtered_df, "\n")
Output:
The initial DataFrame is:
Name Age Group GPA
0 Jonathan 10 A 3.2
1 Will 11 B 3.5
2 Michael 9 A 4.0
3 Liva 10 A 2.9
4 Sia 10 B 4.0
5 Alice 11 B 3.6
The DataFrame of students in Group B with GPA 4.0:
Name Age Group GPA
4 Sia 10 B 4.0
It selects all the rows in the students_df
DataFrame, which have value B
for the Group
column and value 4.0
for the GPA
column.