How to Filter Rows That Contain a Specific String in Pandas

Fariba Laiq Feb 15, 2024
  1. Install Prerequisite Libraries
  2. Create a Pandas DataFrame
  3. Use str.contains() to Filter Rows That Contain a Specific String
  4. Use str.contains() to Filter Rows That Contain a String in a List
How to Filter Rows That Contain a Specific String in Pandas

The Pandas library is a complete tool for handling text data in addition to numbers. You’ll want to exclude text input from many data analysis applications and machine learning exploration/pre-processing.

Dataframes in Python is a primary data structure present in the Pandas module. These data structures are used for storing and processing data in tabular forms.

One such process performed on data stored in tabular form is filtering the Dataframe by substring criteria so that relevant information can be extracted from it. This article will go through a step-by-step procedure to perform this same operation.

Install Prerequisite Libraries

To begin filtering the Pandas dataframe, we first need to install the Pandas library. We can quickly achieve this by running the following command in the terminal of choice:

pip install pandas

It is also essential to ensure we work with the correct Python version. In this article, we are using version 3.10.4.

We can check the currently installed Python version by running the following command in the terminal:

python --version

Create a Pandas DataFrame

To perform the dataframe filtering operation, we will need an example dataframe; hence, we will generate a dataframe for our article using the code below. It shows us the names of five students being graded for two subjects, Biology and Chemistry, out of 100.

Example Code:

import pandas as pd

data = {
    "Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
    "Biology": [68, 73, 87, 58, 78],
    "Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data)
print(data_frame)

So, the code above is pretty straightforward. We begin by importing the Pandas library and then initialize the data variable as a dictionary containing the information we want to insert in our resulting dataframe.

We then use the DataFrame() method in the Pandas module to generate our dataframe by passing the data dictionary into the abovementioned technique.

The following dataframe is generated when we run the code.

Output:

Pandas DataFrame Example

Use str.contains() to Filter Rows That Contain a Specific String

Now that we’ve created our dataframe, we can move on to the filtering step. Let’s suppose we want to filter out the data for the student Suharwardy; the result should be all information stored against Suharwardy.

We can perform this operation using the str.contains() method. In the snippet below, we have accessed the dataframe column Student_Name and, using the str.contains() method, accessed the information stored against the name Suharwardy.

Example Code:

import pandas as pd

data = {
    "Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
    "Biology": [68, 73, 87, 58, 78],
    "Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame["Student_Name"].str.contains("Suharwardy")]
print(df)

Output:

Pandas Filter Rows Containing a String Using str.contains

An even more straightforward and intuitive way of performing this operation could be using the dot operator to access the Student_Name column. We get the same results.

Example Code:

import pandas as pd

data = {
    "Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
    "Biology": [68, 73, 87, 58, 78],
    "Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame.Student_Name.str.contains("Suharwardy")]
print(df)

Output:

Pandas Filter Rows Containing a String Using str.contains and Dot Operator

The str.contains() method also has the regex parameter, which you can use to get faster results by setting it as False.

Example Code:

import pandas as pd
import regex as regex

data = {
    "Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
    "Biology": [68, 73, 87, 58, 78],
    "Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame.Student_Name.str.contains("Suharwardy", regex=False)]
print(df)

Output:

Pandas Filter Rows Containing a String Using str.contains and regex

This is how we can filter a Pandas dataframe using the str.contains() method and specify the particulars of information we want to extract.

Use str.contains() to Filter Rows That Contain a String in a List

The below code shows how to filter for dataframe rows that contain ID1 or ID2 in the ID column.

Example Code:

import pandas as pd

d1 = {
    "ID": [
        "ID1",
        "ID1",
        "ID2",
        "ID2",
        "ID3",
        "ID3",
    ],
    "Names": ["Harry", "Petter", "Daniel", "Ron", "Sofia", "Kelvin"],
    "marks": [70, 80, 90, 70, 60, 90],
}
df = pd.DataFrame(d1)
print(df)
s = df[df["ID"].str.contains("ID1|ID2")]
print("use of str.contains() : ")
print(s)

Output:

Pandas Filter Rows Containing a String in a List Using str.contains

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas String