How to Get Substring in Pandas

Fariba Laiq Feb 02, 2024
  1. Get Substring From Pandas DataFrame Column Values
  2. Extract the First N Characters From a String
  3. Extract the Last N Characters From a String
  4. Extract Any Substring From the Middle of a String
How to Get Substring in Pandas

Pandas is an open-source data analysis library in Python. It provides many built-in methods to perform operations on numerical data.

In this guide, we will get a substring (part of a string) from the values of a pandas data frame column through different approaches. It could be helpful when we want to extract some meaningful substring from a string.

Get Substring From Pandas DataFrame Column Values

We will use string slicing methods to achieve this task. The str.slice() method returns a portion of a string without modifying the actual string.

Syntax:

# Python 3.x
df.column_name.str.slice(start_index, end_index)

We can also do string slicing using the str accessor with square brackets([]).

# Python 3.x
df.column_name.str[start_index:end_index]

Extract the First N Characters From a String

We have a Pandas data frame in the following example consisting of the complete processor name. If we want to get the substring intel (first five characters), we will specify 0 and 5 as start and end indexes, respectively.

We can also mention only the end index if we use the square bracket method because they have the same meaning.

Example Code:

# Python 3.x
import pandas as pd
import numpy as np

df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Name"] = df.Processor.str.slice(0, 5)
display(df)

Output:

Extract First N Characters From a String

Extract the Last N Characters From a String

If we want to extract the brand modifier (last two characters) from the string, we will use negative indexing in the string slicing. We will pass the start index -2 (the second last character’s index) and leave the end index empty.

It will automatically take the last two characters from the string.

Example Code:

# Python 3.x
import pandas as pd
import numpy as np

df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Modifier"] = df.Processor.str.slice(
    -2,
)
display(df)

Output:

Extract Last N Characters From a String

Extract Any Substring From the Middle of a String

To get a substring from the middle of a string, we have to specify the start and end index in string slicing. Here, if we want to get the word Core, we will mention 6 and 10 as start and end indexes, respectively.

It will get the substring between(and inclusive of) the specified positions.

Example Code:

# Python 3.x
import pandas as pd
import numpy as np

df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Series"] = df.Processor.str[6:10]
display(df)

Output:

Extract Any Substring From Middle of a String

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas String