How to Get Substring in Pandas
-
Get
SubstringFrom PandasDataFrameColumn Values -
Extract the
First NCharacters From a String -
Extract the
Last NCharacters From a String -
Extract
Any SubstringFrom the Middle of a String
Pandas is an open-source data analysis library in Python. It provides many built-in methods to perform operations on numerical data.
In this guide, we will get a substring (part of a string) from the values of a pandas data frame column through different approaches. It could be helpful when we want to extract some meaningful substring from a string.
Get Substring From Pandas DataFrame Column Values
We will use string slicing methods to achieve this task. The str.slice() method returns a portion of a string without modifying the actual string.
Syntax:
# Python 3.x
df.column_name.str.slice(start_index, end_index)
We can also do string slicing using the str accessor with square brackets([]).
# Python 3.x
df.column_name.str[start_index:end_index]
Extract the First N Characters From a String
We have a Pandas data frame in the following example consisting of the complete processor name. If we want to get the substring intel (first five characters), we will specify 0 and 5 as start and end indexes, respectively.
We can also mention only the end index if we use the square bracket method because they have the same meaning.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Name"] = df.Processor.str.slice(0, 5)
display(df)
Output:

Extract the Last N Characters From a String
If we want to extract the brand modifier (last two characters) from the string, we will use negative indexing in the string slicing. We will pass the start index -2 (the second last character’s index) and leave the end index empty.
It will automatically take the last two characters from the string.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Modifier"] = df.Processor.str.slice(
-2,
)
display(df)
Output:

Extract Any Substring From the Middle of a String
To get a substring from the middle of a string, we have to specify the start and end index in string slicing. Here, if we want to get the word Core, we will mention 6 and 10 as start and end indexes, respectively.
It will get the substring between(and inclusive of) the specified positions.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Series"] = df.Processor.str[6:10]
display(df)
Output:

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn