How to Get the Substring of a Column in Pandas

Preet Sanghavi Feb 02, 2024
  1. Get the Substring of a Column in Pandas
  2. Use the str.slice() Function to Get the Substring of a Column in Pandas
  3. Use Square Brackets to Get the Substring of a Column in Pandas
  4. Use the str.extract() Function to Get the Substring of a Column in Pandas
How to Get the Substring of a Column in Pandas

In this tutorial, we will learn how to obtain the substring of the column in Pandas.

Get the Substring of a Column in Pandas

This extraction can be helpful in many scenarios when working along with data. For instance, consider a case where we want to create a username from the user’s first name.

We will use multiple approaches to perform this.

To begin with, let us create a Pandas data frame on which we will work throughout our tutorial. We will include a name column in our data frame and will aim to extract a username from that column.

Code:

import pandas as pd

dict = {"Name": ["Shivesh Jha", "Sanay Shah", "Rutwik Sonawane"]}
df = pd.DataFrame.from_dict(dict)

Let us have a look at our data frame.

print(df)

Output:

              Name
0      Shivesh Jha
1       Sanay Shah
2  Rutwik Sonawane

Let us now go through various ways we can employ to obtain substring from the column.

Use the str.slice() Function to Get the Substring of a Column in Pandas

In this approach, we will use the str.slice() function to obtain the first three characters from the name column and use it as the username for a particular user. In the slice() function, we need to pass the string’s start and end indices that we want to extract.

We will use the below code to perform this function.

df["UserName"] = df["Name"].str.slice(0, 3)
print(df)

Let us now look at our updated data frame where we have a new username column containing the first three characters of the name column.

Output:

              Name UserName
0      Shivesh Jha      Shi
1       Sanay Shah      San
2  Rutwik Sonawane      Rut

We can see in the output that we have successfully extracted the first three characters from our name column and used them in the new username column.

Use Square Brackets to Get the Substring of a Column in Pandas

We use the square brackets to access the string and obtain the characters we wish to extract in this approach. We use the below code to perform this action.

df["UserName"] = df["Name"].str[:3]

Output:

              Name UserName
0      Shivesh Jha      Shi
1       Sanay Shah      San
2  Rutwik Sonawane      Rut

We can see in this code that we have obtained the new column with the first 3 characters of the existing column.

Use the str.extract() Function to Get the Substring of a Column in Pandas

This approach will extract the user’s surname from the name. We will use the str.extract() function to implement this.

Code:

df["LastName"] = df.Name.str.extract(r"\b(\w+)$", expand=True)

Now, let us check the updated data frame.

print(df)

Output:

              Name  LastName
0      Shivesh Jha       Jha
1       Sanay Shah      Shah
2  Rutwik Sonawane  Sonawane

As seen above, we have successfully obtained the desired results. Therefore, we can get the substring of a column in Pandas using the above techniques.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas DataFrame Column