How to Split a Pandas DataFrame Column by a Delimiter

Mehvish Ashiq Feb 02, 2024
How to Split a Pandas DataFrame Column by a Delimiter

The ability to easily parse the provided data and create new columns containing more specific information is valuable. One such scenario is when we split a Pandas dataframe column by a delimiter and make multiple columns from it.

For example, we have a column containing values separated by a delimiter like A, D, G, and we want to split these values into multiple columns holding A, D, and G separately.

This tutorial will teach us how to use the str.split() method to split a Pandas dataframe column by a delimiter.

Split a Pandas DataFrame Column by a Delimiter

Before moving towards various examples demonstrating the use of the str.split() method, it is important to understand its syntax and possible parameters.

Syntax:

Series.str.split(pat=None, n=-1, expand=False, regex=None)

Following is a brief explanation of the parameters.

Parameter Explanation
pat It must be a string-type value or a regular expression that will be used as a separator to split the string. Whitespace will be used as a delimiter/separator if it is not specified.
n It denotes the maximum number of splits in the output. The n=-1, n=0, and n=None means return all splits; by default, n is set to -1.
expand It takes a Boolean value, and if True, it results in a dataframe holding different values in different columns; otherwise, a series with a list of strings. By default, expand is set to False.
regex (New in version 1.4.0) It accepts a Boolean value that tells if the specified pattern is a regex or not; by default, regex is set to None.
True means the specified pattern is a regular expression, while False means a string literal. It treats pat as a string literal if set to None and pat length is 1. On the other hand, if regex=None and pat length is not 1, it treats pat as a regular expression.

We will get a ValueError if pat is the compiled regular expression and regex is set to False. You may read more on that here.

Now, let’s jump into code examples to practice it.

Split the full_name Column Into Multiple Columns (first_name & last_name)

Example Code:

import pandas as pd

df = pd.DataFrame(
    {
        "full_name": [
            "Mehvish Ashiq",
            "Hania Arif",
            "Zobia Shakeel",
            "Tahir Mukhtar",
            "Mazhar Ali",
        ]
    }
)

print("Before Splitting:\n")
print(df)

df[["first_name", "last_name"]] = df["full_name"].str.split(" ", expand=True)
print("\nAfter Splitting:\n")
print(df)

OUTPUT:

Before Splitting:

       full_name
0  Mehvish Ashiq
1     Hania Arif
2  Zobia Shakeel
3  Tahir Mukhtar
4     Mazhar Ali

After Splitting:

       full_name first_name last_name
0  Mehvish Ashiq    Mehvish     Ashiq
1     Hania Arif      Hania      Arif
2  Zobia Shakeel      Zobia   Shakeel
3  Tahir Mukhtar      Tahir   Mukhtar
4     Mazhar Ali     Mazhar       Ali

We can see that a dataframe in the above code fence contains a column (full_name) with random first names and last names separated by a single space.

We split full_name into two columns (first_name and last_name) containing the first name and last name separately. To do this, we used vectorized str.split() method with expand=True.

Split the full_name Column Into a New Column Having a List of Split Values

Example Code:

import pandas as pd

df = pd.DataFrame(
    {
        "full_name": [
            "Mehvish Ashiq",
            "Hania Arif",
            "Zobia Shakeel",
            "Tahir Mukhtar",
            "Mazhar Ali",
        ]
    }
)

print("Before Splitting:\n")
print(df)

df["[first_name,last_name]"] = df["full_name"].str.split(" ", expand=False)
print("\nAfter Splitting:\n")
print(df)

OUTPUT:

Before Splitting:

       full_name
0  Mehvish Ashiq
1     Hania Arif
2  Zobia Shakeel
3  Tahir Mukhtar
4     Mazhar Ali

After Splitting:

       full_name [first_name,last_name]
0  Mehvish Ashiq       [Mehvish, Ashiq]
1     Hania Arif          [Hania, Arif]
2  Zobia Shakeel       [Zobia, Shakeel]
3  Tahir Mukhtar       [Tahir, Mukhtar]
4     Mazhar Ali          [Mazhar, Ali]

This code fence is similar to the previous one except for one difference. Here, we set the expand to False to create a column containing a list of split values.

Split the full_name Column by Specifying the Maximum Number of Splits

Example Code:

import pandas as pd

df = pd.DataFrame(
    {
        "full_name": [
            "Mehvish Ashiq Hussain",
            "Hania Arif Ali",
            "Zobia Shakeel Amjad",
            "Tahir Mukhtar Ahmed",
            "Mazhar Ali Azhar",
        ]
    }
)

print("Before Splitting:\n")
print(df)


print("\nAfter Splitting:\n")
print(df["full_name"].str.split(" ", n=1, expand=True))

OUTPUT:

Before Splitting:

               full_name
0  Mehvish Ashiq Hussain
1         Hania Arif Ali
2    Zobia Shakeel Amjad
3    Tahir Mukhtar Ahmed
4       Mazhar Ali Azhar

After Splitting:

         0              1
0  Mehvish  Ashiq Hussain
1    Hania       Arif Ali
2    Zobia  Shakeel Amjad
3    Tahir  Mukhtar Ahmed
4   Mazhar      Ali Azhar

Here, we set n to 1 to get a maximum of one split. In this example, we are focused on getting the first names only.

Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Related Article - Pandas DataFrame Column