Split a Single Column Into Multiple Columns in Pandas DataFrame Column

Split a Single Column Into Multiple Columns in Pandas DataFrame Column

Luqman Khan Apr-14, 2022 Jan-03, 2022 Pandas Pandas DataFrame

Pandas has a well-known method for splitting a string column or text column by dashes, whitespace, and return column (Series) of lists; if we talk about pandas, the term Series is called the Dataframe column.

We can use the pandas Series.str.split() function to break up strings in multiple columns around a given separator or delimiter. It’s similar to the Python string split() method but applies to the entire Dataframe column. We have the simplest way to separate the column below the following.

This method separates the Series string from the initial index.

Series.str.split(pat=None, n=-1, expand=False)

Let’s try out to understand the working of this method

# import Pandas as pd
import pandas as pd

# innitilize Dataframe
df = pd.DataFrame({'Email': ['Alex.jhon@gmail.com', 'Hamza.Azeez@gmail.com', 'Harry.barton@hotmail.com'],
				   'Number':['+44-3844556210', '+44-2245551219', 
                              '+44-1049956215'],
				   'Location':['Alameda,California','Sanford,Florida',
                		'Columbus,Georgia']})				
print("Dataframe series:\n",df)

We have created a Dataframe df with three columns, Email, Number and Location. Note that the strings in the Email column have a specific pattern. However, if you take a closer look, this column can be split into two columns. We will solve the required problem very well.

Output:

Dataframe series :
                       Email          Number            Location
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia

We will use the Series.str.split() function to separate the Number column and pass the - in split() method . Make sure you pass True to the expand keyword.

Example 1:

print("\n\nSplit 'Number' column by '-' into two individual columns :\n",
df.Number.str.split(pat='-',expand=True))

This example will split every value of series (Number) by -.

Output:

Split 'Number' column into two individual columns :
      0           1
0  +44  3844556210
1  +44  2245551219
2  +44  1049956215

If we use only expand parameter Series.str.split(expand=True)this will allow splitting whitespace but not feasible for separating with - and , or any regular expression if exists in the string, you have to pass pat parameter.

Let’s rename these split columns.

df[['Dialling Code','Cell-Number']]=df.Number.str.split('-',expand=True)
print(df)

We created two new series Dialling code and Cell-Number and assigned the values using Number series.

Output:

                      Email          Number            Location Dialling Code  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California           +44   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida           +44   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia           +44   

  Cell-Number  
0  3844556210  
1  2245551219  
2  1049956215  

Example 2:

In this example we are going to split Location series by ,.

df[['City','State']]=df.Location.str.split(',',expand=True)
print(df)

Split Location Series and store its values in individual series City and State.

Output:

                      Email          Number            Location      City  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California   Alameda   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida   Sanford   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia  Columbus   

        State  
0  California  
1     Florida  
2     Georgia 

Let’s see the last example. We will separate the full name in the Email series.

full_name=df.Email.str.split(pat='@',expand=True)
print(full_name)

Output:

              0            1
0     Alex.jhon    gmail.com
1   Hamza.Azeez    gmail.com
2  Harry.barton  hotmail.com

Now we separate first and last names by ..

df[['First Name','Last Name']]=full_name[0].str.split('.',expand=True)
print(df)

Output:

                      Email          Number            Location First Name  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California       Alex   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida      Hamza   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia      Harry   

  Last Name  
0      jhon  
1     Azeez  
2    barton 

n=-1 parameter will not work if expand=True is passed in the .split() method.

print(df['Email'].str.split('@',n=-1, expand=True))

Output:

        0           1
0  George  Washington
1   Hamza       Azeez
2   Harry      Walker

The whole example code is below.

# import Pandas as pd
import pandas as pd

# create a new Dataframe
df = pd.DataFrame({'Email': ['Alex.jhon@gmail.com', 'Hamza.Azeez@gmail.com', 'Harry.barton@hotmail.com'],
				'Number':['+44-3844556210', '+44-2245551219', '+44-1049956215'],
				'Location':['Alameda,California','Sanford,Florida','Columbus,Georgia']})

print("Dataframe series :\n",df)

print("\n\nSplit 'Number' column by '-' into two individual columns :\n",
								df.Number.str.split(pat='-',expand=True))

df[['Dialling Code','Cell-Number']]=df.Number.str.split('-',expand=True)
print(df)

df[['City','State']]=df.Location.str.split(',',expand=True)
print(df)

full_name=df.Email.str.split(pat='@',expand=True)
print(full_name)

df[['First Name','Last Name']]=full_name[0].str.split('.',expand=True)
print(df)

Output:

Dataframe series :
                       Email          Number            Location
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia


Split 'Number' column by '-' into two individual columns :
      0           1
0  +44  3844556210
1  +44  2245551219
2  +44  1049956215
                      Email          Number            Location Dialling Code  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California           +44   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida           +44   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia           +44   

  Cell-Number  
0  3844556210  
1  2245551219  
2  1049956215  
                      Email          Number            Location Dialling Code  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California           +44   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida           +44   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia           +44   

  Cell-Number      City       State  
0  3844556210   Alameda  California  
1  2245551219   Sanford     Florida  
2  1049956215  Columbus     Georgia  
              0            1
0     Alex.jhon    gmail.com
1   Hamza.Azeez    gmail.com
2  Harry.barton  hotmail.com
                      Email          Number            Location Dialling Code  \
0       Alex.jhon@gmail.com  +44-3844556210  Alameda,California           +44   
1     Hamza.Azeez@gmail.com  +44-2245551219     Sanford,Florida           +44   
2  Harry.barton@hotmail.com  +44-1049956215    Columbus,Georgia           +44   

  Cell-Number      City       State First Name Last Name  
0  3844556210   Alameda  California       Alex      jhon  
1  2245551219   Sanford     Florida      Hamza     Azeez  
2  1049956215  Columbus     Georgia      Harry    barton 

Related Article - Pandas DataFrame

  • Get Pandas DataFrame Column Headers as a List
  • Delete Pandas DataFrame Column
  • Convert Pandas Column to Datetime
  • Convert a Float to an Integer in Pandas DataFrame
  • Sort Pandas DataFrame by One Column's Values
  • Get the Aggregate of Pandas Group-By and Sum