Get Dummies in Pandas

Get Dummies in Pandas

  1. pandas.get_dummies() Method
  2. Create DataFrame With Dummy Variable Columns Using pandas.get_dummies() Method
  3. Set columns to Create Dummy Variables for Specified Columns Only
  4. Set prefix to Change the Default Name of Dummy Columns

This tutorial explains how we can generate DataFrame with dummy or indicator variables from DataFrame with categorical columns.

pandas.get_dummies() Method

pandas.get_dummies(data, 
                   prefix=None, 
                   prefix_sep='_', 
                   dummy_na=False, 
                   columns=None, 
                   sparse=False, 
                   drop_first=False, 
                   dtype=None)

Create DataFrame With Dummy Variable Columns Using pandas.get_dummies() Method

import pandas as pd

students_df = pd.DataFrame({
    'Id': [302, 504, 708, 103, 303],
    'Name': ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
    'Sex': ['Male', 'Female', 'Male', 'Male', 'Female'],
})

students_df_dummies = pd.get_dummies(students_df)

print("The original DataFrame is:")
print(students_df, "\n")

print("DataFrame with Dummies:")
print(students_df_dummies)

Output:

The original DataFrame is:
    Id       Name     Sex
0  302       Mike    Male
1  504  Christine  Female
2  708        Rob    Male
3  103     Daniel    Male
4  303   Jennifer  Female 

DataFrame with Dummies:
    Id  Name_Christine  Name_Daniel  Name_Jennifer  Name_Mike  Name_Rob  Sex_Female  Sex_Male
0  302               0            0              0          1         0           0         1
1  504               1            0              0          0         0           1         0
2  708               0            0              0          0         1           0         1
3  103               0            1              0          0         0           0         1
4  303               0            0              1          0         0           1         0

It generates a DataFrame with dummy column names formed by concatenating the original column name and each unique value for the column.

For the Name column, we have five unique values, and hence the Name splits to Name_ plus each unique name in the DataFrame. The dummy columns’ values will be 1 or 0 based on the value in the initial DataFrame.

The row with value of Name column Daniel in the students_df DataFrame will have value 1 for the Name_Daniel column in the students_df_dummies DataFrame while all other name values will have value 0 for the Name_Daniel column in the students_df_dummies DataFrame.

Set columns to Create Dummy Variables for Specified Columns Only

By default, the get_dummies() method will create DataFrame with dummy columns for each column with dtypes object or category. We can set pass the list of the columns as columns argument to specify particular columns.

import pandas as pd

students_df = pd.DataFrame({
    'Id': [302, 504, 708, 103, 303],
    'Name': ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
    'Sex': ['Male', 'Female', 'Male', 'Male', 'Female'],
})

students_df_dummies = pd.get_dummies(students_df, columns=["Sex"])

print("The original DataFrame is:")
print(students_df, "\n")

print("DataFrame with Dummies:")
print(students_df_dummies)

Output:

The original DataFrame is:
    Id       Name     Sex
0  302       Mike    Male
1  504  Christine  Female
2  708        Rob    Male
3  103     Daniel    Male
4  303   Jennifer  Female 

DataFrame with Dummies:
    Id       Name  Sex_Female  Sex_Male
0  302       Mike           0         1
1  504  Christine           1         0
2  708        Rob           0         1
3  103     Daniel           0         1
4  303   Jennifer           1         0

It creates dummy variables for the Sex column only.

Set prefix to Change the Default Name of Dummy Columns

import pandas as pd

students_df = pd.DataFrame({
    'Id': [302, 504, 708, 103, 303],
    'Name': ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
    'Sex': ['Male', 'Female', 'Male', 'Male', 'Female'],
})

students_df_dummies = pd.get_dummies(
    students_df, columns=["Sex"], prefix="Column")

print("The original DataFrame is:")
print(students_df, "\n")

print("DataFrame with Dummies:")
print(students_df_dummies)

Output:

The original DataFrame is:
    Id       Name     Sex
0  302       Mike    Male
1  504  Christine  Female
2  708        Rob    Male
3  103     Daniel    Male
4  303   Jennifer  Female 

DataFrame with Dummies:
    Id       Name  Column_Female  Column_Male
0  302       Mike              0            1
1  504  Christine              1            0
2  708        Rob              0            1
3  103     Daniel              0            1
4  303   Jennifer              1            0

It sets the prefix for the dummy columns generated from the Sex column to Column. Now the dummy column names become Column_Female and Column_Male.

Related Article - Pandas DataFrame Column

  • Get Pandas DataFrame Column Headers as a List
  • Delete Pandas DataFrame Column
  • Convert Pandas Column to Datetime
  • Get the Sum of Pandas Column
  • Change the Order of Pandas DataFrame Columns
  • Convert DataFrame Column to String in Pandas