How to Convert Categorical Variable to Numeric in Pandas

Preet Sanghavi Feb 02, 2024
  1. Convert Categorical Variable to Numeric Variable in Pandas
  2. Use the apply Function to Convert Categorical Variable to Numeric Variable in Pandas
How to Convert Categorical Variable to Numeric in Pandas

This tutorial explores the concept of converting categorical variables to numeric variables in Pandas.

Convert Categorical Variable to Numeric Variable in Pandas

This tutorial lets us understand how and why to convert a certain variable from one to another, particularly how to convert a categorical data type variable to a numeric variable.

One might need to perform such an operation because a certain data type might not be feasible for the analyst’s analysis or interpretation task. Under such a situation, Pandas helps convert a certain type of variable to another variable.

Let us understand how to perform such a complex operation.

However, we create a dummy data frame to work with before we begin. Here we create one data frame, namely, df.

We add a few columns and certain data within this df data frame. We can do this operation using the following code.

import pandas as pd

df = pd.DataFrame(
    {"col1": [1, 2, 3, 4, 5], "col2": list("abcab"), "col3": list("ababb")}
)

The above code creates a data frame along with a few entries. To view the entries in the data, we use the following code.

print(df)

The above code gives the following output.

   col1 col2 col3
0     1    a    a
1     2    b    b
2     3    c    a
3     4    a    b
4     5    b    b

As we can see, we have four columns and 5 rows indexed from value 0 to value 4. Looking into our data frame, we can see that we have certain numeric values in our data and others, alphabets.

Our job is to now convert these alphabetical values into numeric values.

Use the apply Function to Convert Categorical Variable to Numeric Variable in Pandas

Let us get straight to our task as we have our data set up. The first step would be to visualize the category of each column.

This category in other programming languages is also called data types. We use the following code to view the data types associated with each column.

df["col2"] = df["col2"].astype("category")
df["col3"] = df["col3"].astype("category")
print(df.dtypes)

The output of the code can be illustrated below.

col1       	int64
col2    	category
col3    	category
dtype: object

As we can see, we have the data type for each column listed in the table above. We have col1 with the data type as int64 and col2 with category. The col3 is also similar to that of col2.

Now that we know the data types for each column, we can move on to the next step.

The next step is to find the categorical columns and list them together. This is not a difficult but an extremely important step in our operation as it helps us understand which columns are to be converted to numeric variables.

cat_columns = df.select_dtypes(["category"]).columns

As shown in the code, we fetch all the columns with dtypes equal to category. Similarly, we can fetch any dtype as per our requirement.

Now that we have found all our categorical columns let’s visualize them. We can perform this operation using the following code.

print(cat_columns)

The code fetches the following output.

Index(['col2', 'col3'], dtype='object')

This would indicate the dtype associated with the categorical columns.

The last step is to convert these categorical variables to numeric variables. We can perform this operation using the following code.

df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)

The code fetches the following output.

   col1  col2  col3
0     1     0     0
1     2     1     1
2     3     2     0
3     4     0     1
4     5     1     1

We can get the output using the code print(df).

As shown in the output above, we have successfully converted the alphabets to numeric values, thereby helping us convert categorical variables to numeric variables.

Thus, using the apply function and fetching the categorical columns, we have converted variables from categorical to numeric in our data frame.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub