How to Drop Duplicated Column in Pandas

Preet Sanghavi Feb 02, 2024
  1. Drop Duplicate Columns in Pandas
  2. Use the drop_duplicates() Function to Drop Duplicate Columns in Pandas
How to Drop Duplicated Column in Pandas

This tutorial explores the concept of getting rid of or dropping duplicate columns from a Pandas data frame.

Drop Duplicate Columns in Pandas

In this tutorial, let us understand how and why to get rid of identical or similar columns in a Pandas data frame. Most businesses and organizations need to eliminate such duplicate columns as they might not be important to gather insights from.

Moreover, they clutter the database and create issues in storage space when we need to add some other data to our database. Lastly, having duplicate columns might also affect certain statistical or machine learning models as the data might be skewed and would result in very low model accuracy.

Let us see how to get this operation done in action.

However, we create a dummy data frame to work with before we begin. Here we create two data frames, namely dat1 and dat2, along with a few entries.

import pandas as pd

dat1 = pd.DataFrame({"dat1": [9, 5]})

The above code creates a data frame and a few entries, namely 9 and 5. To view the entries in the data, we use the following code.

print(dat1)

The above code gives the following output.

   dat1
0     9
1     5

As shown, we have 2 columns and 2 rows where one indicates the index and the second indicates the values in our data frame. Now, let us create another data frame named dat2 using the following code.

dat2 = pd.DataFrame({"dat2": [9, 5]})

As we did for dat1, we can visualize this dat2 data frame using the following code.

print(dat2)

The code gives the following data frame.

   dat2
0     9
1     5

As we had for dat1, we have 2 rows and 2 columns where one indicates the index and the second indicates the values in our data frame.

Now, let us merge the column of the dat2 data frame to the dat1 data frame. We can do this using the following code.

val = pd.concat([dat1, dat2], axis=1)

As shown, we’re using the concat function in Pandas. This function merges or concatenates multiple data frames into one using a single argument passed as an array with all the data frames merged.

We also need to assign the axis of the addition of the data frame to alter the data frame in terms of columns or rows.

As evident from the code, we use the axis parameter with its value as 1. It can help state adding a column to the array’s data frame assigned in the first argument.

The output of the code is below.

   dat1  dat2
0     9     9
1     5     5

As shown, the data frame dat1 has been altered such that an additional column has been added to it on the first axis.

Again, this output is visualized using the print(val) code. We have a data frame with two columns named dat1 and dat2 with the same values.

Particularly, we have added a new row to the dat1 data frame using the join function in Pandas.

Use the drop_duplicates() Function to Drop Duplicate Columns in Pandas

Now let us eliminate the duplicate columns from the data frame. We can do this operation using the following code.

print(val.reset_index().T.drop_duplicates().T)

This helps us easily reset the index and drop duplicate columns from our data frame. The output of the code is below.

	index	dat1
0	0		9
1	1		5

As shown, we have successfully eliminated the duplicate column named dat2 from our data frame. It is also important to note that we have reset our index for the val data frame that might be useful for analysts to reconfigure their data points and gather better insights.

Thus, we have eliminated any duplicate columns that might exist in our data frame using the concat function and the drop_duplicates() function.

To better understand this concept, you can learn about the following topics.

  1. Concat function in Pandas.
  2. Drop Duplicates function in Pandas.
Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas Column