Check if Column Exists in Pandas

Check if Column Exists in Pandas

  1. Use the IN Operator to Check if Column Exists in Pandas
  2. Use the NOT IN Operator to Check if Column Exists in Pandas

This tutorial demonstrates ways to check whether a column exists in a Pandas Dataframe or not in Python. We will use the IN and NOT IN operators in Python that can be used to do that.

Use the IN Operator to Check if Column Exists in Pandas

Dataframe is an arrangement that holds two-dimensional data and their corresponding labels. We can find the column labels using the dataframe.column attribute.

To ensure whether a column exists or not, we use the IN expression. However, we need to form a dummy dataframe in Pandas to use the mentioned techniques before we begin.

Here we create a dataframe of students’ performance, with column names Name, Promoted, and Marks.

import pandas as pd
import numpy as np
# Creating dataframe
df = pd.DataFrame()
# Adding columns to the dataframe
df['Name'] = ['John', 'Doe', 'Bill']
df['Promoted'] = [True, False,True]
df['Marks'] = [82, 38, 63]
# Getting the dataframe as an output
print(df)

The code gives the following output.

   Name  Promoted  Marks
0  John      True     82
1   Doe     False     38
2  Bill      True     63

Once the dataframe is ready, we can check whether the dataframe contains items or is empty by writing the code given below. For this purpose, we can use two methods.

Either we use the df.empty function that exists in Pandas, or we can check the length of the dataframe using len(df.index).

We have used the Pandas attribute df.empty in the example below.

if df.empty:
    print('DataFrame is empty!')
else:
    print('Not empty!')

Since we have inserted data into the column, the output must be Not empty!.

Not empty!

Now, let’s move on and check whether a column in the Pandas dataframe exists or not using the IN method. See the code below to see this function in action.

if 'Promoted' in df:
    print("Yes, it does exist.")
else:
    print("No, it does not exist.")

The code gives the following output.

Yes, it does exist.

For more clarity, one can also write it as if 'Promoted' in df.columns: instead of just writing df.

Use the NOT IN Operator to Check if Column Exists in Pandas

Let’s see how to use the NOT IN attribute to perform the same operation. It functions the other way around, and the output gets inverted due to an added negation in the attribute.

Here is the sample working of the NOT IN attribute given below.

if 'Promoted' not in df.columns:
    print("Yes, it does not exist.")
else:
    print("No, it does exist.")

The code gives the following output.

No, it does exist.

We have seen how to do it for a single column in a dataframe. Pandas also enable users to check multiple columns within a dataframe.

This helps in quick tasking and helps in categorizing multiple columns simultaneously.

Below is the code snippet to check multiple columns in Pandas dataframe.

if set(['Name','Promoted']).issubset(df.columns):
    print("Yes, all of them exist.")
else:
    print("No")

The code gives the following output.

Yes, all of them exist.

The set([]) can also be constructed using curly braces.

if not {'Name', 'Promoted'}.issubset(df.columns):
    print("Yes")
else:
    print("No")

To which the output will be:

No

These are the possible ways to check for one or more columns in the data. Similarly, we can also perform these functions on readily available data instead of dummy data.

We are only required to import the CSV file using the Python Pandas module through the read_csv method. If Google Colab is used, import the files module from google.colab to upload a data file from a personal system during runtime.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas Column

  • Explode Multiple Columns in Pandas
  • Pandas Fillna Multiple Columns
  • Drop Last Row and Column in Pandas
  • Flatten a Hierarchical Index in Columns in Pandas
  • Drop Duplicated Column in Pandas