Compare Pandas DataFrame Object

This tutorial explains how we can compare Pandas DataFrame objects in Python. We can compare DataFrames using the == operator.

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print("df_1:")
print(df_1)

print("")

print("df_2:")
print(df_2)

Output:

df_1:
        Player  Goals
0  Lewandowski     10
1       Haland      8
2      Ronaldo      6
3        Messi      5
4       Mbappe      4

df_2:
        Player  Goals
0  Lewandowski      7
1       Haland      8
2      Ronaldo      6
3        Messi      7
4       Mbappe      4

We will use the DataFrames df_1 and df_2 to demonstrate the comparison of DataFrames in this article.

Compare Pandas DataFrame Object Using the == Operator

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print(df_1 == df_2)

Output:

   Player  Goals
0    True  False
1    True   True
2    True   True
3    True  False
4    True   True

It compares the corresponding elements of df_1 ad df_2 and returns True if the corresponding elements of that position are the same, otherwise it returns False.

We can use pandas.DataFrame.all() method to know which rows are same in both df_1 and df_2.

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print((df_1 == df_2).all(axis=1))

Output:

0    False
1     True
2     True
3    False
4     True
dtype: bool

The rows with True value in the output have the same value as the corresponding elements. Hence, the rows with False value in the output have different values of corresponding elements.

We can use indexing to list all the rows whose values differ in df_1 and df_2 .

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print(df_1[(df_1 == df_2).all(axis=1) == False])

Output:

        Player  Goals
0  Lewandowski     10
3        Messi      5

It lists all the rows of df_1, which have different values than corresponding rows in df_2.

If we have different indexes for df_1 and df_2, we get an error saying ValueError: Can only compare identically-labeled DataFrame objects.

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=['a', 'b', 'c', 'd', 'e'])

print(df_1 == df_2)

Output:

Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled DataFrame objects

We can use the pandas.DataFrame.reset_index() method to reset the indices to overcome the above issue.

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=['a', 'b', 'c', 'd', 'e'])
df_2.reset_index(drop=True, inplace=True)

print(df_1 == df_2)

Output:

   Player  Goals
0    True  False
1    True   True
2    True   True
3    True  False
4    True   True

It resets the index of df_2 before comparing df_1 and df_2 so that two dataframes have the same indices to make the comparison possible.

We must also make sure we have the same numbers of rows in DataFrames before comparing them.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Pandas DataFrame

  • Pretty Print an Entire Pandas Series/DataFrame
  • Replace Column Values in Pandas DataFrame