Difference Between Shallow Copy vs Deep Copy in Pandas Dataframes

Luqman Khan Feb 03, 2022
Difference Between Shallow Copy vs Deep Copy in Pandas Dataframes

This tutorial article will introduce the difference between shallow and deep copy in Pandas Dataframe.

We can copy and perform the manipulation when we want to add, delete, or update a Dataframe without modifying the Dataframe.

Differences Between Shallow Copy and Deep Copy in Pandas Dataframes

There are many differences between shallow copy and deep copy in Pandas Dataframes. This article will provide two of those differences.

Below you can see the syntax used for Python Pandas Dataframe.copy() function.

DataFrame.copy(deep=True)

Deep indicates the bool (True or False), with True default. There are two ways to copy Pandas’ data structure shallow and deep copy. First, we discuss the shallow copy.

Creating a Shallow Copy Is Faster Than Creating a Deep Copy in Pandas Dataframes

The Deep=False doesn’t copy the indices or the data of the original object. Use the df.copy(deep=False) method to make a shallow duplicate of a Pandas DataFrame.

It refers to creating a new collection object and then populating it with references to the original’s child objects. Because the copying operation does not recurse, no copies of the child objects are created.

When opposed to a deep copy, it is faster to create a shallow copy.

pandas.DataFrame.copy(deep=False)

Import Python Pandas library for this purpose.

import pandas as pd

After importing the Pandas library, assign a DataFrame.

df = pd.DataFrame([5, 6, 7, 8, 9])
print(df)

Output:

   0
0  5
1  6
2  7
3  8
4  9

Now use id to see what happens.

>>> id(df1)

Output:

140509987701904

Create variable df2 and store df1 and see the id of df2.

>>> df2 = df1
>>> id(df2)

Output:

140509987701904

The id is the same for both df2 and df1. Now, use the copy function to see if the id changes or not.

>>> df3 = df1.copy()
>>> id(df3)

Look at the output below to see the change.

Output:

140509924069968

Shallow copy:

>>> df4 = df1.copy(deep=False)
>>> print(df4)
>>> id(df4)

Output:

   0
0   6
1   7
2   8
3   9
4  10
140509923248976

Deep copy:

Deep=True (the default), a new object is produced with a copy of the calling object’s data and indices. Changes to the copy’s data or indices will not reflect the original object.

Use the df.copy(deep=False) method to make a shallow duplicate of a Pandas Dataframe. An object’s copy is copied into another object in the deep copy.

It indicates that any modifications made to a copy of an object are not reflected in the original. A deep copy takes longer to create than a shallow copy.

>>> df4 = df1.copy(deep=True)
>>> print(df4)
>>> id(df4)

Output:

    0
0   6
1   7
2   8
3   9
4  10
140509923248720

Both ids are not the same. Let’s take another example to see the difference between shallow and deep copy.

Shallow Copy Is Dependent on the Original

import pandas as pd

df = pd.DataFrame({"in": [1, 2, 3, 4], "Maria": ["Man", "kon", "nerti", "Ba"]})
copydf = df.copy(deep=False)
print("\nBefore Operation:\n", copydf == df)
copydf["in"] = [0, 0, 0, 0]
print("\nAfter Operation:\n", copydf == df)
print("\nAfter operation original dataframe:\n", df)

Output:

Before Operation:
      in  Maria
0  True   True
1  True   True
2  True   True
3  True   True

After Operation:
      in  Maria
0  True   True
1  True   True
2  True   True
3  True   True

After operation original dataframe:
    in  Maria
0   0    Man
1   0    kon
2   0  nerti
3   0     Ba

As the output of the preceding program shows, the modifications made to the shallow copied dataframe are automatically applied to the original series. And now use the same code; change deep=True for the deep copy.

Deep Copy Is Not Fully Dependent on the Original

import pandas as pd

df = pd.DataFrame({"in": [1, 2, 3, 4], "Maria": ["Man", "kon", "nerti", "Ba"]})
copydf = df.copy(deep=True)
print("\nBefore Operation:\n", copydf == df)
copydf["in"] = [0, 0, 0, 0]
print("\nAfter Operation:\n", copydf == df)
print("\nAfter operation original dataframe:\n", df)

Output:

Before Operation:
      in  Maria
0  True   True
1  True   True
2  True   True
3  True   True

After Operation:
       in  Maria
0  False   True
1  False   True
2  False   True
3  False   True

After operation original dataframe:
    in  Maria
0   1    Man
1   2    kon
2   3  nerti
3   4     Ba

The data included within the original objects are not recursively duplicated in this case. The data included within the original objects’ data still point to the same memory unit.

For example, if the data in a series object is mutable, it will be shared between it and its deep duplicate, and any changes to one will be reflected in the other.

Related Article - Pandas DataFrame