Create Nested Dataframes in Pandas

Create Nested Dataframes in Pandas

  1. Pandas Nested Dataframes
  2. Create Nested Dataframes in Pandas

This article will discuss how to read a Pandas nested dataframe. This will also demonstrate how to fix the issues when we read Pandas nested dataframe in Python.

Pandas Nested Dataframes

Pandas DataFrame is a structure that stores data with two dimensions and the labels corresponding to those dimensions. DataFrames have been widely applied in many data-intensive fields, including data science, machine learning, scientific computing, and many others.

DataFrames are comparable to SQL tables and spreadsheets that can be manipulated in applications such as Excel and Calc.

Because they are an essential component of the Python and NumPy ecosystems, DataFrames are frequently superior to tables and spreadsheets in terms of speed, usability, and power. This is the case for many applications.

While handling large amounts of data, it can occur a situation where it may be required to create a Dataframe instance that contains more instances inside of it.

Consider the following code:

import pandas as pd

data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20, 'c': 30},
        {'a': 40, 'b': 50, 'c': 60},
        {'a': 70, 'b': 80, 'c': 90}]

data2 = [{'d': 1, 'e': 2, 'f': 3},
        {'d': 10, 'e': 20, 'f': 30},
        {'d': 40, 'e': 50, 'f': 60},
        {'d': 70, 'e': 80, 'f': 90}]

data3 = [{'g': 1, 'h': 2, 'i': 3},
        {'g': 10, 'h': 20, 'i': 30},
        {'g': 40, 'h': 50, 'i': 60},
        {'g': 70, 'h': 80, 'i': 90}]

  
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)

print("Dataframe 1: \n" + str(df) + "\n\nDataframe 2:\n" + str(df2) + "\n\nDataframe 3:\n" + str(df3))

Output:

Dataframe 1: 
    a   b   c
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 2:
    d   e   f
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 3:
    g   h   i
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

In the code above, three different Dataframe instances have been declared and stored in the variables df, df2, and df3, respectively. Under the assumption that the three different Dataframe instances are closely related, it can become tedious to access them separately, especially in cases where the data in the instances are large.

To overcome this issue, gathering the data in a single place can be a sensible approach for easier access. A nested Dataframe can be a potential solution since all the related Dataframe instances can be gathered in a new, single Dataframe instance.

On the other hand, nested Dataframe may not be the best choice and are only suitable for very specific scenarios and use cases.

Create Nested Dataframes in Pandas

Just like how a Dataframe can be assigned normal values, it can also receive Dataframe instances and create a new Dataframe consisting of several user-defined Dataframe instances, which in other words, is called a nested Dataframe.

Consider the following code:

import pandas as pd


data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20, 'c': 30},
        {'a': 40, 'b': 50, 'c': 60},
        {'a': 70, 'b': 80, 'c': 90}]

data2 = [{'d': 1, 'e': 2, 'f': 3},
        {'d': 10, 'e': 20, 'f': 30},
        {'d': 40, 'e': 50, 'f': 60},
        {'d': 70, 'e': 80, 'f': 90}]

data3 = [{'g': 1, 'h': 2, 'i': 3},
        {'g': 10, 'h': 20, 'i': 30},
        {'g': 40, 'h': 50, 'i': 60},
        {'g': 70, 'h': 80, 'i': 90}]
  
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)


df4 = pd.DataFrame({'idx':[1,2,3], 'dfs':[df, df2, df3]})

print(df4)

This gives the following output:

   idx                                                dfs
0    1      a   b   c
0   1   2   3
1  10  20  30
2  4...
1    2      d   e   f
0   1   2   3
1  10  20  30
2  4...
2    3      g   h   i
0   1   2   3
1  10  20  30
2  4...

From the above output, it can be seen that just printing the Dataframe instance does not show the whole Dataframe nicely. To make the output a bit more understandable, we have to access the Dataframe elements individually, which are Dataframe instances in our case.

To access the elements, consider the following line:

print("Dataframe 1: \n" + str(df4['dfs'].iloc[0]) + "\n\nDataframe 2:\n" + str(df4['dfs'].iloc[1]) + "\n\nDataframe 3:\n" + str(df4['dfs'].iloc[2]))

This gives the output:

Dataframe 1: 
    a   b   c
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 2:
    d   e   f
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

Dataframe 3:
    g   h   i
0   1   2   3
1  10  20  30
2  40  50  60
3  70  80  90

To nest Dataframe instances inside another instance, we must create a new Dataframe instance and assign the previously created Dataframe instances to the newly created Dataframe.

Assigning Dataframe instances to the newly created Dataframe is quite simple and is in no way different from methods followed while assigning normal data to a Dataframe instance.

In our case, a list consisting of the required Dataframe instance was created, which was then passed on to the constructor of the Dataframe class when creating the new Dataframe instance created specifically to act as a Dataframe consisting of several Dataframes.

As with any problem, there can be many potential approaches to this problem. A nested Dataframe is used in specific scenarios and use cases.

Generally, it is recommended to first research how the situation requires the data to be structured and what type of operations are to be performed on the data. Based on these conditions, it can be determined whether using nested Dataframes is a viable option or not.

Salman Mehmood avatar Salman Mehmood avatar

Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.

LinkedIn

Related Article - Pandas Dataframe

  • Pandas groupby() and diff()
  • Pandas Reverse Dataframe
  • Save Pandas Dataframe Table as a PNG
  • Convert Spark List to Pandas Dataframe
  • Determine the Length of List in Python Pandas Dataframe