Flatten a Hierarchical Index in Columns in Pandas

Flatten a Hierarchical Index in Columns in Pandas

  1. Use rest_index() to Flatten a Hierarchical Index in Columns in Pandas
  2. Use as_index to Flatten a Hierarchical Index in Columns in Pandas

This article will discuss how to flatten a hierarchical index in Pandas Dataframe columns.

Groupby aggregation functions are commonly used to create hierarchical indexes. The aggregated function used will be visible in the hierarchical index of the resulting dataframe.

We will use different functions to explain how to flatten a hierarchical index in columns.

Use rest_index() to Flatten a Hierarchical Index in Columns in Pandas

The reset_index() function in Pandas flattens the hierarchical index created by the groupby aggregation function.

Syntax:

pandas.DataFrame.reset_index(level, drop, inplace)

Where:

  • level: Only the levels indicated are deleted from the index.
  • drop: The index is reset to the default integer index.
  • inplace: Without making a copy, modifies the dataframe object permanently.

We use the Pandas groupby() function to group bus sales data by quarters and the reset_index() pandas function to flatten the grouped dataframe’s hierarchical indexed columns.

First, import the Python Pandas library and then create a simple dataframe. The dataframe is stored in a data_bus variable.

import pandas as pd
data_bus = pd.DataFrame({"bus": ["2x", "3Tr", "4x", "5x"],
                    "bus_sale_q1": [21, 23, 25, 27],
                    'bus_sale_q2': [12, 14, 16, 18]}
                    columns=["bus", "bus_sale_q1",
                            'bus_sale_q2'])
data_bus

Output:

     bus    bus_sale_q1    bus_sale_q2
0    2x     21             12
1    3Tr    23             14
2    4x     25             16
3    5x     27             18

The above output shows the simple dataframe created. After that, use the groupby() function to group the bus column on the sum of sales q1 and q2.

grouped_data = data_bus.groupby(by="bus").agg("sum")
grouped_data

Output:

bus    bus_sale_q1    bus_sale_q2
2x     21             12
3Tr    23             14
4x     25             16
5x     27             18

We will use the reset_index() function to flatten the hierarchical index column.

flat_data = grouped_data.reset_index()
flat_data

Output:

     bus    bus_sale_q1    bus_sale_q2
0    2x     21             12
1    3Tr    23             14
2    4x     25             16
3    5x     27             18

Use as_index to Flatten a Hierarchical Index in Columns in Pandas

The pandas groupby() function will be used to group bus sales data by quarters, and as_index will flatten the hierarchical indexed columns of the grouped dataframe.

Syntax:

pandas.DataFrame.groupby(by, level, axis, as_index)

Where:

  • level: Columns on which the groupby operation must be performed.
  • by: Columns on which the groupby operation must be performed.
  • axis: Whether to split along rows (0) or columns (1).
  • as_index: For aggregated output, returns an object with the index group labels.

We’ll use the Pandas groupby() function to group bus sales data by quarters and set the as_index parameter to False. This ensures that the grouped dataframe’s hierarchical index is flattened.

The same dataframe will be used as in the previous example.

Example:

import pandas as pd
data_bus = pd.DataFrame({"bus": ["2x", "3Tr", "4x", "5x"],
                    "bus_sale_q1": [21, 23, 25, 27],
                    'bus_sale_q2': [12, 14, 16, 18]},

                    columns=["bus", "bus_sale_q1",
                            'bus_sale_q2'])
data_bus
grouped_data = data_bus.groupby(by="bus", as_index=False).agg("sum")
print(grouped_data)

Output:

     bus    bus_sale_q1    bus_sale_q2
0    2x     21             12
1    3Tr    23             14
2    4x     25             16
3    5x     27             18

Related Article - Pandas Column

  • Explode Multiple Columns in Pandas
  • Pandas Fillna Multiple Columns
  • Drop Last Row and Column in Pandas
  • Check if Column Exists in Pandas
  • Drop Duplicated Column in Pandas