How to GroupBy and Aggregate Multiple Columns in Pandas

Fariba Laiq Feb 02, 2024
How to GroupBy and Aggregate Multiple Columns in Pandas

The Pandas library is a powerful data analysis library in Python. We can perform many different types of manipulation on a dataframe using Pandas in Python.

groupby() is a method that splits the data into multiple groups based on specific criteria. After that, we can perform certain operations on the grouped data.

Apply the groupby() and the aggregate() Functions on Multiple Columns in Pandas Python

Sometimes we need to group the data from multiple columns and apply some aggregate() methods. The aggregate() methods are those methods that combine the values from multiple rows and return a single value, for example, count(), size(), mean(), sum(), mean(), etc.

In the following code, we have the students’ data that contains redundant values for some columns. If you want to group the data based on the student’s Name and Section to get their total marks, we will group the data according to the name and section and then calculate the total marks using the aggregate() method.

We have stored the returned result and displayed it.

Example Code:

# Python 3.x
import pandas as pd

student = {
    "Name": ["Jhon", "Alia", "Jhon", "Alia", "Sam", "Smith", "Sam", "Smith"],
    "Section": ["A", "A", "A", "A", "B", "B", "B", "B"],
    "Course": ["Java", "Java", "Dart", "Dart", "Java", "Java", "Dart", "Dart"],
    "Marks": [70, 80, 90, 60, 80, 90, 50, 80],
}
df = pd.DataFrame(student)
display(df)
result = df.groupby(["Name", "Section"]).aggregate("sum")
display(result)

Output:

DataFrame

GroupBy Aggregate Multiple Columns

We can also perform multiple aggregate operations at a time. We will pass the list of operation names to the aggregate() method.

Here, we have calculated students’ mean and total marks at once using the aggregate() method by passing the list of operation names.

Example Code:

# Python 3.x
import pandas as pd

student = {
    "Name": ["Jhon", "Alia", "Jhon", "Alia", "Sam", "Smith", "Sam", "Smith"],
    "Section": ["A", "A", "A", "A", "B", "B", "B", "B"],
    "Course": ["Java", "Java", "Dart", "Dart", "Java", "Java", "Dart", "Dart"],
    "Marks": [70, 80, 90, 60, 80, 90, 50, 80],
}
df = pd.DataFrame(student)
display(df)
result = df.groupby(["Name", "Section"]).aggregate(["mean", "sum"])
display(result)

Output:

DataFrame

GroupBy Aggregate Multiple Columns Multiple Methods

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas GroupBy