How to Apply Transform With Groupby in Pandas

Fariba Laiq Feb 02, 2024
  1. Difference Between the apply() and transform() in Python
  2. Use the apply() Method in Python Pandas
  3. Use the transform() Method in Python Pandas
How to Apply Transform With Groupby in Pandas

The groupby() is a powerful method in Python that allows us to divide the data into separate groups according to some criteria. The purpose is to run calculations and perform better analysis.

Difference Between the apply() and transform() in Python

The apply() and transform() are two methods used in conjunction with the groupby() method call. The difference between these two methods is the argument passed, and the value returned.

The apply() method accepts the argument as a DataFrame and returns a scalar or a sequence of the data frame. Therefore, it allows us to conduct operations on each group’s column, rows, and the complete data frame.

The transform() method only accepts the argument as a series representing a column from each group, and it returns a sequence of the same length as the input series. Therefore, we can only operate on specific columns inside each group at once.

Use the apply() Method in Python Pandas

In the following code, we have loaded a CSV file that consists of Student records. We have used the apply function to show the highest score among each department.

First, we have to make a group of every department using the groupby() method. Then found the maximum score of each department using the max() function.

The output returned in the form of a series. We can also perform operations on multiple columns or the entire data frame.

# Python 3.x
import pandas as pd

df = pd.read_csv("Student.csv")
display(df)


def f(my_df):
    return my_df.Marks.max()


df.groupby("Department").apply(f)

Output:

Use groupby()_apply() in Python Pandas

Use the transform() Method in Python Pandas

We have merged another column, Mean_Marks, to the data frame by making a group of each department using the groupby() method in the next example, and then calculated the Mean of both departments using the mean keyword.

The output shows the mean score of both departments.

Here, the transform() method has operated on a single column, in our case Marks.

# Python 3.x
import pandas as pd

df = pd.read_csv("Student.csv")
display(df)
df["Mean_Marks"] = df.groupby("Department")["Marks"].transform("mean")
display(df)

Output:

use groupby()_transform() in Python Pandas

Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas Groupby