Pandas groupby() and diff()

Fariba Laiq Nov 17, 2022
  1. Data Grouping in Python
  2. Use groupby() With diff() in Pandas
Pandas groupby() and diff()

The Pandas library is a complete tool for handling text data in addition to numbers. You’ll want to exclude text input from many data analysis applications and machine learning exploration/pre-processing, or you’ll want to extract information from it.

To do this, you can add, remove, and change text columns in your DataFrames using various in-built techniques provided by Pandas. This article will briefly discuss how to group data and find the differences between the grouped values.

Data Grouping in Python

Data analysis frequently calls for grouping records by one or more columns. Examples of such scenarios include:

  • Counting the number of employees in each business department.
  • Figuring out the average salaries of men and women in each department.
  • Figuring out the average salaries of employees of various ages.

Pandas offer a groupby() function that makes it easy to handle most grouping chores. However, there are some jobs that the position needs help to complete; let’s attempt to offer other ways.

One of the most significant Pandas functions is groupby(). Records are grouped and summarized using the split in this method and use the combined strategy.

Use groupby() With diff() in Pandas

The example below created a Dataframe with ID_Number, Stu_Names, and Marks of different students. After that, we made a new column called Marks_diff that contains the difference in marks between consecutive dates, which ID_Number groups.

We have used fillna(0) here because when the group variable’s value changes across adjacent rows in the DataFrame, fillna(0) instructs Pandas to insert a zero.

The difference between the marks of Harry and Petter is 6.0, and the difference between Daniel and Ron is 10, as shown in the output.

Example code:

import pandas as pd

d1 = pd.DataFrame(
    {
        "ID_Number": ["ID1", "ID1", "ID2", "ID2"],
        "Stu_Names": ["Harry", "Petter", "Daniel", "Ron"],
        "Marks": [72, 78, 80, 90],
    }
)
print(d1)
d1 = d1.sort_values(by=["ID_Number"])
d1["Marks_diff"] = d1.groupby(["ID_Number"])["Marks"].diff().fillna(0)
print(d1)

Output:

    ID_Number Stu_Names  Marks
0       ID1     Harry     72
1       ID1    Petter     78
2       ID2    Daniel     80
3       ID2       Ron     90
    ID_Number Stu_Names  Marks  Marks_diff
0       ID1     Harry     72         0.0
1       ID1    Petter     78         6.0
2       ID2    Daniel     80         0.0
3       ID2       Ron     90       10.0
Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas Dataframe