Pandas Groupby Count

Suraj Joshi Jan 30, 2023
  1. Count Value of Unique Row Values Using Series.value_counts() Method
  2. Count Values of DataFrame Groups Using DataFrame.groupby() Function
  3. Get Multiple Statistics Values of Each Group Using pandas.DataFrame.agg() Method
Pandas Groupby Count

This tutorial explains how we can get statistics like count, sum, max and much more for groups derived using the DataFrame.groupby() method.

We will use the automobile_data_df shown in the above example to explain the concepts. The DataFrame consists of employees, and the car and bike brands used by them.

import pandas as pd

automobile_data = [
    ["Chevrolet", "Yamaha"],
    ["Toyota", "Ducati"],
    ["Volkswagen", "BMW"],
    ["Cherlovet", "Kawasaki"],
    ["Toyota", "Ducati"],
    ["Toyota", "BMW"],
    ["Volkswagen", "BMW"],
]

column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
    automobile_data, index=index_names, columns=column_names
)
print(automobile_data_df)

Output:

          Car Brand Motorbike Brand
Steven    Chevrolet          Yamaha
Robert       Toyota          Ducati
Jimmy    Volkswagen             BMW
Mike      Cherlovet        Kawasaki
Phoenix      Toyota          Ducati
Scott        Toyota             BMW
John     Volkswagen             BMW

Count Value of Unique Row Values Using Series.value_counts() Method

If we want counts of each unique value of a single column in a DataFrame, we can use the Series.value_counts() method.

import pandas as pd

automobile_data = [
    ["Chevrolet", "Yamaha"],
    ["Toyota", "Ducati"],
    ["Volkswagen", "BMW"],
    ["Cherlovet", "Kawasaki"],
    ["Toyota", "Ducati"],
    ["Toyota", "BMW"],
    ["Volkswagen", "BMW"],
]

column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
    automobile_data, index=index_names, columns=column_names
)

print(automobile_data_df["Car Brand"].value_counts())

Output:

Toyota        3
Volkswagen    2
Cherlovet     1
Chevrolet     1
Name: Car Brand, dtype: int64

It will print all the unique values in the Car Brand column of automobile_data_df along with their counts. The method is only applicable if we want to see value counts of only one column at a time.

Count Values of DataFrame Groups Using DataFrame.groupby() Function

We use the DataFrame.groupby() function to split a DataFrame into multiple groups and then use the size() method to count each created group’s values.

import pandas as pd

automobile_data = [
    ["Chevrolet", "Yamaha"],
    ["Toyota", "Ducati"],
    ["Volkswagen", "BMW"],
    ["Cherlovet", "Kawasaki"],
    ["Toyota", "Ducati"],
    ["Toyota", "BMW"],
    ["Volkswagen", "BMW"],
]

column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
    automobile_data, index=index_names, columns=column_names
)

group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])
print(group.size().reset_index(name="counts"))

Output:

    Car Brand Motorbike Brand  counts
0   Cherlovet        Kawasaki       1
1   Chevrolet          Yamaha       1
2      Toyota             BMW       1
3      Toyota          Ducati       2
4  Volkswagen             BMW       2

It prints a DatFrame with all unique combinations of values of Car Brand and Motorbike Brand columns from the automobile_data_df along with the count of occurrences of these combinations in the automobile_data_df DataFrame.

The groupby() method splits the automobile_data_df into groups. The rows with the same values of Car Brand and Motorbike Brand columns will be placed in the same group. The size() method will give the count of values in each group and finally we generate DataFrame from the count of values in each group.

Get Multiple Statistics Values of Each Group Using pandas.DataFrame.agg() Method

import pandas as pd

automobile_data = [
    ["Chevrolet", "Yamaha", 4000],
    ["Toyota", "Ducati", 5000],
    ["Volkswagen", "BMW", 4500],
    ["Cherlovet", "Kawasaki", 3800],
    ["Toyota", "Ducati", 4500],
    ["Toyota", "BMW", 5000],
    ["Volkswagen", "BMW", 6000],
]

column_names = ["Car Brand", "Motorbike Brand", "Monthly Salary"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
    automobile_data, index=index_names, columns=column_names
)

group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])

print(group.agg({"Monthly Salary": [min, max, sum]}))

Output:

                           Monthly Salary             
                                      min   max    sum
Car Brand  Motorbike Brand                            
Cherlovet  Kawasaki                  3800  3800   3800
Chevrolet  Yamaha                    4000  4000   4000
Toyota     BMW                       5000  5000   5000
           Ducati                    4500  5000   9500
Volkswagen BMW                       4500  6000  10500

It displays the minimum, maximum, and sum of Monthly Salary values for each generated group.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas Groupby