How to Groupby Two Columns in Pandas

Suraj Joshi Feb 02, 2024
  1. Pandas Groupby Multiple Columns
  2. Count Number of Rows in Each Group Pandas
How to Groupby Two Columns in Pandas

This tutorial explains how we can use the DataFrame.groupby() method in Pandas for two columns to separate the DataFrame into groups. We can also gain much more information from the created groups.

We will use the below DataFrame in this article.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
        "Age": [30, 28, 27, 24, 28, 25],
    }
)

print(data)

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Pandas Groupby Multiple Columns

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
        "Age": [30, 28, 27, 24, 28, 25],
    }
)

print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(["Gender", "Employed"])
for group_key, group_value in groups:
    group = groups.get_group(group_key)
    print(group)
    print("")

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Groups in DataFrame:
   Name  Gender Employed  Age
3  Emma  Female       No   24

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
4      Luna  Female      Yes   28

     Name Gender Employed  Age
1  Travis   Male       No   28
5   Anish   Male       No   25

  Name Gender Employed  Age
2  Bob   Male      Yes   27

It creates 4 groups from the DataFrame. All the rows with the same value of Gender and Employed column are placed in the same group.

Count Number of Rows in Each Group Pandas

To count the number of rows in each created group using the DataFrame.groupby() method, we can use the size() method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
        "Age": [30, 28, 27, 24, 28, 25],
    }
)

print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(["Gender", "Employed"]).size().reset_index(name="Count")
print(grouped_df)

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Count of Each group:
   Gender Employed  Count
0  Female       No      1
1  Female      Yes      2
2    Male       No      2
3    Male      Yes      1

It displays the DataFrame, groups created from the DataFrame, and the number of entries in each group.

If we want the largest count value for each value in the Employed column, we can form another group from the created group above and count values and then get the maximum value of count using the max() method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
        "Age": [30, 28, 27, 24, 28, 25],
    }
)

print(data)
print("")

groups = data.groupby(["Gender", "Employed"]).size().groupby(level=1)
print(groups.max())

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Employed
No     2
Yes    2
dtype: int64

It shows the maximum count of values of the Employed column among created groups from Gender and Employed columns.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame Column

Related Article - Pandas Groupby