Pandas Groupby Two Columns

  1. Pandas Groupby Multiple Columns
  2. Count Number of Rows in Each Group Pandas

This tutorial explains how we can use the DataFrame.groupby() method in Pandas for two columns to separate the DataFrame into groups. We can also gain much more information from the created groups.

We will use the below DataFrame in this article.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Pandas Groupby Multiple Columns

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(['Gender', 'Employed'])
for group_key, group_value in groups:
    group = groups.get_group(group_key)
    print(group)
    print("")

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Groups in DataFrame:
   Name  Gender Employed  Age
3  Emma  Female       No   24

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
4      Luna  Female      Yes   28

     Name Gender Employed  Age
1  Travis   Male       No   28
5   Anish   Male       No   25

  Name Gender Employed  Age
2  Bob   Male      Yes   27

It creates 4 groups from the DataFrame. All the rows with the same value of Gender and Employed column are placed in the same group.

Count Number of Rows in Each Group Pandas

To count the number of rows in each created group using the DataFrame.groupby() method, we can use the size() method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(['Gender', 'Employed']
                          ).size().reset_index(name="Count")
print(grouped_df)

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Count of Each group:
   Gender Employed  Count
0  Female       No      1
1  Female      Yes      2
2    Male       No      2
3    Male      Yes      1

It displays the DataFrame, groups created from the DataFrame, and the number of entries in each group.

If we want the largest count value for each value in the Employed column, we can form another group from the created group above and count values and then get the maximum value of count using the max() method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")

groups = data.groupby(['Gender', 'Employed']).size().groupby(level=1)
print(groups.max())

Output:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Employed
No     2
Yes    2
dtype: int64

It shows the maximum count of values of the Employed column among created groups from Gender and Employed columns.

Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.

Related Article - Pandas DataFrame Column

  • Pandas Axis Meaning
  • Add a New Column to Existing DataFrame With Default Value in Pandas
  • Related Article - Pandas Groupby

  • Get Pandas DataFrame Column Headers as a List
  • Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas