Pandas DataFrame DataFrame.groupby() Function

Suraj Joshi Jan 30, 2023
  1. Syntax of pandas.DataFrame.groupby():
  2. Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Values of Single Column
  3. Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Multiple Conditions
  4. Example Codes: Set as_index=False in pandas.DataFrame.groupby()
Pandas DataFrame DataFrame.groupby() Function

pandas.DataFrame.groupby() splits the DataFrame into groups based on the given criteria. We can easily manipulate large datasets using the groupby() method.

Syntax of pandas.DataFrame.groupby():

DataFrame.groupby(
    by=None,
    axis=0,
    level=None,
    as_index=True,
    sort=True,
    group_keys=True,
    squeeze: bool=False,
    observed: bool=False)

Parameters

by mapping, function, string, label or iterable to group elements
axis group by along with the row (axis=0) or column (axis=1)
level Integer. value to group by a particular level or levels
as_index Boolean. It returns an object with group labels as the index
sort Boolean. It sorts the group keys
group_keys Boolean. It adds group keys to index to identify pieces
squeeze Boolean. It decreases the dimension of the return when possible
observed Boolean. Only apply if any of the groupers are Categorical and only show observed values for categorical groupers if set to True.

Return

It returns a DataFrameGroupBy object containing the groupped information.

Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Values of Single Column

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))

Output:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

It groups the DataFrame into groups based on the values in the In_Stock column and returns a DataFrameGroupBy object.

To get details about the DataFrameGroupBy object returned by groupby(), we can use the first() method of DataFrameGroupBy object to get the first element of each group.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34

It prints the DataFrame formed by the first elements of both groups split from df.

We can also print the entire group using get_group() method.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))

Output:

     Name  Price In_Stock
0  Orange     34      Yes
3   Apple     44      Yes
5    Kiwi     84      Yes

It prints all the elements in df whose value in the In_Stock column is Yes. We firstly group elements with different values of the In_Stock column into separate groups by using groubpy() method and then access a particular group using get_group() method.

Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Multiple Conditions

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ,
             ('Pineapple', 64, 'No',"XYZ") ,
             ('Kiwi', 84, 'Yes',"XYZ")  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
grouped_df = df.groupby(['In_Stock', 'Supplier']) 
  
print(grouped_df.first())

Output:

                        Name  Price
In_Stock Supplier                  
No       ABC           Mango     24
         XYZ       Pineapple     64
Yes      ABC          Orange     34
         XYZ           Apple     44

It groups the df into groups based on their values in the In_Stock and Supplier columns and returns a DataFrameGroupBy object.

We use the first() method to get the first element of each group. It returns a DataFrame formed by the combination of the first elements of the following four groups:

  • Group with values of In_Stock column No and Supplier column ABC.
  • Group with values of In_Stock column No and Supplier column XYZ.
  • Group with values of In_Stock column Yes and Supplier column ABC.
  • Group with values of In_Stock column Yes and Supplier column XYZ.

The DataFrame returned by the methods of GroupBy object has a MultiIndex, when we pass multiple labels to groupby() function.

print(grouped_df.first().index)

Output:

MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['In_Stock', 'Supplier'])

Example Codes: Set as_index=False in pandas.DataFrame.groupby()

as_index parameter in DataFrame.groupby() method is True by default. The group label is the index of the returned DataFrame when applying GroupBy methods like first().

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes"),
    ("Mango", 24, "No"),
    ("banana", 14, "No"),
    ("Apple", 44, "Yes"),
    ("Pineapple", 64, "No"),
    ("Kiwi", 84, "Yes"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])

grouped_df = df.groupby("In_Stock", as_index=True)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

print("---------")

grouped_df = df.groupby("In_Stock", as_index=False)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34
Index(['No', 'Yes'], dtype='object', name='In_Stock')
---------
  In_Stock    Name  Price
0       No   Mango     24
1      Yes  Orange     34
Int64Index([0, 1], dtype='int64')

As you could see, the index of the generated DataFrame is the group labels because of as_index=True by default.

The index becomes automatically generated index in numbers when we set as_index=False.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

Related Article - Pandas DataFrame