Pandas DataFrame DataFrame.groupby() Function

Suraj Joshi Jan 30, 2023 Pandas Pandas DataFrame

Syntax of pandas.DataFrame.groupby():
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Values of Single Column
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Multiple Conditions
Example Codes: Set as_index=False in pandas.DataFrame.groupby()

Pandas DataFrame DataFrame.groupby() Function

pandas.DataFrame.groupby() splits the DataFrame into groups based on the given criteria. We can easily manipulate large datasets using the groupby() method.

Syntax of `pandas.DataFrame.groupby()`:

DataFrame.groupby(
    by=None,
    axis=0,
    level=None,
    as_index=True,
    sort=True,
    group_keys=True,
    squeeze: bool=False,
    observed: bool=False)

Parameters


`by`	mapping, function, string, `label` or iterable to group elements
`axis`	group by along with the `row` (axis=0) or `column` (axis=1)
`level`	Integer. value to group by a particular level or levels
`as_index`	Boolean. It returns an object with group labels as the index
`sort`	Boolean. It sorts the group keys
`group_keys`	Boolean. It adds group keys to index to identify pieces
`squeeze`	Boolean. It decreases the dimension of the return when possible
`observed`	Boolean. Only apply if any of the groupers are Categorical and only show observed values for categorical groupers if set to `True`.

Return

It returns a DataFrameGroupBy object containing the groupped information.

Example Codes: Group Two DataFrames With `pandas.DataFrame.groupby()` Based on Values of Single Column

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))

Output:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

It groups the DataFrame into groups based on the values in the In_Stock column and returns a DataFrameGroupBy object.

To get details about the DataFrameGroupBy object returned by groupby(), we can use the first() method of DataFrameGroupBy object to get the first element of each group.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34

It prints the DataFrame formed by the first elements of both groups split from df.

We can also print the entire group using get_group() method.

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
             ('Mango', 24, 'No' ) ,
             ('banana', 14, 'No' ) ,
             ('Apple', 44, 'Yes' ) ,
             ('Pineapple', 64, 'No') ,
             ('Kiwi', 84, 'Yes')  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock']) 
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))

Output:

     Name  Price In_Stock
0  Orange     34      Yes
3   Apple     44      Yes
5    Kiwi     84      Yes

It prints all the elements in df whose value in the In_Stock column is Yes. We firstly group elements with different values of the In_Stock column into separate groups by using groubpy() method and then access a particular group using get_group() method.

Example Codes: Group Two DataFrames With `pandas.DataFrame.groupby()` Based on Multiple Conditions

import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
             ('Mango', 24, 'No','ABC' ) ,
             ('banana', 14, 'No','ABC' ) ,
             ('Apple', 44, 'Yes',"XYZ" ) ,
             ('Pineapple', 64, 'No',"XYZ") ,
             ('Kiwi', 84, 'Yes',"XYZ")  ]

df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"]) 
grouped_df = df.groupby(['In_Stock', 'Supplier']) 
  
print(grouped_df.first())

Output:

                        Name  Price
In_Stock Supplier                  
No       ABC           Mango     24
         XYZ       Pineapple     64
Yes      ABC          Orange     34
         XYZ           Apple     44

It groups the df into groups based on their values in the In_Stock and Supplier columns and returns a DataFrameGroupBy object.

We use the first() method to get the first element of each group. It returns a DataFrame formed by the combination of the first elements of the following four groups:

Group with values of In_Stock column No and Supplier column ABC.
Group with values of In_Stock column No and Supplier column XYZ.
Group with values of In_Stock column Yes and Supplier column ABC.
Group with values of In_Stock column Yes and Supplier column XYZ.

The DataFrame returned by the methods of GroupBy object has a MultiIndex, when we pass multiple labels to groupby() function.

print(grouped_df.first().index)

Output:

MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['In_Stock', 'Supplier'])

Example Codes: Set `as_index=False` in `pandas.DataFrame.groupby()`

as_index parameter in DataFrame.groupby() method is True by default. The group label is the index of the returned DataFrame when applying GroupBy methods like first().

import pandas as pd

fruit_list = [
    ("Orange", 34, "Yes"),
    ("Mango", 24, "No"),
    ("banana", 14, "No"),
    ("Apple", 44, "Yes"),
    ("Pineapple", 64, "No"),
    ("Kiwi", 84, "Yes"),
]

df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])

grouped_df = df.groupby("In_Stock", as_index=True)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

print("---------")

grouped_df = df.groupby("In_Stock", as_index=False)

firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)

Output:

            Name  Price
In_Stock               
No         Mango     24
Yes       Orange     34
Index(['No', 'Yes'], dtype='object', name='In_Stock')
---------
  In_Stock    Name  Price
0       No   Mango     24
1      Yes  Orange     34
Int64Index([0, 1], dtype='int64')

As you could see, the index of the generated DataFrame is the group labels because of as_index=True by default.

The index becomes automatically generated index in numbers when we set as_index=False.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

Syntax of pandas.DataFrame.groupby():

Parameters

Return

Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Values of Single Column

Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Multiple Conditions

Example Codes: Set as_index=False in pandas.DataFrame.groupby()

Related Article - Pandas DataFrame

Syntax of `pandas.DataFrame.groupby()`:

Example Codes: Group Two DataFrames With `pandas.DataFrame.groupby()` Based on Values of Single Column

Example Codes: Group Two DataFrames With `pandas.DataFrame.groupby()` Based on Multiple Conditions

Example Codes: Set `as_index=False` in `pandas.DataFrame.groupby()`