Pandas DataFrame DataFrame.groupby() Function
-
Syntax of
pandas.DataFrame.groupby(): -
Example Codes: Group Two DataFrames With
pandas.DataFrame.groupby()Based on Values of Single Column -
Example Codes: Group Two DataFrames With
pandas.DataFrame.groupby()Based on Multiple Conditions -
Example Codes: Set
as_index=Falseinpandas.DataFrame.groupby()
pandas.DataFrame.groupby() splits the DataFrame into groups based on the given criteria. We can easily manipulate large datasets using the groupby() method.
Syntax of pandas.DataFrame.groupby():
DataFrame.groupby(
by=None,
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
squeeze: bool=False,
observed: bool=False)
Parameters
by |
mapping, function, string, label or iterable to group elements |
axis |
group by along with the row (axis=0) or column (axis=1) |
level |
Integer. value to group by a particular level or levels |
as_index |
Boolean. It returns an object with group labels as the index |
sort |
Boolean. It sorts the group keys |
group_keys |
Boolean. It adds group keys to index to identify pieces |
squeeze |
Boolean. It decreases the dimension of the return when possible |
observed |
Boolean. Only apply if any of the groupers are Categorical and only show observed values for categorical groupers if set to True. |
Return
It returns a DataFrameGroupBy object containing the groupped information.
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Values of Single Column
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
It groups the DataFrame into groups based on the values in the In_Stock column and returns a DataFrameGroupBy object.
To get details about the DataFrameGroupBy object returned by groupby(), we can use the first() method of DataFrameGroupBy object to get the first element of each group.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
It prints the DataFrame formed by the first elements of both groups split from df.
We can also print the entire group using get_group() method.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))
Output:
Name Price In_Stock
0 Orange 34 Yes
3 Apple 44 Yes
5 Kiwi 84 Yes
It prints all the elements in df whose value in the In_Stock column is Yes. We firstly group elements with different values of the In_Stock column into separate groups by using groubpy() method and then access a particular group using get_group() method.
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby() Based on Multiple Conditions
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ,
('Pineapple', 64, 'No',"XYZ") ,
('Kiwi', 84, 'Yes',"XYZ") ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
grouped_df = df.groupby(['In_Stock', 'Supplier'])
print(grouped_df.first())
Output:
Name Price
In_Stock Supplier
No ABC Mango 24
XYZ Pineapple 64
Yes ABC Orange 34
XYZ Apple 44
It groups the df into groups based on their values in the In_Stock and Supplier columns and returns a DataFrameGroupBy object.
We use the first() method to get the first element of each group. It returns a DataFrame formed by the combination of the first elements of the following four groups:
- Group with values of
In_StockcolumnNoandSuppliercolumnABC. - Group with values of
In_StockcolumnNoandSuppliercolumnXYZ. - Group with values of
In_StockcolumnYesandSuppliercolumnABC. - Group with values of
In_StockcolumnYesandSuppliercolumnXYZ.
The DataFrame returned by the methods of GroupBy object has a MultiIndex, when we pass multiple labels to groupby() function.
print(grouped_df.first().index)
Output:
MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['In_Stock', 'Supplier'])
Example Codes: Set as_index=False in pandas.DataFrame.groupby()
as_index parameter in DataFrame.groupby() method is True by default. The group label is the index of the returned DataFrame when applying GroupBy methods like first().
import pandas as pd
fruit_list = [
("Orange", 34, "Yes"),
("Mango", 24, "No"),
("banana", 14, "No"),
("Apple", 44, "Yes"),
("Pineapple", 64, "No"),
("Kiwi", 84, "Yes"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])
grouped_df = df.groupby("In_Stock", as_index=True)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
print("---------")
grouped_df = df.groupby("In_Stock", as_index=False)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
Index(['No', 'Yes'], dtype='object', name='In_Stock')
---------
In_Stock Name Price
0 No Mango 24
1 Yes Orange 34
Int64Index([0, 1], dtype='int64')
As you could see, the index of the generated DataFrame is the group labels because of as_index=True by default.
The index becomes automatically generated index in numbers when we set as_index=False.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn