GroupBy Apply in Pandas

GroupBy Apply in Pandas

  1. Pandas GroupBy-Apply Behaviour
  2. Using the groupby() Function in Pandas
  3. Join groupby() and apply() Function in Pandas

This tutorial aims to explore the GroupBy Apply concept in Pandas. Pandas is used as an advanced data analysis tool or a package extension in Python.

It is highly recommended to use Pandas when we have data in a SQL table, a spreadsheet or heterogenous columns. The data can be ordered or unordered, and time-series data is also supported.

Pandas GroupBy-Apply Behaviour

let us try to understand how to group by data and then apply a particular function to aggregate or calculate values to our data. GroupBy helps us group or bring together certain data entries together.

GroupBy helps us keep track of different data entry points in our data. Let us see this method in action.

We’ll create a dummy data frame to work with. Here we create a data frame dframe and a few rows.

from pandas import *

our_data = {"mylabel": Series(['P','R','E','E','T','S','A','P','R','E','T'])}
dframe = DataFrame(our_data)

print(dframe) #print output

Output:

   mylabel
0         P
1         R
2         E
3         E
4         T
5         S
6         A
7         P
8         R
9         E
10        T

We have our data frame with the label mylabel set up with different data points and indices. Each alphabet has been assigned a particular index.

These labels are something we will learn how to group and apply certain aggregation functions.

Using the groupby() Function in Pandas

We can understand how to group data with the help of the following code. As we can see, we are trying to group each alphabet and count their occurrence.

from pandas import *

our_data = {"mylabel": Series(['P','R','E','E','T','S','A','P','R','E','T'])}
dframe = DataFrame(our_data)

def perc(value, total):
    return value/float(total)

def gcou(values):
    return len(values)

grpd_count = dframe.groupby("mylabel").mylabel.agg(gcou)

print(grpd_count) #prints output

Output:

mylabel
A    1
E    3
P    2
R    2
S    1
T    2
Name: mylabel, dtype: int64

We need to work with this new data frame that we have created called the grpd_count to apply any mathematical formula. Here, we have the count of every alphabet available to us.

Join groupby() and apply() Function in Pandas

Let us manipulate the data frame grpd_count to divide the total number of counts for each alphabet by the sum of all counts. This idea is generally used to gauge the weightage of an entity in the range from 0 to 1.

The values closer to one have a higher weightage, whereas the values closer to zero have a lower weightage, meaning the occurrence of that particular alphabet is less than others.

Code Sample:

from pandas import *

our_data = {"mylabel": Series(['P','R','E','E','T','S','A','P','R','E','T'])}
dframe = DataFrame(our_data)

def perc(value, total):
    return value/float(total)

def gcou(values):
    return len(values)

grpd_count = dframe.groupby("mylabel").mylabel.agg(gcou)
mydata = grpd_count.apply(perc, total=dframe.mylabel.count())

print(mydata) #prints output

Output:

mylabel
A    0.090909
E    0.272727
P    0.181818
R    0.181818
S    0.090909
T    0.181818
Name: mylabel, dtype: float64

We have successfully performed an operation after grouping data in Pandas.

Therefore, with the help of the Grouping By technique in Pandas, we can efficiently filter data based on our requirement and when needed and based on one or more than one condition and then apply some function or aggregation to results.

Preet Sanghavi avatar Preet Sanghavi avatar

Preet writes his thoughts about programming in a simplified manner to help others learn better. With thorough research, his articles offer descriptive and easy to understand solutions.

LinkedIn GitHub

Related Article - Pandas GroupBy

  • Filter Rows After groupby() in Pandas Python
  • Introduction to Useful Rolling Functions for GroupBy Object in Pandas
  • GroupBy and Aggregate Multiple Columns in Pandas
  • Calculate the Mean of a Grouped Data in Pandas
  • GroupBy Month in Pandas