Pandas Groupby Weighted Average

Pandas Groupby Weighted Average

  1. Calculate the Weighted Average of Pandas DataFrame
  2. Use Groupby Function to Group the Weighted Average in Pandas

In this article, we’ll learn how to calculate a weighted average of Pandas DataFrame. We also discuss how to group the weighted average of Pandas DataFrame.

Calculate the Weighted Average of Pandas DataFrame

After importing pandas as pd, we will create a simple DataFrame. Let us imagine you are a teacher and evaluating your students’ scores.

Overall, there are three different assessments: Quiz_1, Quiz_2 and Quiz_3.

Code Example:

import pandas as pd
import numpy as np

Student_DF = pd.DataFrame(
    {
        "Student_Score": [30, 60, 90],
        "Std_score_Weight": [1, 2, 3],
        "Student_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

Output:

   Student_Score  Std_score_Weight Student_Assessment
0             30                 1             Quiz_1
1             60                 2             Quiz_2
2             90                 3             Quiz_3

Code Example:

Student_Average = Student_DF["Student_Score"].mean()
Student_Average

Output:

60.0

Those assessments should influence the overall score differently depending on their weight. So, we want to calculate the weighted average instead of the sample mean.

First, we multiply the Student_Score by the values, then we need to divide the result by the total sum of the weights, and this is also how we could implement it in Pandas.

As the Pandas library allows us to do vectorized calculations, we can multiply the Student_Score by the weight and calculate the sum. Then we need to divide the result by the sum of the weights.

Code Example:

Std_weighted_avg = (Student_DF["Student_Score"] * Student_DF["Std_score_Weight"]).sum() / Student_DF["Std_score_Weight"].sum()
Std_weighted_avg

For this DataFrame, the weighted average would be the following.

70.0

Use Groupby Function to Group the Weighted Average in Pandas

For the next example, we added another column for the different students. So, we have got students John & Jack here.

Student_DF = pd.DataFrame(
    {
        "Student_Score": [30, 50, 90, 40, 50, 20],
        "Std_score_Weight": [1, 2, 3, 1, 2, 3],
        "Two_Students": ["John", "John", "John", "Jack", "Jack", "Jack"],
        "Students_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

Output:

   Student_Score  Std_score_Weight Two_Students Students_Assessment
0             30                 1         John              Quiz_1
1             50                 2         John              Quiz_2
2             90                 3         John              Quiz_3
3             40                 1         Jack              Quiz_1
4             50                 2         Jack              Quiz_2
5             20                 3         Jack              Quiz_3

Let’s assume we want to calculate the weighted average only for student Jack. In that case, we could filter our data, as we did here with the query() method.

The filtered DataFrame will look like this.

Filtered_by_Jack = Student_DF.query("Two_Students == 'Jack'")
Filtered_by_Jack

Output:

   Student_Score  Std_score_Weight Two_Students Students_Assessment
3             40                 1         Jack              Quiz_1
4             50                 2         Jack              Quiz_2
5             20                 3         Jack              Quiz_3

With that in place, we can apply the same calculation as before, but this time on the filtered DataFrame.

Std_weighted_avg = (Filtered_by_Jack["Student_Score"] * Filtered_by_Jack["Std_score_Weight"]).sum() / Filtered_by_Jack["Std_score_Weight"].sum()
Std_weighted_avg

Output:

33.333333333333336

However, this method can get tedious, especially when dealing with a larger dataset. For example, 100 students, and you want to calculate the weighted average for each student.

In our case, we have just added one more student to the dataset.

Student_DF = pd.DataFrame(
    {
        "Student_Score": [20, 40, 90, 80, 60, 10, 5, 60, 90],
        "Std_score_Weight": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "Three_Student": ["John", "John", "John", "Jack", "Jack", "Jack", "Harry", "Harry", "Harry"],
        "Students_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

Output:

   Student_Score  Std_score_Weight Three_Student Students_Assessment
0             20                 1          John              Quiz_1
1             40                 2          John              Quiz_2
2             90                 3          John              Quiz_3
3             80                 1          Jack              Quiz_1
4             60                 2          Jack              Quiz_2
5             10                 3          Jack              Quiz_3
6              5                 1         Harry              Quiz_1
7             60                 2         Harry              Quiz_2
8             90                 3         Harry              Quiz_3

But, the following method will also work regardless of many students the dataset might contain. This time, we will write a small helper function called Groupby_weighted_avg().

The function takes three parameters: the values, weighted_value and the column name called Group_Cols, which we want to group the weighted average. The calculation method is very similar to before; the only difference is that we combine it with the groupby() method.

def Groupby_weighted_avg(values, weighted_value, Group_Cols):
    return (values * weighted_value).groupby(Group_Cols).sum() / weighted_value.groupby(Group_Cols).sum()

With our Groupby_weighted_avg() function in place, we can now pass in the values, which are the Student_Score in our example, and the Std_score_Weight. We want to group the result by Three_Student.

Groupby_weighted_avg(Student_DF["Student_Score"], Student_DF["Std_score_Weight"], Student_DF["Three_Student"])

After executing the above line, we will have a new DataFrame containing the weighted average for each student.

Three_Student
Harry    65.833333
Jack     38.333333
John     61.666667
dtype: float64

Complete Source Code:

# In[1]:
import pandas as pd
import numpy as np

Student_DF = pd.DataFrame(
    {
        "Student_Score": [30, 60, 90],
        "Std_score_Weight": [1, 2, 3],
        "Student_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

# In[2]:
Student_Average = Student_DF["Student_Score"].mean()
Student_Average

# In[3]:
Std_weighted_avg = (Student_DF["Student_Score"] * Student_DF["Std_score_Weight"]).sum() / Student_DF["Std_score_Weight"].sum()
Std_weighted_avg

# In[4]:
# groupby
Student_DF = pd.DataFrame(
    {
        "Student_Score": [30, 50, 90, 40, 50, 20],
        "Std_score_Weight": [1, 2, 3, 1, 2, 3],
        "Two_Students": ["John", "John", "John", "Jack", "Jack", "Jack"],
        "Students_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

# In[5]:
Filtered_by_Jack = Student_DF.query("Two_Students == 'Jack'")
Filtered_by_Jack

# In[6]:
Std_weighted_avg = (Filtered_by_Jack["Student_Score"] * Filtered_by_Jack["Std_score_Weight"]).sum() / Filtered_by_Jack["Std_score_Weight"].sum()
Std_weighted_avg

# In[7]:
Student_DF = pd.DataFrame(
    {
        "Student_Score": [20, 40, 90, 80, 60, 10, 5, 60, 90],
        "Std_score_Weight": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "Three_Student": ["John", "John", "John", "Jack", "Jack", "Jack", "Harry", "Harry", "Harry"],
        "Students_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3", "Quiz_1", "Quiz_2", "Quiz_3"],
    }
)
Student_DF

# In[8]:
def Groupby_weighted_avg(values, weighted_value, Group_Cols):
    return (values * weighted_value).groupby(Group_Cols).sum() / weighted_value.groupby(Group_Cols).sum()

# In[9]:
Groupby_weighted_avg(Student_DF["Student_Score"], Student_DF["Std_score_Weight"], Student_DF["Three_Student"])
Salman Mehmood avatar Salman Mehmood avatar

Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.

LinkedIn

Related Article - Pandas Groupby

  • Pandas Groupby Describe
  • Pandas Apply Transform With Groupby
  • Group by and Sort in Pandas
  • Groupby Index Columns in Pandas
  • Pandas Groupby Count