How to get average of a column of a pandas DataFrame

  1. df.mean() method to calculate the average of a Pandas DataFrame column
  2. df.describe() method

When we work with large data sets, sometimes we have to take average or mean of column. For example, you have a grading list of students and you want to know the average of grades or some other column. Listed below are the different ways to achieve this task.

  1. df.mean()
  2. df.describe()

We will use the same DataFrame in the next sections as follows,

import pandas as pd
data = {'name': ['Oliver', 'Harry', 'George', 'Noah'],
        'percentage': [90, 99, 50, 65],
        'grade': [88, 76, 95, 79]}
df = pd.DataFrame(data)

Below is the example DataFrame.

     name  percentage  grade
0  Oliver          90     88
1   Harry          99     76
2  George          50     95
3    Noah          65     79

df.mean() method to calculate the average of a Pandas DataFrame column

Let’s take the mean of grades column present in our dataset.

import pandas as pd
data = {'name': ['Oliver', 'Harry', 'George', 'Noah'],
        'percentage': [90, 99, 50, 65],
        'grade': [88, 76, 95, 79]}
df = pd.DataFrame(data)
mean_df = df['grade'].mean()
print(mean_df)

The following will be output.

84.5

Let’s take another example and apply df.mean() function on the entire DataFrame.

import pandas as pd
data = {'name': ['Oliver', 'Harry', 'George', 'Noah'],
        'percentage': [90, 99, 50, 65],
        'grade': [88, 76, 95, 79]}
df = pd.DataFrame(data)
mean_df = df.mean()
print(mean_df)

We don’t specify the column name in the mean() method in the above example. The mean() method automatically determines which columns are eligible for applying mean function.

The following will be output.

percentage    76.0
grade         84.5
dtype: float64

df.describe() method

This method creates the output of a complete statistics of the dataset. Let’s take a look how to use it.

import pandas as pd
data = {'name': ['Oliver', 'Harry', 'George', 'Noah'],
        'percentage': [90, 99, 50, 65],
        'grade': [88, 76, 95, 79]}
df = pd.DataFrame(data)
print(df.describe())

Output:

       percentage      grade
count    4.000000   4.000000
mean    76.000000  84.500000
std     22.524061   8.660254
min     50.000000  76.000000
25%     61.250000  78.250000
50%     77.500000  83.500000
75%     92.250000  89.750000
max     99.000000  95.000000

The result of df.describle() method is a DataFrame, therefore, you could get the average of percentage and grade by referring to the column name and row name.

df.describe()['grade']['mean']
df.describe()['percentage']['mean']

df.describe() can also work for specific column. Let’s apply this function on grade column.

import pandas as pd
data = {'name': ['Oliver', 'Harry', 'George', 'Noah'],
        'percentage': [90, 99, 50, 65],
        'grade': [88, 76, 95, 79]}
df = pd.DataFrame(data)
print(df['grade'].describe())

The following will be output.

count     4.000000
mean     84.500000
std       8.660254
min      76.000000
25%      78.250000
50%      83.500000
75%      89.750000
max      95.000000
Name: grade, dtype: float64

The result is Series when the column is specified. We could get the average value by referring to mean directly.

df['grade'].describe()['mean']
comments powered by Disqus