Pandas DataFrame DataFrame.median() Function

Pandas DataFrame DataFrame.median() Function

Jinku Hu Mar-30, 2021 Jun-01, 2020 Pandas Pandas DataFrame
  1. Syntax of pandas.DataFrame.median():
  2. Example Codes: DataFrame.median() Method to Find Median Along Column Axis
  3. Example Codes: DataFrame.median() Method to Find Median Along Row Axis
  4. Example Codes: DataFrame.median() Method to Find Median Ignoring NaN Values

Python Pandas DataFrame.median() function calculates the median of elements of DataFrame object along the specified axis.

The median is not mean, but the middle of the values in the list of numbers.

Pandas DataFrame median

Syntax of pandas.DataFrame.median():

DataFrame.median( axis=None, 
                skipna=None, 
                level=None, 
                numeric_only=None, 
                **kwargs)

Parameters

axis find median along the row (axis=0) or column (axis=1)
skipna Boolean. Exclude NaN values (skipna=True) or include NaN values (skipna=False)
level Count along with particular level if the axis is MultiIndex
numeric_only Boolean. For numeric_only=True, include only float, int, and boolean columns
**kwargs Additional keyword arguments to the function.

Return

If the level is not specified, return Series of the median of the values for the requested axis, else return DataFrame of median values.

Example Codes: DataFrame.median() Method to Find Median Along Column Axis

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, 5, 10],
                   'Y': [4, 3, 8, 2, 9]})
print("DataFrame:")
print(df)

medians=df.median()
print("medians of Each Column:")
print(medians)

Output:

DataFrame:
    X  Y
0   1  4
1   2  3
2   7  8
3   5  2
4  10  9
medians of Each Column:
X    5.0
Y    4.0
dtype: float64

It calculates the median for both columns X and Y and finally returns a Series object with the median of each column.

To find the median of a particular column of DataFrame in Pandas, we call the median() function for that column only.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, 5, 10],
                   'Y': [4, 3, 8, 2, 9]})
print("DataFrame:")
print(df)

medians=df["X"].median()
print("medians of Each Column:")
print(medians)

Output:

DataFrame:
    X  Y
0   1  4
1   2  3
2   7  8
3   5  2
4  10  9
medians of Each Column:
5.0

It only gives the median of values of column X of DataFrame.

Example Codes: DataFrame.median() Method to Find Median Along Row Axis

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, 5, 10],
                   'Y': [4, 3, 8, 2, 9],
                   'Z': [2, 7, 6, 10, 5]})
print("DataFrame:")
print(df)

medians=df.median(axis=1)
print("medians of Each Row:")
print(medians)

Output:

DataFrame:
    X  Y   Z
0   1  4   2
1   2  3   7
2   7  8   6
3   5  2  10
4  10  9   5
medians of Each Row:
0    2.0
1    3.0
2    7.0
3    5.0
4    9.0
dtype: float64

It calculates the median for all the rows and finally returns a Series object with the median of each row.

To find the median of a particular row of DataFrame in Pandas, we call the median() function for that row only.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, 5, 10],
                   'Y': [4, 3, 8, 2, 9],
                   'Z': [2, 7, 6, 10, 5]})

print("DataFrame:")
print(df)

median=df.iloc[[0]].median(axis=1)
print("median of 1st Row:")
print(median)

Output:

DataFrame:
    X  Y   Z
0   1  4   2
1   2  3   7
2   7  8   6
3   5  2  10
4  10  9   5
median of 1st Row:
0    2.0
dtype: float64

It only gives the median of values of 1st row of DataFrame.

We use iloc method to select rows based on the index.

Example Codes: DataFrame.median() Method to Find Median Ignoring NaN Values

We use the default value of skipna parameter i.e. skipna=True to find the median of DataFrame along the specified axis by ignoring NaN values.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, None, 10, 8],
                   'Y': [None, 3, 8, 2, 9, 6],
                   'Z': [2, 7, 6, 10, None, 5]})

print("DataFrame:")
print(df)

median=df.median(skipna=True)
print("medians of Each Row:")
print(median)

Output:

DataFrame:
      X    Y     Z
0   1.0  NaN   2.0
1   2.0  3.0   7.0
2   7.0  8.0   6.0
3   NaN  2.0  10.0
4  10.0  9.0   NaN
5   8.0  6.0   5.0
medians of Each Row:
X    7.0
Y    6.0
Z    6.0
dtype: float64

If we set skipna=True, it ignores the NaN in the dataframe. It allows us to calculate the median of DataFrame along the column axis by ignoring NaN values.

import pandas as pd

df = pd.DataFrame({'X': [1, 2, 7, None, 10],
                   'Y': [5, 3, 8, 2, 9],
                   'Z': [2, 7, 6, 10, 4]})

print("DataFrame:")
print(df)

median=df.median(skipna=False)
print("medians of Each Row:")
print(median)

Output:

DataFrame:
      X  Y   Z
0   1.0  5   2
1   2.0  3   7
2   7.0  8   6
3   NaN  2  10
4  10.0  9   4
medians of Each Row:
X    NaN
Y    5.0
Z    6.0
dtype: float64

Here, we get NaN value for the median of the column X as column X has NaN value present in it.

Author: Jinku Hu
Jinku Hu avatar Jinku Hu avatar

Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.

LinkedIn

Related Article - Pandas DataFrame

  • Pandas concat Function
  • Pandas cut Function
  • Pandas DataFrame sort_index() Function
  • Pandas DataFrame.idxmax() Function
  • Pandas DataFrame.insert() Function
  • Pandas DataFrame.resample() Function