Python Pandas Percentile

Python Pandas Percentile

In Python Pandas, there are some inbuilt functions such as mean(), mode(), median(), and quantile() function.

Developers consider this function a crucial step in the data cleaning process.

Calculate Pandas Percentile in Python

A luxurious car company wants to develop its branches, for that it is trying to locate the best location thru getting the people’s income.

import pandas as pd
import numpy as np

df = pd.read_excel("C:\\Users\\banga\\Downloads\\Record.xlsx")
df

Output:

# A Dataframe with `Name` and `Monthly Income (in INR)` as columns present in the Record.xlsx file. These columns determine the income of people present in the respective locality.
index	Name		Monthly Income (in INR)
0	John		30000
1	Joel		35000
2	Albert		45000
3	Ali		50000
4	Jerin		70000
5	Robert		90000
6	Sampath		1000000

We have imported the dataset into a dataframe using pandas. Let us filter the data frame with our requirements.

The company needs the average income of that locality, affecting its growth.

df.describe()

Output:

index	Monthly Income (in INR)
count	7.000000
mean	188571.428571
std	358407.522774
min	30000.000000
25%	40000.000000
50%	50000.000000
75%	80000.000000
max	1000000.000000

We can infer that the mean is higher than the average income of the people. We use the percentile concept to deal with these types of real-time situations.

percentile = df["Monthly Income (in INR)"].quantile(0.99)
percentile

Output:

945399.9999999995
new_df = df[df["Monthly Income (in INR)"] <= percentile]
new_df

Output:

index	Name		Monthly Income (in INR)
0	John		30000
1	Joel		35000
2	Albert		45000
3	Ali		50000
4	Jerin		70000
5	Robert		90000

We have found one solution, and now we have to fill the null values in a dataframe, mostly average values.

df

Output:

index	Name		Monthly Income (in INR)
0	John		30000.0
1	Joel		35000.0
2	Albert		45000.0
3	Ali		NaN
4	Jerin		70000.0
5	Robert		90000.0
6	Sampath		1000000.0
avg = df["Monthly Income (in INR)"].mean()
df.fillna(avg)

Output:

index	Name		Monthly Income (in INR)
0	John		30000.000000
1	Joel		35000.000000
2	Albert		45000.000000
3	Ali		211666.666667
4	Jerin		70000.000000
5	Robert		90000.000000
6	Sampath		1000000.000000

We can see that the null value is filled with the average incomes, but it is not ideal. To overcome this problem, we use another method.

med = df["Monthly Income (in INR)"].median()
df.fillna(med)

Output:

index	Name		Monthly Income (in INR)
0	John		30000.0
1	Joel		35000.0
2	Albert		45000.0
3	Ali		57500.0
4	Jerin		70000.0
5	Robert		90000.0
6	Sampath		1000000.0

In this way, we can determine the proper value for the company based on its growth.

Related Article - Python Pandas