Pandas DataFrame.describe() Function

Minahil Noor Jan 30, 2023
  1. Syntax of pandas.DataFrame.describe():
  2. Example Codes: DataFrame.describe() Method to Find the Statistics of a Data Frame
  3. Example Codes: DataFrame.describe() Method to Find the Statistics of Each Column
  4. Example Codes: DataFrame.describe() Method to Find the Statistics of Numeric Columns
Pandas DataFrame.describe() Function

Python Pandas DataFrame.describe() function tells about the statistical data of a DataFrame.

Syntax of pandas.DataFrame.describe():

DataFrame.describe(
    percentiles=None, include=None, exclude=None, datetime_is_numeric=False
)

Parameters

percentiles This parameter tells about the percentiles to include in the output. All values should be between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
include It specifies the data types to include in the output. It has three options.
all: all columns of the input will be included in the output.
A list-like of data types: limits the results to the provided data types.
None: The result will include all numeric columns.
exclude It specifies the data types to exclude from the output. It has two options.
A list-like of data types: excludes the provided data types from the result.
None: The result will exclude nothing.
datetime_is_numeric A boolean parameter. It tells whether to treat datetime data types as numeric.

Return

It returns the summary of statistics of the Series or Dataframe passed.

Example Codes: DataFrame.describe() Method to Find the Statistics of a Data Frame

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
                        'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                        'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})

print("The Original Data frame is: \n")
print(dataframe)

dataframe1 = dataframe.describe()
print("Statistics are: \n")
print(dataframe1)

Output:

The Original Data frame is: 

   Attendance    Name  Obtained Marks
0          60  Olivia              90
1         100    John              75
2          80   Laura              82
3          78     Ben              64
4          95   Kevin              45
Statistics are: 

       Attendance  Obtained Marks
count    5.000000        5.000000
mean    82.600000       71.200000
std     15.773395       17.484279
min     60.000000       45.000000
25%     78.000000       64.000000
50%     80.000000       75.000000
75%     95.000000       82.000000
max    100.000000       90.000000

The function has returned the summary of the statistics of the DataFrame. We have passed no parameters, so, the function has used all the default values.

Example Codes: DataFrame.describe() Method to Find the Statistics of Each Column

We will find the statistics of all columns using the include parameter.

import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
                        'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                        'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})
print("The Original Data frame is: \n")
print(dataframe)

dataframe1 = dataframe.describe(include='all')
print("Statistics are: \n")
print(dataframe1)

Output:

The Original Data frame is: 

   Attendance    Name  Obtained Marks
0          60  Olivia              90
1         100    John              75
2          80   Laura              82
3          78     Ben              64
4          95   Kevin              45
Statistics are: 

        Attendance   Name  Obtained Marks
count     5.000000      5        5.000000
unique         NaN      5             NaN
top            NaN  Kevin             NaN
freq           NaN      1             NaN
mean     82.600000    NaN       71.200000
std      15.773395    NaN       17.484279
min      60.000000    NaN       45.000000
25%      78.000000    NaN       64.000000
50%      80.000000    NaN       75.000000
75%      95.000000    NaN       82.000000
max     100.000000    NaN       90.000000

The function has returned the summary of statistics of all columns of the DataFrame.

Example Codes: DataFrame.describe() Method to Find the Statistics of Numeric Columns

Now we will find the statistics of numeric columns only using the exclude parameter.

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 78,4: 95},
                        'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                        'Obtained Marks': {0: 90, 1: 75, 2: 82, 3: 64, 4: 45}})
print("The Original Data frame is: \n")
print(dataframe)

dataframe1 = dataframe.describe(exclude=[object])
print("Statistics are: \n")
print(dataframe1)

Output:

The Original Data frame is: 

   Attendance    Name  Obtained Marks
0          60  Olivia              90
1         100    John              75
2          80   Laura              82
3          78     Ben              64
4          95   Kevin              45
Statistics are: 

       Attendance  Obtained Marks
count    5.000000        5.000000
mean    82.600000       71.200000
std     15.773395       17.484279
min     60.000000       45.000000
25%     78.000000       64.000000
50%     80.000000       75.000000
75%     95.000000       82.000000
max    100.000000       90.000000

We have excluded the data type object.

Related Article - Pandas DataFrame