NumPy Confidence Interval

Vaibhhav Khetarpal Jun 30, 2022
  1. Confidence Interval
  2. Use T-Distribution to Calculate Confidence Intervals in Python
  3. Use Normal Distribution to Calculate Confidence Intervals in Python
NumPy Confidence Interval

A Confidence Interval for a mean is an essential part of statistics widely utilized on data in the fields of Data Analytics. Python is one of the most popular programming languages used by professionals in the field of Data Analytics and allows the implementation of Confidence Intervals on arrays.

This tutorial discusses Confidence Interval and demonstrates the different approaches available to implement it in Python.

Confidence Interval

A Confidence Interval for a mean can be defined as a range of values for which we anticipate figuring out the value capable of accurately reflecting the population.

The formula for calculating the Confidence Interval can be seen below.

Confidence Interval =  x̄  +/-  t*(s/√n)

The parameters of this formula are explained below.

  1. - The mean of the sample data.
  2. t - The corresponding t-value for the confidence level.
  3. s - Standard deviation for the sample data.
  4. n - The size of the sample data.

Let’s now move on to the various approaches that can be utilized to calculate Confidence Intervals in Python. Two main methods can be utilized, both needing functions coming off the SciPy library in Python.

The SciPy library in Python is an abbreviation for Scientific Python and is utilized to provide several functions that help with Technical and Scientific computing. The SciPy.stats submodule of the library provides a wide variety of functions that deal with statistics in Python.

Use T-Distribution to Calculate Confidence Intervals in Python

The SciPy.stats library provides a t.interval() function that can be utilized to calculate Confidence Intervals using the t-distribution approach.

The t-distribution approach can be utilized when dealing with smaller datasets, usually when the data has less than 30 elements (n<30).

The syntax and the parameter description for the t.interval() function have been described below.

scipy.stats.t.interval(alpha, length, loc, scale)
  1. alpha - It defines the probability of getting a random variable from the selected range.
  2. length - It depicts the length of the given data set.
  3. loc - It depicts the location parameter value.
  4. scale - It depicts the scale parameter value.

The following code takes in the goals scored by 20 footballers in one calendar year and calculates the 90% Confidence Intervals for the given data with the help of the t-distribution approach.

import numpy as np
import scipy.stats as st

# data of goals scored by 20 footballers in a calendar year
fb_data = [10, 11, 10, 14, 16, 24, 10, 6, 8, 10, 11, 27, 28, 21, 13, 10, 6, 7, 8, 10]
# create 90% confidence interval
print(
    st.t.interval(
        alpha=0.90, df=len(fb_data) - 1, loc=np.mean(fb_data), scale=st.sem(fb_data)
    )
)

The above code provides the following output.

(10.395704943723088, 15.60429505627691)

Use Normal Distribution to Calculate Confidence Intervals in Python

The same SciPy.stats library also provides a norm.interval() function that can be utilized to calculate Confidence Intervals by using the normal distribution approach.

This approach is generally utilized in the cases where the dataset is comparatively larger; that is, the number of elements is more than 30 (n>30).

The syntax and parameter description for the norm.interval() function is described below.

scipy.stats.norm.interval(alpha, loc, scale)
  1. alpha - It defines the probability of getting a random variable from the selected range.
  2. loc - It depicts the location parameter value.
  3. scale - It depicts the scale parameter value.

The following code takes an example of a dataset of 80 elements and calculates the 90% Confidence Intervals on it with the help of the normal distribution approach.

import numpy as np
import scipy.stats as st

fb_data = np.random.randint(15, 20, 80)
# create 90% confidence interval
print(st.norm.interval(alpha=0.90, loc=np.mean(fb_data), scale=st.sem(fb_data)))

The above code provides the following output.

(16.763325839308074, 17.286674160691923)
Vaibhhav Khetarpal avatar Vaibhhav Khetarpal avatar

Vaibhhav is an IT professional who has a strong-hold in Python programming and various projects under his belt. He has an eagerness to discover new things and is a quick learner.

LinkedIn