How to Calculate the Cumulative Distribution Function in Python

Najwa Riyaz Feb 02, 2024
  1. Use numpy.arange() to Calculate the CDF in Python
  2. Use numpy.linspace() to Calculate the CDF in Python
How to Calculate the Cumulative Distribution Function in Python

The term cumulative distribution function or CDF is a function y=f(x), where y represents the probability of the integer x, or any number lower than x, being randomly selected from a distribution.

It is calculated in Python by using the following functions from the NumPy library.

  1. numpy.arange() function which returns an ndarray of evenly spaced values.
  2. numpy.linspace() function which returns an ndarray of evenly spaced values within a given interval.

Use numpy.arange() to Calculate the CDF in Python

The NumPy standard library contains the arange() function used to determine the CDF in Python.

For this, import the NumPy library first.

The arange() function returns an ndarray of evenly spaced values.

Below is an example that demonstrates the implementation of the CDF function using the numpy.arange() function in Python.

import matplotlib.pyplot as plt
import numpy

data = numpy.random.randn(5)
print("The data is-", data)
sorted_random_data = numpy.sort(data)
p = 1.0 * numpy.arange(len(sorted_random_data)) / float(len(sorted_random_data) - 1)
print("The CDF result is-", p)

fig = plt.figure()
fig.suptitle("CDF of data points")
ax2 = fig.add_subplot(111)
ax2.plot(sorted_random_data, p)
ax2.set_xlabel("sorted_random_data")
ax2.set_ylabel("p")

Here, the randn() function is used to return samples of data using the standard normal distribution. Since randn(5) is mentioned, a 1Darray is built with 5 random values.

Next, the data is sorted using the sort() function, after which the arange() function is used to calculate the CDF.

Output:

The data is- [ 0.14213322 -1.28760908  0.94533922  0.82004319  1.08232731]
The CDF result is- [0.   0.25 0.5  0.75 1.  ]

The graph is displayed as per the CDF function as.

python cdf

Use numpy.linspace() to Calculate the CDF in Python

The NumPy standard library contains the linspace() function used to determine the CDF in Python. For this, import the NumPy library first.

The linspace() function returns an ndarray of evenly spaced numbers over a specified interval.

Here is an example that demonstrates the implementation of the CDF function using numpy.linspace() in Python.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(5)
print("The data is-", data)
sorted_random_data = np.sort(data)
np.linspace(0, 1, len(data), endpoint=False)

print("The CDF result using linspace =\n", p)

fig = plt.figure()
fig.suptitle("CDF of data points")
ax2 = fig.add_subplot(111)
ax2.plot(sorted_random_data, p)
ax2.set_xlabel("sorted_random_data")
ax2.set_ylabel("p")

Here, the randn() function is used to return samples of data using the standard normal distribution. Next, the data is sorted using the sort() function, after which the arange() function is used to calculate the CDF.

Output:

The data is- [-0.92106668 -0.05998132  0.02102705 -0.84778184  0.90815869]
The CDF result using linspace =
 [0.   0.25 0.5  0.75 1.  ]

The graph is displayed as per the CDF function as below.

python cdf 2

Related Article - Python Math