How to Plot CDF Matplotlib Python

Suraj Joshi Feb 02, 2024
How to Plot CDF Matplotlib Python

This tutorial explains how we can generate a CDF plot using the Matplotlib in Python. CDF is the function whose y-values represent the probability that a random variable will take the values smaller than or equal to the corresponding x-value.

Plot CDF Using Matplotlib in Python

CDF is defined for both continuous and discrete probability distributions. In continuous probability distribution, the random variable can take any value from the specified range, but in the discrete probability distribution, we can only have a specified set of values.

Plot CDF for Discrete Distribution Using Matplotlib in Python

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(1, 7)
y = [0.2, 0.1, 0.1, 0.2, 0.1, 0.3]
cdf = np.cumsum(y)

plt.plot(x, y, marker="o", label="PMF")
plt.plot(x, cdf, marker="o", label="CDF")
plt.xlim(0, 7)
plt.ylim(0, 1.5)
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for discrete distribution")
plt.legend()
plt.show()

Output:

Plot CDF for discrete distribution using Matplotlib in Python

It plots the PMF and CDF for the given distribution. To calculate the y-values for CDF, we use the numpy.cumsum() method to calculate an array’s cumulative sum.

If we are given frequency counts, we must normalize the y-values initially so that they represent the PDF.

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(1, 7)
frequency = np.array([3, 8, 4, 5, 3, 6])
pdf = frequency / np.sum(frequency)
cdf = np.cumsum(pdf)

plt.plot(x, pdf, marker="o", label="PMF")
plt.plot(x, cdf, marker="o", label="CDF")
plt.xlim(0, 7)
plt.ylim(0, 1.5)
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for discrete distribution")
plt.legend()
plt.show()

Output:

Plot CDF for discrete distribution using Matplotlib in Python after normalization

Here, we are given the frequency values for each X value. We convert the frequency values into pdf values by dividing each element of the pdf array by the sum of frequencies. We then use the pdf to calculate the CDF values to plot the CDF of given data.

We can also use histogram plots to view the CDF and PDF plots, which will be more intuitive for discrete data.

import numpy as np
import matplotlib.pyplot as plt

data = [3, 4, 2, 3, 4, 5, 4, 7, 8, 5, 4, 6, 2, 1, 0, 9, 7, 6, 6, 5, 4]
plt.hist(data, bins=9, density=True)
plt.hist(data, bins=9, density=True, cumulative=True, label="CDF", histtype="step")
plt.xlabel("X")
plt.ylabel("Probability")
plt.xticks(np.arange(0, 10))
plt.title("CDF using Histogram Plot")
plt.show()

Output:

Plot CDF for discrete distribution using Histogram Plots

It plots the CDF and PDF of given data using the hist() method. To plot the CDF, we set cumulative=True and set density=True to get a histogram representing probability values that sum to 1.

Plot CDF for Continuous Distribution Using Matplotlib in Python

import numpy as np
import matplotlib.pyplot as plt

dx = 0.005
x = np.arange(-10, 10, dx)
y = 0.25 * np.exp((-(x ** 2)) / 8)

y = y / (np.sum(dx * y))
cdf = np.cumsum(y * dx)

plt.plot(x, y, label="pdf")
plt.plot(x, cdf, label="cdf")
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for continuous distribution")
plt.legend()
plt.show()

Output:

Plot CDF for continuous distribution using Matplotlib in Python

It plots the PMF and CDF for the given continuous distribution. To calculate the y-values for CDF, we use the numpy.cumsum() method to calculate an array’s cumulative sum.

We divide y by the sum of the array y multiplied by the dx to normalize the values so that the CDF values range from 0 to 1.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn