SciPy scipy.stats.multivariate_normal

Suraj Joshi Dec 15, 2021
SciPy scipy.stats.multivariate_normal

Python Scipy scipy.stats.multivariate_normal object is used to analyze the multivariate normal distribution and calculate different parameters related to the distribution using the different methods available.

Syntax to Gemerate Probability Density Function Using scipy.stats.multivariate_normal Object

scipy.stats.multivariate_normal.pdf(x, mean=None, cov=1, allow_singular=False)

Parameters:

x Values whose pdf is to be determined. The second dimension of this variable represents the components of the dataset.
mean Array-like element that represents the mean of the distribution. Each value of the array represents the value for each component in the dataset. The default value is 0.
cov Covariance Matrix of the data. The default value is 1.
allow_singular If set to True, singular cov can be allowed. The default value is False

Return:

An array-like structure which contains probability value for each element in x.

Example : Generate Probability Density Function Using scipy.stats.multivariate_normal.pdf Method

import numpy as np
from scipy.stats import multivariate_normal

mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = np.random.uniform(size=(5, 2))
y = multivariate_normal.pdf(x, mean=mean, cov=cov)

print("Tha data and corresponding pdfs are:")
print("Data-------PDF value")
for i in range(len(x)):
    print(x[i], end=" ")
    print("------->", end=" ")
    print(y[i], end="\n")

Output:

Tha data and corresponding pdfs are:
Data-------PDF value
[0.60156002 0.53917659] -------> 0.030687330659191728
[0.60307471 0.25205368] -------> 0.0016016741361277501
[0.27254519 0.06817383] -------> 0.7968146411119688
[0.33630808 0.21039553] -------> 0.7048988855032084
[0.0009666  0.52414497] -------> 0.010307396714783708

In the above example, x represents the array of values whose pdf is to be found. The rows represent each value of x whose pdf is to be found, and columns represent the number of components used to represent each value.

Here, each value of x consists of two components, and hence it is a vector of length 2. The mean will be a vector with a length equal to the number of components. Similarly, if d be the number of components in the dataset, cov will be a symmetric square matrix of size d*d.

The scipy.stats.multivariate_normal.pdf method takes the input x, mean and covariance matrix cov and outputs a vector with a length equal to the number of rows in x where each value in the output vector represents pdf value for each row in x.

Example : Draw Random Samples From a Multivariate Normal Distribution Using scipy.stats.multivariate_normal.rvs Method

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = multivariate_normal.rvs(mean, cov, 100)
plt.scatter(x[:, 0], x[:, 1])
plt.show()

Output:

Scatter Plot of random samples drawn from multivariate normal distribution

The above plot represents the scatter plot of 20 random samples drawn randomly from a multivariate normal distribution with two features. The distribution has mean value of [0.4,0.8] where 0.4 represents the mean value of the first feature and 0.8 the mean of the second feature. We finally draw the scatter plot of random samples with the first feature along the X-axis and the second feature along the Y-axis.

From the plot, it is clear that most of the sample points are centered around [0.4,0.8], representing the multivariate distribution’s mean.

Example : Get Cumulative Distribution Function Using scipy.stats.multivariate_normal.cdf Method

Cumulative distribution function (CDF) is the integral of pdf.CDF shows us that any value taken from the population will have a probability value less than or equal to some value. We can calculate cdf of points of multivariate distribution using the scipy.stats.multivariate_normal.cdf method.

import numpy as np
from scipy.stats import multivariate_normal

mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = np.random.uniform(size=(5, 2))
y = multivariate_normal.cdf(x, mean=mean, cov=cov)

print("Tha data and corresponding cdfs are:")
print("Data-------CDF value")
for i in range(len(x)):
    print(x[i], end=" ")
    print("------->", end=" ")
    print(y[i], end="\n")

Output:

Tha data and corresponding cdfs are:
Data-------CDF value
[0.89027577 0.06036432] -------> 0.22976054289355996
[0.78164237 0.09611703] -------> 0.24075282906929418
[0.53051197 0.63041372] -------> 0.4309184323329717
[0.15571201 0.97173575] -------> 0.21985053519541042
[0.72988545 0.22477096] -------> 0.28256819625802715

In the above example, x represents the array of points at which cdf is to be found. The rows represent each value of x at which cdf is to be found, and columns represent the number of components used to represent each value.

Here, each value of x consists of two components, and hence it is a vector of length 2. The mean will be a vector with a length equal to the number of components. Similarly, if d be the number of components in the dataset, cov will be a symmetric square matrix of size d*d.

The scipy.stats.multivariate_normal.cdf method takes the input x, mean and covariance matrix cov and outputs a vector with a length equal to the number of rows in x where each value in the output vector represents cdf value for each row in x.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn