How to Calculate Mahalanobis Distance in Python

Calculate Mahalanobis Distance With
cdist()
Function in thescipy.spatial.distance
Library in Python 
Calculate Mahalanobis Distance With
numpy.einsum()
Method in Python
This tutorial will introduce the methods to find the Mahalanobis distance between two NumPy arrays in Python.
Calculate Mahalanobis Distance With cdist()
Function in the scipy.spatial.distance
Library in Python
Mahalanobis distance is the measure of distance between a point and a distribution. If we want to find the Mahalanobis distance between two arrays, we can use the cdist()
function inside the scipy.spatial.distance
library in Python. The cdist()
function calculates the distance between two collections. We can specify mahalanobis
in the input parameters to find the Mahalanobis distance. See the following code example.
import numpy as np
from scipy.spatial.distance import cdist
x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])
i, j, k = x.shape
xx = x.reshape(i, j * k).T
y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])
yy = y.reshape(i, j * k).T
results = cdist(xx, yy, "mahalanobis")
results = np.diag(results)
print(results)
Output:
[3.63263583 2.59094773 1.97370848 1.97370848 2.177978 3.04256456
3.04256456 1.54080605 2.58298363]
We calculated and stored the Mahalanobis distance between the arrays x
and y
with the cdist()
function in the above code. We first created both arrays with the np.array()
function. We then reshaped both arrays and saved the transpose in the new arrays xx
and yy
. We then passed these new arrays to the cdist()
function and specified mahalanobis
in the parameters with cdist(xx,yy,'mahalanobis')
.
Calculate Mahalanobis Distance With numpy.einsum()
Method in Python
We can also calculate the Mahalanobis distance between two arrays using the numpy.einsum()
method. The numpy.einsum()
method is used to evaluate the Einstein summation convention on the input parameters.
import numpy as np
x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])
i, j, k = x.shape
xx = x.reshape(i, j * k).T
y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])
yy = y.reshape(i, j * k).T
X = np.vstack([xx, yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
delta = xx  yy
results = np.sqrt(np.einsum("nj,jk,nk>n", delta, VI, delta))
print(results)
Output:
[3.63263583 2.59094773 1.97370848 1.97370848 2.177978 3.04256456
3.04256456 1.54080605 2.58298363]
We passed arrays to the np.vstack()
function and stored the value inside the X
. After that, we passed the transpose of X
to the np.cov()
function and stored the result inside V
. We then calculated the multiplicative inverse of the matrix V
using the numpy.linalg.inv()
method and stored the result in VI
. We calculated the difference between xx
and yy
and stored the results in delta
. In the end, we calculated and stored the Mahalanobis distance between x
and y
with results = np.sqrt(np.einsum('nj,jk,nk>n', delta, VI, delta))
.
Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.
LinkedIn