How to Calculate Mahalanobis Distance in Python

Muhammad Maisam Abbas Feb 02, 2024
  1. Calculate Mahalanobis Distance With cdist() Function in the scipy.spatial.distance Library in Python
  2. Calculate Mahalanobis Distance With numpy.einsum() Method in Python
How to Calculate Mahalanobis Distance in Python

This tutorial will introduce the methods to find the Mahalanobis distance between two NumPy arrays in Python.

Calculate Mahalanobis Distance With cdist() Function in the scipy.spatial.distance Library in Python

Mahalanobis distance is the measure of distance between a point and a distribution. If we want to find the Mahalanobis distance between two arrays, we can use the cdist() function inside the scipy.spatial.distance library in Python. The cdist() function calculates the distance between two collections. We can specify mahalanobis in the input parameters to find the Mahalanobis distance. See the following code example.

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])

i, j, k = x.shape

xx = x.reshape(i, j * k).T


y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])


yy = y.reshape(i, j * k).T

results = cdist(xx, yy, "mahalanobis")

results = np.diag(results)
print(results)

Output:

[3.63263583 2.59094773 1.97370848 1.97370848 2.177978   3.04256456
 3.04256456 1.54080605 2.58298363]

We calculated and stored the Mahalanobis distance between the arrays x and y with the cdist() function in the above code. We first created both arrays with the np.array() function. We then reshaped both arrays and saved the transpose in the new arrays xx and yy. We then passed these new arrays to the cdist() function and specified mahalanobis in the parameters with cdist(xx,yy,'mahalanobis').

Calculate Mahalanobis Distance With numpy.einsum() Method in Python

We can also calculate the Mahalanobis distance between two arrays using the numpy.einsum() method. The numpy.einsum() method is used to evaluate the Einstein summation convention on the input parameters.

import numpy as np

x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])
i, j, k = x.shape

xx = x.reshape(i, j * k).T


y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])


yy = y.reshape(i, j * k).T

X = np.vstack([xx, yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
delta = xx - yy
results = np.sqrt(np.einsum("nj,jk,nk->n", delta, VI, delta))
print(results)

Output:

[3.63263583 2.59094773 1.97370848 1.97370848 2.177978   3.04256456
 3.04256456 1.54080605 2.58298363]

We passed arrays to the np.vstack() function and stored the value inside the X. After that, we passed the transpose of X to the np.cov() function and stored the result inside V. We then calculated the multiplicative inverse of the matrix V using the numpy.linalg.inv() method and stored the result in VI. We calculated the difference between xx and yy and stored the results in delta. In the end, we calculated and stored the Mahalanobis distance between x and y with results = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta)).

Muhammad Maisam Abbas avatar Muhammad Maisam Abbas avatar

Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.

LinkedIn

Related Article - Python NumPy