# How to Calculate Mahalanobis Distance in Python

Muhammad Maisam Abbas Feb 02, 2024

This tutorial will introduce the methods to find the Mahalanobis distance between two NumPy arrays in Python.

## Calculate Mahalanobis Distance With `cdist()` Function in the `scipy.spatial.distance` Library in Python

Mahalanobis distance is the measure of distance between a point and a distribution. If we want to find the Mahalanobis distance between two arrays, we can use the `cdist()` function inside the `scipy.spatial.distance` library in Python. The `cdist()` function calculates the distance between two collections. We can specify `mahalanobis` in the input parameters to find the Mahalanobis distance. See the following code example.

``````import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])

i, j, k = x.shape

xx = x.reshape(i, j * k).T

y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])

yy = y.reshape(i, j * k).T

results = cdist(xx, yy, "mahalanobis")

results = np.diag(results)
print(results)
``````

Output:

``````[3.63263583 2.59094773 1.97370848 1.97370848 2.177978   3.04256456
3.04256456 1.54080605 2.58298363]
``````

We calculated and stored the Mahalanobis distance between the arrays `x` and `y` with the `cdist()` function in the above code. We first created both arrays with the `np.array()` function. We then reshaped both arrays and saved the transpose in the new arrays `xx` and `yy`. We then passed these new arrays to the `cdist()` function and specified `mahalanobis` in the parameters with `cdist(xx,yy,'mahalanobis')`.

## Calculate Mahalanobis Distance With `numpy.einsum()` Method in Python

We can also calculate the Mahalanobis distance between two arrays using the `numpy.einsum()` method. The `numpy.einsum()` method is used to evaluate the Einstein summation convention on the input parameters.

``````import numpy as np

x = np.array([[[1, 2, 3], [3, 4, 5], [5, 6, 7]], [[5, 6, 7], [7, 8, 9], [9, 0, 1]]])
i, j, k = x.shape

xx = x.reshape(i, j * k).T

y = np.array([[[8, 7, 6], [6, 5, 4], [4, 3, 2]], [[4, 3, 2], [2, 1, 0], [0, 1, 2]]])

yy = y.reshape(i, j * k).T

X = np.vstack([xx, yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
delta = xx - yy
results = np.sqrt(np.einsum("nj,jk,nk->n", delta, VI, delta))
print(results)
``````

Output:

``````[3.63263583 2.59094773 1.97370848 1.97370848 2.177978   3.04256456
3.04256456 1.54080605 2.58298363]
``````

We passed arrays to the `np.vstack()` function and stored the value inside the `X`. After that, we passed the transpose of `X` to the `np.cov()` function and stored the result inside `V`. We then calculated the multiplicative inverse of the matrix `V` using the `numpy.linalg.inv()` method and stored the result in `VI`. We calculated the difference between `xx` and `yy` and stored the results in `delta`. In the end, we calculated and stored the Mahalanobis distance between `x` and `y` with `results = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))`.

Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.