Cosine Similarity in Python

Cosine Similarity in Python

Shivam Arora Nov-26, 2021 Jun-13, 2021 Python Python Math
  1. Use the scipy Module to Calculate the Cosine Similarity Between Two Lists in Python
  2. Use the NumPy Module to Calculate the Cosine Similarity Between Two Lists in Python
  3. Use the sklearn Module to Calculate the Cosine Similarity Between Two Lists in Python
  4. Use the torch Module to Calculate the Cosine Similarity Between Two Lists in Python

The cosine similarity measures the similarity between vector lists by calculating the cosine angle between the two vector lists. If you consider the cosine function, its value at 0 degrees is 1 and -1 at 180 degrees. This means for two overlapping vectors, the value of cosine will be maximum and minimum for two precisely opposite vectors.

In this article, we will calculate the cosine similarity between two lists of equal sizes.

Use the scipy Module to Calculate the Cosine Similarity Between Two Lists in Python

The spatial.cosine.distance() function from the scipy module calculates the distance instead of the cosine similarity, but to achieve that, we can subtract the value of the distance from 1.

For example,

from scipy import spatial
List1 = [4, 47, 8, 3]
List2 = [3, 52, 12, 16]
result = 1 - spatial.distance.cosine(List1, List2)
print(result)

Output:

0.9720951480078084

Use the NumPy Module to Calculate the Cosine Similarity Between Two Lists in Python

The numpy.dot() function calculates the dot product of the two vectors passed as parameters. The numpy.norm() function returns the vector norm.

We can use these functions with the correct formula to calculate the cosine similarity.

For example,

from numpy import dot
from numpy.linalg import norm
List1 = [4, 47, 8, 3]
List2 = [3, 52, 12, 16]
result = dot(List1, List2)/(norm(List1)*norm(List2))
print(result)

Output:

0.9720951480078084

If there are multiple or a list of vectors and a query vector to calculate cosine similarities, we can use the following code.

import numpy as np
List1 =np.array([[ 4, 45,  8,  4],
       [ 2, 23,  6,  4]])

List2=np.array([ 2, 54, 13, 15])

similarity_scores = List1.dot(List2)/ (np.linalg.norm(List1, axis=1) * np.linalg.norm(List2))

print(similarity_scores)

Output:

[0.98143311 0.99398975]

Use the sklearn Module to Calculate the Cosine Similarity Between Two Lists in Python

In the sklearn module, there is an in-built function called cosine_similarity() to calculate the cosine similarity.

See the code below.

from sklearn.metrics.pairwise import cosine_similarity,cosine_distances
A=np.array([10,3])
B=np.array([8,7])
result=cosine_similarity(A.reshape(1,-1),B.reshape(1,-1))
print(result)

Output:

[[0.91005765]]

Use the torch Module to Calculate the Cosine Similarity Between Two Lists in Python

When we deal with N-dimensional tensors having shapes (m,n), we can use the consine_similarity() function from the torch module to find the cosine similarity.

For example,

import torch
import torch.nn.functional as F
t1 = [3,45,6,8]
a = torch.FloatTensor(t1)

t2 = [4,54,3,7]
b = torch.FloatTensor(t2)
result = F.cosine_similarity(a, b, dim=0)

print(result)

Output:

tensor(0.9960)

Lists are converted into tensors using the torch.FloatTensor() module.

Related Article - Python Math

  • Calculate Factorial in Python
  • Calculate Inverse of Cosine in Python
  • Calculate Modular Multiplicative Inverse in Python
  • Fit Poisson Distribution to Different Datasets in Python
  • Reduce Fractions in Python
  • Define an Infinite Value in Python