How to Calculate Edit Distance in Python

Zeeshan Afridi Feb 02, 2024
How to Calculate Edit Distance in Python

Today, we will learn about the edit distance in Python. We will also explore the character string’s insertion, deletion, substitution, and recursive implementation.

Edit Distance in Python

Edit distance is the amount required for the transposition of a String to another String. This transposition is done by recursion, substitution, insertion and deletion of the characters of a string.

The number of character deletions, insertions, and replacements required to turn one word into another depends on how closely the two words were edited.

It is also addressed as the Levenshtein distance in Python because it does not need that two strings to be of identical length to be compared; the Levenshtein distance has a significant influence.

The Levenshtein distance is straightforward to comprehend intuitively. The total number of single-character modifications determines the distance between the two.

The output distance indicates how many modifications were required to make the two words identical. The smaller the output distance, the fewer changes will be required.

Let’s understand the edit distance using the following example code for insertion, deletion, substitution and recursion implementation of the character string.

Insertion of the Character String

In the following code, t is inserted between i and n in the word machine.

Example Code:

w = "Machine"
w = w[:5] + "t" + w[5:]
print(w)

Output:

Machitne

Deletion of the Character String

In the following code, c is deleted between a and h in the word machine.

Example Code:

w = "Machine"
w = w[:2] + w[3:]
print(w)

Output:

Mahine

Substitution of the Character String

In the following code, c is replaced by a between a and h in the word machine.

Example Code:

w = "Machine"
w = w[:2] + "a" + w[3:]
print(w)

Output:

Maahine

Recursive Implementation of the Character String

The edit distance between machine and learning is five, as shown in the output below.

We can take any string input from the user, and the Levenshtein distance is implemented recursively in the following Python method.

This recursive technique is inefficient since the Levenshtein distances of the identical substrings are computed repeatedly.

Example Code:

w1 = input("Type here for word 1: ")
w2 = input("Type here for word 2: ")

len1 = len(w1)
len2 = len(w2)

a = [[0] * (len2 + 1) for _ in range(len1 + 1)]

for m in range(0, len1 + 1):
    a[m][0] = m

for n in range(0, len2 + 1):
    a[0][n] = n

for m in range(1, len1 + 1):
    for n in range(1, len2 + 1):
        if w1[m - 1] == w2[n - 1]:
            a[m][n] = a[m - 1][n - 1]
        else:
            a[m][n] = min(a[m][n - 1], a[m - 1][n], a[m - 1][n - 1]) + 1
print(a[m][n])

Output:

Type here for word 1: machine
Type here for word 2: learning
5
Zeeshan Afridi avatar Zeeshan Afridi avatar

Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.

LinkedIn