Longest Common Substring in Python

Longest Common Substring in Python

  1. Longest Common Substring in Python
  2. Longest Common Substring Using Loops in Python
  3. Use Recursion to Find the Longest Common Substring
  4. Use Dynamic Programming to Find the Longest Common Substring in Python
  5. Conclusion

String in Python stores a sequence of characters in it to perform various operations on them. Substrings are a part of these strings.

In this article, we need to find the longest common substring between two given strings and will discuss various solutions for the same.

Longest Common Substring in Python

A substring is a contiguous sequence of characters within a given string. It can be of any length.

The main problem is that we have been provided with two strings, and we need to find a substring that is common between the given strings and should be the longest among all possible common substrings.

Input: str1 = "HaveAAppleRegularly"
       str2 = "HaveAnAnApple"
Output: "Apple"

In the above example, we have two common substrings between the given strings, "Have" and "Apple". But since the length of the "Apple" substring is the longest among the rest of the substrings; therefore, it will be displayed as our result.

This problem can be solved using various concepts like recursion and dynamic programming.

Longest Common Substring Using Loops in Python

Loops can be used to iterate through a string. We can iterate through the string here using the loops and find our longest common substring between the two given strings.

We will use both for and while loops in this approach. The following are the steps that need to be followed to find the longest common substring in Python.

  • We will be finding all the substrings of the first string.
  • We will check the current substring of the first string is also a substring of the second string.
  • If both substring match, then we will store their lengths in a certain variable and will keep on updating this variable.
  • Lastly, the variable storing the length of the substring will contain our desired result and be printed.

Code Example:

str1 = "PokemonGo"
str2 = "StopPokemonLetsGo"
result=0
for i in range(len(str1)):
    for j in range(len(str2)):
        k=0
        while ((i + k) < len(str1) and (j + k) < len(str2) and str1[i + k] == str2[j + k]):
            k = k+1
        result = max(result, k)

print("Length of the longest common substring is:", result)

Output:

Length of the longest common substring is: 7

All the substrings of the string can be calculated in O(n^2) time, whereas checking that if the current substring matches with a substring of the second string would take O(m) time. The time complexity for the above approach would be O(n^2 * m) where n and m are lengths of the two given strings, respectively.

However, as we have not taken any extra space to achieve the solution, the space complexity for the above solution would be O(1).

Use Recursion to Find the Longest Common Substring

Recursion refers to a function calling itself. We need a base case and the choice diagram for the problem at hand in recursion.

We will follow the steps below to find the longest common substring in Python.

  • In the given problem of finding the longest common substring, the smallest possible input can be a string of length 0. Therefore, the base case would be checking if any of the lengths of the inputs are 0, then the function should return a 0.
  • Now, we will compare the last characters of both the strings, then two cases arise that either both the characters would match or they will not match with each other.
  • If the last character of each of the given strings matches, then we should recursively call the rest of the strings. Therefore, we reduce the lengths of both the strings by 1 and add one to it to count the length of the string.
  • However, if the characters do not match, a recursive call is made to the first string decrementing its length by 1 and then to the second string to decrement its length by 1.
  • The length, the maximum between the two calls, is taken for our result.

Code Example:

def LCS(s1, s2, n, m):
    if n == 0 or m == 0:
        return 0

    if s1[n-1] == s2[m-1]:
        return 1+lcs(s1, s2, n-1, m-1)

    else:
        return max(lcs(s1, s2, n, m-1),lcs(s1, s2, n-1, m))
s1 = "pokemonGo"
s2 = "watchPokemon"
n = len(s1)
m = len(s2)
res = lcs(s1,s2,n,m)
print("Length of the longest common substring is:", res)

Output:

Length of the longest common substring is: 7

The time complexity of the above solution would be O(3^(m+n)), and the space complexity would be O(m+n).

Use Dynamic Programming to Find the Longest Common Substring in Python

The basic idea of dynamic programming is to find the lengths for all the substrings of both the strings and store their respective lengths in a table. Since using recursion, there is a possibility of getting a stack overflow error because, for large inputs, the recursion stack will keep on increasing.

We introduce the concept of dynamic programming in which we form a table and will keep on storing the results for the respective substrings. The same algorithm we used in recursion is also used with some changes.

Now, we will discuss the steps for finding the longest common substring in Python.

  • First, initialize the first column and first row of the table. These cells will be initialized with a value of 0 as we have seen in the base condition in recursion.
  • We use loops instead of recursive calls for our logic.
  • Inside the loops, if the last character of both strings matches, we increase the particular cell’s length by 1.
  • Otherwise, we store the maximum length until the adjacent row and column in the particular cell.
  • Lastly, our result will be stored in the last position of the table; therefore, we return dp[m][n].

Code Example:

def LCS(X, Y, m, n):

    dp = [[0 for x in range(m + 1)] for y in range(n + 1)]

    for i in range(m + 1):
        for j in range(n + 1):
            if (X[i-1] == Y[j-1]):
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j],dp[i][j-1])
    return dp[m][n]
s1 = "playbatball"
s2 = "batballwicket"
n = len(s1)
m = len(s2)
res = lcs(s1,s2,n,m)
print("Length of the longest common substring is:", res)

Output:

Length of the longest common substring is: 7

Since there are only two loops in our solution, the time complexity for the above solution would be O(m*n). We are using extra space by forming a table to store the results.

The space complexity of the above solution would be O(m*n).

Conclusion

We have learned three different approaches for finding the longest common substring between two given strings. The first approach, a naive approach, uses three loops and finds all the substrings of the given string and keeps a check on the longest among all the substrings.

The second approach uses recursion, whereas the third approach uses dynamic programming to find the longest common substring in Python. The third approach would be the fastest among all the solutions discussed in this article to solve the problem and should be preferred.

Related Article - Python String

  • Remove Commas From String in Python
  • Check a String Is Empty in a Pythonic Way
  • Convert a String to Variable Name in Python
  • Remove Whitespace From a String in Python
  • Extract Numbers From a String in Python
  • Convert String to Datetime in Python