We will introduce the Rabin-Karp algorithm in Python and discuss how we can use it in our Python programs.
Rabin-Karp Algorithm In Python
The Rabin-Karp algorithm finds specific numbers, letters, or patterns from a given input or value. Machine learning algorithms are often the go-to solution in data science when you need to extract insights from data, but not all algorithms are created equal.
Some are better than others at finding the right insights, and some are better than others at avoiding false positives. One of the most powerful machine learning algorithms for finding the right insights is the Rabin-Karp algorithm.
Rabin-Karp algorithm is used to find the best match between a set of text and possible passwords. It is primarily used in software to help users find their passwords when they have forgotten them.
It was initially developed for finding email addresses in text, and since then, it has been used in many other applications such as finding phone numbers, extracting text from PDFs, and much more. It was designed by Richard M. Rabin and Abraham S. Karp.
Complexity of the Rabin-Karp Algorithm in Python
The Rabin-Karp algorithm is a method for finding a minimum number of distinct values in an array efficiently. It has been proven asymptotically faster than other common minimum-finding algorithms like binary search, quadratic probing, and sequential search.
However, the Rabin-Karp algorithm is often much more complex than its theoretical worst-case complexity of
n is the number of distinct values in the search array. We have this complexity because the Rabin-Karp algorithm must repeatedly visit each value in the search array until it finds the required value.
Implement the Rabin-Karp Algorithm in Python
Now, let us understand how to implement the Rabin-Karp algorithm in our Python examples.
We will give a character pattern and then check the possibility of the given pattern to the existing elements. If the pattern is found, then give it as the output.
First, we will assign the value of the number of characters added as an input. In our case, we will assign
15, as shown below.
# python numOfChar = 15
We will define a function as
searchPattern that will take three arguments. The first argument will be the pattern we want to find using the Rabin-Karp algorithm.
The second argument will be the text in which we will be looking for a pattern. And the last argument will be the prime number.
We will assign the length of the pattern and text to variables so we can use the length later on. We will also set the hash value for the pattern and text.
We will define the variables
b in the
# python def searchPattern(pattern, text, primeNum): patLen = len(pattern) txtLen = len(text) a = 0 b = 0 p = 0 # hash value for pattern t = 0 # hash value for txt h = 1
From the Rabin-Karp algorithm, we will first find the value of
h using the formula
pow(numOfChar, patLen-1)% primeNum, as shown below.
# python for a in xrange(patLen-1): h = (h * numOfChar)% primeNum
Now, we will find the hash value of the pattern and the first window of the text, as shown below.
# python for a in xrange(patLen): p = (numOfChar * p + ord(pattern[a]))% primeNum t = (numOfChar * t + ord(text[a]))% primeNum
We will create another
for loop to slide the pattern over the text one by one. Inside this
for loop, we will check the hash value of the current window of text and pattern.
If the hash values match, we will check for the characters one by one, as shown below.
# python for a in range(txtLen-patLen + 1): if p == t: for b in range(patLen): if text[a + b] != pattern[b]: break b+= 1 if b == patLen: print("Pattern found at index " + str(a)) if a < txtLen-patLen: t = (numOfChar*(t-ord(text[a])*h) + ord(text[a + patLen]))% primeNum if t < 0: t = t + primeNum
Now, let’s assign values to the parameters and call the function to check how it works, as shown below.
# python text = "ABBAABCDEAABBDCAABB" pattern = "ABB" primeNum = 101 searchPattern(pattern, text, primeNum)
As you can see, our pattern was found at three different locations. Using the Rabin-Karp algorithm, we can find patterns in a given text at multiple locations.