How to Get the N-th Occurrence of a Substring in a String in Python

Namita Chaudhary Feb 02, 2024
  1. Find the NTH Occurrence of a Substring in a String in Python
  2. Calculate the NTH Occurrence of a Substring Using the split() Method in Python
  3. Find the NTH Occurrence of a Substring Using the find() Method in Python
  4. Find the NTH Occurrence of a Substring Using a Regular Expression in Python
  5. Conclusion
How to Get the N-th Occurrence of a Substring in a String in Python

Strings in Python are used to store a sequence of characters so that we can perform different operations on them. A substring in Python is a set of characters contained within another string.

In this article, we will be extracting the index where the substring occurs after the nth time and discuss various approaches to find the nth index of the substring in Python.

Find the NTH Occurrence of a Substring in a String in Python

In this example, we are given a string and a substring and the value n such that we need to find the index at which our substring is present in the original string after the nth time. Suppose we are given a string s, substring str with a value of n below.

Example Code:

s = "batpollbatsitbat"
str = "bat"
n = 2

Output:

7

We need to return the index at which our substring "bat" is present for the second time in our original string. Therefore, we will return 7 as the output according to the 0-based indexing.

Calculate the NTH Occurrence of a Substring Using the split() Method in Python

Python’s split() method is used to split the given string into a list of strings by a specified separator. However, we can explicitly specify the separator (whitespace) where we need to break the string.

The split() function also takes a second parameter as an argument maxsplit that is used to specify the number of times you need to break the string after a substring is found.

In the following example, we need to split the string n times where n is the nth occurrence given in the question.

Example Code:

def solve(s, str, n):
    sep = s.split(str, n)
    if len(sep) <= n:
        return -1
    return len(s) - len(sep[-1]) - len(str)


print(solve("foobarfobar akfjfoobar afskjdffoobarruythfoobar", "foobar", 2))

Output:

16

We have a function solve() in the program above, whose main logic is implemented. The first line uses the split() function in which the substring str is passed as a separator, and the value of n is passed as the value of maxsplit.

After this line, our string s has been broken into a list of strings. The list of strings stored in the sep variable is separated according to the input taken in the sample below.

["", "fobar akfj", " afskjdffoobarruythfoobar"]

The original string is separated at the indexes where we find our substring str. However, this split happens only two times because of the value of n.

The last string stored in the sep variable matches our substring at some indexes, but we have not separated them.

After the split() function, we have checked a condition that if the length of the sep variable is greater than the value of n because if there is a case where the user is trying to search for the nth occurrence of a substring that does not exist n times, in that case, we need to return -1.

Now comes our main logic, where the index of the nth occurrence of the substring is calculated, and we have separated the string only n times. Therefore, the string that might have been left after the nth occurrence of the substring is stored as the last element of the sep variable.

Therefore, we subtract the length of the original string s with the length of the last string present in the sep variable, which is accessed as sep[-1].

This gives the index where the occurrence of our desired substring finishes, but since we need the starting index, we will subtract the substring length as well.

In this way, we can calculate the index of our nth occurrence of a substring.

Find the NTH Occurrence of a Substring Using the find() Method in Python

The find() method in Python is used to find the index of the first occurrence of the specified value. We can also specify a starting and an ending index in the find() function.

These starting and ending indexes tell us to limit our search in the specified range.

Example Code:

s = "xyxyxyxybvxy"
str = "xy"
n = 4
x = -1
for i in range(0, n):
    x = s.find(str, x + 1)
print("Nth occurrence is at", x)

Output:

Nth occurrence is at 6

We applied the find() function on the string s, which will find the first occurrence of the substring inside our original string in each iteration.

In our code, in the first iteration, the original string will be searched from the 0th index to the end because of the value of x (initially -1), but in the find() function, it changes to x+1 = -1+1 = 0).

This iteration will give us the first occurrence of the substring in our original string. However, the second iteration will search the string from the index 1 to the end (because x becomes 0 in the previous iteration and the find() function changes to x+1 = 0+1 = 1).

This iteration will give us the second occurrence of our substring. We can do such n iterations to find the nth occurrence of the string.

Find the NTH Occurrence of a Substring Using a Regular Expression in Python

Regular Expressions are used to find a certain pattern in a string, and it is a sequence of characters that lets us form a search pattern. Python has the regular expression package known as re.

We will be using the re package for finding the nth occurrence of the substring.

Example Code:

import re

s = "yoofpofbof"
n = 3
result = [m.start() for m in re.finditer(r"of", s)]
if len(result) <= n:
    print(result[n - 1])

Output:

8

We have imported the re package in the first line to use the regular expressions in the above code. After which, our input has been defined.

We use the finditer() method from the re package, which gives us the starting and ending indexes of all the matched substrings from the original string, but we only need the starting index to find the nth occurrence.

Therefore, we use the m.start() method, which will give us only the starting indexes of the substring matched.

We use the for loop to find all the starting indexes of the substring and store them in the result variable. Now, if the user provides the value of n that is not in the string, it will throw an error because we check the condition between the length of the result list and the n variable.

Lastly, we print our index of the nth occurrence of the substring.

Conclusion

In this tutorial, we’ve discussed the three different methods to find the nth occurrence of the substring in a string. These methods, such as the find() function, split() function and the Regular expression method, have been discussed in great detail to make it more clear.

Related Article - Python String