How to Extract Substring From a String in Python

Vaibhav Vaibhav Feb 02, 2024
  1. Extract Substring Using String Slicing in Python
  2. Extract Substring Using the slice() Constructor in Python
  3. Extract Substring Using Regular Expression in Python
How to Extract Substring From a String in Python

The string is a sequence of characters. We deal with strings all the time, no matter if we are doing software development or competitive programming. Sometimes, while writing programs, we have to access sub-parts of a string. These sub-parts are more commonly known as substrings. A substring is a subset of a string.

In Python, we can easily do this task using string slicing or using regular expression or regex.

Extract Substring Using String Slicing in Python

There are a few ways to do string slicing in Python. Indexing is the most basic and the most commonly used method. Refer to the following code.

myString = "Mississippi"
print(myString[:])  # Line 1
print(myString[4:])  # Line 2
print(myString[:8])  # Line 3
print(myString[2:7])  # Line 4
print(myString[4:-1])  # Line 5
print(myString[-6:-1])  # Line 6

Output:

Mississippi
issippi
Mississi
ssiss
issipp
ssipp

In the above code, we add [] brackets at the end of the variable storing the string. We use this notation for indexing. Inside these brackets, we add some integer values that represent indexes.

This is the format for the brackets [start : stop : step] (seperated by colons (:)).

By default, the value of start is 0 or the first index, the value of stop is the last index, and the value of step is 1. start represents the starting index of the substring, stop represents the ending index of the substring, and step represents the value to use for incrementing after each index.

The substring returned is actually between start index and stop - 1 index because the indexing starts from 0 in Python. So, if we wish to retrieve Miss from Mississippi, we should use [0 : 4]

The brackets can’t be empty. If you wish to use the default values, the required amount of colons : should be added with spaces in-between to state which parameter you refer to. Refer to the following list for better understanding.

  • [:] -> Returns the whole string.
  • [4 : ] -> Returns a substring starting from index 4 till the last index.
  • [ : 8] -> Returns a substring starting from index 0 till index 7.
  • [2 : 7] -> Returns a substring starting from index 2 till index 6.
  • [4 : -1] -> Returns a substring starting from index 4 till second last index. -1 can be used to define the last index in Python.
  • [-6 : -1] -> Returns a substring starting from the sixth index from the end till the second last index.

Extract Substring Using the slice() Constructor in Python

Instead of mentioning the indexes inside the brackets, we can use the slice() constructor to create a slice object to slice a string or any other sequence such as a list or tuple.

The slice(start, stop, step) constructor accepts three parameters, namely, start, stop, and step. They mean exactly the same as explained above.

The working of slice is a bit different as compared to brackets notation. The slice object is put inside the string variable brackets like this myString[<'slice' object>].

If a single integer value, say x, is provided to the slice() constructor and is further used for index slicing, a substring starting from index 0 till index x - 1 will be retrieved. Refer to the following code.

myString = "Mississippi"
slice1 = slice(3)
slice2 = slice(4)
slice3 = slice(0, 8)
slice4 = slice(2, 7)
slice5 = slice(4, -1)
slice6 = slice(-6, -1)
print(myString[slice1])
print(myString[slice2])
print(myString[slice3])
print(myString[slice4])
print(myString[slice5])
print(myString[slice6])

Output:

Mis
Miss
Mississi
ssiss
issipp
ssipp

The outputs received are self-explanatory. The indexes follow the same rules as defined for brackets notation.

Extract Substring Using Regular Expression in Python

For regular expression, we’ll use Python’s in-built package re.

import re

string = "123AAAMississippiZZZ123"

try:
    found = re.search("AAA(.+?)ZZZ", string).group(1)
    print(found)
except AttributeError:
    pass

Output:

Mississippi

In the above code, the search() function searches for the first location of the pattern provided as an argument in the passed string. It returns a Match object. A Match object has many attributes which define the output such as the span of the substring or the starting and the ending indexes of the substring.

print(dir(re.search('AAA(.+?)ZZZ', string))) will output all the attributes of the Match object. Note that some attributes might be missing because when dir() is used, __dir__() method is called, and this method returns a list of all the attributes. And this method is editable or overridable.

Vaibhav Vaibhav avatar Vaibhav Vaibhav avatar

Vaibhav is an artificial intelligence and cloud computing stan. He likes to build end-to-end full-stack web and mobile applications. Besides computer science and technology, he loves playing cricket and badminton, going on bike rides, and doodling.

Related Article - Python String