- Extract Substring Using String Slicing in Python
Extract Substring Using the
slice()Constructor in Python
- Extract Substring Using Regular Expression in Python
The string is a sequence of characters. We deal with strings all the time, no matter if we are doing software development or competitive programming. Sometimes, while writing programs, we have to access sub-parts of a string. These sub-parts are more commonly known as substrings. A substring is a subset of a string.
In Python, we can easily do this task using string slicing or using regular expression or regex.
Extract Substring Using String Slicing in Python
There are a few ways to do string slicing in Python. Indexing is the most basic and the most commonly used method. Refer to the following code.
myString = "Mississippi" print(myString[:]) # Line 1 print(myString[4 : ]) # Line 2 print(myString[ : 8]) # Line 3 print(myString[2 : 7]) # Line 4 print(myString[4 : -1]) # Line 5 print(myString[-6 : -1]) # Line 6
Mississippi issippi Mississi ssiss issipp ssipp
In the above code, we add
 brackets at the end of the variable storing the string. We use this notation for indexing. Inside these brackets, we add some integer values that represent indexes.
This is the format for the brackets
[start : stop : step] (seperated by colons (
By default, the value of
0 or the first index, the value of
stop is the last index, and the value of
start represents the starting index of the substring,
stop represents the ending index of the substring, and
step represents the value to use for incrementing after each index.
The substring returned is actually between
start index and
stop - 1 index because the indexing starts from
0 in Python. So, if we wish to retrieve
Mississippi, we should use
[0 : 4]
The brackets can’t be empty. If you wish to use the default values, the required amount of colons
: should be added with spaces in-between to state which parameter you refer to. Refer to the following list for better understanding.
[:]-> Returns the whole string considering the default values.
[4 : ]-> Returns a substring starting from index
4till the last index.
[ : 8]-> Returns a substring starting from index
[2 : 7]-> Returns a substring starting from index
[4 : -1]-> Returns a substring starting from index
4till second last index.
-1can be used to define the last index in Python. The indexing from the last starts from
[-6 : -1]-> Returns a substring starting from the sixth index from the end till the second last index.
Extract Substring Using the
slice() Constructor in Python
Instead of mentioning the indexes inside the brackets, we can use the
slice() constructor to create a
slice object to slice a string or any other sequence such as a list or tuple.
slice(start, stop, step) constructor accepts three parameters, namely,
step. They mean exactly the same as explained above.
The working of
slice is a bit different as compared to brackets notation. The slice object is put inside the string variable brackets like this
If a single integer value, say
x, is provided to the
slice() constructor and is further used for index slicing, a substring starting from index
0 till index
x - 1 will be retrieved. Refer to the following code.
myString = "Mississippi" slice1 = slice(3) slice2 = slice(4) slice3 = slice(0, 8) slice4 = slice(2, 7) slice5 = slice(4, -1) slice6 = slice(-6, -1) print(myString[slice1]) print(myString[slice2]) print(myString[slice3]) print(myString[slice4]) print(myString[slice5]) print(myString[slice6])
Mis Miss Mississi ssiss issipp ssipp
The outputs received are self-explanatory. The indexes follow the same rules as defined for brackets notation.
Extract Substring Using Regular Expression in Python
For regular expression, we’ll use Python’s in-built package
import re string = "123AAAMississippiZZZ123" try: found = re.search('AAA(.+?)ZZZ', string).group(1) print(found) except AttributeError: pass
In the above code, the
search() function searches for the first location of the pattern provided as an argument in the passed string. It returns a
Match object. A
Match object has many attributes which define the output such as the
span of the substring or the starting and the ending indexes of the substring.
print(dir(re.search('AAA(.+?)ZZZ', string))) will output all the attributes of the
Match object. Note that some attributes might be missing because when
dir() is used,
__dir__() method is called, and this method returns a list of all the attributes. And this method is editable or overridable.