Regex Wildcards Using the Re Module in Python

Jay Shaw Oct 10, 2023
  1. Use the re.sub() Function for Regex Operations Using Wildcards in Python
  2. Replace Matches in Regular Expression Using re.sub() Module in Python
  3. Understand How to Use Wildcards With re.sub() Submodule
  4. Use Two or More Regex Wildcards Together in Python
  5. Perform Operations on Strings Using the Regex Pattern and re.sub() Function by Adding a Wildcard in Python
  6. Conclusion
Regex Wildcards Using the Re Module in Python

Wildcards are used in regular expressions as a symbol to represent or swap out one or more characters. These are mostly used to simplify search criteria.

This article explains in detail how to use re.sub() with a wildcard in Python to match characters strings with regex.

Use the re.sub() Function for Regex Operations Using Wildcards in Python

The re module in Python is used for operations on Regular expressions (RegEx). These are unique strings of characters used to find a string or group of strings.

Comparing a text to a specific pattern may determine if it is present or absent.

It can also divide a pattern into one or more sub-patterns. Regex support is available in Python through the re module.

Its main purpose is to search for a string inside a regular expression.

Before we understand how to use re.sub() with a wildcard in Python, let’s learn the implementation of the re.sub() function on normal string statements.

Replace Matches in Regular Expression Using re.sub() Module in Python

The re.sub() function replaces one or more matches in the given text with a string.

re.sub(pattern, repl, string, count=0, flags=0)

It returns the string created by substituting the replacement repl for the pattern’s leftmost non-overlapping occurrences in the string.

In the absence of a match, the string is returned in its original form. If repl is a string, any backslash escapes are processed. The repl can be a function as well.

Let’s understand the code example below.

import re

rex = "[0-9]+"
string_reg = "ID - 54321, Pay - 586.32"
repl = "NN"

print("Original string")
print(string_reg)

result = re.sub(rex, repl, string_reg)

print("After replacement")
print(result)

What the code does:

  1. The first line of code imports the re module.
  2. The pattern to search is stored inside the variable rex. The quantifier - [0-9]+ implies a group of digits ranging from 0-9 whose decimal places can extend to any number of digits.
  3. The string on which the sub-operation will be implemented is stored inside the variable string_reg.
  4. The string to replace the pattern is stored inside the variable repl.
  5. The re.sub() operation looks up the pattern rex inside the string variable string_reg and replaces it with repl. The returned string is stored inside the variable result.
result = re.sub(rex, repl, string_reg)

Output: All the numeric digits are replaced with 'NN', while all the alphabetical ones are left untouched.

Original string
ID - 54321, Pay - 586.32
After replacement
ID - NN, Pay - NN.NN

Understand How to Use Wildcards With re.sub() Submodule

This article mainly focuses on four types of wildcards - . (Dot), *, ?, and +. Learning what each of them does is important in learning how to use re.sub() with a wildcard in Python.

  1. . (The Dot) - Use re.sub with the . wildcard in Python to match any character except a new line. The re module is imported in the program below, and three string instances are stored inside a string variable string_reg.

    Using re.sub() with the . wildcard in Python, the string_reg variable is overwritten with the result returned from the re.sub() function. As the dot matches a new character, the program searches for the pattern ad and any number of d that are repeated after ad.

    In the output, it can be seen that every time the program finds a pattern ad., it replaces it with REMOVED.

    import re
    
    string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx"
    
    string_reg = re.sub(r"ad.", "REMOVED ", string_reg)
    print(string_reg)
    

    Output:

    a23kREMOVED hh234 ... REMOVED 2asdf675 ... xxxREMOVED 2axxx
    
  2. The asterisk (*) - Use re.sub() with this wildcard in Python to give the preceding RE as many repetitions as possible, matching 0 or more of those repetitions in the resulting RE.

    For example, ad* matches the letters 'a', 'ad', or 'a' that is followed by any number of d.

    It can be seen in the output here that every instance of 'a' and 'ad' is replaced with the keyword 'PATCH'.

    import re
    
    string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx"
    
    string_reg = re.sub(r"ad*", "PATCH", string_reg)
    print(string_reg)
    

    Output:

    PATCH23kPATCHhh234 ... PATCH2PATCHsdf675 ... xxxPATCH2PATCHxxx
    
  3. The + - Use re.sub() with this wildcard in Python to match 1 or more repeats of the previous RE in the new RE. Ad+ will not match 'a'; instead, it matches 'a' followed by any non-zero number of d.

    The function searches for the pattern 'ad....' where the '...' represents the repeating number of the succeeding RE 'd' and replaces it with 'POP'.

    import re
    
    string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx"
    
    string_reg = re.sub(r"ad+", "POP", string_reg)
    print(string_reg)
    

    Output:

    a23kPOPhh234 ... POP2asdf675 ... xxxPOP2axxx
    
  4. The ? - makes the next RE match the previous RE’s 0 or 1 repetitions. The pattern ad? matches either 'a' or 'ad'.

    The program finds the instances of 'a' or 'ad' and replaces them with the regular expression (REGEX) 'POP'.

    import re
    
    string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx"
    
    string_reg = re.sub(r"ad?", "POP", string_reg)
    print(string_reg)
    

    Output:

    POP23kPOPdhh234 ... POPd2POPsdf675 ... xxxPOPd2POPxxx
    

Use Two or More Regex Wildcards Together in Python

Sometimes using re.sub() with a wildcard in Python with just a single quantifier is not enough to get the desired result. Combining quantifiers enable the possibility of passing more complex patterns to the system.

Let’s understand some of them.

  1. The *?, +?, ?? - In the previous examples, we have learned about the ‘.’, ‘+’, ‘*’ quantifiers. All of them are greedy, implying that they match as much text as possible.

    For example, if the RE<.*> is matched against <a> b <c>, it will match the full string rather than just <a>, which is often not the desired behavior.

    The ? quantifier is added at the end to solve the issue. The quantifier instructs it to do the match in a minimal or non-greedy manner, implying that the fewest characters get matched.

    Only <a> will match when the RE<.*?> pattern is used.

    import re
    
    string_reg = "as56ad5 ... dhgasd55df ... xxxadd2axxx"
    
    string_reg = re.sub(r"ad*?", "SUGAR", string_reg)
    print(string_reg)
    

    Output: The ad*? quantifier searches instances of just 'a'.

    SUGARs56SUGARd5 ... dhgSUGARsd55df ... xxxSUGARdd2SUGARxxx
    

    For ad+?: It searches the instance of just 'ad'.

    as56SUGAR5 ... dhgasd55df ... xxxSUGARd2axxx
    

    For ad??: It also searches instances of just 'a'.

    SUGARs56SUGARd5 ... dhgSUGARsd55df ... xxxSUGARdd2SUGARxxx
    
  2. The *+, ++, ?+ (also known as possessive quantifiers) - Similar to the '*', '+', and '?' quantifiers, those with the '+' match as frequently as feasible.

    When the expression after it doesn’t match, these don’t allow for backtracking as the greedy quantifiers do. This type of quantifier is known as a possessive quantifier.

    For instance, a*a will match "aaaa" since the a* matches all four a’s, but when the final "a" is encountered, the expression backtracks, and the a* only matches three a’s in total with the final "a" matching the fourth "a".

    But when the expression a*+a is used to match "aaaa", the a*+ will match all four "a"s, but it cannot be backtracked and will not match with the final "a" as it cannot find any more characters to match.

    The equivalents of x*+, x++, and x?+ are (?>x*), (?>x+), and (?>x?) respectively. Let’s look at the program to understand the concept better.

    import regex
    
    string_reg = "as56ad5 ... dhgasd55df ... xxxadd2axxx"
    
    string_reg = regex.sub(r"ad*+", "SUGAR", string_reg)
    print(string_reg)
    

    Note: The re module does not support possessive quantifiers. Use the regex() module instead.

    Output: Finds instance of either a or 'adddd....'.

    SUGARs56SUGAR5 ... dhgSUGARsd55df ... xxxSUGAR2SUGARxxx
    

    For ad++: Finds instance of 'ad' or 'adddd....'.

    as56SUGAR5 ... dhgasd55df ... xxxSUGAR2axxx
    

    For ad+?: Behaves the same as ad++.

    as56SUGAR5 ... dhgasd55df ... xxxSUGARd2axxx
    

Perform Operations on Strings Using the Regex Pattern and re.sub() Function by Adding a Wildcard in Python

We have learned how to use the re.sub() with a wildcard in Python. Now we will use the concepts together to search for a string pattern in a Regex and replace the whole word instead of just the string pattern.

The problem statement presents us with a string and a pattern. The pattern needs to be searched inside the given string.

Once found, the re.sub() function will replace the whole word.

Example: Replace the Whole Word When the Pattern Is Found in the Beginning

  1. Import the re module.

  2. Create a variable string_reg and store any string value. Here, a compound string is stored, meaning the re.sub() function will implement its effect on all four groups inside the string.

    string_reg = """\
        23khadddddh234 > REMOVED23khh234
        add2asdf675 > REMOVED2asdf675"""
    
  3. The function needs to find a pattern inside the string, which replaces the whole string when found. The pattern to find is 'add', so a combination of quantifiers is used to achieve the desired result.

    The combination should be in a way that matches 'ad', 'add', or 'addddd'. However, neither add23khh234 nor add2asdf675 should match.

    The best way to do it is to use add.+?.

    string_reg = re.sub(r"add.+? ", "REMOVED ", string_reg)
    

Code:

import re

string_reg = """\
... 23khadddddh234 > REMOVED23khh234
... add2asdf675 > REMOVED2asdf675"""

string_reg = re.sub(r"add.+? ", "REMOVED ", string_reg)
print(string_reg)

Output: The program searches for 'ad...', and when found, replaces it with repl 'REMOVED'. If the 'ad...' is spotted at the beginning, it replaces the whole word.

... 23khREMOVED > REMOVED23khh234
... REMOVED > REMOVED2asdf675

Conclusion

A vivid description of how to use re.sub() with a wildcard in Python is presented. The article’s first section focuses on using the Python function re.sub with simple REGEX.

Then the concept of using wildcards with re.sub() is explained in detail.

After going through the article, the reader can easily use re.sub() with a wildcard in Python and create programs that search string patterns in REGEX.

Related Article - Python Regex