Pattern Matching in Bash

Olorunfemi Akinlua Jan 30, 2023
  1. Use the =~ Operator for Pattern Matching
  2. Use the * Operator for Pattern Matching
  3. Use subpatterns for Pattern Matching
Pattern Matching in Bash

Pattern matching is a powerful feature in Bash that allows you to compare strings against patterns to find matches or perform actions based on the comparison result. This can be useful in situations like checking the format of a string or extracting substrings from a larger string.

This article will discuss how to do pattern matching in Bash and cover some common operators and techniques used in pattern matching.

Use the =~ Operator for Pattern Matching

To understand how pattern matching works in Bash, let’s first look at the =~ operator used to perform pattern matching. This operator takes two operands: the string to be matched and the pattern to be compared against.

For example, let’s say we have a string called my_string that contains a URL, and we want to check if it starts with "http" or "https". We can use the =~ operator to perform this comparison.

my_string="https://www.example.com"

if [[ $my_string =~ ^https?:// ]]; then
  echo "The string starts with a valid URL"
fi

Output:

The string starts with a valid URL

In the code above, we use the =~ operator to compare the my_string variable against the pattern ^https?://. The ^ character indicates that the pattern must match at the string’s start, while the ? character indicates that the preceding character (in this case, the s in https) is optional.

This means that the pattern will match either "http://" or "https://" at the start of the string.

If the comparison is successful, the if statement will be executed, and the message "The string starts with a valid URL" will be printed.

Use the * Operator for Pattern Matching

Another common operator in pattern matching is the * (asterisk) character, which indicates that the preceding character can be matched zero or more times. For example, let’s say we have a string that contains a number, and we want to check if it is a valid decimal number with at most two decimal places.

We can use the * operator to perform this comparison.

my_string="3.14"

if [[ $my_string =~ ^[0-9]+.[0-9]{0,2}$ ]]; then
echo "The string is a valid decimal number"
else
echo "The string is not a valid decimal number"
fi

Output:

The string is a valid decimal number

In the code above, we use the =~ operator to compare the my_string variable against the pattern ^[0-9]+\.[0-9]{0,2}$. The ^ character indicates that the pattern must match at the beginning of the string, while the $ character indicates that the pattern must match at the end.

The [0-9] character class matches any digit from 0-9, and the + character indicates that the preceding character class must be matched one or more times.

We use the \ (backslash) character to escape the special meaning of the . (dot) character, which is used to match any single character. The {0,2} quantifier indicates that the preceding character (in this case, the [0-9] character class) must be matched zero to two times.

This means that the pattern will only match numbers with at most two decimal places, such as "3.14" or "42.00".

If the comparison is successful, the if statement will be executed, and the message "The string is a valid decimal number" will be printed.

Use subpatterns for Pattern Matching

Another common technique used in pattern matching is the use of subpatterns. A subpattern is a part of a pattern enclosed in parentheses and can be used to group characters or refer to a matched substring in the input string.

For example, let’s say we have a string that contains a date in the format "YYYY-MM-DD", and we want to extract the year, month, and day from the string. We can use subpatterns to perform this extraction.

my_string="2022-11-20"

if [[ $my_string =~ ^([0-9]{4})-([0-9]{2})-([0-9]{2})$ ]]; then
  year=${BASH_REMATCH[1]}
  month=${BASH_REMATCH[2]}
  day=${BASH_REMATCH[3]}

  echo "The year is: $year"
  echo "The month is: $month"
  echo "The day is: $day"
fi

Output:

The year is: 2022
The month is: 11
The day is: 20

In the code above, we use the =~ operator to compare the my_string variable against the ^([0-9]{4})-([0-9]{2})-([0-9]{2})$ pattern. The ^ and $ characters indicate that the pattern must match the entire string, while the ([0-9]{4}), ([0-9]{2}), and ([0-9]{2}) subpatterns match the year, month, and day parts of the date, respectively.

If the comparison is successful, the if statement will be executed, and the year, month, and day will be extracted from the input string. The extracted substrings are stored in the BASH_REMATCH array, and can be accessed using the indexes 1, 2, and 3, which correspond to the first, second, and third subpatterns, respectively.

In conclusion, pattern matching is a powerful feature in Bash that allows you to compare strings against patterns to find matches or perform actions based on the result of the comparison. This can be done using the =~ operator, which takes a string and a pattern as operands and returns true if the string matches the pattern.

Common operators and techniques used in pattern matching include the * (asterisk) operator, which matches the preceding character zero or more times, and subpatterns, which allow you to group characters or extract matched substrings from the input string.

Olorunfemi Akinlua avatar Olorunfemi Akinlua avatar

Olorunfemi is a lover of technology and computers. In addition, I write technology and coding content for developers and hobbyists. When not working, I learn to design, among other things.

LinkedIn