How to Work With Regex in Scala

Suraj P Feb 16, 2024
  1. Regex Class in Scala
  2. Find Matches in Text in Scala
  3. Use Regex to Replace a Text in Scala
  4. Use Regex to Extract Values in Scala
  5. Conclusion
How to Work With Regex in Scala

Regular expressions define a common pattern used to match the input data. These are highly useful for pattern matching and text processing or parsing.

In this article, we’ll learn how to work with Regex(regular expressions) in Scala.

Regex Class in Scala

Regex is a class in Scala that is imported from scala.util.matching.Regex, based on the Java package java.util.regex, which is extensively used for pattern matching and text parsing. Regex objects can be created in two ways.

The first method is to explicitly create the Regex class object.

val x = new Regex("you")

The second method is to use the r method.

val x = "You".r

Let’s look at different use cases for regular expressions with Regex.

Find Matches in Text in Scala

Finding matches in the text is one of the most common use cases of the Regex.

Example Code:

import scala.util.matching.Regex
object myClass
{
    def main(args: Array[String])
    {
        val x = new Regex("Tony")
        val text = "Iron man is also known as Tony Stark. Tony is an Avenger"

        println(x findFirstIn text)
    }
}

Output:

Some(Tony)

Run Code

In the above code, we used the findFirstIn method to find the first match of the Regular expression, and the method returns an Option[String] object.

Example Code:

import scala.util.matching.Regex
object myClass
{
    def main(args: Array[String])
    {
        val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
        val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"

        println((reg findAllIn text).mkString(","))
    }
}

Output:

01-011,30-059

Run Code

In the above example, we used the findAllIn method to find all the matches and return the MatchIterator. We then used the mkString method to convert the output to a string separated by a ,(comma).

We also have the findFirstMatchIn method. It works like the findFirstIn method but returns Option[Match].

Example Code:

import scala.util.matching.Regex
object myClass
{
    def main(args: Array[String])
    {
        val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
        val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"

        val result = reg.findFirstMatchIn(text)
        println(Some("011"), for (x <- result) yield x.group(2))
    }
}

Output:

(Some(011),Some(011))

Run Code

Use Regex to Replace a Text in Scala

This is another use case of Regex that is replacing text. At times during text parsing, we might have replaced some part of it with something else.

Example Code:

import scala.util.matching.Regex
object myClass
{
    // Main method
    def main(args: Array[String])
    {
        val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
        val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"

        println(reg replaceFirstIn(text, "1234"))
    }
}

Output:

He lives in Warsaw 1234 and she lives in Cracow 30-059

Run Code

In the above code, we’ve used the replaceFirstIn method to replace the first match found in the text with the string "1234".

Example Code:

import scala.util.matching.Regex
object myClass
{
    // Main method
    def main(args: Array[String])
    {
        val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
        val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"

        println(reg replaceAllIn(text, "1234"))
   }
}

Output:

He lives in Warsaw 1234 and she lives in Cracow 1234

Run Code

In the above code, we used the replaceAllIn method, which replaces all the matches found in text with "1234".

Use Regex to Extract Values in Scala

When we find a match with regular expressions, we can use Regex to extract values using pattern matching.

Example code:

import scala.util.matching.Regex
object myClass {

    def main(args: Array[String]) {
        val timestamp = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
        val time = "12:20:01.411" match {
            case timestamp(hour, minutes, _, _) => println(s"It is $minutes minutes after $hour")
        }
    }
}

Output:

It is 20 minutes after 12

Run Code

In Scala, Regex by default behaves as if the pattern was anchored. For example, the pattern is put in the middle of ^ and $ characters like ^pattern$, but we can remove these characters using the method unanchored, which is present in the UnanchoredRegex class.

With this help, we can have additional text in our string and still find what we need.

Example code:

import scala.util.matching.Regex
object myClass
{
    def main(args: Array[String]) {
        val timestamp = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
        val temp = timestamp.unanchored
        val time = "It is 12:20:01.411 in New York" match {
            case temp(hour, minutes, _, _) => println(s"It is $minutes minutes after $hour")
        }
    }
}

Output:

It is 20 minutes after 12

Run Code

Java inherits most of its regular expressions and its Regex features from the Perl programming language, and Scala inherits its regular expressions syntax from Java.

Let’s look at some of Scala’s commonly used regular expressions taken from Java.

Subexpression Matches
^ It matches the beginning of the line.
$ It matches the beginning of the end.
[...] It is used to match any single character present in the bracket.
[^...] It is used to match any single character not present in the bracket
\\w It is used to match the word characters.
\\d It is used to match the digits.

Conclusion

In this article, we have learned about the Regex class present in Scala standard library. We have also seen how it provides different APIs, which help us deal with the different use cases of regular expressions.

Author: Suraj P
Suraj P avatar Suraj P avatar

A technophile and a Big Data developer by passion. Loves developing advance C++ and Java applications in free time works as SME at Chegg where I help students with there doubts and assignments in the field of Computer Science.

LinkedIn GitHub