Regex Whitespace in Java

Rupam Yadav Dec 22, 2023
  1. Use the matches() Method to Find Whitespace Using Regular Expressions in Java
  2. Use the Pattern and Matcher Classes to Find Whitespace Using Regular Expressions in Java
  3. Use the String.replaceAll Method to Find Whitespace Using Regular Expressions in Java
  4. Conclusion
Regex Whitespace in Java

Handling and manipulating strings is a common task in Java programming. Sometimes, you might need to identify and work with whitespaces within a string.

Regular expressions provide a powerful and flexible way to achieve this. In this article, we will explore various methods to find whitespaces in a string using regular expressions in Java, covering the matches() method, Pattern and Matcher classes, and the String.replaceAll method.

Use the matches() Method to Find Whitespace Using Regular Expressions in Java

The matches() method is a static method of the Pattern class in Java. It takes two parameters: the first being the regular expression pattern to match and the second being the string to be tested against the pattern.

Its syntax is as follows:

boolean matches(String regex, CharSequence input)

The method returns a boolean value, indicating whether the entire string matches the specified regular expression.

Let’s explore the code example below, which utilizes the matches() method to identify whitespaces using different whitespace regex characters:

import java.util.regex.Pattern;

public class RegWhiteSpace {
  public static void main(String[] args) {
    boolean whitespaceMatcher1 = Pattern.matches("\\s+", "   ");

    boolean whitespaceMatcher2 = Pattern.matches("\\s", " ");

    boolean whitespaceMatcher3 = Pattern.matches("[\\t\\p{Zs}]", " ");

    boolean whitespaceMatcher4 = Pattern.matches("\\u0020", " ");

    boolean whitespaceMatcher5 = Pattern.matches("\\p{Zs}", " ");

    System.out.println("\\s+ ---------- " + whitespaceMatcher1);
    System.out.println("\\s ----------- " + whitespaceMatcher2);
    System.out.println("[\\t\\p{Zs}] --- " + whitespaceMatcher3);
    System.out.println("\\u0020 ------- " + whitespaceMatcher4);
    System.out.println("\\p{Zs} ------- " + whitespaceMatcher5);
  }
}

In the presented example, the first scenario utilizes the regex \s+ to match one or more whitespace characters. The input string, " " (three spaces), successfully triggers the pattern, and as a result, the boolean variable whitespaceMatcher1 is set to true.

boolean whitespaceMatcher1 = Pattern.matches("\\s+", "   ");

Moving on to the second case, the regex \s is employed to match a single whitespace character. When applied to the input string " " (a single space), the pattern successfully identifies the whitespace, leading to the assignment of true to the boolean variable whitespaceMatcher2.

boolean whitespaceMatcher2 = Pattern.matches("\\s", " ");

The third situation introduces the regex [\\t\\p{Zs}], designed to match a single whitespace character. Functionally equivalent to \s, this pattern is employed to demonstrate an alternative approach.

The input string remains " " (a single space), and the result is consistent with the previous cases—whitespaceMatcher3 is set to true.

boolean whitespaceMatcher3 = Pattern.matches("[\\t\\p{Zs}]", " ");

In the fourth instance, the Unicode character \u0020, representing a space, is utilized as a regex to match a single whitespace character. The input string, once again " " (a single space), successfully triggers the pattern, resulting in true being assigned to whitespaceMatcher4.

boolean whitespaceMatcher4 = Pattern.matches("\\u0020", " ");

Finally, the fifth case employs the regex \p{Zs}, specifically designed to match a whitespace separator character. When applied to the input string " " (a single space), the pattern successfully identifies the whitespace separator, setting whitespaceMatcher5 to true.

boolean whitespaceMatcher5 = Pattern.matches("\\p{Zs}", " ");

Output:

\s+ ---------- true
\s ----------- true
[\t\p{Zs}] --- true
\u0020 ------- true
\p{Zs} ------- true

Use the Pattern and Matcher Classes to Find Whitespace Using Regular Expressions in Java

In addition to the matches() method, Java provides the Pattern and Matcher classes, offering more fine-grained control over regular expression matching. The Pattern class is responsible for compiling regular expressions into patterns, while the Matcher class performs matching operations on a given input string based on a compiled pattern.

The find() method of the Matcher class is particularly useful for identifying whether a substring in the input string matches the pattern.

boolean find()

Below is a Java code example that demonstrates the use of the Pattern and Matcher classes to identify whitespaces using different regex characters:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegWhiteSpaceMatcher {
  public static void main(String[] args) {
    Pattern pattern1 = Pattern.compile("\\s+");
    Matcher matcher1 = pattern1.matcher("   ");
    boolean whitespaceMatcher1 = matcher1.find();

    Pattern pattern2 = Pattern.compile("\\s");
    Matcher matcher2 = pattern2.matcher(" ");
    boolean whitespaceMatcher2 = matcher2.find();

    Pattern pattern3 = Pattern.compile("[\\t\\p{Zs}]");
    Matcher matcher3 = pattern3.matcher(" ");
    boolean whitespaceMatcher3 = matcher3.find();

    Pattern pattern4 = Pattern.compile("\\u0020");
    Matcher matcher4 = pattern4.matcher(" ");
    boolean whitespaceMatcher4 = matcher4.find();

    Pattern pattern5 = Pattern.compile("\\p{Zs}");
    Matcher matcher5 = pattern5.matcher(" ");
    boolean whitespaceMatcher5 = matcher5.find();

    System.out.println("\\s+ ---------- " + whitespaceMatcher1);
    System.out.println("\\s ----------- " + whitespaceMatcher2);
    System.out.println("[\\t\\p{Zs}] --- " + whitespaceMatcher3);
    System.out.println("\\u0020 ------- " + whitespaceMatcher4);
    System.out.println("\\p{Zs} ------- " + whitespaceMatcher5);
  }
}

Here in the first case, the pattern \s+ is compiled using Pattern.compile(), and the resulting pattern is applied to the string " " using a Matcher. By invoking the find() method, we successfully identify the presence of multiple whitespaces, and the corresponding boolean variable whitespaceMatcher1 is set to true.

Pattern pattern1 = Pattern.compile("\\s+");
Matcher matcher1 = pattern1.matcher("   ");
boolean whitespaceMatcher1 = matcher1.find();

Moving on to the second scenario, the pattern \s is compiled and applied using a Matcher to the string " ". The find() method effectively detects a single whitespace, leading to the assignment of true to whitespaceMatcher2.

The third case introduces the pattern [\\t\\p{Zs}], designed to match a single whitespace and equivalent to \s. After compiling and applying the pattern, the find() method confirms the existence of whitespace, resulting in whitespaceMatcher3 being set to true.

In the fourth instance, the Unicode character \u0020 is compiled as a pattern to match a single whitespace. Using a Matcher on the input string " ", the find() method successfully identifies a single whitespace, and whitespaceMatcher4 is set to true.

Finally, the fifth scenario employs the pattern \p{Zs}, compiled to match a whitespace separator. The Matcher identifies a whitespace separator in the input string " ", setting whitespaceMatcher5 to true.

Output:

\s+ ---------- true
\s ----------- true
[\t\p{Zs}] --- true
\u0020 ------- true
\p{Zs} ------- true

Use the String.replaceAll Method to Find Whitespace Using Regular Expressions in Java

The replaceAll method in Java is employed to replace substrings in a string that matches a specified regular expression with a given replacement. When used for whitespace detection, a regex pattern representing whitespaces is employed, and the method returns a new string with the specified replacements.

Its syntax is as follows:

String replaceAll(String regex, String replacement)

Here’s a code example that demonstrates the use of the String.replaceAll method to identify whitespaces using different regex characters:

public class RegWhiteSpaceReplace {
  public static void main(String[] args) {
    String input1 = "   ";
    String whitespaceReplaced1 = input1.replaceAll("\\s+", "");
    boolean hasWhitespace1 = input1.length() != whitespaceReplaced1.length();

    String input2 = " ";
    String whitespaceReplaced2 = input2.replaceAll("\\s", "");
    boolean hasWhitespace2 = input2.length() != whitespaceReplaced2.length();

    String input3 = " ";
    String whitespaceReplaced3 = input3.replaceAll("[\\t\\p{Zs}]", "");
    boolean hasWhitespace3 = input3.length() != whitespaceReplaced3.length();

    System.out.println("\\s+ ---------- " + hasWhitespace1);
    System.out.println("\\s ----------- " + hasWhitespace2);
    System.out.println("[\\t\\p{Zs}] --- " + hasWhitespace3);
  }
}

In the first case, the method is applied to the string " " with the regex \\s+ to match multiple whitespaces. The resulting modified string, whitespaceReplaced1, has all whitespaces removed, and the boolean variable hasWhitespace1 is set to true if the length of the original and modified strings differs, indicating the presence of multiple whitespaces.

String input1 = "   ";
String whitespaceReplaced1 = input1.replaceAll("\\s+", "");
boolean hasWhitespace1 = input1.length() != whitespaceReplaced1.length();

On to the second scenario, the method is used on the string " " with the regex \\s to detect a single whitespace. Similar to the first case, whitespaceReplaced2 is created, and hasWhitespace2 is set to true if a single whitespace is detected.

The third case employs the method on the string " " with the regex [\\t\\p{Zs}], functionally equivalent to \\s. The resulting modified string, whitespaceReplaced3, has the single whitespace removed, and hasWhitespace3 is set to true if the length of the original and modified strings differs.

Output:

\s+ ---------- true
\s ----------- true
[\t\p{Zs}] --- true

Conclusion

Identifying whitespaces within Java strings using regular expressions provides us with a versatile set of tools. The matches() method, Pattern and Matcher classes, and the String.replaceAll method each offers distinct advantages, catering to different use cases and coding preferences.

The matches() method is concise and suitable for basic checks, while the Pattern and Matcher classes provide more control for intricate matching requirements. On the other hand, the String.replaceAll method excels in whitespace removal scenarios.

Understanding these methods equips us with the flexibility to tackle whitespace-related challenges effectively. Regular expressions in Java continue to be a powerful ally for string manipulation, offering possibilities for crafting efficient and tailored solutions.

Author: Rupam Yadav
Rupam Yadav avatar Rupam Yadav avatar

Rupam Saini is an android developer, who also works sometimes as a web developer., He likes to read books and write about various things.

LinkedIn

Related Article - Java Regex

Related Article - Java Regex