How to Parse a String in Java

Rupam Yadav Feb 02, 2024
  1. Use the split Method to Parse a String in Java
  2. Use the Scanner Class to Parse a String in Java
  3. Use the StringUtils Class to Parse a String in Java
  4. Use the StringTokenizer Class to Parse a String in Java
  5. Use parse to Parse a String in Java
  6. Use Regular Expressions (Regex) to Parse a String in Java
  7. Conclusion
How to Parse a String in Java

String parsing, the process of extracting specific information from a string, is a common and essential task in Java programming. Java offers a variety of tools and techniques for parsing strings, ranging from simple methods like split to more sophisticated approaches using regular expressions.

In this article, we’ll explore various methods to parse strings in Java.

Use the split Method to Parse a String in Java

One powerful tool for string parsing in Java is the split method. This method is part of the String class and is particularly useful when you need to break down a string into smaller components based on a specified delimiter.

Syntax of the split method:

String[] result = inputString.split(regex);

Here, inputString is the original string you want to parse, and regex is the regular expression used as the delimiter. The method returns an array of substrings resulting from the split operation.

How the split method works is it divides the original string wherever it encounters a match for the specified regular expression. It then returns an array containing the substrings between those matches.

Let’s consider a scenario where we have a date represented as a string in the format MonthDayYear, and we want to extract the components—the month and the day with the year.

public class StringParsingExample {
  public static void main(String[] args) {
    String dateString = "March032021";

    String[] dateComponents = dateString.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

    System.out.println("Month: " + dateComponents[0]);
    System.out.println("Day and Year: " + dateComponents[1]);
  }
}

In this code example, we start by declaring a string variable dateString containing our sample date March032021. We then use the split method to extract the components based on the regular expression (?<=\\D)(?=\\d)|(?<=\\d)(?=\\D).

This expression ensures that we split where there is a transition from a non-digit (\D) to a digit (\d) or from a digit to a non-digit. The (?<= ... ) and (?= ... ) are lookbehind and lookahead assertions, respectively.

The resulting array, dateComponents, holds the parsed parts. Printing these components to the console provides the output:

Month: March
Day and Year: 032021

The split method successfully separated the month (March) and the day with the year (032021).

Use the Scanner Class to Parse a String in Java

In addition to the versatile split method, Java provides the Scanner class as another powerful tool for parsing strings. Unlike the split method, which operates based on delimiters, the Scanner class allows for tokenizing strings using specified patterns.

The Scanner class is part of Java’s java.util package and is commonly used for parsing primitive types and strings. Its primary method for string parsing is next(), which retrieves the next token based on a specified delimiter pattern.

Here’s a brief overview of this approach:

Scanner scanner = new Scanner(inputString);
scanner.useDelimiter(pattern);
while (scanner.hasNext()) {
  String token = scanner.next();
  // Process or display the token as needed
}

Where:

  • inputString: The original string to be parsed.
  • pattern: The delimiter pattern that determines how the string is tokenized.

The useDelimiter method is optional but crucial for setting the delimiter pattern. By default, it matches white spaces.

Consider a scenario where we have a string containing information about a person’s birthdate, and we want to extract the name and birthdate separately.

import java.util.Scanner;

public class ScannerExample {
  public static void main(String[] args) {
    String text = "John Evans was born on 25-08-1980";

    Scanner scanner = new Scanner(text);

    scanner.useDelimiter("born");

    while (scanner.hasNext()) {
      String token = scanner.next();
      System.out.println("Output is: " + token.trim());
    }
  }
}

In this code example, we start by initializing a Scanner object named scanner with the input string. We then use useDelimiter to set the pattern to born, indicating that the string should be tokenized whenever born is encountered.

The while loop iterates through the tokens using the hasNext() and next() methods. Inside the loop, each token is processed or displayed as needed.

In this case, we print each token to the console after trimming any leading or trailing spaces. The output of the code will be as follows:

Output is: John Evans was
Output is:  on 25-08-1980

In this output, you can observe that the Scanner class successfully tokenized the input string based on the specified delimiter pattern. This demonstrates the flexibility and effectiveness of the Scanner class for string parsing in Java.

Use the StringUtils Class to Parse a String in Java

In Java, the StringUtils class, part of the Apache Commons Lang library, offers a robust set of tools for working with strings. Among its functionalities is the substringBetween method, which provides another efficient way to parse strings by extracting substrings between specified opening and closing strings.

To use this class in your Java project, you’ll need to add the following Maven dependency to your project’s pom.xml file:

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.11</version>
</dependency>

The StringUtils class offers a diverse range of string manipulation methods, and the substringBetween method is particularly useful for parsing strings.

Here’s an overview of its syntax:

String[] result = StringUtils.substringsBetween(inputString, open, close);

Where:

  • inputString: The original string to be parsed.
  • open: The opening string that marks the beginning of the desired substring.
  • close: The closing string that marks the end of the desired substring.

The substringBetween method searches for substrings between the specified opening and closing strings and returns them in an array.

Let’s consider a scenario where we have a string containing information about a sentence structure, and we want to extract the adjective and noun phrases.

import org.apache.commons.lang3.StringUtils;

public class StringUtilsExample {
  public static void main(String[] args) {
    String sentence = "The quick brown fox jumps over the lazy dog";

    String[] phrases = StringUtils.substringsBetween(sentence, "The ", " fox");

    for (String phrase : phrases) {
      System.out.println("Output: " + phrase);
    }
  }
}

In this code example, we start by importing the StringUtils class from the Apache Commons Lang library. We then initialize a string variable named sentence with an input string.

Here, the goal is to extract the adjective and noun phrases between The and fox.

The StringUtils.substringsBetween method is employed to perform this extraction. It takes the sentence as the input string, the opening string The , and the closing string fox as the markers.

Here, the result is an array containing the extracted phrases. The for loop iterates through the array of extracted phrases, and each phrase is printed to the console.

The output of the code will be as follows:

Output: quick brown

In this output, you can see that the StringUtils class successfully extracted the substring between The and fox from the original sentence, showing the effectiveness of this method for parsing strings in Java.

Use the StringTokenizer Class to Parse a String in Java

In Java, the StringTokenizer class provides a straightforward mechanism for tokenizing strings. This class is part of the java.util package and offers a convenient way to parse and process textual data.

The StringTokenizer class operates on the principle of tokenization, where a string is broken down into smaller units called tokens. Here’s an overview of the syntax:

StringTokenizer tokenizer = new StringTokenizer(inputString, delimiter);
while (tokenizer.hasMoreTokens()) {
  String token = tokenizer.nextToken();
  // Process or display the token as needed
}

Where:

  • inputString: The original string to be tokenized.
  • delimiter: The delimiter character(s) used to separate tokens.

The hasMoreTokens() method checks if there are more tokens in the string, and nextToken() retrieves the next token. By default, the delimiter is set to whitespace characters.

Let’s consider a scenario where we have a string containing information about fruits, separated by commas, and we want to extract each fruit as a separate token.

import java.util.StringTokenizer;

public class StringTokenizerExample {
  public static void main(String[] args) {
    String fruits = "apple,orange,banana,grape,mango";

    StringTokenizer tokenizer = new StringTokenizer(fruits, ",");

    while (tokenizer.hasMoreTokens()) {
      String fruit = tokenizer.nextToken();
      System.out.println("Output: " + fruit);
    }
  }
}

In this code example, we start by initializing a string variable named fruits with the input string apple,orange,banana,grape,mango. We then create a StringTokenizer object named tokenizer with the input string and a comma (,) as the delimiter.

The while loop iterates through the tokens using the hasMoreTokens() and nextToken() methods. Inside the loop, each token (fruit) is processed or displayed as needed.

In this case, we print each fruit to the console. The output of the code will be as follows:

Output: apple
Output: orange
Output: banana
Output: grape
Output: mango

In this output, you can see that the StringTokenizer class successfully tokenized the input string based on the specified comma delimiter. This allows for the extraction of individual fruits.

Use parse to Parse a String in Java

In Java, the parse method is a versatile tool for converting strings into specific data types. The parse method is often associated with parsing numerical or date values.

It’s important to note that the usage and syntax may vary depending on the specific data type you are parsing.

The parse method has a general syntax as follows:

dataType parsedValue = DataType.parse(inputString);

Where:

  • dataType: The target data type to which you want to parse the string.
  • DataType: The wrapper class corresponding to the target data type.
  • inputString: The string representation of the value you want to parse.

For instance, if you’re parsing an integer, the syntax would be:

int intValue = Integer.parseInt(inputString);

For parsing other data types like double, float, or long, you would use the corresponding wrapper class and its parse method.

Let’s consider a scenario where we have a string representing the temperature in Celsius, and we want to parse it into a double value.

public class ParseExample {
  public static void main(String[] args) {
    String temperatureString = "25.5";

    double temperature = Double.parseDouble(temperatureString);

    System.out.println("Parsed Temperature: " + temperature);
  }
}

In this code example, we initialize a string variable named temperatureString with the input string 25.5, representing the temperature in Celsius. We then use the Double.parseDouble method to parse this string into a double value named temperature.

The parsed temperature value is then displayed to the console using System.out.println.

The output of the code will be as follows:

Parsed Temperature: 25.5

In this output, you can observe that the parse method successfully converted the string representation of the temperature into a double value. The parse method provides flexibility and precision in handling different types of data in Java.

Use Regular Expressions (Regex) to Parse a String in Java

Regular Expressions, commonly known as Regex, provide a powerful and flexible approach to string parsing in Java. With Regex, you can define patterns that match specific parts of a string, allowing for intricate and precise parsing.

To do this, we create a pattern and utilize a Matcher to find matches in the input string. Here’s an overview:

import java.util.regex.*;

// Create a pattern
Pattern pattern = Pattern.compile(regexPattern);

// Create a matcher
Matcher matcher = pattern.matcher(inputString);

// Find matches
while (matcher.find()) {
  // Process or display the matched substring
  String matchedSubstring = matcher.group();
  // Additional logic as needed
}

Where:

  • regexPattern: The regular expression pattern defining the match criteria.
  • inputString: The string to be parsed using the Regex pattern.

The while (matcher.find()) loop iterates through the input string, finding each match based on the specified pattern. The matcher.group() method retrieves the matched substring.

Let’s consider a scenario where we have a string representing dates in the format DD-MM-YYYY, and we want to extract the day, month, and year.

import java.util.regex.*;

public class RegexExample {
  public static void main(String[] args) {
    String date = "25-12-2022";

    String regexPattern = "(\\d{2})-(\\d{2})-(\\d{4})";

    Pattern pattern = Pattern.compile(regexPattern);

    Matcher matcher = pattern.matcher(date);

    while (matcher.find()) {
      String day = matcher.group(1);
      String month = matcher.group(2);
      String year = matcher.group(3);

      System.out.println("Day: " + day);
      System.out.println("Month: " + month);
      System.out.println("Year: " + year);
    }
  }
}

Here, we start by initializing a string variable named date with the input string 25-12-2022 representing a date. We then define a regex pattern "(\\d{2})-(\\d{2})-(\\d{4})" to match the DD-MM-YYYY format.

A Pattern is created using Pattern.compile(regexPattern), and a Matcher is then created with pattern.matcher(date). The while (matcher.find()) loop iterates through the input string, and for each match, the day, month, and year are extracted using matcher.group(1), matcher.group(2), and matcher.group(3) respectively.

The parsed components are then displayed to the console.

The output of the code will be as follows:

Day: 25
Month: 12
Year: 2022

In this output, you can observe that the Regex successfully matched and extracted the day, month, and year components from the input date string.

Conclusion

Java provides a rich set of tools and techniques for parsing strings, each suited for different scenarios and preferences. Whether you need simple tokenization, complex pattern matching, or type-specific parsing, Java’s versatile features have you covered.

Understanding these methods allows you to handle string data effectively in your Java applications.

Author: Rupam Yadav
Rupam Yadav avatar Rupam Yadav avatar

Rupam Saini is an android developer, who also works sometimes as a web developer., He likes to read books and write about various things.

LinkedIn

Related Article - Java String