User Professor Hans Noodles
Professor Hans Noodles
Level 41

The String.split() method in Java: splitting a string into parts

Published in the Java Developer group
6944 members
Let's talk about Java's String.split method: what it does and why it is needed. It's not hard to guess that it splits a Java string, but how does this work in practice? Let's dive deep into the operation of the method and discuss some non-obvious details. At the same time, we'll learn how many split methods the String actually has. Let's go!The String.split() method in Java: splitting a string into parts - 1

Description and signature for Java's String.split

In Java, the split method splits a string into substrings using a delimiter defined using a regular expression. Let's present the method signature and begin our dive:

String[] split(String regex)
Two things are clear from the signature:
  1. The method returns an array of strings.
  2. The method has a string input parameter called regex.
Let us analyze each of these separately as we break down the description given above.
  1. The method returns an array of strings.

    The declaration contains the following words: "In Java, the split method splits a string into substrings." The method collects these substrings into an array that becomes the return value.

  2. The method has a string input parameter called regex.

    Again, recall the description: "splits a string into substrings using a delimiter defined using a regular expression." The regex input parameter is a regular expression that is applied to the original string. When the character or combination of characters match, they are treated as a delimiter.

Java's split in practice

Now let's get closer to the point. Let's imagine we have a string of words. For example, like this:
I love Java
We need to split the string into words. We see that the words in this string are separated from one another by spaces. In this case, a space character is the perfect candidate for our delimiter. The code for solving our task would look like this:

public class Main {
    public static void main(String[] args) {
        String str = "I love Java";
        String[] words = str.split(" ");
        for (String word : words) {
            System.out.println(word);
        }
    }
}
The output of the main method will be the following lines:
I love Java
Let's see a few more examples of how the split method would work:
String Delimiter Result of the method
"I love Java" " " (space character) {"I", "love", "Java"}
"192.168.0.1:8080" ":" {"192.168.0.1", "8080"}
"Red, orange, yellow" "," {"Red", " orange", " yellow"}
"Red, orange, yellow" ", " {"Red", "orange", "yellow"}
Notice the differences between the last two rows in the table above. In the second to last row, a comma is used as the delimiter. As a result, when the string is split, some of the words have leading spaces. In the last row, we used a comma and a space as our delimiter. That's why there were no substrings with leading spaces in the resulting array. This is just a subtle detail that demonstrates how important it is to carefully choose the right delimiter.

Leading delimiter

This is another important nuance. If the original string begins with the delimiter, then the first element of the resulting array will be an empty string. For example, it would look like this: Original string: " I love Java" Delimiter: " " Resulting array: { "", "I", "love", "Java" } But if the original string ends with a delimiter rather than beginning with one, then the result will be different: Original string: "I love Java " Delimiter: " " Resulting array: { "I", "love", "Java" } Look at the code and see how the split method works differently with a delimiter symbol at the end and/or beginning of the original string:

public class Main {
    public static void main(String[] args) {
        print("I love Java".split(" "));
        print(" I love Java".split(" "));
        print("I love Java ".split(" "));
        print(" I love Java ".split(" "));
    }

    static void print(String[] arr) {
        System.out.println(Arrays.toString(arr));
    }
}
The main method's output will be like this:
[I, love, Java] [, I, love, Java] [I, love, Java] [, I, love, Java]
Again turn your attention to the fact that when the first character in the original string is a delimiter character, then the result is that the first element in the array will be an empty string.

Overloaded sibling

The String class has another split method with the following signature:

String[] split(String regex, int limit)
This method has an additional limit parameter: it determines how many times the regex pattern will be applied to the original string. See the explanations below:

limit > 0

The pattern is applied limit-1 times. What's more, the length of the returned array will not exceed the value of the limit parameter. The last element of the array will be the part of the string that follows the last place where the delimiter was found. Example:

public class Main {
    public static void main(String[] args) {
        print("I love Java".split(" ", 1));
        print("I love Java".split(" ", 2));
        /*
         Output: 
         [I love Java]
         [I, love Java]
        */
    }

    static void print(String[] arr) {
        System.out.println(Arrays.toString(arr));
    }
}

limit < 0

The delimiter regular expression is applied to the string as many times as possible. The resulting array can have any length. Example:

public class Main {
    public static void main(String[] args) {
        // Note the space at the end of the string
        print("I love Java ".split(" ", -1));
        print("I love Java ".split(" ", -2));
        print("I love Java ".split(" ", -12));
        /*
         Output:
        [I, love, Java, ]
        [I, love, Java, ]
        [I, love, Java, ]
        
        Please note that the last element of the array is
        an empty string. This is caused by the whitespace
        at the end of the original string. 
        */
    }

    static void print(String[] arr) {
        System.out.println(Arrays.toString(arr));
    }
}

limit = 0

As with the case where limit < 0, the delimiter pattern is applied to the string as many times as possible. The final array can have any length. If the last elements are empty strings, they are discarded from the final array. Example:

public class Main {
    public static void main(String[] args) {
        // Note the space at the end of the string
        print("I love Java ".split(" ", 0));
        print("I love Java ".split(" ", 0));
        print("I love Java ".split(" ", 0));
        /*
         Output:
        [I, love, Java]
        [I, love, Java]
        [I, love, Java]
        Note the absence of empty strings at the end of the arrays
        */
    }

    static void print(String[] arr) {
        System.out.println(Arrays.toString(arr));
    }
}
If we peek at the implementation of the one-parameter version of the split method, then we can see that it is like its overloaded sibling, but with the second argument set to zero:

    public String[] split(String regex) {
        return split(regex, 0);
    }

Various examples

In real-world practice, it sometimes happens that we have strings that are generated according to certain rules. Such a string might come into our program from anywhere:
  • from a third-party service;
  • from a request sent to our server;
  • from a configuration file;
  • and so on.
In these situations, the programmer usually knows the "rules of the game". Let's say a programmer knows that he or she is dealing with user information stored according to this pattern:
user_id|user_login|user_email
Let's take some specific values as an example:
135|bender|bender@gmail.com
Suppose the programmer's task is to write a method that sends an email to the user. The programmer has access to user data, which is recorded in the format given above. The subtask that we will now continue to analyze is how to isolate the email address from the rest of the user data. This is one instance where the split method can be useful. After all, if we look at the user data template, we realize that extracting the user's email address from the rest is a simple as calling the split method to split the string. Then the email address will be in the last element of the resulting array. Here is an example of a method that takes a string containing user data and returns the user's email address. For simplicity, let's say that the data string is always in the format we want:

public class Main {
    public static void main(String[] args) {
        String userInfo = "135|bender|bender@gmail.com";
        System.out.println(getUserEmail(userInfo));
        // Output: bender@gmail.com
    }

    static String getUserEmail(String userInfo) {
        String[] data = userInfo.split("\\|");
        return data[2]; // or data[data.length - 1]
    }
}
Notice the delimiter: "\\|". In regular expressions, "|" is a special character with special meaning, so if we want to use it an ordinary character (i.e. what we want to find in the original string), then we need to escape the character with two backslashes. Consider another example. Let's say we have order information that is structured like this:
item_number_1,item_name_1,item_price_1;item_number_2,item_name_2,item_price_2;...;item_number_n,item_name_n,item_price_n
Or we can even adopt some specific values:
1,cucumbers,2.39;2,tomatoes,1.89;3,bacon,4.99
Our task is to calculate the total cost of the order. Here we will have to apply the split method several times. The first step is to split the string using ";" as the delimiter in order to break it into its component parts. Then each resulting substring will hold information about a separate product, which we can process later. Then, for each product, we will split apart the corresponding information using the "," symbol. We will take an element with a specific index (the one where the product price is stored) from the resulting string array, convert it to numerical form, and tally up the total cost of the order. Let's write a method that will do all these calculations:

public class Main {
    public static void main(String[] args) {
        String orderInfo = "1,cucumbers,2.39;2,tomatoes,1.89;3,bacon,4.99";
        System.out.println(getTotalOrderAmount(orderInfo));
        // Output: 9.27
    }

    static double getTotalOrderAmount(String orderInfo) {
        double totalAmount = 0d;
        final String[] items = orderInfo.split(";");

        for (String item : items) {
            final String[] itemInfo = item.split(",");
            totalAmount += Double.parseDouble(itemInfo[2]);
        }

        return totalAmount;
    }
}
See if you can figure out how this method works on your own. Based on these examples, we can say that the split method is used when we have some data formatted as a string, and we need to extract certain more specific information from it.

Summary

We examined the split method of the String class. It's just what you need when you have to split a string into its component parts with the help of a special delimiter. The method returns an array of strings (the substrings that comprise the original string). It accepts a regular expression whose matches represent the delimiter character(s). We examined various subtleties of this method:
  • a leading delimiter;
  • its overloaded sibling with two parameters.
We also tried to model some real life situations where we used the split method to solve hypothetical, but quite realistic, problems.The String.split() method in Java: splitting a string into parts - 2
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION