Methods to Split or Tokenize String into its Components

String tokenizing or splitting is one of the most common task that we might have to do in our application. Generally we have to split the string into its components on an alphabetic character but there are cases when we have to read the data from csv files and other formatted files, in that situation we have to use special characters like dot(.) or pipe(|) to split the string in order to retrieve the required token or substring.

So among various available methods to do string tokenizing I am gonna discuss some of the most commen ways to do so.

strings-in-java
strings-in-java

Methods to split/tokenize the String in Java are-

  1. Using split() method of String class(String.split()) [ RECOMMENDED ]
  2. Using StringTokenizer class [ LEGACY ]
  3. Using StringUtils.split()

Using split() method of String class(String.split())

This is a recommended method for any splitting related operations in java Strings. This one is better than Stringtokenizer class as it returns a String array and you can use it in the way you want whereas in Tokenizer class once we parse the Tokens and then the we can not traverse it again.

Syntax:-

  • public String[] split(String regex, int limit)
  • public String[] split(String regex)

Parameters:-

  • Regex – Regular Expression to split the String
  • Limit – Number of tokens to be returned i.e. the number of partitions of a string

In the above parameter Limit string is splitted on the regex until it reaches the specified no of tokens. If the limit is set more than the maximum possible then it just return all the tokens and will behave like its other overloaded pattern. Lets see how this works.

Splitting on Space without any Limit.

package codingeekStringTutorials;

public class StringSplittingExample {

	public static void main(String[] args) {

		String string="Welcome to Codingeek. A programmers home."; //String to split
		String array[] = string.split(" "); //Returned Array

		System.out.println("No of splitted tokens - "+ array.length);

		int index=0;
		for(String str: array){
			index++;
			System.out.println("Token "+index+" - "+str);
		}
	}
}
Output:-

No of splitted tokens - 6
Token 1 - Welcome
Token 2 - to
Token 3 - Codingeek.
Token 4 - A
Token 5 - programmers
Token 6 - home.

In the example below you will see that we are splitting on dot(.) using regex and with limit 5 and the after the fourth token it does not split it furthur. You can check it by setting some other values it will behave as said earlier.

package codingeekStringTutorials;

public class StringSplittingExample {

	public static void main(String[] args) {

		String string="Welcome.to..Codingeek.A.programmers.home"; //String to split
		String array[] = string.split("\\.", 5); //Returned Array- Splitting on dot(.)

		System.out.println("No of splitted tokens - "+ array.length);

		int index=0;
		for(String str: array){
			index++;
			System.out.println("Token "+index+" - "+str);
		}
	}
}

Output:-

No of splitted tokens - 5
Token 1 - Welcome
Token 2 - to
Token 3 - 
Token 4 - Codingeek
Token 5 - A.programmers.home

Using StringTokenizer class [ LEGACY ]

As we have already talked earlier that this is just a legacy method and is not recommended to use in your programs. Even Oracle says the same about  it.

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Declaration:-

public class StringTokenizer extends Object implements Enumeration<Object>

This class is useful in cases where it we have to split the String on multiple characters, Strings etc. this class splits the given string on any of the matched substring pattern of the argument i.e. if I use “abc” as split pattern then it will split wherever it find any of thes{ “a”,”b”, “c”, “ab”, “ac”, “bc”, “abc”,}.

Lets have a look at the example.

package codingeekStringTutorials;

import java.util.StringTokenizer;

public class StringSplittingExample {

	public static void main(String[] args) {
		String url = "http://www.codingeek.com/java/methods-to-split-or-tokenize-string-into-its-components/";
		StringTokenizer multiTokenizer = new StringTokenizer(url, ":/-"); //Splitting on any of the substring
		int index=0;
		while (multiTokenizer.hasMoreTokens())
		{
			index++;
		    System.out.println("Token "+index+" - "+multiTokenizer.nextToken());
		}
	}
}

Output:-

Token 1 - http
Token 2 - www.codingeek.com
Token 3 - java
Token 4 - methods
Token 5 - to
Token 6 - split
Token 7 - or
Token 8 - tokenize
Token 9 - string
Token 10 - into
Token 11 - its
Token 12 - components

Using StringUtils.split()

This one is almost similar to the first one we have discussed and the only difference is that the code is faster in this case and it has a large set of functions to operate upon. It also returns the array of Strings but the difference is that this one works on the String and Character not on the regex.

Lets have an example of it-

package codingeekStringTutorials;

import org.apache.commons.lang3.Stringutils;

public class StringSplittingExample {

	public static void main(String[] args) {
		String string="Welcome to Codingeek. A programmers home"; //String to split
		String array[] = SpringUtil.split(string," "); //Returned Array

		System.out.println("No of splitted tokens - "+ array.length);

		int index=0;
		for(String str: array){
			index++;
			System.out.println("Token "+index+" - "+str);
		}
	}
}

Output:-

No of splitted tokens - 6
Token 1 - Welcome
Token 2 - to
Token 3 - Codingeek.
Token 4 - A
Token 5 - programmers
Token 6 - home

Comment if you need any help or if we are wrong anywhere. We would try to help you in the best possible way.

 References-

  • java_an

    nice one