Java split() Method: How to Split a String
The split method on String breaks text into an array of pieces based on a regular-expression delimiter. It's the go-to for parsing CSV-like lines, paths, URLs and configuration strings.
The basics
String csv = "apple,banana,cherry";
String[] fruits = csv.split(",");
// fruits = {"apple", "banana", "cherry"}
for (String f : fruits) System.out.println(f);
split(regex) takes a regular expression, not a plain substring. This is the nΒ° 1 source of confusion.
Special characters need escaping
Regex metacharacters β . * + ? ( ) [ ] { } | \ ^ $ β must be escaped.
// "." matches ANY character, so split is wrong:
"a.b.c".split("."); // β returns an empty array
// Escape the dot:
"a.b.c".split("\\."); // β
{"a", "b", "c"}
// Same for pipe:
"a|b|c".split("\\|"); // {"a", "b", "c"}
// Safer: use Pattern.quote()
String sep = ".";
"a.b.c".split(Pattern.quote(sep)); // {"a", "b", "c"}
Splitting on whitespace
String text = " hello world ";
String[] words = text.trim().split("\\s+");
// {"hello", "world"}
\\s+ matches one or more whitespace characters (spaces, tabs, newlines). trim() first removes leading/trailing whitespace, otherwise the first token can be empty.
Limiting the number of splits
The two-argument form caps how many pieces you get β the last piece keeps the remaining text intact.
"a,b,c,d".split(",", 2); // {"a", "b,c,d"}
"a,b,c,d".split(",", 3); // {"a", "b", "c,d"}
"a,b,c,d".split(",", -1); // {"a", "b", "c", "d", ""} β includes trailing empty
Negative limit preserves trailing empty strings β useful when parsing CSV where "a,b,," should yield 4 fields.
Default behavior drops trailing empty strings
"a,,b,,".split(","); // {"a", "", "b"} β trailing empty dropped
"a,,b,,".split(",", -1); // {"a", "", "b", "", ""} β all preserved
This surprise bites CSV parsers constantly. Always use -1 for data parsing.
Multiple delimiters
// Split on any of comma, semicolon, or pipe
"a,b;c|d".split("[,;|]"); // {"a", "b", "c", "d"}
// Split on any non-alphanumeric run
"a, b; c!d".split("\\W+"); // {"a", "b", "c", "d"}
Split by a fixed character sequence
"one---two---three".split("---"); // {"one", "two", "three"}
When the separator is always the same short string and you want maximum performance, avoid regex entirely:
// Apache Commons Lang
StringUtils.split("one---two---three", "---"); // plain string split, faster
// Guava
Iterable<String> parts = Splitter.on("---").split("one---two---three");
Performance
String.split compiles the regex on every call. Inside a hot loop, precompile:
import java.util.regex.Pattern;
static final Pattern COMMA = Pattern.compile(",");
for (String line : lines) {
String[] fields = COMMA.split(line);
// ...
}
10-20x faster on long jobs.
Streaming with Pattern.splitAsStream
Pattern.compile(",")
.splitAsStream("a,b,c")
.filter(s -> !s.isEmpty())
.forEach(System.out::println);
Avoids allocating an intermediate array.
Common patterns
Parse key=value pairs
String[] pair = "user=alice".split("=", 2);
String key = pair[0];
String val = pair.length > 1 ? pair[1] : "";
Lines of text
String[] lines = text.split("\\R"); // matches \n, \r\n, \r and Unicode line terminators
URL path segments
String[] segments = "/a/b/c".split("/");
// {"", "a", "b", "c"} β leading empty because of the leading slash
CSV β don't use split
Real CSV has quoting, embedded commas, escaped quotes, and multiline fields. split(",") handles none of these. Use a proper library: Jackson CSV, Commons CSV, or OpenCSV.
Common mistakes
- Forgetting that
.,|,+, etc. are regex metacharacters. - Losing trailing empty fields without
-1. - Using
splitfor real CSV. - Compiling the same regex over and over in a loop.
- Forgetting the extra empty string when the input starts with the separator.
Used correctly, split handles 90% of text parsing needs in a single line. For the other 10%, reach for a proper parser.