Java substring: Extract Parts of a String
The substring method on String extracts a contiguous slice of characters. It's one of the most-used methods in Java, but its index conventions trip up beginners.
The two forms
String s = "Hello World";
String a = s.substring(6); // "World" β from index 6 to the end
String b = s.substring(0, 5); // "Hello" β from 0 (inclusive) to 5 (exclusive)
Key rule: beginIndex is inclusive, endIndex is exclusive. The length of the result is endIndex - beginIndex.
Index visualization
Position: 0 1 2 3 4 5 6 7 8 9 10 11
Character: H e l l o ' ' W o r l d
s.substring(0, 5) => "Hello" (indices 0..4)
s.substring(6) => "World" (indices 6..10)
s.substring(6, 11) => "World" (same result)
s.substring(0, s.length()) => whole string
Common slicing patterns
String s = "hello.world.java";
// First N characters
String head = s.substring(0, 5); // "hello"
// Last N characters
String tail = s.substring(s.length() - 4); // "java"
// Everything after the first dot
int dot = s.indexOf('.');
String rest = s.substring(dot + 1); // "world.java"
// Everything before the last dot (extension stripping)
int lastDot = s.lastIndexOf('.');
String noExt = s.substring(0, lastDot); // "hello.world"
// Between two markers
String xml = "<name>Alice</name>";
int start = xml.indexOf("<name>") + "<name>".length();
int end = xml.indexOf("</name>");
String val = xml.substring(start, end); // "Alice"
Exceptions
Out-of-range arguments throw StringIndexOutOfBoundsException:
"abc".substring(-1); // β
"abc".substring(0, 10); // β
"abc".substring(2, 1); // β endIndex < beginIndex
Always validate or clamp before slicing untrusted input:
public static String safeSubstring(String s, int start, int end) {
if (s == null) return "";
start = Math.max(0, Math.min(start, s.length()));
end = Math.max(start, Math.min(end, s.length()));
return s.substring(start, end);
}
The surrogate pair trap
Strings are indexed by UTF-16 code units. Cutting inside a surrogate pair produces an invalid string:
String s = "Hi π there";
String bad = s.substring(0, 4); // "Hi \uD83D" β half an emoji
// Correct: find a code-point-aligned cutoff
int codeUnits = s.offsetByCodePoints(0, 4); // 0-based code point index 4
String good = s.substring(0, codeUnits); // "Hi π"
For user content that may contain emoji, use offsetByCodePoints to locate boundaries.
Memory β modern JVM
In old Java 6 and earlier, substring returned a view backed by the same char array. Java 7u6+ copies, so no memory leak. Each substring call today allocates a new string.
For very large strings and many slices, that allocation adds up. Use StringBuilder or direct char[] manipulation for tight loops.
Alternatives
split β break on a separator
String[] parts = "a,b,c".split(","); // {"a", "b", "c"}
replace / replaceAll β substitute
"Hello World".replace("World", "Java"); // "Hello Java"
Pattern / Matcher β regex extraction
import java.util.regex.*;
Matcher m = Pattern.compile("<name>(.*?)</name>").matcher(xml);
if (m.find()) {
String name = m.group(1);
}
Chaining with trim, toLowerCase etc.
String clean = raw.trim().substring(0, Math.min(raw.length(), 100)).toLowerCase();
This is idiomatic but make sure each call is safe β trim can produce an empty string, then substring(0, 0) is fine but anything else might throw.
Quick reference
| Intent | Expression |
|---|---|
| First 10 chars | s.substring(0, Math.min(10, s.length())) |
| Last 4 chars | s.substring(s.length() - 4) |
| After first char | s.substring(1) |
| Between two indices | s.substring(a, b) |
| File extension | s.substring(s.lastIndexOf('.') + 1) |
| File name without extension | s.substring(0, s.lastIndexOf('.')) |
Master the inclusive/exclusive index convention and the surrogate-pair caveat, and substring covers 99% of string-slicing needs in Java.