String Length in Java: length() Explained

Finding the length of a String in Java is straightforward — until you hit emojis. The length() method returns the number of UTF-16 code units, which is not always the number of characters a human would count.

Basic usage

String name = "Alice";
int n = name.length();   // 5

String empty = "";
int e = empty.length();  // 0

length() is a method (parentheses required), unlike array's .length property.

What length() really counts

Internally, Java stores strings as an array of 16-bit units. For characters in the Basic Multilingual Plane (most Latin, accented, CJK, Cyrillic), one character = one unit, so the count matches intuition.

"hello".length();          // 5
"café".length();           // 4 (é is a single code unit U+00E9)
"日本語".length();          // 3

The emoji trap

Characters above U+FFFF (emojis, rare CJK, ancient scripts) use a surrogate pair — two 16-bit units for one logical character.

String greeting = "Hi 😀";
System.out.println(greeting.length());
// 5 — not 4! The smiley counts as 2

To count the actual number of characters (code points) a human would see:

int characters = greeting.codePointCount(0, greeting.length());
System.out.println(characters); // 4

If your application handles user input (social media bios, chat messages, names), always use codePointCount when you need a human-facing character count.

Combining characters add another twist

Some "characters" are actually composed of multiple code points — é can be U+00E9 (precomposed) or U+0065 U+0301 (letter e + combining acute accent). Even codePointCount counts them as two.

For visual grapheme clusters (what a user sees as "one character"), use BreakIterator:

import java.text.BreakIterator;

String s = "cafe\u0301"; // café (decomposed)
BreakIterator it = BreakIterator.getCharacterInstance();
it.setText(s);
int count = 0;
while (it.next() != BreakIterator.DONE) count++;
System.out.println(count); // 4 grapheme clusters

Null and blank

Calling length() on a null reference throws NullPointerException. Guard or use a utility:

int safeLength = (s != null) ? s.length() : 0;

// Or, with apache-commons-lang
int len = StringUtils.length(s); // returns 0 for null

// Check blank (null / empty / whitespace-only) — Java 11+
boolean blank = s == null || s.isBlank();

Empty string vs null

An empty string has length zero but is a valid object — you can call methods on it. A null reference is not an object at all. Don't confuse the two.

Maximum length

Java String can contain up to about 2.1 billion characters (Integer.MAX_VALUE), but you'll hit JVM memory limits long before that.

length() vs length vs size()

TypeUsageNotes
Strings.length()Method, UTF-16 code units
Arrayarr.lengthField (no parentheses)
Collectionlist.size()Method, element count
Mapmap.size()Method, key count

Performance

length() is O(1) — the internal array's length is stored as a field. Call it freely inside loops without caching.

Common validations

// Username between 3 and 20 characters (code point count for Unicode safety)
int len = username.codePointCount(0, username.length());
if (len < 3 || len > 20) {
    throw new IllegalArgumentException("Invalid length");
}

// Truncate with emoji safety
if (message.codePointCount(0, message.length()) > 280) {
    int cutoff = message.offsetByCodePoints(0, 280);
    message = message.substring(0, cutoff);
}

Quick takeaways

  • Use s.length() for quick byte-like counts and loop bounds.
  • Use s.codePointCount(0, s.length()) when you need the actual Unicode character count.
  • Use BreakIterator for user-visible grapheme clusters.
  • Always guard against null.

For 99% of code, plain length() is enough. The 1% involving user-submitted content and emojis is where the surrogate-pair trap matters most.