Question 1

What is a Unicode code point?

Accepted Answer

A code point is the unique number assigned to every character in the Unicode standard — letters, punctuation, currency symbols, mathematical operators, emoji, ancient scripts, you name it. They're written in hexadecimal with a U+ prefix: the letter A is U+0041, the copyright symbol © is U+00A9, and the rocket emoji 🚀 is U+1F680. The number itself doesn't change across languages, operating systems, or programming environments — that universality is the whole point of Unicode.

Question 2

Why do emojis have longer code points than regular letters?

Accepted Answer

The original Unicode design allocated code points from U+0000 to U+FFFF — the Basic Multilingual Plane. That covers most scripts humans use today. When emojis and rare historic characters were added later, they needed to go into Supplementary Planes, which start at U+10000 and go up to U+10FFFF. So a rocket emoji at U+1F680 simply lives higher in the numbering system than the letter A at U+0041. It's purely about when each character was added and which plane it was assigned to.

Question 3

What is a surrogate pair, and why does it matter in JavaScript?

Accepted Answer

JavaScript strings are stored in memory as UTF-16, which uses 16-bit code units. A 16-bit number can hold 65,536 values — enough for the Basic Multilingual Plane, but not for Supplementary Plane characters like most emoji. To handle those, UTF-16 uses surrogate pairs: two 16-bit code units that together encode one character. This is why '🚀'.length returns 2 in JavaScript, not 1. The string is one character but two code units. This tool uses the modern ES6 for...of iterator, which understands surrogate pairs and extracts the true single code point (U+1F680) rather than the two halves.

Question 4

What is a ZWJ sequence?

Accepted Answer

ZWJ stands for Zero-Width Joiner (U+200D). It's an invisible character used to combine multiple separate emoji into one visual rendering. The family emoji 👨‍👩‍👧 is actually four separate characters: the man emoji, a ZWJ, the woman emoji, another ZWJ, and the girl emoji. The browser reads that sequence and renders a single combined graphic. Paste it into this tool and you'll see it unpack into each component, which is useful for understanding why string length checks on emoji sequences are so often wrong.

Question 5

When would a developer actually need these code points?

Accepted Answer

A few common situations: writing a regex to block or allow a specific Unicode range (like filtering Cyrillic characters with /[\u0400-\u04FF]/); debugging why a character isn't rendering in a custom font (you check whether the font file has a glyph for that specific U+ value); investigating why a string comparison is failing (invisible characters like soft hyphens, zero-width spaces, or directional marks cause subtle bugs); and encoding special characters in HTML entities or CSS content properties.

Question 6

Are there characters that look identical but have different code points?

Accepted Answer

Yes — this is one of the sneakiest bugs in internationalized applications. Homoglyphs are characters from different scripts that look visually identical or nearly identical to the human eye. The Latin letter 'a' (U+0061) and the Cyrillic 'а' (U+0430) are visually indistinguishable in most fonts. Attackers use this to register deceptive domain names or bypass content filters. Pasting suspicious text into this tool will reveal if any characters are not what they appear to be.

Code Point	Character	Name	Notes
U+1F468	👨	Man	Supplementary Plane — emoji
U+200D		Zero-Width Joiner	Invisible combiner
U+1F469	👩	Woman	Supplementary Plane — emoji
U+200D		Zero-Width Joiner	Invisible combiner
U+1F467	👧	Girl	Supplementary Plane — emoji

Plane	Range	What lives here
BMP (Plane 0)	U+0000–U+FFFF	Latin, Greek, Cyrillic, Arabic, CJK, most punctuation
SMP (Plane 1)	U+10000–U+1FFFF	Emoji, historic scripts (Linear B, Gothic, Cuneiform)
SIP (Plane 2)	U+20000–U+2FFFF	CJK extension B–F, rare Chinese/Japanese/Korean characters
TIP (Plane 3)	U+30000–U+3FFFF	CJK extension G–H (extremely rare)
Planes 4–13	U+40000–U+DFFFF	Unassigned (reserved for future use)
SSP (Plane 14)	U+E0000–U+EFFFF	Language tags and variation selectors
SPUA-A/B (15–16)	U+F0000–U+10FFFF	Private Use Areas — app-specific glyphs

Text to Code Points

What are Unicode code points?

Reading the output: a practical example

When developers actually need this

Regex character class ranges

Font glyph debugging

Homoglyph / spoofing detection

Invisible character bugs

Emoji-aware string length

HTML entities & CSS content

Why `.length` lies to you

A quick map of Unicode planes

Tips for getting reliable results

Frequently asked questions

Related Tools