What is a homograph attack and how does ASCII help detect it?

A homograph attack uses characters from non-Latin scripts that look identical to ASCII letters. A Cyrillic 'а' (code 1072) and a Latin 'a' (code 97) look the same in most fonts but are completely different characters. Pasting a suspicious URL into an ASCII converter reveals any code outside 32–126, which signals a deceptive non-ASCII character.

Text encoding · Character reference · Developer utility

ASCII Converter & Encoding Reference

Translate plain text into ASCII decimal, hex, binary, or octal codes — and back. Spot hidden control characters, debug encoding mismatches, and verify ASCII-safe strings. Everything runs locally in your browser; nothing is sent to a server.

Decimal · Hex · Binary · OctalControl char detectionBidirectionalZero-server

Plain Text

Result

How it works

What this tool actually does

Every character on your keyboard has a numeric identifier assigned by the ASCII standard — the letter A is 65, a space is 32, an exclamation mark is 33. This tool makes those numbers visible.

Type in the left box and each character instantly maps to its code in whatever format you choose. Switch to decode mode, paste a string of codes, and recover the original text.

Why this matters beyond curiosity

Hidden characters — null bytes, carriage returns, zero-width spaces — are invisible in most text editors but show up here as distinct codes. That's what makes this tool useful for debugging, security review, and protocol work, not just education.

Quick reference

Key ASCII code ranges

The 128 standard ASCII characters split into three meaningful groups. Knowing the boundaries lets you spot anomalies at a glance.

Range	Type	Common examples	Note
0 – 31	Control	NUL (0), TAB (9), LF (10), CR (13), ESC (27)	Non-printable; drive terminal and protocol behavior
32 – 47	Printable	Space (32), ! (33), " (34), # (35), % (37)	Punctuation and symbols
48 – 57	Printable	0–9 (48–57)	Digit characters — their char codes, not numeric values
65 – 90	Printable	A (65), B (66) … Z (90)	Uppercase Latin alphabet
97 – 122	Printable	a (97), b (98) … z (122)	Lowercase Latin; exactly 32 above uppercase counterparts
127	Control	DEL (127)	Delete — a holdover from physical punch-tape erasure
128+	Extended	Varies by encoding	Outside standard ASCII; interpreted by active encoding (UTF-8, Latin-1…)

The 32-point gap between uppercase and lowercase is intentional — flipping bit 5 (adding or subtracting 32) toggles case, a trick exploited by early systems for fast conversion.

Who uses this

Real-world use cases

🔐

Security & phishing detection

Paste a suspicious URL or domain name. Any character with a code outside 32–126 is not standard ASCII — it could be a Cyrillic or Greek lookalike used in a homograph attack.

🔌

Embedded & serial debugging

Translate raw byte streams from Arduino, PLCs, or RS-232 devices into readable characters. Decimal codes from a serial monitor become instantly interpretable.

🐛

Diagnosing encoding bugs

Seeing replacement characters or garbled text? Paste it here. Codes above 127 that don't follow UTF-8 multi-byte patterns reveal a mismatch between encoder and decoder.

📡

Protocol & API work

HTTP headers, SMTP handshakes, and many wire protocols use ASCII control characters. Knowing CR LF = 13 10 makes reading raw packet captures much faster.

🎓

Teaching & learning

The ASCII table is a foundational CS concept. Understanding it demystifies text storage, why case-toggling is a single bit flip, and why '0' ≠ 0.

🏁

CTF challenges

Capture-the-flag puzzles frequently encode flags as decimal or hex ASCII strings. A fast, no-login converter is practically a requirement.

Tips

Getting the most out of it

Detecting hidden characters

Paste text that "looks clean" but behaves oddly — breaks sorting, fails regex, or causes database errors. Control characters like zero-width spaces (Unicode 8203) and null bytes (code 0) only reveal themselves as distinct codes.

Checking ASCII-only safety

If a string must be pure 7-bit ASCII (required by certain serial protocols or config parsers), confirm every code falls between 32 and 126. Codes 0–31 and 127 are control characters; 128+ are outside standard ASCII entirely.

Using the hex output

Switch to hex when working with memory dumps, hex editors, or network packets. The hex representation of "Hello" — 48 65 6C 6C 6F — is exactly what you'll see in Wireshark or GDB.

Decoding in bulk

In ASCII-to-text mode, the tool accepts space, comma, or semicolon-separated values — and even mixed hex (0x48) and decimal in the same string. Useful when copying code sequences from different sources.

Limitations

What this tool won't do

✕It shows Unicode code points for non-ASCII characters (emoji, accented letters, CJK). Those codes are valid Unicode — not standard ASCII, which stops at 127.
✕It doesn't perform multi-byte UTF-8 byte-level encoding. For the raw byte sequence of a UTF-8 string (e.g. how 'é' encodes as C3 A9 in two bytes), use a dedicated UTF-8 byte encoder.
✕It's not a cipher. ASCII encoding is a representation, not encryption — the codes are universally known and trivial to reverse.
✕Very large inputs (10,000+ characters) may slow the per-character visualizer. The text output itself stays fast.

FAQ

Questions worth answering properly

Is ASCII still relevant when we have UTF-8?›

Very much so. UTF-8 was deliberately designed to be backward-compatible with ASCII — the first 128 code points are byte-for-byte identical. Any pure ASCII document is also a valid UTF-8 document. Every HTML tag, HTTP header keyword, and JSON brace is ASCII.

What is a homograph attack, and how does this tool help detect it?›

A homograph attack uses characters from non-Latin scripts that are visually indistinguishable from ASCII letters. The Cyrillic letter 'а' (code 1072) and the Latin 'a' (code 97) look the same in most fonts but are entirely different characters. Paste any suspicious string here — any code outside 32–126 is a red flag.

What is Extended ASCII and why was it problematic?›

Standard ASCII uses 7 bits (codes 0–127). 'Extended ASCII' refers to various schemes that used the 8th bit for 128 more characters (128–255). The catch: there was never a single standard. IBM OEM-850, Microsoft CP-1252, and ISO-8859-1 all map different characters to the same codes, producing the infamous 'mojibake' garbled text. UTF-8 solved this.

Why does the digit '0' have code 48 instead of 0?›

ASCII represents the character '0', not the number zero. Codes 0–31 were reserved for control characters. The printable digit was placed at 48 — which is why converting a digit character to its numeric value requires subtracting 48, a trick used in virtually every language's parseInt or atoi implementation.

What was ASCII code 7 (BEL) originally used for?›

In the teletype era, sending code 7 rang a physical bell on the machine to alert an operator. Modern terminal emulators still honor this — most produce a system beep or visual flash. Test it in a Unix terminal: printf '\007'

My text has characters above 127 — is that a problem?›

It depends on your target system. Codes above 127 are outside standard 7-bit ASCII. In a modern UTF-8 environment they're perfectly valid — accented letters, currency symbols, and emoji all live above 127 in Unicode. They're only a problem when the receiving system expects strict 7-bit ASCII, such as some serial protocols or legacy database fields.

Explore the full encoding suite

Pair the ASCII Converter with these tools to cover binary, hex, URL encoding, and beyond.

Binary Converter Hex Converter URL Encoder Base64 Encoder Unicode to UTF-8

Feedback

Live