Unicode Converter

Encode and decode Unicode characters, escape sequences, code points, and HTML entities instantly.

Examples

  • Unicode Escape: \u0048\u0065\u006C\u006C\u006F → Hello
  • Code Points: U+0048 U+0065 U+006C U+006C U+006F → Hello
  • HTML Entities: Hello → Hello

What Is Unicode?

Unicode is a universal character encoding standard that assigns a unique number (called a code point) to every character in every writing system — from Latin letters and Chinese ideographs to emoji and mathematical symbols. Maintained by the Unicode Consortium, the standard currently defines over 154,000 characters across 168 scripts.

Before Unicode, dozens of incompatible character sets (ASCII, ISO-8859-1, Shift_JIS, Windows-1252) made multilingual text exchange unreliable. Unicode — and its most common encoding, UTF-8 — solved this by providing a single, consistent mapping used by virtually all modern software, websites, and operating systems.

Encoding Formats Explained

FormatSyntaxExample ("A")Common Use
Unicode Escape\uXXXX\u0041JavaScript, JSON, Java, C#
Code PointU+XXXXU+0041Unicode documentation, specs
HTML Entity (decimal)&#DDD;AHTML, XML documents
HTML Entity (hex)&#xHH;AHTML, XML documents
UTF-8 BytesHex bytes41Network protocols, file storage

How to Use This Tool

  1. Enter your text or encoded string in the Input area.
  2. Click the appropriate conversion button:
    • To Unicode Escape — converts readable text to \uXXXX sequences.
    • From Unicode Escape — decodes escape sequences back to readable text.
    • To/From Code Points — converts between text and U+XXXX notation.
    • To HTML Entities — encodes text as numeric HTML entities for safe embedding.
    • Character Info — shows the code point, UTF-8 bytes, and Unicode name for each character.
  3. View the result and click Copy to copy it to your clipboard.

Common Use Cases

  • Internationalization (i18n): Inspect and debug Unicode strings in multilingual applications.
  • Web Development: Encode special characters as HTML entities to prevent rendering issues or XSS attacks.
  • JSON/JavaScript: Represent non-ASCII characters as \u escape sequences in JSON strings.
  • Database Debugging: Identify hidden or invisible Unicode characters (zero-width spaces, BOM markers) that cause bugs.
  • Emoji Analysis: Decompose emoji into their constituent code points (many emoji are multi-code-point sequences).

Frequently Asked Questions

Unicode is the standard that assigns code points to characters. UTF-8 is one of several encoding schemes that represent those code points as bytes. UTF-8 uses 1 to 4 bytes per character, is backwards-compatible with ASCII, and is the dominant encoding on the web (used by over 98% of websites).

Unicode uses combining characters and emoji sequences where multiple code points render as a single visible glyph. For example, the flag emoji 🇺🇸 is two Regional Indicator symbols (U+1F1FA U+1F1F8), and accented letters like "é" can be either a single precomposed code point (U+00E9) or a base letter plus a combining accent (U+0065 U+0301).

If your HTML document uses UTF-8 encoding (the modern default), you can use raw Unicode characters directly. HTML entities are still useful for characters that conflict with HTML syntax (<, >, &) or when you need to ensure compatibility with legacy systems that don't support UTF-8. Use our HTML Entity Encoder for HTML-specific encoding.