Latest web development tutorials

HTML UTF-8 Reference Manual

The Unicode Consortium (Unicode Consortium)

The Unicode Consortium (Unicode Consortium) developed the Unicode standard (Unicode Standard). Their goal is to use the standard Unicode Transformation Format (ie UTF, stands for Unicode Transformation Format) to replace the existing character sets.

The Unicode Standard is a successful initiative, in HTML, XML, Java, JavaScript, E-mail, ASP, PHP are achieved. Unicode standard also many operating systems and all modern browsers support.

Unicode alliance with the leading standards development organizations that have ISO, W3C, and ECMA.


Unicode character set

Unicode can be implemented by different character sets. The most commonly used encoding is UTF-8 and UTF-16:

字符集 描述
UTF-8 UTF8 中的字符可以是 1 到 4 字节长。UTF-8 可以代表 Unicode 标准中的任何字符。UTF-8 向后兼容 ASCII。UTF-8 是电子邮件和网页的首选编码。
UTF-16 16 位 Unicode 转换格式是一种可变长度的 Unicode 字符编码,能够编码整个 Unicode 指令表。UTF-16 主要用于操作系统和环境,如 Microsoft Windows、Java 和 .NET。

Tip: Unicode first 128 characters (ASCII-to-one correspondence) use an ASCII value of the same binary octet is encoded so that valid ASCII text UTF-8 encoding is also conducting effective.

Tip: All HTML 4 processors support UTF-8, all HTML 5 and XML processors support UTF-8 and UTF-16!


HTML5 standards: Unicode UTF-8

Because the ISO-8859 character set size is limited and not compatible in a multilingual environment, the Unicode Consortium developed the Unicode standard.

Unicode standard covers (almost) all the characters, punctuation and symbols.

Unicode enables processing, storage and transportation of the text, and platform-independent language.

HTML-5 in the default character encoding is UTF-8.

Here are some HTML5 support UTF-8 character set:

字符集 十进制 十六进制
C0 控制与基本的 Latin(C0 Controls and Basic Latin) 0-127 0000-007F
C1 控制与 Latin-1 的补充(C1 Controls and Latin-1 Supplement) 128-255 0080-00FF
Latin 扩展 A(Latin Extended-A) 256-383 0100-017F
Latin 扩展 B(Latin Extended-B) 384-591 0180-024F

If HTML5 page uses UTF-8 character is different from, you need to specify in the <meta> tag as follows:

Examples

<meta charset="ISO-8859-1">