Questions And Answers

More Tutorials

HTML Character set

A character set is an element of internationalization that maps and translates an alphabet; that is, the characters that are used in a particular language. ... A character set can also be called a coded character set, a code set, a code page, or an encoding.

To display an HTML page correctly, a web browser must know the character set used in the page.

This is specified in the <meta> tag:

<meta charset="UTF-8" >

The ASCII Character Set

ASCII abbreviated from American Standard Code for Information Interchange.

  It contains the numbers from 0-9, the upper and lower case English letters from A to Z, and some special characters. ASCII uses the values from 0 to 31 (and 127) for control characters. ASCII uses the values from 32 to 126 for letters, digits, and symbols. ASCII does not use the values from 128 to 255.

The ANSI Character Set (Windows-1252)

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.

  Windows-1252 was the first default character set in Microsoft Windows. It was the most popular character set in Windows from 1985 to 1990. ANSI is identical to ASCII for the values from 0 to 127.ANSI has a proprietary set of characters for the values from 128 to 159.ANSI is identical to UTF-8 for the values from 160 to 255.

The ISO-8859-1 Character Set

Latin-1, also called ISO-8859-1, is an 8-bit character set endorsed by the International Organization for Standardization (ISO) and represents the alphabets of Western European languages.

  ISO-8859-1 was the default character in HTML 4.01.ISO (The International Standards Organization) defines the standard character sets for different alphabets/languages.ISO-8859-1 is identical to ASCII for the values from 0 to 127.ISO-8859-1 does not use the values from 128 to 159.ISO-8859-1 is identical to UTF-8 for the values from 160 to 255.

The UTF-8 Character Set

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.

  The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character sets with its standard Unicode Transformation Format (UTF).The Unicode Standard has become a success and is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The Unicode standard is also supported in many operating systems and all modern browsers.

  UTF-8 is identical to ASCII for the values from 0 to 127. UTF-8 does not use the values from 128 to 159.UTF-8 is identical to both ANSI and 8859-1 for the values from 160 to 255.UTF-8 continues from the value 256 with more than 10 000 different characters.


In this page (written and validated by ) you learned about HTML charset . What's Next? If you are interested in completing HTML tutorial, your next topic will be learning about: HTML Forms.

Incorrect info or code snippet? We take very seriously the accuracy of the information provided on our website. We also make sure to test all snippets and examples provided for each section. If you find any incorrect information, please send us an email about the issue:

Share On:

Mockstacks was launched to help beginners learn programming languages; the site is optimized with no Ads as, Ads might slow down the performance. We also don't track any personal information; we also don't collect any kind of data unless the user provided us a corrected information. Almost all examples have been tested. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. By using, you agree to have read and accepted our terms of use, cookies and privacy policy.