HTML Character set
A character set is an element of internationalization that maps and translates an alphabet; that is, the characters that are used in a particular language. ... A character set can also be called a coded character set, a code set, a code page, or an encoding.
To display an HTML page correctly, a web browser must know the character set used in the page.
This is specified in the
The ASCII Character Set
ASCII abbreviated from American Standard Code for Information Interchange.
It contains the numbers from 0-9, the upper and lower case English letters from A to Z, and some special characters. ASCII uses the values from 0 to 31 (and 127) for control characters. ASCII uses the values from 32 to 126 for letters, digits, and symbols. ASCII does not use the values from 128 to 255.
The ANSI Character Set (Windows-1252)
Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
Windows-1252 was the first default character set in Microsoft Windows. It was the most popular character set in Windows from 1985 to 1990. ANSI is identical to ASCII for the values from 0 to 127.ANSI has a proprietary set of characters for the values from 128 to 159.ANSI is identical to UTF-8 for the values from 160 to 255.
The ISO-8859-1 Character Set
Latin-1, also called ISO-8859-1, is an 8-bit character set endorsed by the International Organization for Standardization (ISO) and represents the alphabets of Western European languages.
ISO-8859-1 was the default character in HTML 4.01.ISO (The International Standards Organization) defines the standard character sets for different alphabets/languages.ISO-8859-1 is identical to ASCII for the values from 0 to 127.ISO-8859-1 does not use the values from 128 to 159.ISO-8859-1 is identical to UTF-8 for the values from 160 to 255.
The UTF-8 Character Set
UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.
UTF-8 is identical to ASCII for the values from 0 to 127. UTF-8 does not use the values from 128 to 159.UTF-8 is identical to both ANSI and 8859-1 for the values from 160 to 255.UTF-8 continues from the value 256 with more than 10 000 different characters.