The simplest way to handle this problem is to send the encoding yourself, via your programming language. You might be able to get away with not specifying a character encoding with the META tag as long as your webserver sends the right Content-Type header, but why risk it? Besides, if the user downloads the HTML file, there is no longer any webserver to define the character encoding. HTML Purifier End-User Documentation. Each UTF is reversible, thus every UTF supports lossless round tripping: mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again. Sometimes developers get around this by adding support for multiple encodings: when using Chinese, use Big5, when using Japanese, use Shift-JIS, when using Greek, etc. the first snippet calculates the high (or leading) surrogate from a character code C. These directives can also be placed in httpd.conf file for Apache, but in most shared hosting situations you won't be able to edit this file. Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE?. It will, however, fix the problem we are about to discuss: processing UTF-8 text in PHP. Since Greek is not supported by ISO 8859-1, it will be either ignored or replaced with a question mark: ? . Using the following type definitions. Any U+FEFF would be interpreted as a ZWNBSP. Q: Why wouldn't I always use a protocol that requires a BOM?. Q: What's the algorithm to convert from UTF-16 to character codes?. However, byte sequences from standard UTF-8 won't interoperate well in an EBCDIC system, because of the different arrangements of control codes between ASCII and EBCDIC. A: UTF-16 uses a single 16-bit code unit to encode the most common 63K characters, and a pair of 16-bit code units, called surrogates, to encode the 1M less commonly used characters in Unicode. Due to the aforementioned compatibility issues, a more interoperable way of storing UTF-8 text is to stuff it in a binary datatype. ASCII is a 7-bit encoding based on the English alphabet.

