A Byte Order Mark (BOM) is a signature at the beginning of a Unicode data stream that may be used by a higher protocol. The signature can indicate whether a data stream is Unicode encoded or not, and if so, which Unicode Transformation Format (UTF) is used. The BOM is U+FEFF ZERO WIDTH NON-BREAKING SPACE (ZWNBSP), which can be represented in different byte sequences depending on the UTF:
Byte Sequence   Encoding Form
00 00 FE FF     UTF-32, big-endian
FF FE 00 00     UTF-32, little-endian
FE FF           UTF-16, big-endian
FF FE           UTF-16, little-endian
EF BB BF        UTF-8
If an application does not suspect that a BOM is being used, the BOM may be misinterpreted in various ways. Below are some examples:
Value   Your Browser   Description
U+BBEF       믯         a Hangul character
U+EFBB                personal use
U+FEFF                 zero width non-breaking space
U+FFFE       ￾         undefined
To encode a ZWNBSP as the first chracter in a data stream that also uses a BOM, simply start with U+FEFF U+FEFF. While most (if not all) modern Microsoft applications use BOM, not all software do. For example, the API for Java 1.4 SE treats ZWNBSP like an ordinary character. Its Reader classes do not attempt to determine an InputStream's encoding by looking for a BOM signature.

Reference(s): FAQ - UTF and BOM http://www.unicode.org/unicode/faq/utf_bom.html

Log in or register to write something here or to contact authors.