Encoding
Unicode in the obvious way as 4
bytes per
character. To detect byte order the data is often prefixed with the character 0xfeff (ZERO WIDTH NO-BREAK SPACE), also known as the
Byte Order Mark (
BOM). Its
byte-swapped equivalent 0xfffe of 0xfffe0000 is not a valid Unicode character, therefore it helps to unambiguously distinguish the
Bigendian and
Littleendian variants.
Also called UTF-32, which is exactly the same thing (except for some bogus claim that UTF-32 should not encode any characters greater than 0x10ffff).
There is no reason to use UCS-4 or UTF-16 or any encoding other than UTF-8 anywhere in any program, file, or interface. If you think there is, you should get a clue. Study "combining characters" and other parts of the Unicode standard if you are under the delusion that this will somehow make programming easier. Face it: fixed-size characters are gone, an no amount of bits will bring them back. UTF-8 has the advantage of being compatable with ASCII, which is still used for 99.5% of computer text data.