16-bit characters (as in Unicode
) is enough to represent
any single language
. What it isn't enough to do is represent all languages at the same time, enabling you to mix various asian languages
in the same document
(1). In order to facilitate this, the asians use so called shift codes
) - some values in the strings that would normally contain character codes are defined to be an escape
. When this escape comes, the next values are read from the string and combined to find the actual character to use. This allows an arbitary number of bits
, but is a pain to program with.
(1) If I remember correctly, there is an extra constraint, too: People are unwilling to have the same glyph (graphical symbol) encode to the same value when it has different semantic meanings. If we were encoding english to one value per word, that would be the same as wanting a different value for the to in "Go to London" and the to in "To be or not to be."