In the context of Unicode, astral planes are planes other than the Basic Multilingual Plane. Characters in such planes are called astral characters.

To explain, first a little bit of history. In the bad old days, Unicode characters were 16 bits long, and there were only 65536 characters, period. While this represented a massive improvement over the 7-bit ASCII and 8-bit ISO 8859 series of character sets, it it was eventually realized that 16 bits are still not enough. Unicode characters had to be widened. However, this idea was not without its technical challenges.

By this point, there was already a fair bit of software that was built on the assumption that Unicode characters are 16 bits long, not the least of which being the entire Java platform.1 Such a fundamental and sweeping change would break a lot of existing code, and that's just not going to fly.

The solution was surrogate pairs. In this scheme, certain pairs of the original 16-bit units are used to represent a single, 20-bit character code. All other 16-bit codes still represent single characters by themselves, like they did before this transition. The resulting character encoding is called UTF-16. Now you've got a backward-compatible way of having more characters!

This scheme still divides Unicode into at least two: the lower 65536 characters, and the remaining characters with codes above that. For reasons quite possibly related to this, it was decided that Unicode would be divided into character planes, each containing 65536 characters. The first plane containing the original 65536 characters is known as the Basic Multilingual Plane (BMP).

So, why the name astral planes? Think about it: the planes other than the BMP are thought of as above it, and special magic tricks (surrogate pairs) are often needed to get to these other planes.


1 Why they didn't use the impressively elegant UTF-8 and avoid this whole stupid surrogate pairs fiasco is beyond me. UTF-8 can encode characters up to 32 bits long without any special characters or other such garbage. Probably the result of design by committee: Unicode's evil twin is the ISO standard ISO 10646. And ISO... well, that way lies madness.

Log in or register to write something here or to contact authors.