Unicode 3.1 (idea) by avjewe

Unicode 3.1 was released in March, 2001, updated to Unicode 3.1.1 in August, 2001, and later updated once again to Unicode 3.1.1 with Corrigendum. The previous version was Unicode 3.0 and the next version is Unicode 3.2.

Unicode 3.1.1 with Corrigendum

This is exactly Unicode 3.1.1, with the addition of Corrigendum #3: U+F951 Normalization (http://www.unicode.org/versions/corrigendum3.html) which states

The canonical decomposition mapping for U+F951 (陋 ) was recently found to be in error. The correct mapping is to U+964B (陋 ) This was printed correctly in Unicode 2.0, but was mistakenly entered as U+96FB (電 ) in the UnicodeData.txt file, and remained uncorrected in successive versions. This corrigendum fixes that error.

Unicode 3.1.1

The Unicode Standard, Version 3.1.1 is defined by: The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the Unicode Standard Annex #27: Unicode 3.1 (http://www.unicode.org/reports/tr27/) and the Unicode 3.1.1 Update Notice (http://www.unicode.org/versions/Unicode3.1.1.html).

3.1.1 does not contain character additions or major normative changes, but only very subtle changes in a few secondary data files.

Unicode 3.1

The Unicode Standard, Version 3.1.0, is defined by: The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the Unicode Standard Annex #27: Unicode 3.1 (http://www.unicode.org/reports/tr27/).

Unicode 3.1 adds many characters, and is the first Unicode version to assign characters to the supplementary planes (i.e. character codes over 0x10000 or outside the original 2-byte limit). Specifically,

Supplementary Multilingual Plane (SMP) U+10000..U+1FFFF
Supplementary Ideographic Plane (SIP) U+20000..U+2FFFF
Supplementary Special Purpose Plane (SSP) U+E0000..U+EFFFF

The Supplementary Multilingual Plane, or Plane 1, contains several historic scripts, and several sets of symbols: Old Italic, Gothic, Deseret, Byzantine Musical Symbols, (Western) Musical Symbols, and Mathematical Alphanumeric Symbols. Together these comprise 1594 newly encoded characters.

The Supplementary Ideographic Plane, or Plane 2, contains a very large collection of additional unified Han ideographs known as Vertical Extension B, comprising 42,711 characters, as well as 542 additional CJK Compatibility ideographs.

The Supplementary Special Purpose Plane, or Plane 14, contains a set of tag characters, 97 in all.

Counting the additions to the three supplementary planes and the two characters on the BMP, Unicode 3.1 adds 44,946 new encoded characters. Together with the 49,194 already existing characters in Unicode 3.0, that comes to a grand total of 94,140 encoded characters in Unicode 3.1.

Of those 94,140 characters, 70,207 are unified Han ideographs, and an additional 832 are CJK Compatibility ideographs -- slightly more than 75% of the encoded characters in the standard.

There are 34 specific code points in Unicode 3.0 that are characterized as noncharacters (U+nFFFE and U+nFFFF (where n is from 0 to hex 10). Unicode 3.1 adds an additional 32 noncharacters to the BMP at code points U+FDD0 to U+FDEF.

Unicode Technical Reports
#11: East Asian Width,
#13: Unicode Newline Guidelines,
#14: Line Breaking Properties,
and #15: Unicode Normalization Forms
have been promoted to the status of Unicode Technical Annex (UTX) and are thus officially part of the Unicode Standard.

Some of the differences between Unicode 3.0 and Unicode 3.1 include :

New Code Blocks

11 new code blocks were added in 3.1

U+10300 to U+1032F Old Italic 35/48
U+10330 to U+1034F Gothic 27/32
U+10400 to U+1044F Deseret 76/80
U+1D000 to U+1D0FF Byzantine Musical Symbols 246/256
U+1D100 to U+1D1FF Musical Symbols 219/256
U+1D400 to U+1D7FF Mathematical Alphanumeric Symbols 991/1024
U+20000 to U+2A6DF CJK Unified Ideographs Extension B 42711/42720
U+2F800 to U+2FA1F CJK Compatibility Ideographs Supplement 542/544
U+E0000 to U+E007F Tags 97/128
U+F0000 to U+FFFFF Supplementary Private Use Area A 65534/65536
U+100000 to U+10FFFF Supplementary Private Use Area B 65534/65536

New Characters

Excluding those in the new code blocks, there were 2 new characters added in Unicode 3.1

Number of characters in each General Category :

Letter, Uppercase          Lu :  1
Letter, Lowercase          Ll :  1

All the characters in this set are in bidirectional category LeftToRight L

The columns below should be interpreted as :

The Unicode code for the character
The character in question
The Unicode name for the character
The Unicode General Category for the character

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

Greek and Coptic

Greek symbols

U+03F4 ϴ Greek capital theta symbol Lu: ref U+0472 Cyrillic capital letter fita (Cyrillic)
U+03F5 ϵ Greek lunate epsilon symbol Ll: aka straight epsilon; ref U+220A small element of (Mathematical Operators)

Altered Characters

In addition, 3 characters were altered in 3.1

Runic

U+16EE ᛮ Runic arlaug symbol had its General Category changed from Number, Other to Number, Letter
U+16EF ᛯ Runic tvimadur symbol h ad its General Category changed from Number, Other to Number, Letter
U+16F0 ᛰ Runic belgthor symbol h ad its General Category changed from Number, Other to Number, Letter

http://unicode.org

Unicode 3.2	Unicode 3.0	July 6, 2000	import java.util.*;
The Taming of the Shrew	Unicode