Unicode version 4.0.0, was released in the April, 2003. The previous version was Unicode 3.2 and the next is Unicode 4.1.

All the gory details can be found at http://www.unicode.org/versions/Unicode4.0.0/

The primary feature of Unicode 4.0 is the addition of 1,288 newly encoded characters.

Version 4.0.1 was released in March, 2004 (http://www.unicode.org/versions/Unicode4.0.1/)

The main new features in Unicode 4.0.1 (compared to 4.0.0) are:

  1. The first significant update of the Unihan Database (Unihan.txt) since Unicode 3.2.0, including a large number of fixes and additional data items.
  2. Significant clarifications in four definitions used in conformance.
  3. Unicode Character Database:
    • New character properties: STerm and Variation_Selector
    • Updated significantly: Terminal_Punctuation, Math, Script, and Line_Break
    • Changed: general category of U+200B ZERO WIDTH SPACE
    • Changed: bidi class of some characters including: +, -, / and FRACTION SLASH
    • Added: property value aliases
    • Revised: formats in some of the data files
  4. Changes in the recommended loose comparison of Character name values.
  5. Clearer definition of the encoding of Bengali Reph and Ya-phalaa

The changes in 4.0.0 since the previous version, Unicode 3.2, are as follows :


New Code Blocks

15 new code blocks were added in 4.0


U+1900 to U+194F   Limbu 66/80
U+1950 to U+197F   Tai Le 35/48
U+19E0 to U+19FF   Khmer Symbols 32/32
U+1D00 to U+1D7F   Phonetic Extensions 108/128
U+2B00 to U+2BFF   Miscellaneous Symbols and Arrows 14/256
U+4DC0 to U+4DFF   Yijing Hexagram Symbols 64/64
U+10000 to U+1007F   Linear B Syllabary 88/128
U+10080 to U+100FF   Linear B Ideograms 123/128
U+10100 to U+1013F   Aegean Numbers 57/64
U+10380 to U+1039F   Ugaritic 31/32
U+10450 to U+1047F   Shavian 48/48
U+10480 to U+104AF   Osmanya 40/48
U+10800 to U+1083F   Cypriot Syllabary 55/64
U+1D300 to U+1D35F   Tai Xuan Jing Symbols 87/96
U+E0100 to U+E01EF   Variation Selectors Supplement 240/240

 

New Characters

Excluding those in the new code blocks, there were 138 new characters added in Unicode 4.0

Number of characters in each General Category :

Letter, Uppercase        Lu :  5
Letter, Lowercase        Ll : 11
Letter, Other            Lo : 16
Mark, Non-Spacing        Mn : 25
Mark, Spacing Combining  Mc :  1
Number, Other            No : 11
Punctuation, Connector   Pc :  1
Punctuation, Open        Ps :  1
Punctuation, Close       Pe :  1
Punctuation, Other       Po :  2
Symbol, Currency         Sc :  2
Symbol, Modifier         Sk : 17
Symbol, Other            So : 41
Other, Format            Cf :  4

Number of characters in each Bidirectional Category :

Left To Right                 L : 24
Right To Left Arabic         AL : 14
European Number Terminator   ET :  2
Non Spacing Mark            NSM : 25
Other Neutral                ON : 73

The columns below should be interpreted as :

  1. The Unicode code for the character
  2. The character in question
  3. The Unicode name for the character
  4. The Unicode General Category for the character
  5. The Unicode Bidirectional Category for the character

If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.

 

Latin Extended B

     Miscellaneous additions

U+0221   ȡ   Latin small letter D with curl Ll L
* phonetic use in Sinology

     Additions for Sinology

U+0234   ȴ   Latin small letter L with curl Ll L
U+0235   ȵ   Latin small letter N with curl Ll L
U+0236   ȶ   Latin small letter T with curl Ll L

 

IPA Extensions

     Additions for Sinology

U+02AE   ʮ   Latin small letter turned h with fishhook Ll L
U+02AF   ʯ   Latin small letter turned h with fishhook and tail Ll L

 

Spacing Modifier Letters

     UPA modifiers

U+02EF   ˯   modifier letter low down arrowhead Sk ON
U+02F0   ˰   modifier letter low up arrowhead Sk ON
U+02F1   ˱   modifier letter low left arrowhead Sk ON
U+02F2   ˲   modifier letter low right arrowhead Sk ON
U+02F3   ˳   modifier letter low ring Sk ON
U+02F4   ˴   modifier letter middle grave accent Sk ON
U+02F5   ˵   modifier letter middle double grave accent Sk ON
U+02F6   ˶   modifier letter middle double acute accent Sk ON
U+02F7   ˷   modifier letter low tilde Sk ON
U+02F8   ˸   modifier letter raised colon Sk ON
U+02F9   ˹   modifier letter begin high tone Sk ON
U+02FA   ˺   modifier letter end high tone Sk ON
U+02FB   ˻   modifier letter begin low tone Sk ON
U+02FC   ˼   modifier letter end low tone Sk ON
U+02FD   ˽   modifier letter shelf Sk ON
U+02FE   ˾   modifier letter open shelf Sk ON
U+02FF   ˿   modifier letter low left arrow Sk ON

 

Combining Diacritical Marks

     Additions for the Uralic Phonetic Alphabet

U+0350   ͐   combining right arrowhead above Mn NSM
U+0351   ͑   combining left half ring above Mn NSM
U+0352   ͒   combining fermata Mn NSM
U+0353   ͓   combining x below Mn NSM
U+0354   ͔   combining left arrowhead below Mn NSM
U+0355   ͕   combining right arrowhead below Mn NSM
U+0356   ͖   combining right arrowhead and up arrowhead below Mn NSM
U+0357   ͗   combining right half ring above Mn NSM

     Double diacritics

U+035D   ͝   combining double breve Mn NSM
U+035E   ͞   combining double macron Mn NSM
U+035F   ͟   combining double macron below Mn NSM

 

Greek and Coptic

     Additional archaic letters for Bactrian

U+03F7   Ϸ   Greek capital letter sho Lu L
U+03F8   ϸ   Greek small letter sho Ll L

     Variant letterform

U+03F9   Ϲ   Greek capital lunate sigma symbol Lu L

     Archaic letters

U+03FA   Ϻ   Greek capital letter san Lu L
U+03FB   ϻ   Greek small letter san Ll L

 

Arabic

     Subtending marks

U+0600   ؀   Arabic number sign Cf AL
U+0601   ؁   Arabic sign sanah Cf AL
U+0602   ؂   Arabic footnote marker Cf AL
U+0603   ؃   Arabic sign safha Cf AL

     Punctuation

U+060D   ؍   Arabic date separator Po AL

     Poetic marks

U+060E   ؎   Arabic poetic verse sign So ON
U+060F   ؏   Arabic sign misra So ON

     Honorifics

U+0610   ؐ   Arabic sign sallallahou alayhe wassallam Mn NSM
* represents sallallahu alayhe wasallam "may God's peace and blessings be upon him"
U+0611   ؑ   Arabic sign alayhe assallam Mn NSM
* represents alayhe assalam "upon him be peace"
U+0612   ؒ   Arabic sign rahmatullah alayhe Mn NSM
* represents rahmatullah alayhe "may God have mercy upon him"
U+0613   ؓ   Arabic sign radi allahou anhu Mn NSM
* represents radi allahu 'anhu "may God be pleased with him"
U+0614   ؔ   Arabic sign takhallus Mn NSM
* sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names

     Koranic annotation sign

U+0615   ؕ   Arabic small high tah Mn NSM
* marks a recommended pause position in some Korans published in Iran and Pakistan
* should not be confused with the small TAH sign used as a diacritic for some letters such as 0679

     Other combining marks

U+0656   ٖ   Arabic subscript alef Mn NSM
U+0657   ٗ   Arabic inverted damma Mn NSM
U+0658   ٘   Arabic mark noon ghunna Mn NSM
* Kashmiri and Baluchi
* indicates nasalization in Urdu

     Extended Arabic letters for Parkari

U+06EE   ۮ   Arabic letter dal with inverted v Lo AL
U+06EF   ۯ   Arabic letter reh with inverted v Lo AL

     Extended Arabic letter for Parkari

U+06FF   ۿ   Arabic letter heh with inverted v Lo AL

 

Syriac

     Persian letters

U+072D   ܭ   Syriac letter persian bheth Lo AL
U+072E   ܮ   Syriac letter persian ghamal Lo AL
U+072F   ܯ   Syriac letter persian dhalath Lo AL

     Sogdian letters

U+074D   ݍ   Syriac letter sogdian zhain Lo AL
U+074E   ݎ   Syriac letter sogdian khaph Lo AL
U+074F   ݏ   Syriac letter sogdian fe Lo AL

 

Devanagari

     Independent vowels

U+0904   ऄ   Devanagari letter short a Lo L

 

Bengali

     Various signs

U+09BD   ঽ   Bengali sign avagraha Lo L

 

Gurmukhi

     Based on ISCII 1988

U+0A01   ਁ   Gurmukhi sign adak bindi Mn NSM
U+0A03   ਃ   Gurmukhi sign visarga Mc L

 

Gujarati

     Independent vowels

U+0A8C   ઌ   Gujarati letter vocalic l Lo L
* used with Sanskrit text

     Additions for use with Sanskrit text

U+0AE1   ૡ   Gujarati letter vocalic ll Lo L
U+0AE2   ૢ   Gujarati vowel sign vocalic l Mn NSM
U+0AE3   ૣ   Gujarati vowel sign vocalic ll Mn NSM

     Currency sign

U+0AF1   ૱   Gujarati rupee sign Sc ET

 

Oriya

     Consonants

U+0B35   ଵ   Oriya letter va Lo L
ref U+0B2C   ବ   Oriya letter ba (Oriya)

     Oriya-specific additions

U+0B71   ୱ   Oriya letter wa Lo L
ref U+0B13   ଓ   Oriya letter O (Oriya)
ref U+0B35   ଵ   Oriya letter va (Oriya)

 

Tamil

     Tamil symbols

U+0BF3   ௳   Tamil day sign So ON
U+0BF4   ௴   Tamil month sign So ON
U+0BF5   ௵   Tamil year sign So ON
U+0BF6   ௶   Tamil debit sign So ON
U+0BF7   ௷   Tamil credit sign So ON
U+0BF8   ௸   Tamil as above sign So ON

     Currency symbol

U+0BF9   ௹   Tamil rupee sign Sc ET

     Tamil symbol

U+0BFA   ௺   Tamil number sign So ON

 

Kannada

     Various signs

U+0CBC   ಼   Kannada sign nukta Mn NSM
U+0CBD   ಽ   Kannada sign avagraha Lo L

 

Khmer

     Various signs

U+17DD   ៝   Khmer sign atthacan Mn NSM
* mostly obsolete
* indicates that the base character is the final consonant of a word with its inherent vowel sound
ref U+17D1   ៑   Khmer sign viriam (Khmer)

     Numeric symbols for divination lore

U+17F0   ៰   Khmer symbol lek attak son No ON
U+17F1   ៱   Khmer symbol lek attak muoy No ON
U+17F2   ៲   Khmer symbol lek attak pii No ON
U+17F3   ៳   Khmer symbol lek attak bei No ON
U+17F4   ៴   Khmer symbol lek attak buon No ON
U+17F5   ៵   Khmer symbol lek attak pram No ON
U+17F6   ៶   Khmer symbol lek attak pram muoy No ON
U+17F7   ៷   Khmer symbol lek attak pram pii No ON
U+17F8   ៸   Khmer symbol lek attak pram bei No ON
U+17F9   ៹   Khmer symbol lek attak pram buon No ON

 

General Punctuation

     General punctuation

U+2053   ⁓   swung dash Po ON
U+2054   ⁔   inverted undertie Pc ON

 

Letterlike Symbols

     Additional letterlike symbols

U+213B   ℻   facsimile sign So ON
ref U+2121   ℡   telephone sign (Letterlike Symbols)

 

Miscellaneous Technical

     Keyboard and UI symbols

U+23CF   ⏏   eject symbol So ON
* UI symbol to eject media

     Special character extension

U+23D0   ⏐   vertical line extension So ON
* used for extension of arrows
ref U+23AF   ⎯   horizontal line extension (Miscellaneous Technical)

 

Enclosed Alphanumerics

     Additional white on black circled number

U+24FF   ⓿   negative circled digit zero No ON
ref U+2776   ❶   dingbat negative circled digit one (Dingbats)

 

Miscellaneous Symbols

     Weather symbol

U+2614   ☔   umbrella with rain drops So ON
aka showery weather

     Miscellaneous symbol

U+2615   ☕   hot beverage So ON
aka tea or coffee, depending on locale
* can be used to indicate a wait
ref U+231A   ⌚   watch (Miscellaneous Technical)
ref U+231B   ⌛   hourglass (Miscellaneous Technical)

     Yijing monogram and digram symbols

U+268A   ⚊   monogram for yang So ON
U+268B   ⚋   monogram for yin So ON
U+268C   ⚌   digram for greater yang So ON
U+268D   ⚍   digram for lesser yin So ON
U+268E   ⚎   digram for lesser yang So ON
U+268F   ⚏   digram for greater yin So ON

     Map markers

U+2690   ⚐   white flag So ON
U+2691   ⚑   black flag So ON

     Warning signs

U+26A0   ⚠   warning sign So ON
U+26A1   ⚡   high voltage sign So ON

 

Enclosed CJK Letters and Months

     Parenthesized Korean words

U+321D   ㈝   parenthesized korean character ojeon So ON
U+321E   ㈞   parenthesized korean character o hu So ON

     Squared Latin abbreviation

U+3250   ㉐   partnership sign So ON

     Circled Korean words

U+327C   ㉼   circled korean character chamko So ON
U+327D   ㉽   circled korean character jueui So ON

     Squared Latin abbreviations

U+32CC   ㋌   square hg So ON
U+32CD   ㋍   square erg So ON
U+32CE   ㋎   square ev So ON
U+32CF   ㋏   limited liability sign So ON

 

CJK Compatibility

     Squared Latin abbreviations

U+3377   ㍷   square dm So ON
U+3378   ㍸   square dm squared So ON
U+3379   ㍹   square dm cubed So ON
U+337A   ㍺   square iu So ON

     Squared Latin abbreviations

U+33DE   ㏞   square v over m So ON
U+33DF   ㏟   square a over m So ON

     Squared Latin abbreviation

U+33FF   ㏿   square gal So ON

 

Arabic Presentation Forms A

     Symbol

U+FDFD   ﷽   Arabic ligature bismillah ar rahman ar raheem So ON

 

CJK Compatibility Forms

     Glyphs for vertical variants

U+FE47   ﹇   presentation form for vertical left square bracket Ps ON
ref U+23B4   ⎴   top square bracket (Miscellaneous Technical)
U+FE48   ﹈   presentation form for vertical right square bracket Pe ON
ref U+23B5   ⎵   bottom square bracket (Miscellaneous Technical)

 

Deseret

     Uppercase letters

U+10426   𐐦   Deseret capital letter oi Lu L
U+10427   𐐧   Deseret capital letter ew Lu L

     Lowercase letters

U+1044E   𐑎   Deseret small letter oi Ll L
U+1044F   𐑏   Deseret small letter ew Ll L

 

Mathematical Alphanumeric Symbols

     Script symbols

U+1D4C1   𝓁   mathematical script small l Ll L
ref U+2113   ℓ   script small l (Letterlike Symbols)

 

Altered Characters

In addition, 18 characters were altered in 4.0

 

Latin-1 Supplement


U+00AD   ­   soft hyphen had its General Category changed from Punctuation, Dash to Other, Format

 

Spacing Modifier Letters


U+02B9   ʹ   modifier letter prime had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02BA   ʺ   modifier letter double prime had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02C6   ˆ   modifier letter circumflex accent had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02C7   ˇ   caron had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02C8   ˈ   modifier letter vertical line had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02C9   ˉ   modifier letter macron had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CA   ˊ   modifier letter acute accent had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CB   ˋ   modifier letter grave accent had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CC   ˌ   modifier letter low vertical line had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CD   ˍ   modifier letter low macron had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CE   ˎ   modifier letter low grave accent had its General Category changed from Symbol, Modifier to Letter, Modifier
U+02CF   ˏ   modifier letter low acute accent had its General Category changed from Symbol, Modifier to Letter, Modifier

 

Kannada


U+0CBF   ಿ   Kannada vowel sign i had its Bidirectional Category changed from Non Spacing Mark to Left To Right
U+0CC6     Kannada vowel sign e had its Bidirectional Category changed from Non Spacing Mark to Left To Right

 

Khmer


U+17B4     Khmer vowel inherent aq had its General Category changed from Mark, Spacing Combining to Other, Format
U+17B5     Khmer vowel inherent aa had its General Category changed from Mark, Spacing Combining to Other, Format

 

Mongolian


U+180E     Mongolian vowel separator had its General Category changed from Other, Format to Separator, Space
U+180E     Mongolian vowel separator had its Bidirectional Category changed from Boundary Neutral to Whitespace
http://unicode.org

Log in or register to write something here or to contact authors.