Here is a quick guide to Unicode characters used with some non-Western-European languages. It is organized by language.

For Western languages, see HTML symbol reference. They have HTML entity codes beginning with ampersand and ending with semicolon, around a name, for example é . Most of these should also be creatable on your keyboard using a combination with Alt, Ctrl, or Option keys: see Special Alt key characters & accents. The Western European character set covers English, French, Spanish, Italian, Portuguese, German, Danish, Swedish, Norwegian, Finnish, and in theory Icelandic though in practice the letters thorn and edh often come out wrong. Blame your browser. Greek letters can also be represented by HTML entities such as α .

For brevity I am not repeating those letters that are found in the Western set, with acute, grave, circumflex, umlaut, and so on. See Accent marks used with the Latin alphabet for a list of Western and Eastern accented letters arranged by accent.

In general, do not use accented letters in node titles or in hard links. Even if you think they're better that way. They're not. What's better is if other noders can find them. The E2 Search facility is limited in what it can find: it cannot find ü if you search for u, nor vice versa. Acutes and graves are okay, but umlauts won't work. It is better to leave other accents off. E2 is written in English, not Hungarian, and in English we usually leave all accents off. Please do not put in title edit requests asking for them to be added. If you want the accents to appear in your text, pipelink them, e.g. [Lowenbrau|Löwenbräu]. See E2 FAQ: Using Special HTML Characters for more detail on this.

Never use HTML entities or Unicode in names in node titles. Don't be pedantic about names. Pedantry is bad. Usefulness is good.

In the following tables capital letters come before lowercase. If you can't see them properly, this won't be of use to you. That's a limitation of your browser. A lot of browsers won't be able to show them, and they'll just appear as rectangles or question marks. And I use proper human numbers, not hexadecimal, which means there's no "x" in the code, just &#nnn;.

Large scripts like Chinese and Devanagari are beyond the scope of this write-up, as are extras like the vowel pointing of Hebrew and Arabic. Go to www.unicode.org/charts for all the rest, like Mongolian, Tamil, Ogham, -- the lot.

Albanian

No non-Western letters. Has Ç and Ë.

Arabic

ا  ا    alif
ب  ب    ba
ة  ة    ta marbuta
ت  ت    ta
ث  ث    tha
ج  ج    jim
ح  ح    ha emphatic
خ  خ    kha
د  د    dal
ذ  ذ    dhal
ر  ر    ra
ز  ز    za
س  س    sin
ش  ش    shin
ص  ص    sad
ض  ض    dad
ط  ط    ta emphatic
ظ  ظ    za emphatic
ع  ع    ain
غ  غ    ghain
a gap in numbers
ف  ف    fa
ق  ق    qaf
ك  ك    kaf
ل  ل    lam
م  م    mim
ن  ن    nun
ه  ه    ha
و  و    waw
ى  ى    ya undotted
ي  ي    ya dotted

Letters with hamza:
ء  ء    no bearer
أ  أ    alif hamza above
ؤ  ؤ    waw hamza
إ  إ    alif hamza below
ئ  ئ    ya hamza

Other diacritics:
آ  آ    alif maddah
ً  ً    fathah with nunation
ٌ  ٌ    dammah with nunation
ٍ  ٍ    kasrah with nunation
َ  َ    fathah     
ُ  ُ    dammah     
ِ  ِ    kasrah     
ّ  ّ    shaddah  
ْ  ْ    sukun

Numerals:
٠  ٠    0
١  ١    1
٢  ٢    2
٣  ٣    3
٤  ٤    4
٥  ٥    5
٦  ٦    6
٧  ٧    7
٨  ٨    8
٩  ٩    9

Arabic transliteration

Ā  Ā   ā  ā   A-macron
Ḍ Ḍ   ḍ ḍ   D-dot-below
Ḥ Ḥ   ḥ ḥ   H-dot-below
Ī  Ī   ī  ī   I-macron
Ṣ Ṣ   ṣ ṣ   S-dot-below
Ṭ Ṭ   ṭ ṭ   T-dot-below
Ū  Ū   ū  ū   U-macron

Azeri

Ə  Ə   ə  ə   schwa
Ğ  Ğ   ğ  ğ   G-breve (yumuşak-G)
İ  İ               I dotted capital
            ı  ı   I undotted lowercase
Ş  Ş   ş  ş   S-cedilla
Also uses Ç, Ö, Ü. Formerly used Ä for Ə and this is still used when symbol Ə is unavailable.

Belarusian

Belarusian uses (part of) the Cyrillic alphabet (see under Russian below) with the following additional letters:
Ґ  Ґ   ґ  ґ   G-hook
І  І   і  і   I
Ў  Ў   ў  ў   U-breve

Bulgarian

Bulgarian uses (part of) the Cyrillic alphabet (see under Russian below) but with no additional letters.

Catalan

Ŀ  Ŀ   ŀ  ŀ   L-mid-dot

Chechen

Has a new Roman alphabet which however has numerous letters not yet representable in Unicode.

Croatian

Ć  Ć   ć  ć   C-acute
Č  Č   č  č   C-hacek
Đ  Đ   đ  đ   D-bar
Š  Š   š  š   S-hacek
Ž  Ž   ž  ž   Z-hacek

Czech

Č  Č   č  č   C-hacek
Ď  Ď   ď  ď   D-hook
Ě  Ě   ě  ě   E-hacek
Ň  Ň   ň  ň   N-hacek
Ř  Ř   ř  ř   R-hacek
Š  Š   š  š   S-hacek
Ť  Ť   ť  ť   T-hook
Ů  Ů   ů  ů   U-circle
Ž  Ž   ž  ž   Z-hacek
Also uses Á, É, Í, Ó, Ú, Ý.

Esperanto

Ĉ  Ĉ   ĉ  ĉ   C-circumflex
Ĝ  Ĝ   ĝ  ĝ   G-circumflex
Ĥ  Ĥ   ĥ  ĥ   H-circumflex
Ĵ  Ĵ   ĵ  ĵ   J-circumflex
Ŝ  Ŝ   ŝ  ŝ   S-circumflex
Ŭ  Ŭ   ŭ  ŭ   U-breve

Estonian

No non-Western letters. Has Õ, Ö, Ü.

Hawaiian

ʻ  ʻ               'okina
Ā  Ā   ā  ā   A-macron
Ē  Ē   ē  ē   E-macron
Ī  Ī   ī  ī   I-macron
Ō  Ō   ō  ō   O-macron
Ū  Ū   ū  ū   U-macron

Hebrew

(These letter names are Biblical Hebrew because I know more about that.)
א  א    aleph
ב  ב    beth
ג  ג    gimel
ד  ד    daleth
ה  ה    he
ו  ו    waw
ז  ז    zayin
ח  ח    heth
ט  ט    teth
י  י    yod
ך  ך    kaph final
כ  כ    kaph
ל  ל    lamedh
ם  ם    mem final
מ  מ    mem
ן  ן    nun final
נ  נ    nun
ס  ס    samekh
ע  ע    ayin
ף  ף    pe final
פ  פ    pe
ץ  ץ    sadhe final
צ  צ    sadhe
ק  ק    qoph
ר  ר    resh
ש  ש    shin/sin
ת  ת    taw

Hungarian

Ő  Ő   ő  ő   O-double-acute
Ű  Ű   ű  ű   U-double-acute
Also has Ö, Ü, and Á, É, Í, Ó, Ú.

Japanese

See the nodes hiragana and katakana.

Japanese transliteration

Ā  Ā   ā  ā   A-macron
Ē  Ē   ē  ē   E-macron
Ō  Ō   ō  ō   O-macron
Ū  Ū   ū  ū   U-macron

Korean transliteration

In one common romanization (no longer officially used) of Hangul these two are used:
Ŏ  Ŏ   ŏ  ŏ   O-breve
Ŭ  Ŭ   ŭ  ŭ   U-breve

Latin

Ā  Ā   ā  ā   A-macron
Ă  Ă   ă  ă   A-breve
Ē  Ē   ē  ē   E-macron
Ĕ  Ĕ   ĕ  ĕ   E-breve
Ī  Ī   ī  ī   I-macron
Ĭ  Ĭ   ĭ  ĭ   I-breve
Ō  Ō   ō  ō   O-macron
Ŏ  Ŏ   ŏ  ŏ   O-breve
Ū  Ū   ū  ū   U-macron
Ŭ  Ŭ   ŭ  ŭ   U-breve

Latvian

Ā  Ā   ā  ā   A-macron
Č  Č   č  č   C-hacek
Ē  Ē   ē  ē   E-macron
Ģ  Ģ   ģ  ģ   G-cedilla
Ī  Ī   ī  ī   I-macron
Ķ  Ķ   ķ  ķ   K-cedilla
Ļ  Ļ   ļ  ļ   L-cedilla
Ņ  Ņ   ņ  ņ   N-cedilla
Ō  Ō   ō  ō   O-macron
Ŗ  Ŗ   ŗ  ŗ   R-cedilla
Š  Š   š  š   S-hacek
Ū  Ū   ū  ū   U-macron
Ž  Ž   ž  ž   Z-hacek

Lithuanian

Ą  Ą   ą  ą   A-ogonek
Č  Č   č  č   C-hacek
Ę  Ę   ę  ę   E-ogonek
Ė  Ė   ė  ė   E-dot-above
Į  Į   į  į   I-ogonek
Š  Š   š  š   S-hacek
Ū  Ū   ū  ū   U-macron
Ų  Ų   ų  ų   U-ogonek
Ž  Ž   ž  ž   Z-hacek

Macedonian

Macedonian uses (part of) the Cyrillic alphabet (see under Russian below) with the following additional letters:
Ѓ  Ѓ   ѓ  ѓ   GJ (G-acute)
Ѕ  Ѕ   ѕ  ѕ   DZ
Ј  Ј   ј  ј   J
Љ  Љ   љ  љ   LJ
Њ  Њ   њ  њ   NJ
Ќ  Ќ   ќ  ќ   KJ (K-acute)
Џ  Џ   џ  џ   DZ-hacek

Maltese

Ċ  Ċ   ċ  ċ   C-dot-above
Ġ  Ġ   ġ  ġ   G-dot-above
Ħ  Ħ   ħ  ħ   H-bar
Ż  Ż   ż  ż   Z-dot-above

Māori

Ā  Ā   ā  ā   A-macron
Ē  Ē   ē  ē   E-macron
Ī  Ī   ī  ī   I-macron
Ō  Ō   ō  ō   O-macron
Ū  Ū   ū  ū   U-macron

Persian

The following are additions to the Arabic alphabet used in Persian.
پ  پ    p
چ  چ    ch
ژ  ژ    zh
گ  گ    g

Polish

Ą  Ą   ą  ą   A-ogonek
Ć  Ć   ć  ć   C-acute
Ę  Ę   ę  ę   E-ogonek
Ł  Ł   ł  ł   L-slash
Ń  Ń   ń  ń   N-acute
Ś  Ś   ś  ś   S-acute
Ź  Ź   ź  ź   Z-acute
Ż  Ż   ż  ż   Z-dot-above
Also has Ó.

Romanian

Ă  Ă   ă  ă   A-breve
Ş  Ş   ş  ş   S-cedilla
Ţ  Ţ   ţ  ţ   T-cedilla
The Romanians actually prefer underposed commas instead of cedillas, and there are symbols defined for these too, but they are less likely to show up:
Ș  Ș   ș  ș   S-comma
Ț  Ț   ț  ț   T-comma
Also has Â, Î.

Russian

А  А  а  а      a
Б  Б  б  б      b
В  В  в  в      v
Г  Г  г  г      g
Д  Д  д  д      d
Е  Е  е  е      ye
Ё  Ё  ё  ё      yo (N.B. out of order!)
Ж  Ж  ж  ж      zh
З  З  з  з      z
И  И  и  и      i
Й  Й  й  й      y
К  К  к  к      k
Л  Л  л  л      l
М  М  м  м      m
Н  Н  н  н      n
О  О  о  о      o
П  П  п  п      p
Р  Р  р  р      r
С  С  с  с      s
Т  Т  т  т      t
У  У  у  у      u
Ф  Ф  ф  ф      f
Х  Х  х  х      kh
Ц  Ц  ц  ц      ts
Ч  Ч  ч  ч      ch
Ш  Ш  ш  ш      sh
Щ  Щ  щ  щ      shch
Ъ  Ъ  ъ  ъ      hard sign
Ы  Ы  ы  ы      y
Ь  Ь  ь  ь      soft sign
Э  Э  э  э      e
Ю  Ю  ю  ю      yu
Я  Я  я  я      ya

Sanskrit transliteration

Ā  Ā   ā  ā   A-macron
Ḍ Ḍ   ḍ ḍ   D-dot-below
Ḥ Ḥ   ḥ ḥ   H-dot-below
Ī  Ī   ī  ī   I-macron
Ḷ Ḷ   ḷ ḷ   L-dot-below
Ṃ Ṃ   ṃ ṃ   M-dot-below
Ṅ Ṅ   ṅ ṅ   N-dot-above
Ṇ Ṇ   ṇ ṇ   N-dot-below
Ṛ Ṛ   ṛ ṛ   R-dot-below
Ṝ Ṝ   ṝ ṝ   R-dot-and-macron
Ś  Ś   ś  ś   S-acute
Ṣ Ṣ   ṣ ṣ   S-dot-below
Ṭ Ṭ   ṭ ṭ   T-dot-below
Ū  Ū   ū  ū   U-macron
Also uses Ñ.

Serbian

Serbian uses (part of) the Cyrillic alphabet (see under Russian above) with the following additional letters:
Ђ  Ђ   ђ  ђ   D-bar
Ј  Ј   ј  ј   J
Љ  Љ   љ  љ   LJ
Њ  Њ   њ  њ   NJ
Ћ  Ћ   ћ  ћ   C-acute
Џ  Џ   џ  џ   DZ-hacek

Slovak

Č  Č   č  č   C-hacek
Ď  Ď   ď  ď   D-hook
Ĺ  Ĺ   ĺ  ĺ   L-acute
Ľ  Ľ   ľ  ľ   L-apostrophe
Ň  Ň   ň  ň   N-hacek
Ŕ  Ŕ   ŕ  ŕ   R-acute
Š  Š   š  š   S-hacek
Ť  Ť   ť  ť   T-hook
Ž  Ž   ž  ž   Z-hacek
Also has Á, É, Í, Ó, Ú, Ý, and also Ô.

Turkish

Ğ  Ğ   ğ  ğ   G-breve (yumuşak-G)
İ  İ               I dotted capital
            ı  ı   I undotted lowercase
Ş  Ş   ş  ş   S-cedilla
Also has Ç, Ö, Ü.

Turkmen

Ň  Ň   ň  ň   N-hacek
Ş  Ş   ş  ş   S-cedilla
Ž  Ž   ž  ž   Z-hacek
Also uses Ä, Ç Ö, Ü, Ý. Originally reported as using currency symbols $, ¢, ¥, but it seems these have now been replaced.

Ukrainian

Ukrainian uses the Cyrillic alphabet (see under Russian above) with the following additional letters:
Є  Є   є  є   curved-E
І  І   і  і   I
Ї  Ї   ї  ї   I-umlaut
Ґ  Ґ   ґ  ґ   G-hook

Vietnamese

Ă  Ă   ă  ă   A-breve
Đ  Đ   đ  đ   D-bar
Ơ  Ơ   ơ  ơ   O-hook
Ư  Ư   ư  ư   U-hook
These and Â, Ê are letters of the Vietnamese alphabet; there are also numerous other accents for tone marks, which may be combined with any of the vowels.

Welsh

Ŵ  Ŵ   ŵ  ŵ   W-circumflex
Ŷ  Ŷ   ŷ  ŷ   Y-circumflex
Also has Â, Ê, Î, Ô Û, and occasionally some others such as Ï.

Yoruba

Ẹ Ẹ   ẹ ẹ   E-dot-below
Ọ Ọ   ọ ọ   O-dot-below
Ṣ Ṣ   ṣ ṣ   S-dot-below