Unicode > Indic Scripts

Standard Unicode Disclaimer:
Not all the Unicode characters represented below or in other writeups may be viewable in your browser. In fact, some of the characters may not be viewable in any browser. This is because Unicode is an evolving and ever-growing standard which has the ability to store/represent literally millions of characters and symbols from hundreds of languages and cultures past and present, and not all software has the ability to display all the characters. If your browser does not understand the Unicode value of the character, it will usually display a small square box or a question mark. This is normal and expected behavior, and does not mean there is a problem with this writeup or your web browser. For additional information see Using Unicode on E2. In addition, you may have luck changing your font in the ekw Preferences to something such as "Arial Unicode MS" which has better Unicode support.

The Indic scripts and alphabets of Southeast Asia share many common features, and most of them are derived from the ancient Brahmi or Kharoshthi scripts. Both of these script themselves are though to be derived from Semitic and Aramaic. Words are separated by spaces, and quite often Indic scripts use western punctuation (periods, commas, etc.) in writing, though traditional punctuation is used in certain cases. Most of the languages that use these scripts are read left to right and are called abugidas - writing systems where the basic symbols represent a consonant with an inherent vowel. If a different vowel is needed, a diacritic or other distinguishing mark called a matra is added to the symbol. If no vowel is needed, another special symbol called a virama is used. These separate vowel symbols can come before, after, above, below, or even surround the consonant. If a vowel is needed at the start of the word, or two vowels are next to each other in a word, there are special vowel symbols for these cases. While all of these scripts are descended from a common language, there is a wide variety of differences between them - especially those used in northern India vs. southern India.

The Unicode Indic Scripts include:

The Unicode values for these characters were heavily influenced by the Indian Standard Code for Information Interchange (ISCII), but while the Unicode Standard is quite usable and robust for common and modern use, not every word can be represented in its current state.

Below you will find a list of characters and common punctuation included in the Indic scripts. Where applicable, there are additional links to writeups that include more detailed information for each block of Unicode characters.

NOTE: I have enclosed the characters in <big> tags so that it is easier to see the details of various characters you may not be familiar with.


Devanagari

Primary languages that use these characters: Awadhi, Bagheli, Bhatneri, Bhili, Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Ghondi, Harauti, Ho, Jaipuri, Kachchhi, Kanauji, Konkani, Kilui, Kumaoni, Kurku, Marathi, Marwari, Mundari, Nepali, Newari, Palpa, Sanskrit, Santali, and Sindhi

HTML Display Characters: &#2304; to &#2431; (decimal) / &#x900; to &#x97f; (hexadecimal)

                                                                                                                              ि                                                                                                                                 ॿ  


Bengali

Primary languages that use these characters: Assamese, Bengali, Daphla, Hallam, Khasi, Manipuri, Mizo, Munda, Naga, Rian, and Santali

HTML Display Characters: &#2432; to &#2559; (decimal) / &#x980; to &#x9ff; (hexadecimal)

                                                                                                                              ি                                                                                                                                 ৿  


Gurmukhi

Primary languages that use these characters: Panjabi/Punjabi

HTML Display Characters: &#2560; to &#2687; (decimal) / &#xa00; to &#xa7f; (hexadecimal)

                                                                                                                              ਿ                                                                                                                                 ੿  


Gujarati

Primary languages that use these characters: Gujarati

HTML Display Characters: &#2688; to &#2815; (decimal) / &#xa80; to &#xaff; (hexadecimal)

                                                                                                                              િ                                                                                                                                 ૿  


Oriya

Primary languages that use these characters: Khondi, Oriya, and Santali

HTML Display Characters: &#2816; to &#2943; (decimal) / &#xb00; to &#xb7f; (hexadecimal)

                                                                                                                              ି                                                                                                                                 ୿  


Tamil

Primary languages that use these characters: Badaga, Dravidian, Saurashtra, and Tamil

HTML Display Characters: &#2944; to &#3071; (decimal) / &#xb80; to &#xbff; (hexadecimal)

                                                                                                                              ி                                                                                                                                 ௿  


Telugu

Primary languages that use these characters: Gondi, Lambadi, and Telugu

HTML Display Characters: &#3072; to &#3199; (decimal) / &#xc00; to &#xc7f; (hexadecimal)

                                                                                                                              ి                                                                                                                                 ౿  


Kannada

Primary languages that use these characters: Kannada, Kanarese, and Tulu

HTML Display Characters: &#3200; to &#3327; (decimal) / &#xc80; to &#xcff; (hexadecimal)

                                                                                                                              ಿ                                                                                                                                 ೿  


Malayalam

Primary languages that use these characters: Malayalam

HTML Display Characters: &#3328; to &#3455; (decimal) / &#xd00; to &#xd7f; (hexadecimal)

                                                                                                                              ി                                                                                                                                 ൿ  


Sinhala

Primary languages that use these characters: Pali, Sanskrit, and Sinhala

HTML Display Characters: &#3456; to &#3583; (decimal) / &#xd80; to &#xdff; (hexadecimal)

                                                                                                                              ඿                                                                                                                                 ෿  


Tibetan

Primary languages that use these characters: Dzongkha and Tibetan
In addition, Tibetan is the liturgical language of numerous Buddhist sects.

HTML Display Characters: &#3840; to &#4095; (decimal) / &#xf00; to &#xfff; (hexadecimal)

                                                                                                                              ༿                                                                                                                                 ཿ                                                                                                                                 ྿                                                                                                                                 ࿿  


Limbu

Primary languages that use these characters: Limbu

HTML Display Characters: &#6400; to &#6479; (decimal) / &#x1900; to &#x194f; (hexadecimal)

                                                                                                                              ᤿                                  


Syloti Nagri

HTML Display Characters: &#43008; to &#43055; (decimal) / &#xa800; to &#xa82f; (hexadecimal)

                                                                                               


How Do I Use These Characters in My Writeup?

Please see Unicode or Using Unicode on E2 for a quick tutorial on using these characters in your writeups. You may also find the character you need listed in HTML Symbol Reference. You will notice that each section above has the range of HTML Unicode Values specified to help you. If you would like to look up characters on your own, you can go to http://www.unicode.org/charts. Possibly a quicker way is to click on the character. If there is not a writeup for that character, the Findings:/Create a Node page will show you the HTML unicode value. If there is a writeup for that character, just click the search button again, and when you return to this writeup, you will see the HTML unicode value displayed in the search box.

Log in or register to write something here or to contact authors.