display | more...

Unicode is a unified system for encoding writing of all types. This includes all the world's major alphabets, various handy sets of symbols and the thousands of Chinese characters (漢字, pronounced hanzi in Mandarin Chinese, kanji in Japanese and hanja in Korean). Besides providing the elegance of a unified framework for writing from different scripts, Unicode also makes things much easier for people viewing a page to see any foreign writing you try to include by doing away with the need for the browser to know what kind of encoding is being used for the current page. This is especially useful on a site like E2, where the content is overwhelmingly in English and therefore almost everyone is going to have their encoding set to Western European; people who want to see content encoded with Japanese fonts and so on can do so if they change their Encoding setting, but Internet Explorer, at least, won't do this automatically on E2 and if you leave your browser set to Chinese or Japanese encoding all the time the browser will occasionally misinterpret strings of English letters and punctuation as chunks of hanji. Please note that whatever the encoding, you will not be able to see Chinese and Japanese characters if you do not have the correct font(s) installed; see how to read Japanese characters in E2 for advice on this. If you can see this correctly - - then you should be alright.

So, if you want to enhance your writeups on topics relating to China, Japan and so on by including the original hanzi in Unicode, how should you do it? Well, if you have the text in some other encoding, you can use Java's Unicode-converting tools to get numbers for the characters; see J2U and Using Unicode on E2.

If you don't already have the text on your computer you may find www.zhongwen.com helpful, with its facilities for searching for characters by English equivalent, by Pinyin romanisation or by radical. When you click on a character it should appear in a box at the top right, with the origins of the pictogram explained, the main meanings of the character and a list of common compound words it appears in.

All of this is interesting and useful, but for the purpose of Unicode the important bit is the big + sign at the top of the box. If you click the plus sign you'll get to a page with links to the character in assorted online dictionaries, and so on; but like a pipelink, you don't have to click on it to extract useful information. Hover the mouse cursor over the sign, and (if you're using a browser that does this) you should see that its link goes to a page called something like http://zhongwen.com/cgi-bin/zi3.cgi?uni=9053 - the four-digit hexadecimal number at the end here is what you are after. Take that number and insert it into your writeup at the relevant point, prefixed by &#x and suffixed by a semicolon (;), and there you have it - you should have something like &#x9053;, which shows up like this: 道. By default the characters appear the same size as English letters - which of course is rather small for Chinese - so you might also like to surround the hanzi by a lot of <big> tags to make it look more impressive and easier to see, like this:

One final note: Chinese traditionally reads top to bottom, right to left. Nobody's going to jump on you if you write left to right, top to bottom instead, but if you want to format lines of Chinese faithfully you will need to use <pre> tags.

Log in or register to write something here or to contact authors.