Everything Request for Comments: Kanji Write-Ups

Header:

Target date of implementation: 03/31/2001
Status: Working Draft

Intent:

To standardize the form of write-ups when entering in definitions of Japanese Kanji characters with the goal in mind of creating a comprehensive on-line Kanji dictionary within the framework of e2.

Specification:

Node Titles:

After much debate. It was decided that node titles be simplified to the best one-word English meaning for the character to better integrate the kanji into the "nodegel". This, noding "ICHI" would be placed under the node for one.

Node Content:

Ideally, an e2 Kanji node would provide the following information:

  1. A nice title.
  2. A listing of all on-yomi and kun-yomi reading(s).
  3. Nanori readings.
  4. English definition(s) for the character.
  5. The etymology of the character.
  6. Other index numbers this character can be found under.
  7. An EUC encoded version of the character.
  8. EUC encoded Compound word examples.
  9. An ASCII-art representation of the character.
  10. Other interesting facts and experiences in reference to the character.

This is a lot of detail to provide, so it will not be expected that one person fills in all of this information. However, it should be looked on as an opportunity for write-up hungry e2 users to collectively flex their typing muscles to contribute.

Examples of These Sections for the Character "ICHI" (one):

A Nice Title:

ICHI ITSU hito (one)

A listing of all on-yomi and kun-yomi readings:

on-yomi: ICHI ITSU
kun-yomi: hito hitotsu

Nanori Readings:

Nanori: kazu i itu iru katsu ten hajime hi hitotsu makoto

English Definitions:

  1. ITSU: One.
  2. hito: One.
  3. hito(tsu): One unit.
  4. hito-: one unit of.
  5. ichi-: one, a certain; the whole; the same (time); petty; worthless.
  6. -ichi: the best, the first.

EUC Encoded Version:

°ì

EUC Encoded Compound Examples:

°ì¿Í: hitori (a person)
°ì·î: itchigatsu (January)
°ì»Ø: isshi (a finger)

Character Etymology:

A pictograph of a single extended finger.

ASCII Art Representation:


    ##############
      ###############

Other Intesting Facts:

This is often the first Kanji character taught to students as it is the easiest to write and recognize.

Conclusion:

There are still some issues I would like commented upon here before we commit to this standard for Kanji write-ups. Such as:

  • Is this all the information a node needs to be considered complete?


Revision History:

03/02/2001: Completed First Revision
03/19/2001: EUC encoding only (not unicode), and title prefix changed from, "E2KANJI," to, "KANJI." The E2 would be too reduntant. Title is now also to contain the most prominent kun-yomi reading as well.
06/03/2001:1 Added section on namespacing nodes. Added English to the title requirement. Fixed a couple of stupid typos.
06/03/20012: I recant that section, and a bunch of other crap.

Encoding Formats

Check Japanese Character Encoding Formats for a bit on the different encodings that exist for Kanji based input/output systems. My personal preference, and what I recommend for E2 is EUC, which has shown to resist being eaten by the E2 filters. SJIS uses the left bracket in a few combinations, which no doubt raises issues. Unicode, while a good idea and all, isn't terribly universal. It's the least common encoding format on the net, and until they get their act together, we probably won't be using it.

Kanji support:


Netscape, Mozilla, and Internet Explorer all support 3 or more encoding formats, including EUC, JIS, and SJIS, so platform compatibility shouldn't be an issue. Additionally, for Japanese character support in IE, you should download the optional langauge pack, as it includes the font MS Gothic (if you have Win2K and added Japanese text support, you already have that, MS Mincho, and MS UI Gothic).

Writeup Creation and Editing:

A few difficulties will arise should you start messing with encoding formats while creating/editing a writeup inside the browser.
To avoid this, use a seperate editor, either NJStar or my favorite, JWPce (which is opensource, and runs on WINE). By editing in there, and copy/pasting (make sure your output format is correct), you can avoid character set mixups. If you edit an EUC encoded writeup while in EUC mode, and you submit it, you will end up with something totally unlike what you put in there originally. Why this happens, I don't know, but it's safest to be in Wester European (ISO) or Western European (Windows), if you want to edit the node without opening the external editor.

A few more notes on node formatting, and some definitions:

At the start of your node, or at the bottom, list your encoding format, for example:


------------------
EUC Encoded.

Since E2 defaults to the standard ISO character set, the reader will have to manually set the page, this makes it easy, so they won't have to point and guess.

Some definitions, which apply to the previous writeup here:

on-yomi are the chinese reading of the kanji.

kun-yomi is the Japanese reading of the kanji.

nanori are the readings used for names.

All of the above are related solely to the Japanese usage of kanji, as the actual chinese reading of some kanji will differ widely, and Japanese uses all 3 forms of reading.

Kanji are an important part of the Japanese language, and to forget them while creating E2 would leave a serious deficit in its banks of information

I'm going to take a purely English-centric stance on this, because that's the only language I speak and because E2 is and will probably remain an English-centric Web site.

Why not node these under their simple English meanings?

Let's take KANJI: 1281 SHI SU TSU ko (child), for instance. There are three basic problems with a node title like this:

  1. Using "KANJI:" to prefix each and every one of these node titles is awkward at best and problematic at worst. Don't Namespace Your Nodes.
  2. "1281 SHI SU TSU ko" means absolutely nothing to me. What's the number mean? Why is part of it capitalized and part of it not? I have to read the node itself to get some understanding of the title, which defeats the entire point of giving the node a title in the first place.
  3. "(child)" is the only part of the title that has any meaning for me, and it's badly placed. When I search for "child" using the Search tool with "Ignore Exact" checked, this node does not turn up. The parentheses get in the way.

This writeup should, in my opinion, simply have been placed under "child". Of course there will be other writeups above it, and it means someone has to scroll down further to find this one. But I find writeups like this a nice surprise when I'm wandering through the nodespace, not expecting to come across such information. And, again speaking only for myself, I routinely vote them up when I find them that way.

Pick Titles Carefully is practically a mantra on E2, and I think it should be applied here. Don't pick a long, overly-precise node title if it means no one is going to find it except through Random Node jumps. Pick a short, accurate one and allow people to find this information where they don't expect it.

Log in or register to write something here or to contact authors.