The Velthius scheme is a method of encoding transliterated text that includes non-standard characters and accents into ASCII text. The Velthius scheme is named for Frans Velthius, who developed it in 1991 for use with a Devanagari font that he had designed for the TeX system. As it was created for use with Devanagari, the Velthius scheme is well suited to the encoding of Indic languages, such as Sanskrit, Hindi, and Pali. However, because these languages are typically transliterated using the Latin-1 set, the Velthius scheme can also be used to cover diacritics used in French, Spanish, and most other languages Latin-based alphabets.
Because of its simplicity and accuracy, the Velthius scheme quickly became a near standard for online discussions among scholars of Pali and Sanskrit. It is also used in some online publications, such as the Journal of Buddhist Ethics, which employs the scheme for its online Tipitaka project.
The system is quite easy to learn, and is also easy to incorporate into a script or macro for use with text processing. Its simplicity hinges on the fact that you need only know how a character is typeset in order to rewrite it in ASCII; there is no need to try and figure out what sort of a vowel or consonant each marking indicates. This makes moving from a typset, Roman-script text (such as those published by the Pali Text Society) to an ASCII-encoded file very simple to perform either manually or with a script.
There are essentially two rules in the Velthius scheme:
- Long vowels are represented by doubled short vowels.
- Diacritic marks precede the consonant that they affect.
Rule one means that when you see a long vowel (almost always typeset with a macron), such as 'ā', you write it down by doubling the basic vowel, like 'aa'.
Rule two means that to indicate that a consonant has a diacritic above or below it, you simply put an indicator before the character. The indicators are as follows:
- Consonants typeset with a dot beneath them, usually Retroflex/cerebral consonants such as ṭ, are rewritten using the period key, like '.t'
- Consonants typeset with a dot above them, often guttural nasals like ṅ, are rewritten using the double quotation mark, like '"n'
- Grave accents use the backtick character, like '`a'
- Acute accents use the single quote, like ''a'.
- Circumflex accents use the carrot, like '^o'
- Finally, consonants usually typeset with a tilde above them, such as ñ, is written with the tilde before the character, as '~n'
That's it. The same scheme can be easily extended to cover most other diacritics from the Latin character set that might be encountered, but the rules above suffice for writing down most Indic languages (as well as a number of others); I have not seen representations for breve, diaresis, or any of the somewhat more obscure accents in the Velthius scheme (which is not to say that there not out there somewhere, or that it would be difficult to define a convention for including them)
As an example, we'll use the Pali verse found at the start of most of the Five Precepts of Buddhism. Omitting any diacritical marks, it appears:
panatipata veramani sikkha-padam samadiyami
Putting in the diacritics (using Unicode) gives:
pāṇātipātā veramaṇī sikkhā-padaṃ samādiyāmi
Which is rather a mess to produce by hand; there are four different hex codes to remember, and even in a properly configured browser, the result is somewhat ugly. However, applying the Velthius scheme gives us:
paa.naatipaataa verama.nii sikkhaa-pada.m samaadiyaami
Which properly conveys all of the diacritic information, without requiring any hex lookups, or rendering the passage unreadable to non-Unicode viewers.