History and Use of the Diaeresis
or, More Than You Ever Wanted to Know About Two Silly Dots Above a Vowel
The concept of diaeresis comes from Latin verse.
A verse of traditional dactylic hexameter consists of six metrons
(or metra to be pedantic), where each metron is a single dactyl,
ie. trisyllabic foot. For example, the first line of the Aeneid is:
arma virumque cano, Troiae qui primus aboris
which, when divided into metra, becomes:
arma vi | rumque ca | no, Troi | ae qui | primus ab | oris
As you can see, metra are marked off without regard to the
underlying words, but sometimes they coincide, as in the boundary
between the fourth and fifth metra above. Unlike a caesura
, which is an
intentional pause on a word boundary within a metron, this is mere
coincidence and should be (almost) ignored when reciting. The mark for this
is the original diaeresis
(Greek for division
and it was denoted with two vertical lines, like this:
arma vi | rumque ca | no, Troi | ae qui || primus ab | oris
Romance Usage (Trema)
Skipping ahead a few millennia, this notation shrunk down into two little dots placed on the second character,
also called the trema,
and was borrowed into French and most other Romance languages
for separating digraphs (multiple letters that form only one sound)
and mere adjacent
vowels, as in naïve, which is na-eve but would be read nayve
without the diaeresis.
English, in turn, borrowed this from French. However, in English the popularity of the diaeresis
has waned during the last 100 years (why bother when English spelling is so
screwy anyway?) and these days only a few bastions of "proper" style, most notably
The New Yorker, still insist on the coöperation of their authors.
Since this use of diaeresis
does not modify the sound of the letter it is on, it is left out if the word is
hyphenated in the same place.
There is one other subtly different usage, not known in English but found in (at least?) French and Spanish: if a diaeresis is placed on the first vowel of a digraph, as in French aigüe or Spanish antigüedad, it indicates that the vowel in question is sounded, not silent. (In French, this is the result of a recent language reform and many people still place the diaeresis on the second vowel.)
Germanic Usage (Umlaut)
And that's where the story ends for English (and Webster 1913), but the
diaeresis has acquired
many a new use in other languages, most notably as a representation for umlaut
in Germanic languages (eg. German, Swedish, Norwegian, Icelandic) and Celtic. The original glyph for umlaut was a little e atop the modified vowel, but this eventually morphed to look exactly like the diaeresis.
To further confuse things, German vowels with a diaeresis
on top were lifted wholesale into the orthographies of many
(eg. Finno-Ugrian, Turkic, Albanian and even Chinese pinyin),
which do not
have the grammatical concept of umlaut. The unfortunate result is that many of the
speakers of these languages speak of e.g. "umlaut a" when they really mean
"diaeresis a", as can be seen from the utter confusion in the node by that name.
(This usage is so widespread that it can no longer be called incorrect,
it's just highly confusing.)
I meant to provide a summary of how ä, ö and ü are "usually" pronounced here,
but after some thought came to the conclusion that the whole situation is just
way too big a mess to summarize usefully, so please consult the individual
entries for the characters or languages you are interested in instead. (ÿ is an especially odd little character.)
Hungarian deserves a special mention though, since it distinguishes between
a short 'ö/ü' and a long 'ö/ü' by diagonally stretching the dots into something
resembling a quotation mark, resulting in ő and
ű! Unicode terms this the U+030B COMBINING
DOUBLE ACUTE ACCENT ( ̋) but don't be fooled, it's a diaeresis in disguise.
One last note of minor interest: whereas the Romance diaeresis is only punctuation and accented characters are usually not considered letters, characters with a Germanic diaeresis are almost always dealt with as separate letters of the alphabet when sorting and alphabetizing.
In many Slavic languages, the diaeresis is used in the character 'ë', the glyph
for which is essentially identical in its Roman and Cyrillic representations,
although Unicode separates them into U+0451 CYRILLIC SMALL LETTER IO
(ё) and U+00EB LATIN SMALL LETTER E WITH DIAERESIS (ë).
In Russian this is read as "o" or "io" depending on its position within a word.
The International Phonetic Alphabet (IPA) has its own meanings for the diaeresis,
which I happen to think are highly bogus. At any rate, in the IPA world,
a diaeresis above a vowel
means "centralized" and a diaeresis below (!) a vowel means "breathy voiced".
These probably confuse even linguists.
English speakers tend to think both that the diaeresis looks funny and that it can be
plunked down anywhere in a word, resulting in names like Motörhead and the wonderfully
perverse Häagen-Dazs. Nöt müch Ï cän säy äböüt thät, nöw ïs thërë?
Incidentally, in case you're wondering why both the examples above are impossible, read up on vowel harmony.
The diaeresis can be found in Unicode as U+0308 COMBINING DIAERESIS ( ̈)
and also within the Latin-1 supplement as U+00A8 DIAERESIS (Latin-1 ¨, Unicode ¨).
Ideally, all accented characters should be formed by using the combining
diaeresis printed on top of the unaccented vowel, but for historical reasons
Unicode includes all of ISO-8859-1 and its vast multitude of precomposed characters
like U+00FC LATIN SMALL LETTER U WITH DIAERESIS (Latin-1 ü, Unicode ü).
This is a kludge, but a necessary one for time being.
For HTML character entities, the diaeresis is systematically called an
umlaut, or rather just "uml", as in ¨ (¨) and
Typographically, the Romance diaeresis is often denoted with smaller, lighter
dots than the other types. Unicode, however, does not make this distinction.
Obscure diaeretic bugs in E2
You can't use a single high-ASCII character like ä as a
node title, you have to use an HTML character entity like ä instead.
And thanks to Albert Herring, Gritchka, thbz, Tiefling and tres equis for corrections.
- years of personal experience battling with software over the issue