Psst -- they have white space outside the Western world too, you know! Let's slowly work our way east...

Semitic

Arabic

The Arabic script is, odd as it may seem, actually a distant relative of our familiar Roman letters, and its rules of white space and punctuation have cross-fertilized with the Greek and Roman styles of writing described above. There are still quite a few differences, mind you!

Arabic is a cursive script, meaning that its letters flow together, and as in the West it uses white space to separate words. Arabic script has no case, omits weak vowels and is written from right to left. However, not all Arabic letters are made equal: certain letters, such as A ( alif ا ), are always followed by white space, even in the middle of a word! Thus, "God is Greatest" -- allahu akbar -- is actually written

الله اكبر
rbk a hll a
To the uninitiated this can be pretty confusing, as for example the only difference between an initial L (laam ل ) and an initial A is that the word may continue after an L, but not an A. Thus, in order to differentiate a L from an A at the end of a word, there is a special final L with a hook at the end. The amount of white space is also often wider between words than within words, but especially with more decorative scripts this alone would not be sufficient.

White space between sentences and paragraphs, on the other hand, is largely unknown in Classical Arabic, as best typified (and still retained) in the Qur'an. The end result is very much like the Medieval writing described by Cletus, except that Qur'anic Arabic has a much wider repertoire of punctuation to insert into the solid block of text. The Western pilcrow (¶) is replaced with a circle-shaped marker ( ۝ ) for the end of a verse (ayah) and another star-shaped marker ( ۞ ) for the end of a chapter (rub el-hizb). Western commas, semicolons and dashes are replaced by drawing little superscript Arabic letters, eg. a meem means a pause is obligatory, jeem means recommended but not required, saad is not recommended but possible, etc. This system is pretty opaque without extensive study, but it does add to the hypnotic beauty of written Qur'anic verse.

Modern Arabic, on the other hand, uses slightly modified but familiar versions of Western punctuation symbols. The period is still eschewed in Arabic itself, with a wider stretch of white space substituting, but Urdu (which is written with the Arabic script) uses the period. As white space thus acquires syntactic meaning, the preferred means of justification is to stretch the "bar" (tatweel) connecting the characters: بيت and بيـــــت are exactly the same word!

One last tidbit: the mathematical three-dot "therefore" symbol ∴ is originally from Arabic, where it is yet another Qur'anic symbol known as the muanqah and meaning that the word thus marked "therefore" continues from the previous word.

Don't worry too much if you got a few question marks above, most browsers can't quite handle Qur'anic Unicode yet...

Hebrew

Pretty much the same pattern repeats with Hebrew, which is also derived from the same Canaanite scripts as Arabic and Roman. Classical Hebrew, namely the Torah, has its own system of punctuation, but modern Hebrew is written with Western punctuation and formatting, Unlike Arabic, Hebrew is a block script and there are no funky inter-word white space rules.

Greek

Greek and its many, many relatives and lookalikes like Armenian, Cyrillic, Ethiopic, Georgian all employ modern Western white space and punctuation rules. Yes, this is a broad generalization and there are many tiny variances, drop me a note if you know of something really wacky.

Indic

Devanagari

Devanagari, the script used to write Hindi and many other Indian languages, is a left-to-right joined script much like Arabic, except that the letters in a word are always joined by a bar and the rules for ligatures are very, very complex. Words are separated by white space, sentences by a character called danda and verses with a double danda. Modern usage often substitutes the full set of Western punctuation.

Thai

The Thai script and its close cousins Lao and Myanmar are derived from Devanagari, but they do not separate words at all! White space is only used to separate sentences. Other Western punctuation like the exclamation point and quotation marks are used in modern Thai.

CJK

Chinese

The Chinese, on the other hand, had a complete system of writing at the time the Egyptians were still doodling hieroglyphs on pyramid walls. In an ideographic writing system like Chinese each character essentially represents one concept or "word" -- yes, this is a simplification, but it will have to do -- so words are already separate from each with no need for additional white space. And indeed, for a very long time Chinese was written with no white space or punctuation to speak of: text went from top to bottom in rows marching from right to left, leaving the sentences for the reader to figure out,
      ikok   951    海森
      sefi    62    之林
       tln    73    恋是
       hid    84    人大
Although for short poems and lists line breaks were often inserted at the end of each verse or item, improving readability somewhat. This classical style is still used for things like poetry and Buddhist sutras, which can thus be a royal pain to read since the characters used and their meanings have also tended to change over the millennia... but I digress.

Eventually, in China too the Western punctuation system crept in, once again with a few changes. To prevent confusion with the dots and curlies of the characters themselves, the period became a little circle "。" and the comma shifted direction and became a lot longer, "、".

Japanese

Japanese went the Chinese route and, despite the adoption of its own two kana syllabaries for phonetic writing, never saw the need to adopt white space. In a sentence like 俺が猫を食った, "I ate the cat", the content of the sentence is in the Chinese characters -- 俺 猫 食 -- and the kana syllables -- が を った -- sort out their relationship. This is considerably clearer than Chinese, where you have to rely on word order to figure out whether a particular character is acting as a noun, adjective or verb, and this is in fact one of the rare upsides of the otherwise hideously convoluted Japanese writing system.

After World War II and Japan's almost-wholesale embrace of things and ways Western, the Education Ministry decided to start writing Japanese in horizontal rows from left to right. (This had of course been practiced earlier as well on short texts like signs, but there had been no consensus about the right direction!) However, while Japanese school textbooks are to this day written Western-style, nearly all newspapers, magazines and books retain the old top-to-bottom formatting.

One last quirk: due to the similarity of the Western quote " and the voiced-sound indicator dakuten ゛, Japanese uses its own quotation marks, 「like this」, instead of the Western ones. These are not found in Chinese.

Korean

And Korean outweirds everybody with its Hangul system of writing, which involves packing little kana-like phonetic signs into square boxes. Each Hangul composed character is one syllable and consists of an optional initial consonant, a medial vowel and an optional final consonant (or two). If there is no final consonant, it can be simply omitted, but a missing initial must be indicated by drawing a circle ᄋ, which thus acts as visible white space -- "Hey! There's nothing here!". There is much more to Hanguk than this, but this probably isn't the right place to get into it...

Korean was formerly written in Chinese characters with Chinese white space rules (or lack thereof). In modern Hangul space is used to separate both words and sentences, and once again Western punctuation is in common use.

Summary

So in all, while the majority of the world appears less than convinced about the merits of the Roman alphabet, nearly the entire planet has adopted Western rules of white space and punctuation. The exclamation mark and question mark are effectively universal and the comma, period and quotation mark are only slightly less so. Every modern script that I know of uses white space to separate its sentences, and many -- albeit far from all -- also use it between their words.

And thanks fly out to the Unicode Consortium for making this writeup possible.