How many haiku can there possibly be? Due to their small, rigid form, we should be able to roughly determine the size of the haikuspace. We will use Japanese, as it is the only language suitable for proper haiku.*
* Of course words come to mean many things, but if you're used to reading and writing haiku in English with a 5-7-5 syllable pattern, I highly recommend investigating some Japanese haiku, and writing with something like 3-4-3 (syllables) or 2-3-2 (words) to get a feeling for the Japanese style.
Phonetic Attack
Japanese syllables are generally smaller than syllables in English. They consist of a consonant and a vowel, or a vowel by itself. Here are various estimates on the size of the Japanese sound inventory:
Source |
Count |
Notes |
the fifty sounds, see also i ro ha |
50 |
only the basic sounds of Japanese, and so a lower bound on their total number |
Wikipedia article on hiragana |
102 |
the vowels a/i/u/e/o (5), Ya/Yu/Yo (3), Wa/Wo (2), Da/De/Do (3), K/S/T/N/H/M/R/G/Z/B/P (11) combined with a/i/u/e/o/ya/yu/yo (8), and N by itself (1), for a total of 5+3+2+3+11*8+1 = 102 |
Japanese pronunciation |
113 |
14 consonants * 8 vowels + syllabic n |
The Range of Sounds in Japanese |
133 |
|
JMdict |
172 |
from all kana entries, counting only syllable-characters, see below |
We'll eliminate the 50, as it's clearly a low-boru. A haiku's 5-7-5 pattern is 17 syllables total, and so the upper bound is between 10217 = 14002414191924244276669361796022272 ≈ 1034.146 and 17217 = 100921476901355254279645541839050637312 ≈ 1038.004.
This is still a pretty wide range (about four orders of magnitude, or a factor of 10,000), and the numbers are pretty unfathomable. Here are a few others for comparison. A googol is 10100. There are estimated to be about 1080 atoms in the observable universe. The number of possible positions in chess is fewer than 1046.7. There are about 1026 molecules of water in a gallon of the stuff. But those doesn't really help, do they?
Dictionary Attack
From JMdict, a machine-readable Japanese dictionary containing nearly 160,000 entries, we extract the most common* kanji (ideographic) and kana (syllabic/reading) records from each entry. Syllables are counted by applying the regular expression substitution below, and then taking the length of the resulting string.
* Roughly, determined using JMdict's "priority" markers, otherwise using the first one. (Most entries (92%) have only one anyway.)
Thanks to memoization, it takes mere seconds for these huge permutations to be computed.
Non-syllable-character removal regex:
s/([きしちにひみりぎじびぴ])[ゃゅょ]/\1/g
(Please let me know if there are other characters or cases which do not count as syllables.)
All characters used in JMdict's kana entries: (172 characters)
、〜ぁあぃいうぇえおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゆょよらりるれろゎわゐゑをんゝゞァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロワヰヱヲンヴヶ・ーヽヾ
Using All Kana Entries
Permutations fitting in 5 syllables = 13724842934828
Permutations fitting in 7 syllables = 2495396740987223584
Permutations of 5-7-5 lines = 470061162017233273469657393428518492432749056 ≈ 1044.672154
Using Only Common* Kana Entries
Permutations fitting in 5 syllables = 94865603412
Permutations fitting in 7 syllables = 2411754014092300
Permutations of 5-7-5 lines = 21704538552340125271960104096068971200 ≈ 1037.336551
* as denoted by JMdict's "priority" markers
Using Only Unique Kana Entries
Permutations fitting in 5 syllables = 21007905554
Permutations fitting in 7 syllables = 302428066343444
Permutations of 5-7-5 lines = 133471212337745718580643080665018704 ≈ 1035.125388
Duplicate Kana Entries: 18784 out of 158685 entries.
The duplication is a bit of a wrinkle. It appears (by sifting randomly through duplicates) that the vast majority of duplicate readings are indeed for separate meanings/kanji, and so I am inclined to believe the "all entries" number. The truth is probably somewhere in the middle, but don't forget we've only used one dictionary.
Tangent: I would love to be able to get a number on the phonetic saturation of Japanese from this. Perhaps after some input regarding syllable counting from those more fluent in Japanese. Until then, I'll just say this: if you map kana readings to kanji entries, there are 9377 readings (6.7%) with 2 or more kanji entries, 1161 (.8%) have 5 or more, and 181 (.1%) have 10 or more. Look at that beautiful power law action.
Summary
That was rather blustery, so here's the take-away: haikuspace is huge. Like 1044 huge. On top of that, a phonetic approach doesn't reach a good upper bound, apparently because of homophones, which increase the haikuspace by almost seven(!) orders of magnitude. Some independent confirmation of that would be nice, though.
The next major step in finding a lower upper-bound would be to apply some sort of "sense-making" filter to the poems. This is beyond the scope of this writeup.
Some Random Haiku
A natural consequence of being able to permute all the words of a Japanese dictionary into haiku is being able to generate random haiku. And so here are a few of those that rose slightly above noise. Translations courtesy of mauler!
詰め込む間
ざあざあネオン
酸化物
While I cram
Whooshing neon
Oxide
|
狂暴戸
レッドテープ子
史籍ポロ
Enraged door
Red tape child
A history of polo
|
険悪絵
願掛け火食
公有気
Hostile pictures
Prayer cooked food
Public aspiration
|
孝道子
引ったくり急
穴居人
Michiko Takashi
Sudden snatching
Caveman
|
国花櫛
結論回目
圏外死
National flower comb
Conclusionth
Out of range death
|
代弁課
身の上西部
簾戸葉書
Department of spokesmen
Circumstances western
Bamboo blinds postcard
|
沿海二
心嚢浸す
教唆罪
Coast two
Soak pericardium
Criminal incitement
|
幼児予示
ボンレスハム荷
バラスト医
Infant foreshadowing
A load of boneless ham
Ballast medicine
|
横に頃
民利草規矩
横丁科
That horizontal time
The people's interests, grass rules
Department of alleys
|
投げ入れミ
拒絶滑りい
浸食シ
Throw mi
Rejection slippage i
Erosion shi
|
表立つ
夏枯れ無窮
真鶸説
Stand out
Summer slump eternal
Siskin theory
|
Update 2013 March 22
Having just read this exploration of the size of Twitterspace, it occurred to me that I could use written language entropy as another estimate on the size of haikuspace:
number of haiku = 2(5 + 7 + 5) * b
where b is the number of bits per character for Japanese. I'm going to use 2.4 (= 452337 * 8 / 1519224) from this paper (html version via google). This gives 240.8 ≈ 1012.3 haiku, a little more than a bit shy (as expected) of my previous estimate of 1044.
See Also
kigo—season word
senryu, tanka, renga, waka—other haiku-like forms