can be used to break any substitution cipher
s, even if the key
is arbitrarily long, provided you can assume
some knowledge of the language
of the plaintext
and of the key
. You need a computer, but the program
is fast and easy; the harder part is knowing detailed facts about the language.
You could translate the German text of Das Kapital using the French text of Madame Bovary, and the same principles apply, but for ease of exposition I'll stick to English. Here the kind of knowledge you need is that the commonest word is THE, and this occurs in other common words such as THEN, WHETHER, OTHER, and that TH and WH are very common sequences, and so on.
In English text E has a frequency of about 13%, T of 9%, and so on down in a characteristic pattern. Any text QLPZJHWSNA... exhibiting this pattern is monoalphabetic and is crackable instantly. As Xamot has explained above, an n-letter keyword will create ciphertext that exhibits this same characteristic pattern at precisely every nth place, wherever you start counting from. A good example is at the back of Alan Garner's Red Shift: because it also preserves spacing and punctuation, it can be solved by hand. But in general you just feed it to a computer and test spacings of successive n until you hit the magic pattern. This is, as Xamot indicated, pretty trivial for any key length << the text length.
That depends on the encryption patterns being cyclically re-used. A one-time pad has no cyclic repetition. A random one-time-pad is totally uncrackable, unless you find the piece of paper it's written on. In fact it's pointless attacking the cipher: you might as well capture the agent who's already decrypted it and torture them until they tell you what it said.
The interesting middle position is a non-random one-time pad. This is crackable by the CIA and Echelon and Mossad, it's crackable by dozens of the people contributing to E2, if they devoted a bit of computer time to it and could be bothered, but it does resist attack by the ten-line program that solves cyclic keys. It keeps your kid sister out and the beauty and temptation is that it's so easy to do, no hard typing or programming.
The easiest way to get hold of an arbitrarily long key is to use a publicly available text. Chapter One of Pride and Prejudice is a terrible choice, because once someone thinks you're doing this, the first keys they're going to try are IT IS A TRUTH UNIVERSALLY ACKNOWLEDGED, followed by CALL ME ISHMAEL and IN A HOLE IN THE GROUND THERE LIVED A HOBBIT and any novels or lyrics they know mean something to you. But for the purposes of illustration let's say we're using Pride and Prejudice.
The commonest word, in fact the commonest three-letter sequence, in Pride and Prejudice, is THE. This is also the commonest three-letter sequence in the plaintext. The T-encryption of THE is MAX; the H-encryption is AOL, and the E-encryption is XLI. (See Vigenère cipher for the table.) So in the encrypted text the sequences MAX, AOL, and XLI are going to occur equally probably, and with far higher probability than the 1 in 26^3 expected by chance. These will stand out. You will also get significantly higher than random occurrences of the three THE-encryptions for the HEN of WHEN and THEN, the ILL of WILL, and for ING and AND and WHO and FOR.
If you try to decrypt according to these second-order frequencies, gradually you will build up simultaneous pictures of at least the grammar of both the keytext and the plaintext. if you're the one trying to keep the secret, I suppose picking a Chinese web page and using the hexadecimal image of that as your one-time pad would make it a lot harder for frequency analysis to gain any leverage.