In
cryptography, an "
all-or-nothing transform" (also called a "package transform") is a
randomized unkeyed
reversible transformation (
P'i = f(
Pi )) with the following properties:
- f( Pi ) is easy to calculate;
- f-1( P'i ) is easy to calculate if all bits of P'i are known;
- f-1( P'i ) is difficult to calculate (or even estimate/approximate) if not every bit of P'i is known.
Notice that f() is an unkeyed transformation, which means that (although it can use a
block cipher in its construction) it is not "
encryption"
per se. It is, nevertheless, usually used as a
pre-processing step prior to the use of a keyed
encryption step.
Why?
Cryptanalysis often exploits
known plaintext or
redundancies in the
plaintext (i.e. statistical structure) to deduce information about a cipher's
keystream or key, in order to break this
cipher. Applying an
all-or-nothing transform before encryption effectively randomizes the data (at the cost of a small
expansion in the message size and some computational
overhead), spreading the information you need to recover the original message over
all the message. This generally turns "
partial information about the plaintext" into "no information about the plaintext".
An example would be: imagine someone encrypts two different plaintexts with a
stream cipher using exactly the same key (or key+
IV pair). Classically, an attacker can just
XOR together the two
ciphertexts, effectively removing the
keystream and obtaining the two plaintexts
XORed together; this enables the attacker to eventually obtain the two separate plaintexts, due to assumptions that can be made regarding the statistical structure of the plaintext (e.g. it mostly consists of
spaces and letters). On the other hand, if someone first applies an
all-or-nothing transform to the two plaintexts before encryption with the stream cipher (using again, the same key), the attacker can no longer recover any of the plaintexts, as it's not possible to invert two all-or-nothing transforms if you only have the two transformed plaintexts XORed together (since they are effectively random). This means that an
all-or-nothing transform increases the
resistance of a cipher against
statistical and
known plaintext attacks: even a cipher with
certificational weaknesses can be considered pretty safe, if you only use it to encrypt data that has been processed with an all-or-nothing transform.
It should be noted, for example, that some
public key cryptography protocols are only secure under the assumption that you only encrypt random data (e.g. a randomly-generated
AES session key) with it, as direct encryption of a (redundant/non-random) plaintext might leak information about the
private key. In these types of contexts, it is also useful to apply an all-or-nothing transform prior to the encryption (like
OAEP, which is usually used with
RSA), to prevent leaking information about the private key.
A final example of where an all-or-nothing transform would be useful is a situation where you want to encrypt a plaintext with two different
stream ciphers (Cipher1 and Cipher2), using two distinct keys:
Cipher1( Cipher2(
plaintext ) ) = (
plaintext ⊕ Keystream2)
⊕ Keystream1 = (
plaintext ⊕ Keystream1)
⊕ Keystream2 =
plaintext ⊕ (
Keystream1 ⊕ Keystream2)
As you can see, the problem is that, since
XOR is
commutative, the attacker doesn't have to decrypt the ciphertext by the same order you encrypted it. In fact, the attacker doesn't even have to attack the two ciphers: (s)he can attack an equivalent cipher that outputs
Keystream1 ⊕ Keystream2 as its
keystream. On the other hand, if you apply an all-or-nothing transform between the two encryptions, they are no longer
commutative, and the attacker now has to "peel off" Cipher1 before he can undo the all-or-nothing transform and "peel off" Cipher2.
How?
An example of an all-or-nothing transform (using
AES-256 as block cipher and
SHA-256 as hash function) would be something like:
- Take your message and apply an appropriate padding scheme (PKCS7 is cool, but anything decent is ok), so that its length is an integer multiple of 256 bits;
- Choose a random 256-bit key (K) and encrypt each of the N 256-bit blocks (Pi) with that key using a slightly modified ECB mode:
P'i = AES-256-encrypt( K, Pi ⊕ i ) (for 0 ≤ i ≤ N-1)
Here, the counter prevents equal blocks from encrypting to the same thing, by making the output of the encryption function depend on block position. Encrypting the message with a random key effectively whitens (i.e. randomizes) the message. Note that this would work probably just as well with any other encryption mode, such as plain ECB or CBC.
-
Now, perform the following calculations:
H0 = SHA-256( P'0 )
Hi = SHA-256( P'i || Hi-1 || i ) (for 1 ≤ i ≤ N-1)
-
Then calculate this single 256-bit word:
J = K ⊕ H0 ⊕ H1 ⊕ H2 ⊕ ... ⊕ HN-1
-
And, finally, the "packaged" message is just the randomly encrypted blocks concatenated with J:
P'0 || P'1 || P'2 || ... || P'N-1 || J
Notice that each possible
plaintext maps to
2256 different (expanded) messages, depending on the key you choose. Also, note that the message is effectively randomized, as every element of
P'i has been encrypted under a
random key and element
J results from the
XOR of a randomly chosen number (
K) with several hashes obtained from the (randomized)
P'i blocks. By "randomized", I mean that, even if someone knows
P0 (or any other block) with 100%
certainty, they still cannot
predict P'0, J or any
P'i. Also, even if they know
P'0, they cannot recover
P0 unless they know
J and every other element of
P'i.
To "unpackage" an
incoming transformed message (
Zi, with a length of M blocks), the recipient only has to perform the following steps:
- Calculate the H array:
H0 = SHA-256( Z0 )
Hi = SHA-256( Zi || Hi-1 || i ) (for 1 ≤ i ≤ M-2)
-
Calculate K from J (i.e. the last block in Zi):
J = ZM-1
K = J ⊕ H0 ⊕ H1 ⊕ H2 ⊕ ... ⊕ HN-1
-
Now just decrypt the blocks using K, to get the original plaintext:
Pi = AES-256-decrypt( K, Zi ) ⊕ i (for 0 ≤ i ≤ M-2)
As you can see, as long as someone knows all the bits of the "package", it's trivial to
reverse this transform. On the other hand, to show how this transformation is "all-or-nothing", let's see what happens when only
partial information is available:
-
Imagine you encrypt the first 256-bits of Zi (corresponding to P'0) with some key (L): someone who doesn't know L cannot calculate or estimate H0 (or any of the elements of Hi, actually, due to chaining), so (s)he cannot obtain the random key (K) used for the all-or-nothing transform. Also, note that, since only one block was encrypted with L and that block was randomized to begin with, mounting any type of cryptanalytic attack to recover L becomes virtually impossible (you can't really attack a cipher after having looked at only one output);
-
Imagine you encrypt the last 256-bits of Zi (corresponding to J) with some key (L): some attacker can now calculate the whole H array, but (for the same reason as above) it's still impossible to obtain K, since you need to know J. Again, given that you're only encrypting one block with L and that the block is randomized to begin with, it becomes virtually impossible to mount a cryptanalytic attack to recover L;
This extends to any block: if you
encrypt or
obfuscate any single block of the "package", it becomes
computationally infeasible to recover
K and undo the all-or-nothing transform. Of course, in a real application (unless there are specific
constraints), you would probably apply a keyed encryption step to the whole expanded message (rather than to a single block), effectively increasing the
robustness of the keyed encryption step against cryptanalysis (particularly against "
known plaintext attacks").
Finally, it should be said that, from a purely
academic point-of-view, a type of transform like this is not very
efficient, as it requires an additional encryption and
hashing step per block and results in an expanded message. On the other hand, it's also true that the expansion rate is small for large ciphertexts (and tends to 0% as the ciphertext size grows to
infinity), so it's
negligible when you're encrypting
bulk data and computational
overhead is mostly negligible for small ciphertexts (particularly in a world where people use stuff like
scrypt and
bcrypt). Besides, if it's data you're not likely to decrypt every day, spending a handful of seconds more while decrypting is not the end of the world.
So... yeah... if you're thinking of encrypting data you're not going to touch for a while, the correct order of steps should be:
- compression;
- all-or-nothing transform;
- encryption;
- forward error correction.
Just saying...
References