In cryptography, an "all-or-nothing transform" (also called a "package transform") is a randomized unkeyed reversible transformation (P'i = f( Pi )) with the following properties:
  • f( Pi ) is easy to calculate;
  • f-1( P'i ) is easy to calculate if all bits of P'i are known;
  • f-1( P'i ) is difficult to calculate (or even estimate/approximate) if not every bit of P'i is known.
Notice that f() is an unkeyed transformation, which means that (although it can use a block cipher in its construction) it is not "encryption" per se. It is, nevertheless, usually used as a pre-processing step prior to the use of a keyed encryption step.

Why?

Cryptanalysis often exploits known plaintext or redundancies in the plaintext (i.e. statistical structure) to deduce information about a cipher's keystream or key, in order to break this cipher. Applying an all-or-nothing transform before encryption effectively randomizes the data (at the cost of a small expansion in the message size and some computational overhead), spreading the information you need to recover the original message over all the message. This generally turns "partial information about the plaintext" into "no information about the plaintext".

An example would be: imagine someone encrypts two different plaintexts with a stream cipher using exactly the same key (or key+IV pair). Classically, an attacker can just XOR together the two ciphertexts, effectively removing the keystream and obtaining the two plaintexts XORed together; this enables the attacker to eventually obtain the two separate plaintexts, due to assumptions that can be made regarding the statistical structure of the plaintext (e.g. it mostly consists of spaces and letters). On the other hand, if someone first applies an all-or-nothing transform to the two plaintexts before encryption with the stream cipher (using again, the same key), the attacker can no longer recover any of the plaintexts, as it's not possible to invert two all-or-nothing transforms if you only have the two transformed plaintexts XORed together (since they are effectively random). This means that an all-or-nothing transform increases the resistance of a cipher against statistical and known plaintext attacks: even a cipher with certificational weaknesses can be considered pretty safe, if you only use it to encrypt data that has been processed with an all-or-nothing transform.

It should be noted, for example, that some public key cryptography protocols are only secure under the assumption that you only encrypt random data (e.g. a randomly-generated AES session key) with it, as direct encryption of a (redundant/non-random) plaintext might leak information about the private key. In these types of contexts, it is also useful to apply an all-or-nothing transform prior to the encryption (like OAEP, which is usually used with RSA), to prevent leaking information about the private key.

A final example of where an all-or-nothing transform would be useful is a situation where you want to encrypt a plaintext with two different stream ciphers (Cipher1 and Cipher2), using two distinct keys:

Cipher1( Cipher2( plaintext ) ) = (plaintext Keystream2) Keystream1 = (plaintext Keystream1) Keystream2 = plaintext (Keystream1 Keystream2)

As you can see, the problem is that, since XOR is commutative, the attacker doesn't have to decrypt the ciphertext by the same order you encrypted it. In fact, the attacker doesn't even have to attack the two ciphers: (s)he can attack an equivalent cipher that outputs Keystream1 Keystream2 as its keystream. On the other hand, if you apply an all-or-nothing transform between the two encryptions, they are no longer commutative, and the attacker now has to "peel off" Cipher1 before he can undo the all-or-nothing transform and "peel off" Cipher2.

How?

An example of an all-or-nothing transform (using AES-256 as block cipher and SHA-256 as hash function) would be something like:
  1. Take your message and apply an appropriate padding scheme (PKCS7 is cool, but anything decent is ok), so that its length is an integer multiple of 256 bits;

  2. Choose a random 256-bit key (K) and encrypt each of the N 256-bit blocks (Pi) with that key using a slightly modified ECB mode:

    P'i = AES-256-encrypt( K, Pi i )   (for 0 ≤ iN-1)

    Here, the counter prevents equal blocks from encrypting to the same thing, by making the output of the encryption function depend on block position. Encrypting the message with a random key effectively whitens (i.e. randomizes) the message. Note that this would work probably just as well with any other encryption mode, such as plain ECB or CBC.

  3. Now, perform the following calculations:

    H0 = SHA-256( P'0 )
    Hi = SHA-256( P'i || Hi-1 || i )   (for 1 ≤ iN-1)
    (note: || means concatenation)
  4. Then calculate this single 256-bit word:

    J = K H0 H1 H2 ... HN-1

  5. And, finally, the "packaged" message is just the randomly encrypted blocks concatenated with J:

    P'0 || P'1 || P'2 || ... || P'N-1 || J

Notice that each possible plaintext maps to 2256 different (expanded) messages, depending on the key you choose. Also, note that the message is effectively randomized, as every element of P'i has been encrypted under a random key and element J results from the XOR of a randomly chosen number (K) with several hashes obtained from the (randomized) P'i blocks. By "randomized", I mean that, even if someone knows P0 (or any other block) with 100% certainty, they still cannot predict P'0, J or any P'i. Also, even if they know P'0, they cannot recover P0 unless they know J and every other element of P'i.


To "unpackage" an incoming transformed message (Zi, with a length of M blocks), the recipient only has to perform the following steps:
  1. Calculate the H array:

    H0 = SHA-256( Z0 )
    Hi = SHA-256( Zi || Hi-1 || i )   (for 1 ≤ iM-2)

  2. Calculate K from J (i.e. the last block in Zi):

    J = ZM-1
    K = J H0 H1 H2 ... HN-1

  3. Now just decrypt the blocks using K, to get the original plaintext:

    Pi = AES-256-decrypt( K, Zi ) i   (for 0 ≤ iM-2)

As you can see, as long as someone knows all the bits of the "package", it's trivial to reverse this transform. On the other hand, to show how this transformation is "all-or-nothing", let's see what happens when only partial information is available:
  • Imagine you encrypt the first 256-bits of Zi (corresponding to P'0) with some key (L): someone who doesn't know L cannot calculate or estimate H0 (or any of the elements of Hi, actually, due to chaining), so (s)he cannot obtain the random key (K) used for the all-or-nothing transform. Also, note that, since only one block was encrypted with L and that block was randomized to begin with, mounting any type of cryptanalytic attack to recover L becomes virtually impossible (you can't really attack a cipher after having looked at only one output);

  • Imagine you encrypt the last 256-bits of Zi (corresponding to J) with some key (L): some attacker can now calculate the whole H array, but (for the same reason as above) it's still impossible to obtain K, since you need to know J. Again, given that you're only encrypting one block with L and that the block is randomized to begin with, it becomes virtually impossible to mount a cryptanalytic attack to recover L;
This extends to any block: if you encrypt or obfuscate any single block of the "package", it becomes computationally infeasible to recover K and undo the all-or-nothing transform. Of course, in a real application (unless there are specific constraints), you would probably apply a keyed encryption step to the whole expanded message (rather than to a single block), effectively increasing the robustness of the keyed encryption step against cryptanalysis (particularly against "known plaintext attacks").


Finally, it should be said that, from a purely academic point-of-view, a type of transform like this is not very efficient, as it requires an additional encryption and hashing step per block and results in an expanded message. On the other hand, it's also true that the expansion rate is small for large ciphertexts (and tends to 0% as the ciphertext size grows to infinity), so it's negligible when you're encrypting bulk data and computational overhead is mostly negligible for small ciphertexts (particularly in a world where people use stuff like scrypt and bcrypt). Besides, if it's data you're not likely to decrypt every day, spending a handful of seconds more while decrypting is not the end of the world.


So... yeah... if you're thinking of encrypting data you're not going to touch for a while, the correct order of steps should be:
  1. compression;
  2. all-or-nothing transform;
  3. encryption;
  4. forward error correction.


Just saying...



References