In
cryptography, an "
allornothing transform" (also called a "package transform") is a
randomized unkeyed
reversible transformation (
P'_{i} = f(
P_{i} )) with the following properties:
 f( P_{i} ) is easy to calculate;
 f^{1}( P'_{i} ) is easy to calculate if all bits of P'_{i} are known;
 f^{1}( P'_{i} ) is difficult to calculate (or even estimate/approximate) if not every bit of P'_{i} is known.
Notice that f() is an unkeyed transformation, which means that (although it can use a
block cipher in its construction) it is not "
encryption"
per se. It is, nevertheless, usually used as a
preprocessing step prior to the use of a keyed
encryption step.
Why?
Cryptanalysis often exploits
known plaintext or
redundancies in the
plaintext (i.e. statistical structure) to deduce information about a cipher's
keystream or key, in order to break this
cipher. Applying an
allornothing transform before encryption effectively randomizes the data (at the cost of a small
expansion in the message size and some computational
overhead), spreading the information you need to recover the original message over
all the message. This generally turns "
partial information about the plaintext" into "no information about the plaintext".
An example would be: imagine someone encrypts two different plaintexts with a
stream cipher using exactly the same key (or key+
IV pair). Classically, an attacker can just
XOR together the two
ciphertexts, effectively removing the
keystream and obtaining the two plaintexts
XORed together; this enables the attacker to eventually obtain the two separate plaintexts, due to assumptions that can be made regarding the statistical structure of the plaintext (e.g. it mostly consists of
spaces and letters). On the other hand, if someone first applies an
allornothing transform to the two plaintexts before encryption with the stream cipher (using again, the same key), the attacker can no longer recover any of the plaintexts, as it's not possible to invert two allornothing transforms if you only have the two transformed plaintexts XORed together (since they are effectively random). This means that an
allornothing transform increases the
resistance of a cipher against
statistical and
known plaintext attacks: even a cipher with
certificational weaknesses can be considered pretty safe, if you only use it to encrypt data that has been processed with an allornothing transform.
It should be noted, for example, that some
public key cryptography protocols are only secure under the assumption that you only encrypt random data (e.g. a randomlygenerated
AES session key) with it, as direct encryption of a (redundant/nonrandom) plaintext might leak information about the
private key. In these types of contexts, it is also useful to apply an allornothing transform prior to the encryption (like
OAEP, which is usually used with
RSA), to prevent leaking information about the private key.
A final example of where an allornothing transform would be useful is a situation where you want to encrypt a plaintext with two different
stream ciphers (Cipher1 and Cipher2), using two distinct keys:
Cipher1( Cipher2(
plaintext ) ) = (
plaintext ⊕ Keystream2)
⊕ Keystream1 = (
plaintext ⊕ Keystream1)
⊕ Keystream2 =
plaintext ⊕ (
Keystream1 ⊕ Keystream2)
As you can see, the problem is that, since
XOR is
commutative, the attacker doesn't have to decrypt the ciphertext by the same order you encrypted it. In fact, the attacker doesn't even have to attack the two ciphers: (s)he can attack an equivalent cipher that outputs
Keystream1 ⊕ Keystream2 as its
keystream. On the other hand, if you apply an allornothing transform between the two encryptions, they are no longer
commutative, and the attacker now has to "peel off" Cipher1 before he can undo the allornothing transform and "peel off" Cipher2.
How?
An example of an allornothing transform (using
AES256 as block cipher and
SHA256 as hash function) would be something like:
 Take your message and apply an appropriate padding scheme (PKCS7 is cool, but anything decent is ok), so that its length is an integer multiple of 256 bits;
 Choose a random 256bit key (K) and encrypt each of the N 256bit blocks (P_{i}) with that key using a slightly modified ECB mode:
P'_{i} = AES256encrypt( K, P_{i} ⊕ i ) (for 0 ≤ i ≤ N1)
Here, the counter prevents equal blocks from encrypting to the same thing, by making the output of the encryption function depend on block position. Encrypting the message with a random key effectively whitens (i.e. randomizes) the message. Note that this would work probably just as well with any other encryption mode, such as plain ECB or CBC.

Now, perform the following calculations:
H_{0} = SHA256( P'_{0} )
H_{i} = SHA256( P'_{i}  H_{i1}  i ) (for 1 ≤ i ≤ N1)

Then calculate this single 256bit word:
J = K ⊕ H_{0} ⊕ H_{1} ⊕ H_{2} ⊕ ... ⊕ H_{N1}

And, finally, the "packaged" message is just the randomly encrypted blocks concatenated with J:
P'_{0}  P'_{1}  P'_{2}  ...  P'_{N1}  J
Notice that each possible
plaintext maps to
2^{256} different (expanded) messages, depending on the key you choose. Also, note that the message is effectively randomized, as every element of
P'_{i} has been encrypted under a
random key and element
J results from the
XOR of a randomly chosen number (
K) with several hashes obtained from the (randomized)
P'_{i} blocks. By "randomized", I mean that, even if someone knows
P_{0} (or any other block) with 100%
certainty, they still cannot
predict P'_{0}, J or any
P'_{i}. Also, even if they know
P'_{0}, they cannot recover
P_{0} unless they know
J and every other element of
P'_{i}.
To "unpackage" an
incoming transformed message (
Z_{i}, with a length of M blocks), the recipient only has to perform the following steps:
 Calculate the H array:
H_{0} = SHA256( Z_{0} )
H_{i} = SHA256( Z_{i}  H_{i1}  i ) (for 1 ≤ i ≤ M2)

Calculate K from J (i.e. the last block in Z_{i}):
J = Z_{M1}
K = J ⊕ H_{0} ⊕ H_{1} ⊕ H_{2} ⊕ ... ⊕ H_{N1}

Now just decrypt the blocks using K, to get the original plaintext:
P_{i} = AES256decrypt( K, Z_{i} ) ⊕ i (for 0 ≤ i ≤ M2)
As you can see, as long as someone knows all the bits of the "package", it's trivial to
reverse this transform. On the other hand, to show how this transformation is "allornothing", let's see what happens when only
partial information is available:

Imagine you encrypt the first 256bits of Z_{i} (corresponding to P'_{0}) with some key (L): someone who doesn't know L cannot calculate or estimate H_{0} (or any of the elements of H_{i}, actually, due to chaining), so (s)he cannot obtain the random key (K) used for the allornothing transform. Also, note that, since only one block was encrypted with L and that block was randomized to begin with, mounting any type of cryptanalytic attack to recover L becomes virtually impossible (you can't really attack a cipher after having looked at only one output);

Imagine you encrypt the last 256bits of Z_{i} (corresponding to J) with some key (L): some attacker can now calculate the whole H array, but (for the same reason as above) it's still impossible to obtain K, since you need to know J. Again, given that you're only encrypting one block with L and that the block is randomized to begin with, it becomes virtually impossible to mount a cryptanalytic attack to recover L;
This extends to any block: if you
encrypt or
obfuscate any single block of the "package", it becomes
computationally infeasible to recover
K and undo the allornothing transform. Of course, in a real application (unless there are specific
constraints), you would probably apply a keyed encryption step to the whole expanded message (rather than to a single block), effectively increasing the
robustness of the keyed encryption step against cryptanalysis (particularly against "
known plaintext attacks").
Finally, it should be said that, from a purely
academic pointofview, a type of transform like this is not very
efficient, as it requires an additional encryption and
hashing step per block and results in an expanded message. On the other hand, it's also true that the expansion rate is small for large ciphertexts (and tends to 0% as the ciphertext size grows to
infinity), so it's
negligible when you're encrypting
bulk data and computational
overhead is mostly negligible for small ciphertexts (particularly in a world where people use stuff like
scrypt and
bcrypt). Besides, if it's data you're not likely to decrypt every day, spending a handful of seconds more while decrypting is not the end of the world.
So... yeah... if you're thinking of encrypting data you're not going to touch for a while, the correct order of steps should be:
 compression;
 allornothing transform;
 encryption;
 forward error correction.
Just saying...
References