This writeup is actually about video files inside .rar archives, but there is no such node and I want to place it here anyway. :)

There is a perfectly valid reason to place video files inside a RAR archive. The reason is error correction. RAR can place a recovery record inside an archive that can recover up to 10% of lost information. RAR does it by adding extra information in the file. The size is equal to the maximum size of the lost data that you want to recover. Say, you want to be able to fix the 600Mb file if up to 1% (6Mb) is corrupted. Then RAR increases the archive size by 6Mb. The amazing thing is that using this recovery record RAR can fix the error no matter which part of the 600Mb file is corrupted.

Now I will give a very simple explanation of how this works1. Imagine that we have a 3-byte file that we want to protect from errors. Let's say, the file is 'ABC' where A, B and C are bytes. First, let's protect this file from up to 33% corruption — when any byte can be lost.

Let's add one byte to our file, D, such as:

D=A+B+C

Now, in case any of the three data bytes is lost, we can recover it, using a simple formula:

A=D-B-C
B=D-A-C
C=D-A-B

Now, if we want to be able to recover two lost bytes, we just need to add two bytes, D and E to recovery record:

D=A+B+C
E=A+2B+3C

If two bytes are lost, say B and C, we can calculate them:

B=D-C-A, C=E/3-2/3B-A
B=D-E/3+2/3B => B=3D-E
C=E/3-2D+2/3E-A=E-2D-A

Voila. The same can be done for byte sequence of any length. (A1...An) To recover k lost bytes we need to add a k-byte long recovery record r1...rk, where rj=b1j*A1+b2j*A2+...+bnj*An The only requirement is that for every i,j,m such as i<=k, j<=k and m<=n, bmi<>bmj.

1 - in this explanation in A=B I use "equals" sign when I actually mean that the MOD(A,256)=B. For this I am used to writing three horisontal lines (strange, but I can't find a Unicode character) with 256 written above.

In real life the situation is just a very slightest bit more complicated. Instead of bytes, somewhat larger data blocks are protected. And of course, in addition to recovery record you need some error detection to find out which blocks are actually corrupted. In archives CRC is used most often.

So you may wonder now, does anyone actually put movies inside archives. The answer is a resounding yes. Online movie distributors use them sometimes. Typically the large files are stored in multi-volume archives, with about 80 volumes per CD (I may be wrong about the exact number) and distributed. In addition to data volumes there can be a recovery volume. If you got one corrupted volume in 80, you can use the recovery volume and the rest 79 good data volumes to exactly reconstruct the corrupted one. Voila!

In addition to storing movies in RAR archives, one can also store inside them MP3 songs, JPEG images and even other ZIP and RAR archives. Not to mention boring DOC, XLS and PPT files. :) Next time when you want to store valuable data on unreliable media, such as a floppy disk or a CD, think about using a RAR archive with recovery protection enabled.