This is an
idea I had while walking back
home one evening after a bit of a
heavy drinking session. New files for games, or movies, are generally getting
bigger and bigger. Files do sometimes get
corrupted during downloads, particularly on services prone to
line noise. Bandwidth is increasing, but not
everyone has access to services like
cable,
dsl or others. Even if they did, redownloading a CD size
movie is annoying, and extremely wasteful of
network resources.
The problem is, if a file is corrupt, how do we proceed? Well, we can assume that a finite part of the file is corrupt. Realistically speaking, a few hundred bytes could be misplaced; a small amount compared to the file size, but for executables and other non-resilient file types, enough to prevent the correct use of the file.
This download system would first generate a checksum, or mathematical hash of the entire file. Something like MD5 would do the trick. It then asks the server to return the checksum for it's copy of the file. If the two don't match, then the program gets clever. It recursively subdivides the file into two halves, and checksums those. It then asks the server for its checksum for each section of a file.
We repeatedly split the file, discarding sections which have the checksum correct. In this way, we rapidly zero in on the damaged section of the file. Once we reach a certain threshold, we decide to fetch that broken section from the server. Because we're only fetching the damaged part of the file, we save network bandwidth.
In this way, we sacrifice processor time in calculating checksums for network bandwidth in redownloading file sections. Obviously, there are problems - the server would be best advised to cache the generated checksums, to prevent excessive CPU usage as they are generated. By generating the checksums when the file is uploaded, or in a batch at a low processor usage time, this can be optimised further.
This is potentially a radical idea which has occurred to someone else at some point in time. I don't care - it occured to me independantly. Going off on a total tangent, if we could get infinitely fast (or simply very very fast) processors, downloading itself would be unnecessary. Simply sending the checksum and filesize, then generating data which is then checksummed, we could theoretically reconstruct a file from a sufficiently advanced checksum. This is at present computationally infeasible. Roll on quantum computing ...