Adler32 Checksum - Everything2.com

The Adler32 checksum is a 32-bit checksum invented by Mark Adler for use in the Zlib compression/decompression library. It is described in RFC 1950: "ZLIB Compressed Data Format Specification version 3.3", including sample C code.

Such a checksum is included in every zlib stream (note that zlib's format is different from that of gzip, though zlib also supports gzip), and is checked by the decompressor to guard against accidental modification or corruption. For protection against malicious modification, the use of a cryptographic message authentication code is required, but these are generally much more expensive (in terms of CPU time) than the Adler32 checksum.

Adler32 is also included as part of the Java standard library. It is trivial to program, and doubtless there are dozens of implementations in existence today; I have seen ones in C, C++, C#, Java, Ada95, and Perl.

Adler32 is significantly faster than any CRC algorithm I have seen, and can easily hash well over 750 megabytes of data per second on a reasonably modern machine.

The algorithm goes as follows:

S1, S2: unsigned 16-bit variables
S1 = 1
S2 = 0

foreach byte b in input
    S1 = (S1 + b) % 65521
    S2 = (S2 + S1) % 65521

The final checksum is S2 || S1

65521 is the largest prime smaller than 2**16. You can optimize an implementation significantly by realizing that, if S1 and S2 are stored in 32-bit registers, then you hash quite a bit of data without causing the values to overflow; thus, you only have to do the modulo operation for every 4 kilobytes or so of input; see RFC 1950 for details.

There is one important caveat about using Adler32. Unlike CRCs, it is not the case that all the bits of the checksum rely on all the bits of the input; thus, hashing strings of a short length, or with little entropy, will result in checksums with unevenly distributed bits. In particular, the higher bits will often be zero when hashing short strings.

CRC	Cyclic Redundancy Check	hash function	zlib
Checksum	modification	CRC32	MD5 hash function
Header Checksum	Fortuity	Flagitate	bzip2
What to do if you get in a car accident	malicious	modulo	implementation
digest	register	gzip	overflow
RFC	Integrity	hash