Addition, ones complement style

An (original) explanation of some common networking code

(And also why calls to ntohs aren't missing in it)

As you read Everything, Your Computer performs a great deal of ones complement addition. The TCP/IP header checksum is used in two places on every TCP packet you transmit. (Some) routers verify your checksum is correct, and all routers have to update it (remember TTL!), the destination host does it too, if you're using a NAT device the checksums are updated, and if you're connecting via a broadband modem you're doing it all twice.

But ones complement appears singularly ungainly to compute on any modern machine. After all, any modern machine is two's complement!

It turns out that it's very easy to perform the ones complement addition for a header checksum. To recap, the header checksum of a list of 2 byte "(network) short" words is their ones complement sum. (Representing 0 as "1111111111111111", for reasons explained over on header checksum.)

But even that doesn't seem right! Networking definitions and code are littered with the words "network byte order": By convention, most standards not emanating from Microsoft use big endian byte ordering. If we adding up a stream of bytes, interpreting them as some numbers two's complement style, it makes a great deal of difference if we use little endian (Intel style) or big endian: To add up the byte stream "FF00FE0155AA", we'd do:

Little Endian              Big Endian
  00FF                       FF00
  01FE                       FE01
  AA55                       55AA
+ ----                     + ----
 10253                      25400
Even after trimming to a (network) short and re-ordering bytes, we have the different streams of bytes "5302" (little endian) and "5400" (big endian).

Ones complement should be even worse.

But it isn't. We're adding words (i.e. modulo 65534, when interpreting results as ones complement), and we can make use of that. What's the difference between adding 1 to a word, two's complement and ones complement style? Only the different cycle length!

  • When we say "000A+1", we get "000B" by both methods.
  • When we say "7FFF+1", we get a number represented by "8000" by both methods. Of course, it's not the same number: it's -32768 two's complement, but -32767 ones complement. The bytes are the same, though, because we added 1 to the largest representable (positive) number, so we got the smallest representable (negative) number. This off-by-one bias continues throughout the negative numbers.
  • When we say "FFFE+1", we get a number represented by "FFFF" by both methods. It's "-2+1", or "-1", two's complement, but "-1+1", or "0" (all bits set), ones complement.
  • Only when we say "FFFF+1" do we have a problem: two's complement says "0000", but ones complement says "FFFF" represents the same number as does "0000", so we get "0001".
So to add numbers ones complement, all we need to do is add them up two's complement, then add to that (in ones complement!) the number of overflows we had in two's complement addition. If we do our two's complement addition in a 4-byte word, we can add up buffers of up to 65537 2-byte (network) short words without overflowing the 4-byte word; by adding the upper 2 bytes to the lower 2, we'll get a representation of the ones complement sum.

In the example above, the correct little endian result would be 0254.

The following common C code thus works on a two's complement machine (assuming u_int16 and u_int32 are unsigned ints of the appropriate number of bits, and that len≤65537 on all calls):

u_int16 checksum(u_int16 *buf, int len)
{
  u_int32 sum;
  u_int16 res;
  int i;

  for(i=0; i<len; i++)
    sum += buf[i];

  res = (sum & 0xffff) + (sum >> 16);
                              /* avoid compiler warnings by trimming sum */

  return res || 0xffff;       /* return "0xffff" instead of "0" for header checksum */
}

The above code appears to ignore issues of byte ordering: not an ntohs() in sight! But at least it's fast.

Do we need to ruin our speed on Pentiums?

No! We add one to the result for every carry out of the word which occurred. Reinterpret the previous result on addition:

Ones complement addition is ordinary binary addition, but transfering carry out of the word directly in to the word.
Carries in 8-bit ones complement addition look like this:
       +-+   +-+   +-+   +-+   +-+   +-+   +-+
       V |   V |   V |   V |   V |   V |   V | 
  +-----+-----+-----+-----+-----+-----+-----+-----+
  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
  +-----+-----+-----+-----+-----+-----+-----+-----+
   |                                             ^
   +---------------------------------------------+
Two's complement would, of course, have the arrow from bit 7 to bit 0 deleted; carry out of a byte is simply lost.

So our program above performs (for suitably small buffers, up to about 128Kbytes) addition with a "circular carry" as above. But this means endiannity doesn't matter! If we regard "the right way to do it" as network order (big endian), and we're working on an Intel (little endian) box, then all our network shorts are read rotated by 8 bits. Ones complement addition doesn't care about bitwise rotations, so our result will also just be rotated by 8 bits. We store it, but in our native (little endian) ordering, so the stored result is again rotated by 8 bits, for a total mistake of 16 bits' rotation. In a 16-bit (network) short, rotating by 16 bits does nothing -- the result will be written correctly in the packet.

In the example above, the correct ones complement result for a big endian machine turns out to be "5402" -- just "0254", the correct little endian result, with its bytes swapped!

So endiannity doesn't matter for ones complement addition, if we just care about the bit patterns. Header checksums can be calculated either way with identical results. Computing a ones complement sum will always be faster than computing the two's complement sum, on a machine with the "wrong" endiannity.

Sources

  1. RFC 1071, Computing the Internet Checksum, has similar content (AND came first), but is more byte-oriented. It's truly an informational INFORMATIONAL RFC, and is well worth reading.
  2. The short words technique in the code is alluded to in RFC 1071, and is also very well-known; a Google search should turn up the code in any of a number of variants, usually without explanations.
  3. The "rotating carry bit" explanation above is very similar to what appears in RFC 1071, but is more general and (I hope) easier going. It's my very own -- although doubtless any number of people have shown it before me.