Counting 1 bits (idea) by Neodymion2

You can optimise bis's solution even further. The first two lines are the same:

i = (i & 0x55555555) + (i>>1 & 0x55555555);
i = (i & 0x33333333) + (i>>2 & 0x33333333);

Now at this point we have 8 4-bit counters and wish to add them in pairs to get 4 8-bit counters. The maximum number in each counter will be 8, because it's the sum of 8 bits. The trick is that the number 8 fits into 4 bits, which means we can do the mask operation after the add. This is the same as call's optimization, except we're doing it earlier.

i = i + (i >> 4) & 0x0F0F0F0F

We now have our 4 8-bit counters. Here's the clever part: We can add these together in a single operation.

i = i * 0x01010101

How does this work? Imagine the multiplication done by hand.

        01020304
      x 01010101
----------------
        01020304
      0102030400
    010203040000
  01020304000000
----------------
000103060a090704

Notice that the A is the sum of the four digits. Actually, I lied a bit, we still have to shift.

i = i >> 24

the whole thing:

i = (i & 0x55555555) + (i>>1 & 0x55555555);
i = (i & 0x33333333) + (i>>2 & 0x33333333);
i = i + (i >> 4) & 0x0F0F0F0F
i = (i * 0x01010101) >> 24

64 bit version:

i = (i & 0x5555555555555555) + (i>>1 & 0x5555555555555555);
i = (i & 0x3333333333333333) + (i>>2 & 0x3333333333333333);
i = (i & 0x0F0F0F0F0F0F0F0F) + (i>>4 & 0x0F0F0F0F0F0F0F0F);
i = (i * 0x0101010101010101) >> 56

I didn't invent this - search for bit population count on Google.

counting 1 bits SPOILER	hard interview questions	counting 1 bits C implementations	INTERCAL
Compute Fibonacci numbers FAST!	bitfrobber	A dead wish for uniformity	bit reversal function
Power of Two solution	counting bits is deceptive	Zero Population Count	bitcount
lookup table	New Testament	The Old Grey Whistle Test	Logic gate
Reversing a linked list	Interviewer	The C Programming Language	bit twiddling
Duff's device	constant	conventional memory