Counting 1 bits (thing) by call

The algorithm in bis' writeup above can be further optimised as the result of a couple of observations

The size of the sum is proportional to the log₂ of the number of bits being counted. This sum is continuously stored in an n-bit space within the 32-bit integer range.
On most architectures simple 8-bit constants are easier to construct than 32- or 16-bit ones.

On this basis, the final stage in bis's code:

i = (i&MASK16) + (i>>16&MASK16);

Can be reduced to:

i = (i + (i>>16))&MASK6;

Where MASK6 is (1<<6)-1 or 0x1F, and eliminates the need for MASK16.

Typically this will save two operations: one to set up the larger constant, and one AND operation.

A small optimsation, but useful if you're relying on your compiler to generate your code, in tight timing constraints, and with no hardware population count instruction.

counting 1 bits SPOILER	hard interview questions	counting 1 bits C implementations	INTERCAL
Compute Fibonacci numbers FAST!	bitfrobber	A dead wish for uniformity	bit reversal function
Power of Two solution	counting bits is deceptive	Zero Population Count	bitcount
lookup table	New Testament	The Old Grey Whistle Test	Logic gate
Reversing a linked list	Interviewer	The C Programming Language	bit twiddling
Duff's device	constant	conventional memory