Zipf's Law is a
power law function that has found many applications in the physical, biological, and
behavioral sciences. Popularized by
George Zipf, it is also known as the
Rank-Size Rule,the
Pareto-Zipf Law, the Pareto-
Estoup-Zipf Law and
Zipf's Curve.
Zipf worked with language and text. He would take a book and count the number of times each word was used. Then he would rank these words by most common to least common. In analyzing the results he formulated Zipf's Law:
r f = C
where r is the word rank
f is the frequency (or how many times it occurred)
C is a constant that depends on the text being analyzed.
In English text
C tends to be about
N/10, where
N is the number of words in the text.
So, for a text with 200,000 words we would expect to see the most common word about 20,000 times; the second ranked word 10,000 times, the third 6,667 times ... (20,000 / 1, 20,000 / 2, 20,000 / 3 ... ) and the 50th most common word 400 times (20,000 / 50).
A generalization from this can be made: a few things happen a lot, a bunch of things happen fairly often, and a lot of things rarely happen at all.