**Theorem:** Any comparison sort must in the worst case make at least Ω(*n* log *n*) comparisons to sort *n* keys.

**Proof:**

A sorting algorithm, viewed abstractly, takes a sequence of keys *k*_{1}, *k*_{2}, ... *k*_{n}, and outputs the permutation that will sort them.

For a given *n*, we can represent the optimal algorithm as a binary decision tree: the internal nodes are comparisons of two keys, and at the leaves are the correct permutations. Sorting a particular sequence is equivalent to walking the tree from the root to a leaf. The number of comparisons in the worst case is the height of the tree.

Now, there are n! permutations of a given sequence, so the tree has n! leaves. Then its height must be at least ceil(log *n*!). But we know from Stirling's Formula that *n*! ≥ *n*^{n}e^{-n}, which is equivalent to log(*n*!) ≥ *n* log *n* - n log e. So *worst case* ≥ ceil(log *n*!) ≥ *n* log *n* - *n* log e = &Omega(*n* log *n*). ∎

One way of thinking about this result is from an information theory viewpoint: there are *n*! possible permutations of the sequence, so the scrambled version can contain up to log(*n*!) bits of entropy. But by comparing two keys, we can extract at most one bit of information.

A counterexample for sorts that are not comparison-based: radix sort runs in O(*n*) time, if we restrict the keys to a certain interval.