Theorem: Any comparison sort must in the worst case make at least Ω(n log n) comparisons to sort n keys.
Proof:
A sorting algorithm, viewed abstractly, takes a sequence of keys k1, k2, ... kn, and outputs the permutation that will sort them.
For a given n, we can represent the optimal algorithm as a binary decision tree: the internal nodes are comparisons of two keys, and at the leaves are the correct permutations. Sorting a particular sequence is equivalent to walking the tree from the root to a leaf. The number of comparisons in the worst case is the height of the tree.
Now, there are n! permutations of a given sequence, so the tree has n! leaves. Then its height must be at least ceil(log n!). But we know from Stirling's Formula that n! ≥ nne-n, which is equivalent to log(n!) ≥ n log n - n log e. So worst case ≥ ceil(log n!) ≥ n log n - n log e = &Omega(n log n). ∎
One way of thinking about this result is from an information theory viewpoint: there are n! possible permutations of the sequence, so the scrambled version can contain up to log(n!) bits of entropy. But by comparing two keys, we can extract at most one bit of information.
A counterexample for sorts that are not comparison-based: radix sort runs in O(n) time, if we restrict the keys to a certain interval.