WARNING: Lots of HTML math ahead!

Forward

There are a several specific results which are known as central limit theorems, each sometimes referred to as "the" central limit theorem. Here we will focus on one particular version which

A word on notation: Here we will use the notation E(x) to denote the expectation value of a random variable x. There are other conventions in common use, including wedge brackets ⟨x⟩. The symbol i will be used for the basic imaginary number, while j and n will be used for counting indices.

Theorem

Consider a set {xj}, j=1,...,N of N independent random variables with expectation E(xj) = μj and variance E(xj2)-E(xj)2 = σj2, where the σj are real and finite. (A specific additional condition on the σj will be discussed later.) Let σ = (&Sigmajσj2)1/2 and define a new variable z = Σj(xjj)/σ as the (scaled and shifted) sum of the xj. Then as N→∞ the distribution of z approaches normal, i.e. p(z) = (2π)-1/2exp[-z2/2] where p(z) is the density function of z.


Preliminary Definitions

The characteristic function Φ(k) for a variable x is defined as

Φ(k) = E(exp[ikx]) = ∫exp[ikx]p(x)dx

This is a calculational device for finding the moments E(x), E(x2), etc. as

Φ(m)(0) = imE(xm)

where Φ(m)(k) represents the mth derivative of Φ(k). If we can write these moments as derivatives of Φ(k), we can also do the reverse and write Φ(k) in a Taylor series:

Φ(k) = Σ E(xn)(ik)n/n!

The logarithm of the characteristic function is known as the cumulant generating function, defined as

Ψ(k) = ln[Φ(k)] = ΣCn(ik)n/n!

where the Cn, known as cumulants, are polynomials in the moments E(x), E(x2), etc. Of special note are C1 = E(x) and C2 = E(x2)-E(x)2 = σ2. Note that if we try to evaluate C0 the result is always zero, so this term is generally ignored.


Proof

Let Φz(k) and Φj(k) denote the characteristic functions for z and the xj. Then

Φz(k) = E(exp[ikz]) = E(exp[ikΣj(xjj)/σ]) = E(Πj exp[ik(xjj)/σ]) = E(Πj exp[ikxj/σ] exp[-ikμj/σ])

As the xj are independent the product can be moved outside the calculation of the expectation; so can the exponential in μj, as it is a constant. This results in

Φz(k) = Πj E(exp[ikxj/σ]) exp[-ikμj/σ] = Πj Φj(k/σ) exp[-ikμj/σ]

Now we take the log, to change the characteristic functions into the cumulant-generating functions:

Ψz(k) = Σj Ψj(k/σ) - ikμj/σ

Substituting the Taylor expansions,

Σn Czn(ik)n/n! = &SigmajΣn Cjn(ik/σ)n/n! - ikμj/σ

Coefficients of like powers of k must be equal on both sides, so we can solve for the Czn. As Cj1 = μj and Cj2 = σj2 we find

Cz1 = Σj Cj1/σ - μj/σ = Σj μj/σ - μj/σ = 0
Cz2 = Σj Cj2/σ2 = (Σj σj2)/j σj2) = 1

Now, Czn ∝ 1/σn, while σ is a sum of N finite σj, so as N→∞ it should not be surprising that Cz3 and higher-order Czn approach zero. This is straightforward if we make the simplifying assumption that the xj have equal variance, i.e. that the σj are all equal. However, there are several sufficient, weaker restrictions which we can impose on the distribution of the xj including the Lyapunov, Lindeberg, and Feller-Lévy conditions; the study and proof of these variants is left to the interested reader. In all cases, we find that Czn = 0 for n>2, so

Ψz(k) = (ik)2/2! = -k2/2

and

Φz(k) = exp[-k2/2]

This is the characteristic function of a standard normal distribution; we can verify this by performing an inverse Fourier transform to recover p(z):

p(z) = (2π)-1 ∫exp[-ikz]Φz(k)dk = (2π)-1 ∫exp[-ikz-k2/2]dk = (2π)-1/2exp[-z2/2]

Thus z converges to the standard normal distribution, as desired.