Confidence Interval (idea) by Professor Pi

A confidence interval is a notion from probability. For a given confidence level, it represents the probability that a random sample will fall within that interval.

For instance, let's suppose the average flight speed of an unladened swallow is 15 m/s with a 95% confidence interval ±1.5 m/s. This means that for any randomly selected swallow, there is a 95% probability that his flight speed will be between 13.5 m/s and 16.5 m/s (unless he carries a coconut).

In experimental data, it is quite often impossible to measure the average of the entire population (the population mean). For example, it is impractical to measure the flight speed of every swallow that is alive. In this case the sample mean is used as an estimate of the population mean.

There is another problem with experimental data, which is that usually we don't know the variance of a population. The variance describes how much a random variable deviates from the population mean. Since the sample mean is only an estimate of the population mean, the variance is unknown.

In many experimental measurements, the data of an entire population will follow a normal distribution. For instance, if we measure the time it takes a ball to drop from 10 m, and remeasure it a great many times, the measured values will fall around 1.43 s.; most of the values will be close to the mean, fewer will be far off. If we plot the number of observations of any specific time that we measured as a function of this time, it will resemble the typical bell curve that is described by the normal distribution.

However... As I mentioned before, we cannot measure the variance if the entire population mean is not available. This is where William S. Gosset, better known as student comes to the rescue. The so called student t-distribution makes a correction to the normal distribution, based on the number of samples that were taken from the population. This data can be looked up in a book with mathematical tables. You will need the double sided t- distribution values. The values will be listed for a certain P-value, corresponding to the required confidence limit (P = 0.05; confidence interval = 1 - 0.05 = 0.95 = 95%). For each confidence interval, the values are listed with an increasing number of degrees of freedom. Any number of measurements, n will correspond to n-1 degrees of freedom; one degree of freedom is used to calculate the mean. Frequently used confidence limits are the 99%, 95%, and 50% confidence intervals:

           Confidence limit
 DF      99%      95%	50%
  1    63.656   12.706    1.0000
  2     9.9250   4.3027   0.8165
  3     5.8408   3.1824   0.7649
  4     4.6041   2.7765   0.7407
  5     4.0321   2.5706   0.7267
  6     3.7074   2.4469   0.7176
  7     3.4995   2.3646   0.7111
  8     3.3554   2.3060   0.7064
  9     3.2498   2.2622   0.7027
 10     3.1693   2.2281   0.6998
 15     2.9467   2.1315   0.6912
 20     2.8453   2.0860   0.6870
 40     2.7045   2.0211   0.6807
100     2.6259   1.9840   0.6770
 ∞      2.5758   1.9600   0.6745

values calculated using the TINV function in Excel

At infinite sample size the t-distribution values approach the normal distribution. Assuming that the entire population represented by the normal distribution, the confidence interval δq for a mean q can now be calculated with:

δq= ts / √n

Where δq is the confidence interval, t is the value of the t-distribution for n-1 degrees of freedom, s is the standard deviation, and n is the number of measurements in the sample.

As an example, we're going to measure the average number of users online as reported by the Everything Snapshot. The number of users online for one week is:

17 December 2000: 42
18 December 2000: 58
19 December 2000: 48
20 December 2000: 45
21 December 2000: 57
22 December 2000: 43
23 December 2000: 41

The sample mean is 47.7 users. The standard deviation, s is 7.06433. The 95% confidence limit for 6 degrees of freedom (7 measurements minus one) is 2.4469. Thus, the 95% confidence interval is:

δq= 2.4469 x 7.06433 / √7 = 6.5

Therefore, the average number of users online during the Everything Snapshot is 47.7 ± 6.5 users for a 95% confidence interval.

How to lie with statistics	chi square	six degrees of freedom	z score
t-test	consumer confidence	William S. Gosset	normal distribution
Omega Point	standard deviation	Everything Snapshot	t-distribution
Variance	anova	Chernoff bounds	hypothesis test
Statistics every writer should know	student	Kolmogorov - Smirnov test	frequentist
standard error	relative error	Probability Density Function