AKA The Normal distribution. The statistical distribution. Its importance flows from the fact that :
  • Any sum of Normal distributed variables is itself a Normal distributed variable
  • Sums of variables that, individually, are not Normal distributed tend to become Normal distributed (asymptotically)
You won't find many stochastic variables on this planet that are not Gaussian of nature.
    Examples
  • The number of single girls in a bar (when measuring eg. every day at noon in the same bar
  • The number of cars passing a point on the highway (go ahead: Spend an hour a day - a 1000 days in a row and see the nice distribution curve smoothing more and more until it is perfectly Gaussian
  • The height of Japanese people
  • The number of bytes/links/images on a homepage (this one would be easy to check
Let Z be a Gaussian distributed stochastic variable with mean=0 and standard deviation=1.
    Interesting values of Z follow:
  • Prob(|Z|>=1) <= 0.3173105
  • Prob(|Z|>=1.96) <= 0.0499957
  • Prob(|Z|>=3.29055) <= 0.0010000
For the not so much into mathematics reader:
The small list shows that the probability of finding a value in the data set that is more than 3.29055 times higher than the standard deviation is 1 in a thousand. So - if all cars on the highway are doing 50 plus/minus 10, only one car in a thousand will do more/less than 50+10*3.29055 which is about 83. (Or to use the first entry in the list: The chance that there are more single girls in a bar than normally is 31.7%/2= 15.8% - go push your luck!) Well folks - that's all for now. Thanks for letting me use this place as a test stage for my thesis, where I'm actually discussing small uninteresting matters like this (focusing a little less on single girls, though)


And to ariels - yes - you're absolutely right. You'd also never find a car going faster than the speed of light, even though it SHOULD happen de temps en temps if the velocities were truly Gaussian distributed. Forgive my engineer-geekish way of looking at things (eg. 0.98 is not close to 1, it IS 1)
All of the examples above are of non-negative quantities, but the Gaussian distribution is unbounded, and in particular always attains negative values with non-zero probability! So all the examples are wrong, at least in the strict sense.

Whether the Gaussian or Poisson distribution is more common depends on what, exactly, you measure. But it is true that many naturally occurring random variables are approximately Gaussian. This is a consequence of the Central Limit Theorem alluded to above: the average of N iid random variables (which have variance, if you must get technical)) converges a.s. to a Gaussian variable. So if you look at people's heights, they're not normally distributed (since they're always positive). But (presumably due to some underlying stochastic process) it can be modelled with reasonable accuracy as a sum of iid random variables; this, in turn, may be approximated by a normal distribution.

Just don't confuse the pretty mathematical model with what really goes on.


Engineers, Physicists, Statisticians, Computer Scientists, Astronomers, and all the others! Hmmph! I don't know why we allow them to use Mathematics, I really don't...

                                 THE
                                NORMAL
                             LAW OF ERROR
                           STANDS OUT IN THE
                         EXPERIENCE OF MANKIND
                        AS ONE OF  THE BROADEST
                       GENERALIZATIONS OF NATURAL
                     PHILOSOPHY . IT SERVES AS THE
                   GUIDING INSTRUMENT IN RESEARCHES
                IN THE PHYSICAL AND SOCIAL SCIENCES AND
               IN MEDICINE AGRICULTURE AND ENGINEERING .
          IT IS AN INDISPENSABLE TOOL FOR THE ANALYSIS AND THE
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT

--W.J. Youden

This is the normal density curve defined by a mean of 0 and standard deviation of 1, also called the standardized normal curve. A formula for generating the density curve is:
P=(e^(-(x^2)/2))/sqrt(2*pi)
To determine the probability of an event in a normally distributed set of data of occuring, integrate this function from -infinity to (x-mu)/sigma where mu is the mean and sigma is the standard deviation.
A mainstay of statistics, this curve is symmetric around a single mode (which also happens to be the mean.) It has inflection points at one and two standard deviations to either side of the mean. (Note: jt claims differentiation shows that there's inflection points at one standard deviation on either side... I haven't checked the math yet.)

The curve is described by the following equation: y = (1 / sqrt(2 * pi * sigma^2)) * e^(-(x - a)^2 / (2 * sigma^2))

...where a is the mean and sigma is the standard deviation. Also known as the Gaussian or the Normal curve bell curve, or the Laplace-Gaussian curve. Karl Pearson is apparently the person responsible for the term normal, which he coined in order to avoid a naming dispute, but which he apparently now regrets since it incorrectly implies that all other distributions of data are somehow abnormal.


What does this mean to you?

Gaussian curves appear all over the place. IQ is assumed to follow a normal curve, with 100 being the mean (average), and half of the population falling above the mean, half below. Test scores for well-defined tests often fall into this shape. A lot of science, especially social science, tends to assume that data fits this pattern and chooses the statistical tests to used based on that assumption. T-tests and ANOVAs, for example, assume that the samples come from a normally distributed population.

Most statistics books contain tables at the back which list the probability that something occurs however many standard deviations away from the mean of the curve.

Statistical term for a symmetric curve in which the measures of central tendency (mean, median, and mode) are all equal. This is extremely important in statistics, as many continuous distributions approximate the normal curve. Using a table of values (or integrating with Calculus), we can find the area under the curve, and thus the probability.

Not to be overly nit-picky, but blaaf's write up is not entirely accurate. He provides the values of the standard normal distribution, which is the normal distribution when the mean is 0 and standard deviation 1. This is a particularly useful normal distribution (ie. it is used to simplify calculations in statistical tests), but only one of an infinite set.

The equation for the normal curve is:

f(x) = e(-(x-μ)2/2σ2) / √(2πσ2)

Where μ is the mean of the distribution, σ is the standard deviation of the distribution and f(x) is the probability density function. As you can easily see, if the mean is 0 and standard deviation 1, the normal curve becomes:

f(z) = e-z2/2 / √2π

Log in or register to write something here or to contact authors.