This
node is meant to be a somewhat more
technical retelling of the above, with a focus on how to
avoid being duped.
The first way in which
statistics can be
distorted is through the type of
sample used. Rarely is an entire
population surveyed in a study. More often, a
sample is taken and the
data from that sample is
extrapolated onto the rest of the
population. It is thus vitally important that the sample be judiciously chosen.
Let’s imagine we are looking for the average height of
Canadians. We choose to
sample three random Canadians and we get their heights. The
mean value we find for the height of Canadians is a
random variable (do follow the linkpipe on that one, the term
random variable has an important meaning here). The height of Canadians could be any number within a particular
range but it is not equally
probable that it would be any of them. Let’s imagine that we take 50 samples of three people each and plot the resulting
data set on a
histogram. The mean of this
histogram (the
average of the fifty
averages) is the
population mean as nearly as we can determine it.
Each of the fifty
data sets could also be plotted on a
histogram. The small size of the
sample means that there is a relatively high chance of an unusually tall or short person turning up in our
data set and thus making its
mean dramatically different from the
mean of the entire
population. Therefore, as our
sample size gets larger, the
distribution of the
sample averages will have less
spread.
If we knew the true
mean height of the Canadian population, we could put it on the
histogram from one trial. It will usually be either too large or too small when compared with our
estimated value. How far off it will be depends on the
sample size of the trial. Since a larger sample represents the whole population more effectively, it makes sense that it would do a better job of
estimating the true value.
95% of the time, the true value of the
mean height of the Canadian population will be within two
standard deviations of the estimated value. The
standard deviation of the
histogram will become smaller as the sample becomes larger. This means that the area in which the true value almost certainly lies on a histogram becomes smaller when a larger
sample size is used. This concept may be more familiar than you think.
Consider polls. When a poll result is stated, it is usually in the form: “55% of Canadians say Jean Chretien should play more golf, plus or minus 5% 19 times out of 20.” The “19 times out of 20” is the same 95% from the above paragraph. This means that 5% represents twice the
standard deviation for the
set from which the 55% value is determined. The pollsters are giving you the
standard deviation in disguise!
Another common method by which
statistics are
fudged is
conditioning. This is the process of selecting specific sub samples within a
data set for comparison. An example is the
average male wage compared with the
average female wage. The manner in which this is done affects the results you get.
Studies have shown that kids who go to private schools earn 10% more, on average, than those who go to public schools. What does this mean? If we change the
conditioning to examine neighbourhood and background, we see the difference reduced to zero. This essentially means that the
marginal impact of going to private school if you already live in a good area (high average income, low
unemployment) is quite small. Contrarily, students coming from a poor area stand to gain 10% in their average income for going to private school. Such statistical evidence (keep in mind that this is just an example) can lead to
government policy decisions. The above conclusion would support a proposal for
vouchers allowing poor kids to go to private school, for example.
One final
statistical trick I shall examine is that of
scale. Somebody can call an increase from 2-3% inflation (as calculated by the
Consumer Price Index, for example) a “50%” jump. In actuality, the change was rather small. Whenever percentage changes are used to examine changes in small values, alarmingly large percentage changes can result. For this reason, if you are presented with very large
percentage changes you ought to keep in mind that they may simple represent small variations in small quantities. For the
GDP of
Luxemburg to grow by 10 or even 50% represents very little actual growth compared with the
GDP of the
United States growing even 1%.
Remember, people can only lie to you with
statistics if you let them! Be aware of how they work and you will be a less
gullible member of society.