A chi-square test is used in statistics to determine whether or not there is a correlation between two variables.
The most common chi-square test is Pearson's chi-square test - if you just hear the words "chi-square," 99% of the time this is what they're talking about. This test assumes:
- a random sample;
- a large sample size (although this is rather arbitrary);
- all cells in a Variable X Variable table will have at least a count of 5 - if not, apply Yates' correction (more on that later);
- similar distribution of the population; and
- a non-directional hypothesis - that is, discovering two variables are related does not imply one causes the other or vice versa.
The chi-square test formula is
Χ2 = Σ (Observed - Expected)2 / Expected
Ok, if all that is confusing, now that we've covered the statistical mumbo-jumbo, let's put our knowledge to work. First, let's get some data.
Example: The Department of Transportation wants to know if more traffic accidents occur on the weekends or on the weekdays. You head down to the local DMV and get the following data:
Day of the Week Accidents
So, what are we expecting here? Well, we expect each day to have the same number of accidents. To find out what each day should be, take each column (only accidents, in this case), add the values up, and divide it by the number of elements (in this case, 7 days of the week). We get an expected value of 37.
Next, we apply our formula. I'll do the first one Sunday, and let you guys do the rest:
(Observed - Expected)2 / Expected
(42 - 37)2 / 37 =
25 / 37
Continuing on, we end up with a Χ2 value of 144 / 37, or roughly 3.892. Next, we figure out the degrees of freedom. This is always equal to (n - 1) in chi-square tests with only one column, i.e. 6. In a multi-row, multi-column table, dF is equal to (r - 1) * (c - 1). Now, the fun part: get out your handy-dandy chi-square table. What? Don't have one? You can find them in the back of most statistics books, or, if you're lucky, your calculator or statistical analysis program will have one built in. (Update: blaaf has provided a handy-dandy table at chi-square curve.) Looking up our upper critical value in the books, we see that our Χ2 value would have to exceed 10.645 to be statistically significant. Therefore, we can safely go tell our boss at the DOT that accidents pretty much happen at the same rate every day of the week.
In conclusion, a chi-square test compares the observed values and the expected values to see if there's a significant correlation between them. It is an excellent simple tool for calculating that two variables are affecting each other somehow.
Addenda: Yates' correction basically punishes low cell counts, which suggest a non-rigorous sampling. To apply the correction, if any cell in the table has a value of less than 5, subtract .5 from every O - E value before squaring it and dividing by E.