Math 143 F
Probability and Statistics
Spring 2000

An Introduction to Chi-Square

The Chi-Square test statistic is useful for measuring how close counts of categorical variables are to what we would expect under some assumption (which we will call the Null Hypothesis). Larger values of the Chi-Square statistics are more unusual, so if the Chi-squre statistic is large enough, then one of two things has happened: Probability let's us quantifiy the phrase very unusual. The number that measures the likelihood that the null hypothesis is true and the data would be as extreme as the data at hand is called the P-value.

We'll develop these ideas by using three examples.

Golf Balls in the Yard

We first encountered the use of Chi-squre to investigate the null hypothesis that the golf balls landing in Professor Rossman's back yard were equally likely to be of numbered 1, 2, 3 or 4. That is, that the golf balls played by the types of players who hit golf balls that might land in his yard are uniformly distributed among these values.

	H-null: The numbers 1, 2, 3, and 4 are equally likely.

	H-alt:  The numbers 1, 2, 3, and t are not equally likely.
We expected about 121.5 golf balls (1/4 of 486) of each number. By adding
	                     2                                 2
	(observed - expected)                     (137 - 121.5)
	--------------------- ;    i.e., 1.9774 = ---------------; etc.
	       expected                                121.5
for each of the four values (1 through 4), we obtained Chi-squre = 8.46914:
 
            obs exp     diff    n dif   chi sq. P-value         cum prob
            137 121.5   15.5    1.97737 8.46914 0.0372487       0.962751
            138 121.5   16.5    2.24074
            107 121.5   -14.5   1.73045
            104 121.5   -17.5   2.52058
The P-value for this test is 0.0372487. This means if the golf ball numbers are uniformly distributed, then we would expect to get a value this big or bigger about 3.7% of the time.

Two-way tables

While this example makes the idea of the Chi-Square statistics easy to understand, it is not the most common way that Chi-Square is used. The most common use of Chi-Square is to test for an association between two categorical variables. The data for the two categorical variables is usually presented in a two-way table (also called a contingency table). Let's take a look at an example.

Chi-Square in Randomized Experiments: the Physicians' Health Study

In a randomized experiment, one of the two categorical variables represents the treatments and the other represents the outcomes. For example, in the Physicians' Health Study subjects were treated with either aspirin or a placebo. The main recorded outcome was whether or not the subject suffered a heart attack. In this case our hypotheses can be stated as follows

	H-null: There is no association between taking aspirin and having a 
		heart attack.  

	H-alt:  There is an association between taking aspirin and having a 
		heart attack. (That is, those taking aspirin are either more 
		likely or less likely to have a heart attack than those 
		taking a placebo.

Here is a two-way table representing the data from this famous study:
Heart Attack No Heart Attack
Aspirin 104 10 933
Placebo 189 10 845

The table lists the number of subjects with each possible combiniation of treatment and outcome. For example, there were 104 subjects treated with aspirin that had a heart attack during the course of the study.

We can get a more information from this table by adding anther row and two additional columns
Heart Attack No Heart Attack Total Rate per 1000
Aspirin 104 10 933 11 937 9.4
Placebo 189 10 845 11 034 17.1
Total 293 21 778 22 071 13.3

Now we can clearly see that roughly the same number of subjects were in each treatment group, and that the aspirin group had a lower rate of heart attack. In fact the rate of heart attack for the aspirin group was only a little more than half the rate for the placebo group.

Relative Risk

One way of formalizing that last observation is by using relative risk.
			   rate for one group
	Relative risk = ------------------------
			  rate for other group

So in this case, we can say that rate of heart attacks for those taking the placebo was 1.82 (17.1 / 9.4) times the rate of heart attack for those taking aspirin. Sometimes this is expressed as an increased risk of 82%.

Of course, we can reverse the roles of the two groups, computing the relative risk to be 0.55 (9.4 / 17.1), and say that taking aspirin reduces the risk of heart attack by 45%.

At least that was the case for those in the study. That leaves us with at least two questions:

We'll look at the fist of these questions in the next section.

The Chi-Square Test

So what are the results of the Chi-Square test on this data? Here is the output from Minitab:

Chi-Square Test

Expected counts are printed below observed counts

	Heart Attack?

           Yes         No    Total

Aspirin    104      10933    11037
        146.52   10890.48

Placebo    189      10845    11034
        146.48   10887.52

Total      293      21778    22071

Chi-Sq = 12.339 +  0.166 +
         12.343 +  0.166 = 25.014
DF = 1, P-Value = 0.000

The Chi-Squre statistic is computed by adding the value of
	                     2                                  2
	(observed - expected)                     (104 - 146.52)
	--------------------- ;    So    12.339 = ---------------; etc.
	       expected                                146.52
for each of the four cells in the original two-way table. The four values are added togther to produce (in this example) 25.014

How big is 25? That answer is given by the P-Value. It is listed here as 0.000, which means that it is less than 0.0005 (else it would round to more than 0.000). This P-value was so small that the study was actually terminated early. The evidence was so overwhelming in favor of aspirin, that those conducting the survey could no longer justify withholding it from the placebo group.

What did we expect?

So where did those expected counts (146.52, 10890.48, 146.48, 10887.52) come from?

Notice that in expectations in each row are approximately the same. That is because there were roughly equal numbers in each treatment group. Had there been more in one group than in the other, we should have expected more heart attacks (and more non-heart attacks) in the larger group. More specifically, since 11 037 of the 22 071 subjects took aspirin, we would expect (if the Null Hypothesis is in fact true) that of the 293 heart attacks, approximately

	 11 037           11 037 * 293
	-------- (293) = --------------  = 146.52
	 22 071              22 071
would occur in the aspirin group. That's simply the fair share for that group. In general the expected count is given by
	             Row Total                    Row Total * Column Total
	expected = ------------ (Column Total) = --------------------------
	            Grand Total                          Grand Total

Why so many Doctors?

The Physicians' Health Study was very large. What would have happened had they chosen to do the study with only about 2200 doctors instead of 22,000?

If the percentages of heart attacks remained roughly the same, the data in this case would have been the following
Heart Attack No Heart Attack Total
Aspirin 10 1093 1103
Placebo 19 1085 1104
Total 29 2178 2207


Expected counts are printed below observed counts

	   Heart Attack?

           Yes       No    Total
Aspirin     10     1093     1103
         14.49  1088.51

Placebo     19     1085     1104
         14.51  1089.49

Total       29     2178     2207

Chi-Sq =  1.393 +  0.019 +
          1.392 +  0.019 = 2.822
DF = 1, P-Value = 0.093

Notice how much less significant the result is with a sample of this size!

Chi-Square in Observational Studies

In an observational study, we are simply looking to see if there is an association between the values of two cateogrical variables collected for each unit. Although the methodology is not quite the same, the statistical procedure is identical.

As an example, let's look at the data from Survey 1. We might be interested, for example, in whether men get more tickets than women. In order to do a Chi-squre analysis, we first must decide what categorical variable to use for "getting tickets". One way to do this would be to compare men and woment to see who has received any tickets at all. If we do so, Minitab produces the following output:


 Rows: Sex     Columns: any tickets
 
          No      Yes      All
  
 F     74.00    26.00   100.00
          37       13       50
       32.22    17.78    50.00
  
 M     52.50    47.50   100.00
          21       19       40
       25.78    14.22    40.00
  
 All   64.44    35.56   100.00
          58       32       90
       58.00    32.00    90.00
 
Chi-Square = 4.483, DF = 1, P-Value = 0.034

 
  Cell Contents --
                  % of Row
                  Count
                  Exp Freq

Another way to do this would be to look at multiple offenders (those with 2 or more tickets). Here are the results:


Rows: Sex   Columns: multiple tickets
 
          No      Yes      All
  
 F     92.00     8.00   100.00
          46        4       50
       40.56     9.44    50.00
  
 M     67.50    32.50   100.00
          27       13       40
       32.44     7.56    40.00
  
 All   81.11    18.89   100.00
          73       17       90
       73.00    17.00    90.00
 
Chi-Square = 8.706, DF = 1, P-Value = 0.003

  Cell Contents --
                  % of Row
                  Count
                  Exp Freq
Notice what these results say and don't say.

One final note about statistical design. In a situation like the one we just looked at, there are two ways to design the study:

  1. Randomly select people, recording their sex and speeding ticket category.
  2. Do separate random samples from a population of women and a population of men, recording for each their speeding ticket category. In some cases, only the first of these design options will be abailable.

    A Few More Details

    The Chi-Square test is pretty easy to use:
    1. Create a two-way table.
    2. Compute the Chi-Square statistic.
    3. Determine the P-value (df = (rows-1)(cols-1)).
    4. Interpret the result. Lower P-values indicate a more statistically significant result, that is, a result that is less likely to be the result of random chance alone.

    There are, however, a few little details to keep in mind.

    BACK

    This page is maintained by Randall Pruim. Please email comments, corrections, suggestions and the like to rpruim@calvin.edu.

    Last Modified: Thursday, 11-Jan-2001 16:02:48 EST