Math 143 C/E, Spring 2001

Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 8, Section 2

The data in the table at the top of p. 603 can be thought of as having been collected using questionnaires which asked for two pieces of information (i.e., two variables) from each respondent: the gender of the respondent, and the respondent's status as a binge drinker. A different type of table (than the one on p. 603), called a two-way table, can be constructed to summarize the information collected using such a survey. While you have yet to do any reading regarding two-way tables (and I am not asking you to do any now either), try your hand at constructing one following this brief description:
Along the top, write column headers for each value of one of the two variables, perhaps “gender”. Along the rows, write down row “headings” for each value of the other variable (this would be “drinking status” if you used “gender” for the columns). You should now have headings for two columns and two rows. Add a third column for “Totals” and similarly a third row for “Totals”. Now fill in the 9 entries of the table with the appropriate counts.
Compare your result to Table 2.14 on p. 194.

Female Male Total Not a binge drinker 8,232 5,550 13,782 Binge drinker 1,684 1,630 3,314 Total 9,916 7,180 17,096

Why does it make more sense when performing a test of significance for the difference in two (independent) proportions to use a pooled estimate (defined at the bottom of p. 604) in the calculation of the spread s_D than to use one of the proportions ₁ or ₂ that arise from the samples?
One of the things that is different about 2-sample procedures (from their 1-sample counterparts) is that we are working with differences. The null and alternative hypotheses make statements about these differences; usually we hypothesize in the null hypothesis that the difference in population parameters is 0 which translates into saying that the parameter in both populations is the same. So, we've hypothesized that p₁ and p₂ are equal, but not what value they might (both) be. It seems unfair to give preference to one population over the other by selecting ₁ (or ₂) as the value to use. Instead, since we're hypothesizing no difference between the two populations anyways, we put them together as if they were one population to get a “pooled” sample proportion.
Notice that, while we calculate and use a pooled estimate for the population proportion in a test of significance, no calculation is required for a confidence interval on the difference of two (independent) population proportions.

The authors say in several places that the sample sizes n₁ and n₂ should be large. At the top of p. 606, they remind us why this is necessary: because we are using the normal approximation to distributions that truly are binomial (at least, binomial for the counts X₁ and X₂). For 1-sample procedures, what was the criterion that had to be met in order to use such a binomial approximation? How has this criterion been modified for 2-sample procedures?
For 1-sample procedures we required that the expected number of successes and failures be at least 10. In equation form, these requirements were written as
np ł 10        and        n(1-p) ł 10. For 2-sample procedures, we have the modified requirements that the number of successes and failures in each population be at least 5. In equation form, these translate as
np₁ ł 5        and        n(1-p₁) ł 5 for population 1, and similarly
np₂ ł 5        and        n(1-p₂) ł 5 for population 2. Of course, in practical implementation you are not likely to know the (true) population parameters p₁ and p₂ (just like we didn't know p in 1-sample procedures), and it is worth giving a little thought to how you would satisfy yourself that the above requirements are met.

File translated from T_EX by T_TH, version 2.87.
On 29 Mar 2001, 09:58.