Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 8, Section 2



  1. The data in the table at the top of p. 603 can be thought of as having been collected using questionnaires which asked for two pieces of information (i.e., two variables) from each respondent: the gender of the respondent, and the respondent's status as a binge drinker. A different type of table (than the one on p. 603), called a two-way table, can be constructed to summarize the information collected using such a survey. While you have yet to do any reading regarding two-way tables (and I am not asking you to do any now either), try your hand at constructing one following this brief description:
    Along the top, write column headers for each value of one of the two variables, perhaps “gender”. Along the rows, write down row “headings” for each value of the other variable (this would be “drinking status” if you used “gender” for the columns). You should now have headings for two columns and two rows. Add a third column for “Totals” and similarly a third row for “Totals”. Now fill in the 9 entries of the table with the appropriate counts.
    Compare your result to Table 2.14 on p. 194.

    Female Male Total
    Not a binge drinker 8,232 5,550 13,782
    Binge drinker 1,684 1,630 3,314
    Total 9,916 7,180 17,096
  2. Why does it make more sense when performing a test of significance for the difference in two (independent) proportions to use a pooled estimate (defined at the bottom of p. 604) in the calculation of the spread sD than to use one of the proportions 1 or 2 that arise from the samples?

    One of the things that is different about 2-sample procedures (from their 1-sample counterparts) is that we are working with differences. The null and alternative hypotheses make statements about these differences; usually we hypothesize in the null hypothesis that the difference in population parameters is 0 which translates into saying that the parameter in both populations is the same. So, we've hypothesized that p1 and p2 are equal, but not what value they might (both) be. It seems unfair to give preference to one population over the other by selecting 1 (or 2) as the value to use. Instead, since we're hypothesizing no difference between the two populations anyways, we put them together as if they were one population to get a “pooled” sample proportion.

    Notice that, while we calculate and use a pooled estimate for the population proportion in a test of significance, no calculation is required for a confidence interval on the difference of two (independent) population proportions.

  3. The authors say in several places that the sample sizes n1 and n2 should be large. At the top of p. 606, they remind us why this is necessary: because we are using the normal approximation to distributions that truly are binomial (at least, binomial for the counts X1 and X2). For 1-sample procedures, what was the criterion that had to be met in order to use such a binomial approximation? How has this criterion been modified for 2-sample procedures?

    For 1-sample procedures we required that the expected number of successes and failures be at least 10. In equation form, these requirements were written as


    np ³ 10        and        n(1-p) ³ 10.
    For 2-sample procedures, we have the modified requirements that the number of successes and failures in each population be at least 5. In equation form, these translate as


    np1 ³ 5        and        n(1-p1) ³ 5
    for population 1, and similarly


    np2 ³ 5        and        n(1-p2) ³ 5
    for population 2. Of course, in practical implementation you are not likely to know the (true) population parameters p1 and p2 (just like we didn't know p in 1-sample procedures), and it is worth giving a little thought to how you would satisfy yourself that the above requirements are met.




File translated from TEX by TTH, version 2.87.
On 29 Mar 2001, 09:58.