Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 9, Section 1 (up to ``Beyond the basics", p. 632)
In Section 8.2, we were considering two categorical
variables: one of these variables was in the forefront
and took on two values (which we generically classified
as ``Successes" and ``Failures"); the other categorical
variable was ``Population" which, because we were
considering two populations (women and men, children
trained for 6 months on the piano and children who
weren't, etc.), took on just two values as well. Thus,
a 2-way table of information for problems of Section 8.2
would have been 2-by-2 (2 rows, 2 cols excluding the
ones for ``Totals"). See Reading Discussion Question
number 1 from Section 8.2 to see an example of a
two-way table for a problem from that section.
In fact, the inference procedure (chi-square test) is an alternate way of testing out the null hypothesis that the two populations are the same (which was precisely the null hypothesis used in the significance tests of Section 8.2. The chi-square test is not quite as flexible as the 2-sample z (2-proportion) test we learned, since the alternative hypothesis is always 2-sided. Nevertheless, both should yield the same answers (the same P-value) in 2-sided alternative cases.
Now, in Section 9.1, we're opening up the possibility that our two categorical variables may take on more than two values. This added flexibility will come at the cost of no longer being able to do any sort of confidence interval. The chi-square test, however, provides us the tools to perform a test of significance. If we follow the convention of placing the values of our explanatory (`population'-like) variable at the heads of columns, then the columns will give conditional distributions that correspond to samples from the populations being considered. Our null hypothesis will always be that there is no real difference in the distributions of these populations. Differences may, indeed, appear in our (column) conditional distributions, but that isn't too surprising since this data comes from a random sample (or random samples). The main question of the test of significance for 2-way tables is whether the lack of same-ness between our sampled distributions is strong enough evidence to make us doubt the same-ness of the populations they represent.
It will be pretty accurate for 2-by-2 tables when the expected count of each cell is at least 5. For larger tables, we should be safe if <ul> <li>no cell has an expected count less than 1, and</li> <li>the average of the expected counts (for all cells) is at least 5.</li> </ul>
Gender Smoking Status Female Male Total Non-smoker 261 Smoker 37 Total 175 123
First, the answers:
Distributions being equal doesn't mean that counts
will be, since more women (175) were included in the
sample than men (123). That's the first lesson.
It does mean, however, that proportions will be equal. Overall,
the proportion of non-smokers to the total is 261/298,
or approximately 87.58%. If the two population
distributions are the same, we would expect this
percentage of females and this percentage of males to
be non-smokers, which means that the count of females
would be 87.58% of the 175 questioned, or
Gender Smoking Status Female Male Total Non-smoker 153.27 107.73 261 Smoker 21.73 15.27 37 Total 175 123 298
|
|