Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 6, Section 3



  1. What are the concerns that go into the deciding of a pre-arranged significance level a that serves as the dividing line between rejecting and not rejecting the null hypothesis?

    The authors list several:

    There is no “magic number” a for which you should doubt the null hypothesis if the P value is below that level and not doubt it if P is above it. Consider it to be more a “continuum of doubt”, where you doubt the truth of the null hypothesis more for lower P values. Still, people have decisions to make, and generally invent some boundary a for the P value that dictates which choice is made.
  2. What do the authors mean by the phrase “statistical significance is not the same as practical significance”?

    A statistically significant result, simply put, is one that makes you doubt the location of the parameter as it was proposed in the null hypothesis. While a statistically significant result will lead you to reject the null hypothesis in favor of the alternative, it is possible that the true parameter is still so close to the proposed one that it's not likely to be of concern to anyone. For example, you may flip a coin 17000 times and get a proportion of “tails” equal to 0.51. If you take as your null hypothesis H0: p = 0.5, you can show (and I encourage you to try this) that this null hypothesis can be rejected at the 1% level even if you take as your alternative hypothesis the two-sided version Ha: p ¹ 0.5. Nevertheless, the same result would occur if you took null and alternative hypotheses H0: p = 0.52 and H0: p ¹ 0.52, leading us to believe that the true proportion of tails that result from flips of this coin is somewhere between 0.5 and 0.52 (but not equal to either of these). While this conclusions translates into a belief that the coin is unfair, it is not so unfair that many of us would be dismayed.

  3. The paragraph that begins with “Tests of significance ...” on p. 479 is very revealing, both as to the context of the inference procedures we have learned, and as to how much of the iceberg that is statistical inference still lies below the surface (never to be studied in this course). Try to summarize and/or expound on this paragraph.

    The inference procedures we have learned rely heavily on our ability to evaluate probabilities for various probabilistic models (we can do this for three types of models, namely binomial, normal and student t), as well on these models being valid for the settings in which we are working (that is, you wouldn't want to use a binomial model if a t distribution was called for, etc.). While the central limit theorem tells us a normal distribution is often appropriate (and there are similar justifications for the use of binomial and t distributions) all of these rely upon the sample from which we collect our data being a true SRS. If we collect data from a sample that is not an SRS, suddenly we must begin to ask: is any of these models appropriate? If not, just what is the correct model and how do we assess probabiities from it? As this is an introductory-level course, we will not get in to these more complicated questions, but you should know that things do, indeed, become much more intricate very quickly.

  4. We have discussed how to interpret a confidence interval - that a level C confidence interval has probability C of containing the population parameter in question. How should one interpret the P value that one computes in a significance test?

    The P value tells you the probability of getting a random sample of size n whose sample statistic is as extreme as the one computed from your sample (as extreme in one direction if the alternative hypothesis is one-sided; as extreme in either direction if a two-sided alternative hypothesis) if the center of the sampling distribution is as has been proposed in the null hypothesis.

  5. Given this interpretation of a test of significance, why would it not be surprising that if you took a sample and, on each unit, measured 107 variables, possibly one or more of them would have an average that was significant at the 1% level when compared to the corresponding general population mean? Should this make you doubt the value of the population mean in this case?

    Each sample probably has its quirks - its own way in which it does not represent the population at large. If you investigate just one variable and find it significant at the 1% level, the is still a small chance that the population mean is where you thought it was and that, with regards to this specific variable, the random sample you took was not representative - in fact, this can happen as often as 1% of the time when you use a significance level of 1%. Nevertheless, while you'd have to be pretty lucky (or unlucky) to have randomly chosen such a rare sample in one variable, no such luck is required if you investigate 107 different (independent) variables from that sample. In that case, you would almost expect at least one of them to be significant at the 1% level (just as you might expect that if you rolled a 100-sided die 107 times, you would probably get a `1' at least once).

    If, in one of the 107 variables, you get significance, this is not cause for drawing any conclusion about the null hypothesis for the mean of that variable. Nevertheless, it might flag an area for further investigation; that is, you might want to draw a new sample and investigate just this particular variable.