Math 143 C/E, Spring 2001

Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 2, Section 4

Figure 2.18(b) is a residual plot for the fitted line in Figure 2.18(a). Is there any way to ``visualize" (b) looking only at (a)?
Yes. The simplest answer would be that you can rotate the fitted line in (a) until it is horizontal, then slide it down until it is on top of the horizontal axis, and you now see (b) (albeit not quite as magnified as the plot (b) has been depicted in your text). Unfortunately, the simplest answer is wrong. It's somewhat close to being correct, however. The thing that is wrong about it is that the residuals in (a), lengths along vertical line segments, after rotation would no longer be vertical distances, which is what they ought to be in the residual plot. Here is how to make the idea correct. Think of each observed dot on the scatterplot as being a metal ball connected to the regression line via a vertical string. Balls above the line are held in place by a magnet that is pulling them upwards, while balls below the line are held taut on their strings by a magnet pulling them downward. As you rotate the line in (a) and make it the horizontal axis in (b), the balls are still held taut by their respective magnets. In other words, the strings that connect the balls to the line are always vertical throughout the rotation process.

Suppose that a certain scatterplot reveals what appears to be a linear relationship between the explanatory and response variables. Explain the difference between outliers and influential observations. How might you determine which observations are influential and which ones are not?
An influential observation is usually an outlier. Often an outlier is detectable from the scatterplot and regression line because you can see a large residual associated with that particular observation. Unfortunately, some of the most influential outliers may not stand out in this way. For example, if you look at Figure 2.23 (p. 161) in your text, ``Child 19" clearly stands out as an outlier because of its large residual, but ``Child 18" does not stand out in this way. It is clear that ``Child 18" is a little removed from the other data points, and it is said to be influential because its presence plays a large role in determining the regression line - its removal from the data set will yield a very different result (see Figure 2.25, p. 162). This suggests a way for finding influential points: one by one, remove points from the data set (keeping all of the other (n-1) points) and perform the regression. If the regression line for all n points is quite different than with a particular point removed, that point is influential.
Note: That certain data points can be so influential in determining the regression line is what we mean when we say that the regression line is not resistant.

In the blue box on p. 166, the authors emphasize that ``association does not imply causation". In what situations would you be fairly safe in saying that association does imply causation?
You are fairly safe in saying this when the data collected comes from a controlled randomized experiment - one that selects an SRS of experimental units (though, even if this is not done, you can still generally draw conclusions about causation at least for members of the population that are like these units), assigns them randomly to treatment groups (to even out any variables not being considered), includes a fairly large number n of units, and makes every effort to avoid biases of the type to which experiments are prone.