Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 2, Section 4
Yes. The simplest answer would be that you can rotate the fitted line in (a) until it is horizontal, then slide it down until it is on top of the horizontal axis, and you now see (b) (albeit not quite as magnified as the plot (b) has been depicted in your text). Unfortunately, the simplest answer is wrong. It's somewhat close to being correct, however. The thing that is wrong about it is that the residuals in (a), lengths along vertical line segments, after rotation would no longer be vertical distances, which is what they ought to be in the residual plot. Here is how to make the idea correct. Think of each observed dot on the scatterplot as being a metal ball connected to the regression line via a vertical string. Balls above the line are held in place by a magnet that is pulling them upwards, while balls below the line are held taut on their strings by a magnet pulling them downward. As you rotate the line in (a) and make it the horizontal axis in (b), the balls are still held taut by their respective magnets. In other words, the strings that connect the balls to the line are always vertical throughout the rotation process.
An influential observation is usually an outlier.
Often an outlier is detectable from the scatterplot and
regression line because you can see a large residual
associated with that particular observation. Unfortunately,
some of the most influential outliers may not stand out
in this way. For example, if you look at Figure 2.23
(p. 161) in your text, ``Child 19" clearly stands out
as an outlier because of its large residual, but ``Child 18"
does not stand out in this way. It is clear that ``Child 18"
is a little removed from the other data points, and it is
said to be influential because its presence plays a large
role in determining the regression line - its removal
from the data set will yield a very different result (see
Figure 2.25, p. 162). This suggests a way for finding
influential points: one by one, remove points from the
data set (keeping all of the other (n-1) points)
and perform the regression. If the regression line for
all n points is quite different than with a particular
point removed, that point is influential.
Note: That certain data points can be so influential in determining the regression line is what we mean when we say that the regression line is not resistant.
You are fairly safe in saying this when the data collected comes from a controlled randomized experiment - one that selects an SRS of experimental units (though, even if this is not done, you can still generally draw conclusions about causation at least for members of the population that are like these units), assigns them randomly to treatment groups (to even out any variables not being considered), includes a fairly large number n of units, and makes every effort to avoid biases of the type to which experiments are prone.