Math 143 C/E, Spring 2001

Math 143 C/E, Spring 2001
IPS Reading Questions
Chapter 2, Section 1 (pp. 102-117)

At the bottom of p. 104, the authors remind us of three preliminary questions one should ask when looking at a data set. How would you answer these questions for the data in Example 2.3 (p. 106)?
The individuals are the 50 states and the District of Columbia in 1996 (some of this information is shared with us in Problem 1.17). The variables are the mean score on the SAT-verbal exam and the percentage of students taking the exam that year (both quantitative). We might obtain the average score from published information by ETS (the organization that administers the SAT). The percentage might be ascertained by combining data from ETS (the number of students in each taking the test) and census data (particularly, the number of high school juniors that year in each state).

A scatterplot requires two quantitative variables for each individual. If one has, in addition, a third variable for each individual that is categorical, how might this be displayed on a scatterplot?
See Figure 2.2, in which the same data as Figure 2.1 (for just the 21 northeastern and midwestern states) is plotted on a scatterplot and different symbols are used to indicate the third variable, geographical region, of each state.

Which of the relationships in Examples 2.3-2.7 appear to be linear? Can any of them be categorized as positive or negative associations?
A close look at the data in Example 2.3 shows that the relationship is fairly linear, at least for the midwestern states. The association here is a negative one. The data in Example 2.4 appears to be linear as well, with a positive association. The data in Example 2.5 suggests a possible association, but it is not a linear relationship and seems neither positively nor negatively associated. In example the pattern is definitely nonlinear and, again, it would be difficult to characterize the association as positive or negative. The association in Example 2.7 would be practically impossible to detect with the naked eye and, though one might argue that the computer's scatterplot smoother has resulted in something a little reminiscent of a line, it's too much of a stretch to call this a linear relationship. There is, however, somewhat of a negative association.

Under what conditions can we talk about a positive/negative association between a categorical variable and a quantitative one?
Such a label makes sense if the categorical variable as a natural ordering to it. See an example of this in Example 2.8 (pp. 115-116).