A case study: Self-concept Data for 7th graders
This data set includes information on 78 7th graders from a rural Midwestern school. For each student we have their GPA (on a 10 pt scale), their IQ (based on an IQ test), their sex, their age, and their scores on the Piers-Harris Children's self-concept test. The self-concept test has an overall score as well as scores in each of the following categories: behavior, school status, physical appearance, anxiety, popularity, and happiness.
Here are some questions we could ask (and procedures to get statistical answers):
1. What is the typical IQ (GPA, concept test score)? [1-sample t]
For example, do these students have IQs significantly different from 100 (the standardized mean for many IQ tests)?
2. Do boys and girls have significantly different GPAs (or IQs or self-concepts)? [2-sample t]
3. Is there a relationship between the age of the student and GPA (or IQ or self-concept)? [ANOVA]
For example, perhaps older students (maybe held back) have lower self-concepts.
Note: Since there are only 3 or 4 ages (because they are given only by years), we can treat age as a categorical variable. What we would really like for this is a better variable than age in years, since the month of the kids' birthdays could matter for this. Another possibility would be a categorical variable that indicated which students had been "held back" (in kindergarten or 1st grade), repeated a grade, started early, or "on-schedule."
If we divide into only two age groups, then we could use 2-sample t procedures.
4. Is there a relationship between IQ and GPA? between GPA and self-concept? between IQ and self-concept? [correlation coefficient, regression]
For example, perhaps students with high GPAs also have higher self-concepts (at least on some components of the self-concept test).
5. Are boys or girls more likely to be older? [Chi-square]
Perhaps more boys than girls are held back for various reasons.
See notes above about using age as a categorical variable. We could do 2-proportion tests if we divided age into only two categories.
Of course, other questions are possible, but these questions will give us a chance to see many of the statistical procedures we have used this semester.
A Note about Multiple Tests from a Single Data Set
To make the description process simple, all the examples will come from the same data set, but be aware that running multiple tests on the same data set brings with it a certain difficulty. In particular, at the 0.05 level, we will expect 1 test in 20 to be give us a p-value less than 0.05 just due to random chance, even if the null hypothesis is always correct. So if we do lots and lots of tests from the same data set, almost surely some of them will be "significant". If we do lots of tests on a single set of data (data exploration) we should confirm the significant ones by running them on a separate set of data. Alternatively, we could require a smaller p-value. For this reason, good statistical researchers have their questions in mind before they collect and analyze their data. Besides, if you know your question, you can design your data collection appropriately for that question. (See Q3 above, for example.)
Q1: What is a typical IQ? Is it significantly different from 100?
One-Sample T: IQ
Test of mu = 100 vs mu not = 100
Variable N Mean StDev SE Mean
IQ 78 108.92 13.17 1.49
Variable 95.0% CI T P
IQ ( 105.95, 111.89) 5.98 0.000
Conclusion: There is statististically significant evidence (t = 5.98, df=77, P<.001) that the mean IQ of 7th graders in this district is higher than 100. If this particular IQ test has a national mean of 100 (I couldn't find out if this is the case or not), this would indicate that these students have somewhat higher IQs than the population at large.
Note: The output above does not use a pooled estimate for standard deviation, but we probably could have. The standard deviations of the samples were similar, and it is not unreasonable to expect that boys and girls have similar amounts of spread.
Q2: Sex and self-concept
Two-Sample T-Test and CI: self-concept, sex
Two-sample T for self-concept
sex N Mean StDev SE Mean
female 31 55.5 12.7 2.3
male 47 57.9 12.3 1.8
Difference = mu (female) - mu (male )
Estimate for difference: -2.40
95% CI for difference: (-8.19, 3.39)
T-Test of difference = 0 (vs not =): T-Value = -0.83 P-Value = 0.411 DF = 62
Conclusion:
Although the boys in our sample have slightly higher self-concept scores on average, this difference is not statistically significant (t=0.83, df=62, p=0.411). Note that the 95% confidence interval for the difference includes both positive and negative values.Further note: We could use side-by-side stemplots, or histograms, or boxplots to compare the distributions to see if they have similar shapes as well as similar means and standard deviations.
Q3: Age and Self-concept
One-way ANOVA: self-concept versus age
Analysis of Variance for self-concept
Source DF SS MS F P
age 3 1190 397 2.75 0.049
Error 74 10673 144
Total 77 11863
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev -+---------+---------+---------+-----
12 27 57.93 14.14 (--*--)
13 45 58.11 10.23 (--*-)
14 5 42.20 14.74 (------*------)
15 1 53.00 0.00 (---------------*---------------)
-+---------+---------+---------+-----
Pooled StDev = 12.01 30 45 60 75
Note that there is only 1 15-year old. Let's combine the 14 and 15-year olds into one category and run the ANOVA again:
One-way ANOVA: self-concept versus age2
Analysis of Variance for self-concept
Source DF SS MS F P
age2 2 1093 546 3.80 0.027
Error 75 10770 144
Total 77 11863
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev --------+---------+---------+--------
12 27 57.93 14.14 (----*-----)
13 45 58.11 10.23 (----*---)
14 or 15 6 44.00 13.90 (-----------*-----------)
--------+---------+---------+--------
Pooled StDev = 11.98 40.0 48.0 56.0

Checking Model Assumptions: The standard deviations obey our 2:1 ratio rule, and normality is a fairly reasonable assumption. We could check histograms or normal quantile plots of residuals to get further evidence of normality.
Conclusion: In both cases, the P-value is less than 0.05. So there is a significant association between age and self-concept. In particular, the older students seem to have lower self-concepts, just as we had anticipated might be the case.
Multiple Comparisons: We have concluded that there is evidence that the three groups do not all have the same mean. Now we might like to know which means differ. We can compare any pair of means with a 2-sample t-test, but if we test all the pairs (and there might be a lot), we should require a smaller P-value to conclude significance, since we are making so many tests from the same data. There are several methods for doing this. One such is Tukey's pairwise comparisons:
Tukey's pairwise comparisons
Family error rate = 0.0500
Individual error rate = 0.0194
Critical value = 3.38
Intervals for (column level mean) - (row level mean)
12 13
13 -7.16
6.79
14 or 15 1.00 1.66
26.85 26.56
This produces simultaneous confidence intervals for the difference between each pair. With alpha chosen at 0.05 (family error rate), this says that 95% of the time we run this procedure we should expect ALL of the confidence intervals to contain the true differences, 5% of the time one or more will be incorrect. For example, we see here that the CI for the difference between 12-year olds and 14- or 15-year-olds is (1.00,26.85). Since this does not include 0, we conclude that the difference is significant (the older kids have lower self-concept scores on average.) We draw the same conclusion comparing 13-year-olds with the older kids, but there is not a significant difference between the 12-year-olds and the 13-year-olds.
Note that these conclusions make sense given the confidence interval graphic included with the original ANOVA output. Those intervals are simply confidence intervals for each group (using an estimate for standard deviation based on al the data, since the ANOVA assumption is that all three groups have the same standard deviation). They give a good (but less formal) indication of which differences are significant and which not.
Q4: GPA and self-concept
First, let's take a look at a scatter plot. This one has the regression line, confidence interval and prediction interval labeled.

Checking model assumptions: We should have a look at the residuals to see if they appear to be normally distributed.

There are more low residuals (less than -2) than high residuals (above 2), but the pattern is more or less linear.

In this residual vs. fit plot we don't want to see any clear patterns that would indicate that the distribution of residuals varies as the data values vary. This looks good. We could also check residuals vs order of measurement, but that should not be a problem in this case, since the order in which the data is in the data set probably doesn't reflect any particular order that is likely to affect the residuals. (It may well be alphabetical, for example.)
Things look pretty good, so let's run the regression!
Regression Analysis: self-concept versus gpa
The regression equation is
self-concept = 33.1 + 3.20 gpa
Predictor Coef SE Coef T P
Constant 33.109 4.408 7.51 0.000
gpa 3.2032 0.5700 5.62 0.000
S = 10.50 R-Sq = 29.4% R-Sq(adj) = 28.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 3482.7 3482.7 31.59 0.000
Residual Error 76 8380.1 110.3
Total 77 11862.9
The most interesting things to note here: There is a statistically significant positive association between gpa and self-concept scores (T=5.62, df=76, P<0.001, alternatively F(1,76)=31.59, P<0.001). The slope of the regression line is about 3.2, which means that each 1 point increase in letter grade results in about 3.2 additional points on the self-concept test (on average). The 95% CI (using critical value for t with df=76) for this slope is 3.2 +- (2.0)(0.577) = 3.2 +- 1.154 = (2.0,4.4). About 30% (R-squared = 0.294) of the variability in the data is explained by this regression line, so while there is a general tendency for children with higher GPAs to have higher self-concept scores, there is a lot of variability due to other factors, too. The wide prediction interval on the scatter plot indicates this as well. The correlation coefficient is R=0.542 (square root of R-squared).
Q5: Age and Sex
Minitab provides two types of output for Chi-square testing (depending on whether the test is run from the data directly or from a summary of row totals). Here are both kinds of output for this situation:
Tabulated Statistics: age2, sex
Rows: age2 Columns: sex
female male All
12 11 16 27
40.74 59.26 100.00
35.48 34.04 34.62
14.10 20.51 34.62
10.73 16.27 27.00
13 17 28 45
37.78 62.22 100.00
54.84 59.57 57.69
21.79 35.90 57.69
17.88 27.12 45.00
14 or 15 3 3 6
50.00 50.00 100.00
9.68 6.38 7.69
3.85 3.85 7.69
2.38 3.62 6.00
All 31 47 78
39.74 60.26 100.00
100.00 100.00 100.00
39.74 60.26 100.00
31.00 47.00 78.00
Chi-Square = 0.347, DF = 2, P-Value = 0.841
2 cells with expected counts less than 5.0
Cell Contents --
Count
% of Row
% of Col
% of Tbl
Exp Freq
Chi-Square Test: female, male
Expected counts are printed below observed counts
female male Total
12 11 16 27
10.73 16.27
13 17 28 45
17.88 27.12
14 or 15 3 3 6
2.38 3.62
Total 31 47 78
Chi-Sq = 0.007 + 0.004 +
0.044 + 0.029 +
0.159 + 0.105 = 0.347
DF = 2, P-Value = 0.841
2 cells with expected counts less than 5.0
Checking the Chi-square assumptions:
All the expected counts are sufficiently large (at least 1 for each, with an average greater than 5), so use of the Chi-square statistic is permitted.Conclusion: The P-value is quite large. This data does not indicate a significant association between age and gender.