height (male 11) N(146cm,8cm) = N(57.5,3.15)
weight (male 18-24) N(162,29.1)
weight (femail 18-24) N(134,27)
1994 SAT mean Verbal = 423, 7% above 600, 42% below 400
approx s.d. 120
now renormalized to N(500,100)
what percent score 800?
1987 CA women's salaries mean=11,600; s.d.=10,500
Tuesday, January 11
Topic: More Pictures of Data
Topic: Relationships between Categorical Variables
Topic: Chi-Squared
Read: Utts 9
Due: HW #3 @ hw02.shtml
Vocab: pie chart, bar chart, pictogram, line graph, scatter plot
make some with survey 1 data
Common problems with plots, graphs, and pictures
1) missing labels
2) scale doesn't start at 0
3) changes in labeling along an axis
4) misleading units
5) poor information
Picture Checklist
overall impression
1) is message clear?
2) is purpose clear?
10) is there any clutter?
source
3) is source given?
4) is source reliable?
labeling
5) is labeling clear?
6) do axes start at 0?
7) is scale constant?
8) are there any breaks along axis? are they easy to spot?
9) was inflation adjustment made?
Banner chart and follow-up letters
Utts, Figure 9.9 (page 149) and fixed version
Read: Utts 12
Act: Golf Balls in the Yard @ data/golfballs.shtml
Vocab: contingency table, cell, row, column, conditional percentage, %%
rate, test statistic, chi-sqaured statistic, p-value, %%
statistical significance, proportion, odds, relative risk, %%
odds ratio, Simpson's paradox %%
Physician's Health Study data
attack no att. total rate/1000
Aspirin 104 10,933 11,037 9.4
Placebo 189 10,845 11,034 17.1
Total 293 21,778 22,071
Question of the day: why is this data so compelling?
significance: how unusual is this? (chi-squared)
magnitude: how big is this?
Golf ball distribution and test statistics
4-sided dice, computer simulation
Chi-squared statistic (on golf ball data again)
what should we expect if there is no association?
how can we adjust our measurement to account for sample size?
Hugo -- 4 times in 12 rolls; how unusual is that?
have students roll 12 dice several times and count number of
6's rolled (work in pairs)
Return to Physician's Health Survey
significance: do chi squared
P-value: interpretted the same for all statistical tests!
chi-sqared table (degrees of freedom)
magnitude: relative risk
percentage having trait = (# with trait / total #) * (100%)
proportion having trait = (# with trait / total #)
i.e. probablility written as decimal
risk of having trait = # with trait / total #
odds of having trait = # with / # without to 1
= # with to # without
odds against trait = # without / # with to 1
= # without to # with
relative risk: = one risk / other risk
increased risk: = change / original (* 100%)
Misrepresenting risk
1) no baseline risk given
2) no time period given
3) unclear population (may not apply to you)
Simpson's Parodox
hospital example (Utts chapter 12, pages 213-215)
give combined results first, then separate
survive die s rate d rate
standard 505 595 .46 .54
new 195 905 .18 .82
total 700 1500
standard 5 95 .05 .95
new 100 900 .10 .90
total 105 995
discrimination example (Utts, chapter 12, pages 215-217) ??
Berkeley admissions (Utts page 221, exercise 14)
video clip -- FAPP #10 2:04:45 -- 2:08:40 [maybe longer]
death penalty
326 cases, white defendant: 19/160 get death pen. (.119)
black defendant: 17/166 get death pen. (.102)
when separated by victim's race, see different story
[overhead from Moore 207]
point: statistically significant means that the effect is not
likely to be due to chance alone, but there may be
some other factor than the obvious one that is
reason