height (male 11) N(146cm,8cm) = N(57.5,3.15) weight (male 18-24) N(162,29.1) weight (femail 18-24) N(134,27) 1994 SAT mean Verbal = 423, 7% above 600, 42% below 400 approx s.d. 120 now renormalized to N(500,100) what percent score 800? 1987 CA women's salaries mean=11,600; s.d.=10,500 Tuesday, January 11 Topic: More Pictures of Data Topic: Relationships between Categorical Variables Topic: Chi-Squared Read: Utts 9 Due: HW #3 @ hw02.shtml Vocab: pie chart, bar chart, pictogram, line graph, scatter plot make some with survey 1 data Common problems with plots, graphs, and pictures 1) missing labels 2) scale doesn't start at 0 3) changes in labeling along an axis 4) misleading units 5) poor information Picture Checklist overall impression 1) is message clear? 2) is purpose clear? 10) is there any clutter? source 3) is source given? 4) is source reliable? labeling 5) is labeling clear? 6) do axes start at 0? 7) is scale constant? 8) are there any breaks along axis? are they easy to spot? 9) was inflation adjustment made? Banner chart and follow-up letters Utts, Figure 9.9 (page 149) and fixed version Read: Utts 12 Act: Golf Balls in the Yard @ data/golfballs.shtml Vocab: contingency table, cell, row, column, conditional percentage, %% rate, test statistic, chi-sqaured statistic, p-value, %% statistical significance, proportion, odds, relative risk, %% odds ratio, Simpson's paradox %% Physician's Health Study data attack no att. total rate/1000 Aspirin 104 10,933 11,037 9.4 Placebo 189 10,845 11,034 17.1 Total 293 21,778 22,071 Question of the day: why is this data so compelling? significance: how unusual is this? (chi-squared) magnitude: how big is this? Golf ball distribution and test statistics 4-sided dice, computer simulation Chi-squared statistic (on golf ball data again) what should we expect if there is no association? how can we adjust our measurement to account for sample size? Hugo -- 4 times in 12 rolls; how unusual is that? have students roll 12 dice several times and count number of 6's rolled (work in pairs) Return to Physician's Health Survey significance: do chi squared P-value: interpretted the same for all statistical tests! chi-sqared table (degrees of freedom) magnitude: relative risk percentage having trait = (# with trait / total #) * (100%) proportion having trait = (# with trait / total #) i.e. probablility written as decimal risk of having trait = # with trait / total # odds of having trait = # with / # without to 1 = # with to # without odds against trait = # without / # with to 1 = # without to # with relative risk: = one risk / other risk increased risk: = change / original (* 100%) Misrepresenting risk 1) no baseline risk given 2) no time period given 3) unclear population (may not apply to you) Simpson's Parodox hospital example (Utts chapter 12, pages 213-215) give combined results first, then separate survive die s rate d rate standard 505 595 .46 .54 new 195 905 .18 .82 total 700 1500 standard 5 95 .05 .95 new 100 900 .10 .90 total 105 995 discrimination example (Utts, chapter 12, pages 215-217) ?? Berkeley admissions (Utts page 221, exercise 14) video clip -- FAPP #10 2:04:45 -- 2:08:40 [maybe longer] death penalty 326 cases, white defendant: 19/160 get death pen. (.119) black defendant: 17/166 get death pen. (.102) when separated by victim's race, see different story [overhead from Moore 207] point: statistically significant means that the effect is not likely to be due to chance alone, but there may be some other factor than the obvious one that is reason