Monday, January 10 Topic: Statistical Summaries Topic: Distributions Read: Utts 7 Read: Utts 8 Act: How Many Raisins? Due: HW #2 @ hw01.shtml Vocab: mean, median, mode, outlier, range, stemplot, histogram, %% shape, symmetric, bell-shaped, unimodal, bimodal, skewed, %% five-number summary, quartile, boxplot, interquartile range, %% variance, standard deviation, frequency curve, normal curve, %% proportion, percentile, standardized score, z-score, %% standard normal distribution, "68-95-99.7 Rule" Where are we now? many problems with statistical studies are not mathematical 7 critical components Pam Plantinga left job because she was told what to find you can't do good statistics unless you start with good data individual issues four issues: validity, reliability, bias, variablility measurement must decide what and how to measure -- not always easy ask about any problems with survey 1 wording dealing with people is especially hard do you own stock cartoon proxies (validity/reliability trade-off) sampling/assignment issues good samples are representative and large enough 1/root n rule experiment vs. observation, the role of treatment randomness used to reduce bias moving into a phase of "what do we do with all this data?" but first ... ... Ethics of experiments informed consent use of doctors in physicians health study kids in art experiment human subjects & review boards Stanley Milgram (Yale): shock and memory done 1960's, probably not doable today Penny's data collection in grad school risk: cost/benefit analysis reasonable hope, reasonable doubt criteria for clinical trials (did friday) Some specific examples and issues Nazi data give them article (2 versions) and then discuss from Bouma's class Yes No good use 6 2 (1 of 2 struggled) criticism 5 3 twins studies -- ideal matched pairs? PHS used only middle-aged men, what about women? minorities 1 in 5 men has heart attack before age 65 1 in 17 women has heart attack before age 65 (did friday) AIDS and slow process of clinical trials measuring easier but less reliable things pressure to release drugs before effectiveness demonstrated (mentioned friday, mention again) domestic violence: warn and release or arrest can a randomized experiment be done? [no informed consent] Raisins & entry of data not yet enterred from survey 1 Looking at the data: stemplots, histograms choosing bin-size subdividing stems Measures of center -- what is a typical value? mean, median, mode what is unusual? outliers Measures of spread range, standard deviation Five-number summary & boxplots Some shape descriptions symmetric, skewed, bell-shaped, unimodal, bimodal Intro to frequency curves and cummulative probability (proportions) generalization of 5-number summary deciles, percentiles from standard tests Normal distributions symmetric, bell-shaped, determined by mean and standard dev. Empirical Rule: 68-95-99.7 standardized scores (z-scores) charts and computers to get other values (chart on page 137) examples of approx. normal distributions height (male 18-74) N(5'9",3") = N(69,3) height (female 18-74) N(5'3.5",2.5")= N(63.5,2.5) height (male 18-24) N(5'10",3") = N(70,2.8) height (female 18-24) N(5'4.3",2.6")= N(64.3,2.6) height (male 11) N(146cm,8cm) = N(57.5,3.15) weight (male 18-24) N(162,29.1) weight (femail 18-24) N(134,27) 1994 SAT mean Verbal = 423, 7% above 600, 42% below 400 approx s.d. 120 now renormalized to N(500,100) what percent score 800? 1987 CA women's salaries mean=11,600; s.d.=10,500