Monday, January 10
Topic: Statistical Summaries
Topic: Distributions
Read: Utts 7
Read: Utts 8
Act: How Many Raisins?
Due: HW #2 @ hw01.shtml
Vocab:
mean, median, mode, outlier, range, stemplot, histogram, %%
shape, symmetric, bell-shaped, unimodal, bimodal, skewed, %%
five-number summary, quartile, boxplot, interquartile range, %%
variance, standard deviation, frequency curve, normal curve, %%
proportion, percentile, standardized score, z-score, %%
standard normal distribution, "68-95-99.7 Rule"
Where are we now?
many problems with statistical studies are not mathematical
7 critical components
Pam Plantinga left job because she was told what to find
you can't do good statistics unless you start with good data
individual issues
four issues: validity, reliability, bias, variablility
measurement
must decide what and how to measure -- not always easy
ask about any problems with survey 1
wording
dealing with people is especially hard
do you own stock cartoon
proxies (validity/reliability trade-off)
sampling/assignment issues
good samples are representative and large enough
1/root n rule
experiment vs. observation, the role of treatment
randomness used to reduce bias
moving into a phase of "what do we do with all this data?"
but first ...
... Ethics of experiments
informed consent
use of doctors in physicians health study
kids in art experiment
human subjects & review boards
Stanley Milgram (Yale): shock and memory
done 1960's, probably not doable today
Penny's data collection in grad school
risk: cost/benefit analysis
reasonable hope, reasonable doubt criteria for clinical trials
(did friday)
Some specific examples and issues
Nazi data
give them article (2 versions) and then discuss
from Bouma's class
Yes No
good use 6 2 (1 of 2 struggled)
criticism 5 3
twins studies -- ideal matched pairs?
PHS used only middle-aged men, what about women? minorities
1 in 5 men has heart attack before age 65
1 in 17 women has heart attack before age 65
(did friday)
AIDS and slow process of clinical trials
measuring easier but less reliable things
pressure to release drugs before effectiveness demonstrated
(mentioned friday, mention again)
domestic violence: warn and release or arrest
can a randomized experiment be done? [no informed consent]
Raisins & entry of data not yet enterred from survey 1
Looking at the data: stemplots, histograms
choosing bin-size
subdividing stems
Measures of center -- what is a typical value? mean, median, mode
what is unusual? outliers
Measures of spread
range, standard deviation
Five-number summary & boxplots
Some shape descriptions
symmetric, skewed, bell-shaped, unimodal, bimodal
Intro to frequency curves and cummulative probability (proportions)
generalization of 5-number summary
deciles, percentiles from standard tests
Normal distributions
symmetric, bell-shaped, determined by mean and standard dev.
Empirical Rule: 68-95-99.7
standardized scores (z-scores)
charts and computers to get other values (chart on page 137)
examples of approx. normal distributions
height (male 18-74) N(5'9",3") = N(69,3)
height (female 18-74) N(5'3.5",2.5")= N(63.5,2.5)
height (male 18-24) N(5'10",3") = N(70,2.8)
height (female 18-24) N(5'4.3",2.6")= N(64.3,2.6)
height (male 11) N(146cm,8cm) = N(57.5,3.15)
weight (male 18-24) N(162,29.1)
weight (femail 18-24) N(134,27)
1994 SAT mean Verbal = 423, 7% above 600, 42% below 400
approx s.d. 120
now renormalized to N(500,100)
what percent score 800?
1987 CA women's salaries mean=11,600; s.d.=10,500