Against All Odds: Inside Statistics
(c) 1989, COMAP, ASA, Am Society for Quality Control
26 half-hour segments hosted by Teresa Amabile (Brandeis)
Program 1: What is Statistics?
comments:
01:30 Creativity and Rewards (kids art)
issues: measurement, random assignment, blindness
03:30 Inference Questions
04:20 Intro: Statistics -> decisions
04:50 Dominoes Pizza example
how to introduce deep dish pizza (recipe, advertizing, etc)
12:00 "No fun to play if you don't keep score"
12:15 statistics is a science, it's our guide to answering the question
"How do you know?"
part of all sciences, fundamental to all sciences
a way of taming uncertainty, of turning raw data into
arguments that resolve profound questions
cloud vs. clarify (hint at misuses of stats)
honest, verifiable arguments that puts reason back in charge
13:15 overview: describe data, collect data, infer from data (my words)
Describe data
14:00 lightning
14:40 growth
15:30 manatees
16:05 baseball
16:50 Produce data
17:00 Samples, Experiment, Probability
17:58 aspirin and doctors
18:45 potato chips
19:25 polls (Dukakis/Bush quotes about polls)
20:39 space shuttle (probability that all parts are good...)
21:31 casino: distribution of outcomes is predictable
22:10 Drawing conclusions
23:00 "What are the assumptions that go into the building of this
picture that they're trying to paint for you?"
23:15 batteries: 95% confidence interval for life of battery
24:05 literature (authorship) and Salem witches
25:00 "can only rule out, can't rule in"
26:00 "There you have it -- the big picture"
1) make best use of what you have
2) gather info you don't have
3) draw reliable conclusions from the data
26:20 results: rewards hinder creativity
Program 2: Picturing Distributions
31:40 start
32:00 Tufte's book on diplaying data
32:40 distribution: overall pattern in a set of observations
33:30 histograms (lightning example)
41:00 center, spread, outliers
43:30 TV shows
48:20 action vs sitcom (skewed distributions, median)
50:00 rules for making histograms (traffic example; choose bin size)
equal widths
reasonable widths: picture that most clearly tells the story
of the data
52:00 Payments to Hospitals (distribution of hospital stays)
54:20 Stem plots (historechtomy example)
56:00 back to back stem plots (male vs. female doctors)
note: video doesn't acknowledge possbile ways to do
back to back histograms
Program 4: Normal Distributions
0:00 what do these blocks say to you? (represent distribution)
1:30 old lady bowlers
2:45 10 workers support 2 retirees, ...
5:20 look at age demographics via histograms
6:10 smoothed histograms
6:35 density curve
7:30 median and mean of density curve
8:20 skewing and mean and median
8:30 symmetric distribution: mean = median
9:00 bell shape = normal curve
9:30 normal distributions are all around us [home video]
12:00 what makes the normal curves so special mathematically?
13:45 Boston beanstalks
15:55 estimated % of people eligible for beanstalk club
17:05 68-95-99.7 rule
18:05 only tallest 2-3% of women are elligible
18:15 biologist studying .400 hitter
19:45 .400 hitting is the right tail of a normal distribution
23:30 using standardization to compare batting averages
Ty Cobb (.420-.266)/.037 = 4.16 [1911]
Ted Williams (.406-.267)/.033 = 4.21 [1941]
George Brett (.390-.261)/.032 = 4.03 [1980]
24:55 wrap-up
Program 5: Normal Calculations
0:00 fashions and height (teressa is short)
1:40 normalization (she's 61.7; mean=65.5; sd=2.5)
3:00 using table to find P(Z < -1.52) = 0.064
4:00 comparing teressa to her daughter chris
5:20 GM tests on vehicles
9:20 NOx and normal calculations
10:35 testing prototypes
12:10 moderately high cholestorol (normal caculations for intervals)
??:?? army measurements and distributions
21:00 figure out sizing for new helmet
23:00 normal quantile plots
Program 12: Experimntal Design
0:00 grandpa phillip and anectdotal evidence [whiskey and raw egg]
0:30 use available data if available
1:00 intro to observation/experimentation
1:45 observing lobsters
4:20 observation -> hypothesis -> test
4:50 treatments = doing something to subjcts
5:15 Physicians health study
7:40 double blindness
8:40 why use doctors
9:10 a poor design -- no control group, no blindness
9:45 confounding factors
10:10 why use placebos -- placebo effect
10:35 results (104 vs 189 heart attacks; 47% reduction)
11:00 study stopped early
12:20 biased group assignments (ribavirin and aids)
13:40 random assignment to groups
14:30 random digts (drawn from a hat)
16:15 domestic violence example
21:15 follow-up data gathering
22:25 tough to design a good experiment
22:40 fictiteous example of poor design
24:45 good features
random assignment
comparison
double blindness
sample size
25:50 end
Program 13: Blocking and Sampling
comments: strawberry segments too slow, not much stats
sampling parts better than blocking parts
01:35 Intro: laundry and water temp
subjects, systematic differences, blocking
04:45 strawberries (raising new varieties)
08:10 randomized complete block
11:55 multiple factor experiments
13:00 census
14:50 undercount problems
16:40 300 000 census takers
18:50 sampling (population/sample; Hite report)
20:45 Lays potato chip
26:30 end
Program 14: Samples and Surveys
comments: pretty good video, but skip 35:20-41:10
31:50 Intro
Statistic vs Parameter
34:00 Stratefied Random Sampling
35:20 Rec fishing and fish populations [skip]
39:00 Literary Digest bad election prediction [skip]
40:00 bias
41:10 Mistakes to avoid (examples demoed with "man on street")
41:48 NORC @ U Chicago (GSS since 1972)
45:00 "Think about" to test out questions
51:05 Sampling distributions and approximation
samples of 50 beads from a bin (time lapse photography)
57:28 end
Program 15: What is Probability?
0:00 gambling with dice; Fermat & Pascal
1:10 what exactly is randomness? (gravity vs coin tosses)
2:55 what do you most fear could harm or kill you?
4:10 car travel prob -- injury: 1 in 100,000 trips; death: 1 in 4 million
4:55 difficult to analyze how experience might mislead our estimates
of probabilities
6:00 asking teanagers about risk assessment (surveys)
7:35 statistics "summarizes experience in a way that will help people
to make decisions"
7:50 teressa flipping coins
9:10 persi diaconnis
10:05 spent years studying ...
possible to make things quite random (if you do them vigorously)
but we usually don't do them vigorously
clip of him flipping heads each time
12:25 probability is about an observers' knowledge, not about...
12:50 finding pattern in data by taking averages
13:30 outcomes, sample space, event [dice example]
15:40 1st two probability rules [about roles of 0 and 1]
16:10 traffic in NYC (probabilistic simulation)
22:10 let's say we're traffic engineers (6 types of drivers)
23:20 calculation of probabilities (mutually exclusive events)
Program 16: Random Variables
0:00 disappointed friends and gender of baby
1:15 determining independence can be tricky [stand and deliver]
2:25 challenger explosion
4:20 nasa had inadaquate procedures for analyzing risks
5:15 Prob(failure of 1 joint) = .023 [but six such]
5:45 multiplication rule
6:10 applied to coin tosses
6:30 applied to field joints
7:40 engineering uses rudundancy to reduce risk
but in this case, redundancy was not necessarily independent
9:10
10:05 shuttles are complex; has a rel. high prob of failure
OK, but should be acknowledged
going to be catestrophic failures every five to ten years
10:50 using the multiplication rule (independence vs disjoint)
11:50 here are the rules (addition, multiplication)
12:20 most interested in numerical outcomes to random phenomena
12:45 X = number of heads (discrete rv)
13:30 weight of babies (continuous rv)
13:50 a little quiz (discrete vs continuous)
14:25 points per game example (NBA)
15:10 "information about what has occured in the past can be used to
assess probablity in the future"
15:40 what about continous random variables (density curve and area)
16:50 dangerous rv (earthquakes)
17:45 man on street interviews about prep for earthquakes
18:10 don't know enough to predict earthquakes [not enough data]
19:00 Parkfield [small town as earthquake lab]
19:35 time interval viewed as normal distribution
mean= 22; sd =3
21:55 planning for earthquakes in CA based on probability
22:30 calculating mean of discrete rv
23:30 formula for mean of discrete rv
24:00 finding variance (and st dev) of discrete rv
Program 23: Inference for Proportions
01:40 Woburn Leukemia Intro
02:45 Statistic vs Parameter
03:00 Bureau of Labor Statistics
06:20 band of certainty (about 0.19 wide for BLS)
07:00 "The data are never good enuf for the uses that people want to make
of them because we are not oversampling ... for the national
numbers."
08:00 example computations (assuming familiarity with p-val, z-score, etc)
12:00 back to Woburn example
20:15 Salem (1692 witches)
24:00 example (pooled estimate for p)
result: no sig dif b/w men's and women's conviction rates
note: sample size small, so would take a large dif to be sig.
Program 24: Inference for 2-way tables
32:30 Intro: Sick cat and 3 drugs
35:00 fossil teeth (scratches vs pits; primate family tree)
39:25 worked out example of chi-square
44:20 breast cancer: treatment vs. age
47:30 grouping matters (categorical age variables)
49:00 worked out example of 3x2 table example
52:00 size and strength of difference vs significance
52:45 Mendel: did he cheat? (His data were "too good".)
example: categorizing corn kernals
57:40 Moral: Be careful about how you assign categories