Against All Odds: Inside Statistics (c) 1989, COMAP, ASA, Am Society for Quality Control 26 half-hour segments hosted by Teresa Amabile (Brandeis) Program 1: What is Statistics? comments: 01:30 Creativity and Rewards (kids art) issues: measurement, random assignment, blindness 03:30 Inference Questions 04:20 Intro: Statistics -> decisions 04:50 Dominoes Pizza example how to introduce deep dish pizza (recipe, advertizing, etc) 12:00 "No fun to play if you don't keep score" 12:15 statistics is a science, it's our guide to answering the question "How do you know?" part of all sciences, fundamental to all sciences a way of taming uncertainty, of turning raw data into arguments that resolve profound questions cloud vs. clarify (hint at misuses of stats) honest, verifiable arguments that puts reason back in charge 13:15 overview: describe data, collect data, infer from data (my words) Describe data 14:00 lightning 14:40 growth 15:30 manatees 16:05 baseball 16:50 Produce data 17:00 Samples, Experiment, Probability 17:58 aspirin and doctors 18:45 potato chips 19:25 polls (Dukakis/Bush quotes about polls) 20:39 space shuttle (probability that all parts are good...) 21:31 casino: distribution of outcomes is predictable 22:10 Drawing conclusions 23:00 "What are the assumptions that go into the building of this picture that they're trying to paint for you?" 23:15 batteries: 95% confidence interval for life of battery 24:05 literature (authorship) and Salem witches 25:00 "can only rule out, can't rule in" 26:00 "There you have it -- the big picture" 1) make best use of what you have 2) gather info you don't have 3) draw reliable conclusions from the data 26:20 results: rewards hinder creativity Program 2: Picturing Distributions 31:40 start 32:00 Tufte's book on diplaying data 32:40 distribution: overall pattern in a set of observations 33:30 histograms (lightning example) 41:00 center, spread, outliers 43:30 TV shows 48:20 action vs sitcom (skewed distributions, median) 50:00 rules for making histograms (traffic example; choose bin size) equal widths reasonable widths: picture that most clearly tells the story of the data 52:00 Payments to Hospitals (distribution of hospital stays) 54:20 Stem plots (historechtomy example) 56:00 back to back stem plots (male vs. female doctors) note: video doesn't acknowledge possbile ways to do back to back histograms Program 4: Normal Distributions 0:00 what do these blocks say to you? (represent distribution) 1:30 old lady bowlers 2:45 10 workers support 2 retirees, ... 5:20 look at age demographics via histograms 6:10 smoothed histograms 6:35 density curve 7:30 median and mean of density curve 8:20 skewing and mean and median 8:30 symmetric distribution: mean = median 9:00 bell shape = normal curve 9:30 normal distributions are all around us [home video] 12:00 what makes the normal curves so special mathematically? 13:45 Boston beanstalks 15:55 estimated % of people eligible for beanstalk club 17:05 68-95-99.7 rule 18:05 only tallest 2-3% of women are elligible 18:15 biologist studying .400 hitter 19:45 .400 hitting is the right tail of a normal distribution 23:30 using standardization to compare batting averages Ty Cobb (.420-.266)/.037 = 4.16 [1911] Ted Williams (.406-.267)/.033 = 4.21 [1941] George Brett (.390-.261)/.032 = 4.03 [1980] 24:55 wrap-up Program 5: Normal Calculations 0:00 fashions and height (teressa is short) 1:40 normalization (she's 61.7; mean=65.5; sd=2.5) 3:00 using table to find P(Z < -1.52) = 0.064 4:00 comparing teressa to her daughter chris 5:20 GM tests on vehicles 9:20 NOx and normal calculations 10:35 testing prototypes 12:10 moderately high cholestorol (normal caculations for intervals) ??:?? army measurements and distributions 21:00 figure out sizing for new helmet 23:00 normal quantile plots Program 12: Experimntal Design 0:00 grandpa phillip and anectdotal evidence [whiskey and raw egg] 0:30 use available data if available 1:00 intro to observation/experimentation 1:45 observing lobsters 4:20 observation -> hypothesis -> test 4:50 treatments = doing something to subjcts 5:15 Physicians health study 7:40 double blindness 8:40 why use doctors 9:10 a poor design -- no control group, no blindness 9:45 confounding factors 10:10 why use placebos -- placebo effect 10:35 results (104 vs 189 heart attacks; 47% reduction) 11:00 study stopped early 12:20 biased group assignments (ribavirin and aids) 13:40 random assignment to groups 14:30 random digts (drawn from a hat) 16:15 domestic violence example 21:15 follow-up data gathering 22:25 tough to design a good experiment 22:40 fictiteous example of poor design 24:45 good features random assignment comparison double blindness sample size 25:50 end Program 13: Blocking and Sampling comments: strawberry segments too slow, not much stats sampling parts better than blocking parts 01:35 Intro: laundry and water temp subjects, systematic differences, blocking 04:45 strawberries (raising new varieties) 08:10 randomized complete block 11:55 multiple factor experiments 13:00 census 14:50 undercount problems 16:40 300 000 census takers 18:50 sampling (population/sample; Hite report) 20:45 Lays potato chip 26:30 end Program 14: Samples and Surveys comments: pretty good video, but skip 35:20-41:10 31:50 Intro Statistic vs Parameter 34:00 Stratefied Random Sampling 35:20 Rec fishing and fish populations [skip] 39:00 Literary Digest bad election prediction [skip] 40:00 bias 41:10 Mistakes to avoid (examples demoed with "man on street") 41:48 NORC @ U Chicago (GSS since 1972) 45:00 "Think about" to test out questions 51:05 Sampling distributions and approximation samples of 50 beads from a bin (time lapse photography) 57:28 end Program 15: What is Probability? 0:00 gambling with dice; Fermat & Pascal 1:10 what exactly is randomness? (gravity vs coin tosses) 2:55 what do you most fear could harm or kill you? 4:10 car travel prob -- injury: 1 in 100,000 trips; death: 1 in 4 million 4:55 difficult to analyze how experience might mislead our estimates of probabilities 6:00 asking teanagers about risk assessment (surveys) 7:35 statistics "summarizes experience in a way that will help people to make decisions" 7:50 teressa flipping coins 9:10 persi diaconnis 10:05 spent years studying ... possible to make things quite random (if you do them vigorously) but we usually don't do them vigorously clip of him flipping heads each time 12:25 probability is about an observers' knowledge, not about... 12:50 finding pattern in data by taking averages 13:30 outcomes, sample space, event [dice example] 15:40 1st two probability rules [about roles of 0 and 1] 16:10 traffic in NYC (probabilistic simulation) 22:10 let's say we're traffic engineers (6 types of drivers) 23:20 calculation of probabilities (mutually exclusive events) Program 16: Random Variables 0:00 disappointed friends and gender of baby 1:15 determining independence can be tricky [stand and deliver] 2:25 challenger explosion 4:20 nasa had inadaquate procedures for analyzing risks 5:15 Prob(failure of 1 joint) = .023 [but six such] 5:45 multiplication rule 6:10 applied to coin tosses 6:30 applied to field joints 7:40 engineering uses rudundancy to reduce risk but in this case, redundancy was not necessarily independent 9:10 10:05 shuttles are complex; has a rel. high prob of failure OK, but should be acknowledged going to be catestrophic failures every five to ten years 10:50 using the multiplication rule (independence vs disjoint) 11:50 here are the rules (addition, multiplication) 12:20 most interested in numerical outcomes to random phenomena 12:45 X = number of heads (discrete rv) 13:30 weight of babies (continuous rv) 13:50 a little quiz (discrete vs continuous) 14:25 points per game example (NBA) 15:10 "information about what has occured in the past can be used to assess probablity in the future" 15:40 what about continous random variables (density curve and area) 16:50 dangerous rv (earthquakes) 17:45 man on street interviews about prep for earthquakes 18:10 don't know enough to predict earthquakes [not enough data] 19:00 Parkfield [small town as earthquake lab] 19:35 time interval viewed as normal distribution mean= 22; sd =3 21:55 planning for earthquakes in CA based on probability 22:30 calculating mean of discrete rv 23:30 formula for mean of discrete rv 24:00 finding variance (and st dev) of discrete rv Program 23: Inference for Proportions 01:40 Woburn Leukemia Intro 02:45 Statistic vs Parameter 03:00 Bureau of Labor Statistics 06:20 band of certainty (about 0.19 wide for BLS) 07:00 "The data are never good enuf for the uses that people want to make of them because we are not oversampling ... for the national numbers." 08:00 example computations (assuming familiarity with p-val, z-score, etc) 12:00 back to Woburn example 20:15 Salem (1692 witches) 24:00 example (pooled estimate for p) result: no sig dif b/w men's and women's conviction rates note: sample size small, so would take a large dif to be sig. Program 24: Inference for 2-way tables 32:30 Intro: Sick cat and 3 drugs 35:00 fossil teeth (scratches vs pits; primate family tree) 39:25 worked out example of chi-square 44:20 breast cancer: treatment vs. age 47:30 grouping matters (categorical age variables) 49:00 worked out example of 3x2 table example 52:00 size and strength of difference vs significance 52:45 Mendel: did he cheat? (His data were "too good".) example: categorizing corn kernals 57:40 Moral: Be careful about how you assign categories