For All Practical Purposes (statistics units) Sol Garfunkel Program 6: Statistics Overview opening: baseball stats 03:15 three issues 1) how to collect accurate numbers 2) how to describe things in ways that we can work with 3) how to draw conclusions and make predictions 03:25 preview of some examples 04:45 "baseball fans thrive on statistics" recording stats during baseball game 06:25 sampling man in street interviews at Fanuel Hall 09:10 science of statistics is designed to sovle 1) correctly choosing a sample 2) eliminating bias of researcher 3) determining appropriate size of the sample 4) being precise about the meaning 09:35 Bureau of Labor Statistics 11:30 size, kind, manner drawn (of a sample) ... determined by what you are trying to measure (Janet Norwood, BLS commissioner) 13:10 "a statistical agency must be completely open" -- J. Norwood 13:55 "bias is the bane of the science of statistics" 14:10 Phys. Health Study 15:40 goals of the study 16:00 randomized assignment 17:15 experiment: why doctors? understand risk/benefit, informed consent 19:00 ATT -- design of telephone circuits (and statistical control) 23:45 casionos -- roulette "this tendency for random events to form very clear and predictable patterns is one of the fundamental concepts in the science of statistics. All it takes is enough repetitions." 27:00 Statistics in science (estimates and prediction) 28:00 end Program 7: Behind the Headlines starts about 32 minutes into video. 32:20 start -- stats in newspaper 01:30 question: how reliable are statistics? (afraid of crime) 01:55 cigarettes (surgeon general find stats compeling, industry says findings are weak because based on stats) 03:25 "As in any other science the foundation is accurate observation. In statistics this means properly collecting data." 03:50 Unemployment rate 05:35 sampling distributions (50-bead samples) 09:20 how BLS samples (multistage design) 10:50 variance/sampling error/tolerance 12:05 smoking and lung cancer randomized comparitive studies / random clinical trials 13:50 Physicians Health Study (Aspirin and beta-caratine) observation vs. experiment all four treatments (two kinds of placebos) 16:10 placebos 16:25 design of the study 17:10 blinding 18:15 ethics of clinical trials (reasonable chance, reasonable doubt) 19:50 null results/sample size too small 20:15 sig dif 21:00 deliberately introducion chance 22:20 size of sample / latin square method (motor oil in cars) 25:50 "Deciding what is applicable and appropriate is one of the statistician's most important jobs." 26:40 end Program 8: Picture this begins approx 61:20 into video 00:00 "The proliferation of data is both a promise and a problem in modern society" threatens to overwhelm us 00:30 data in search for oil (seismography) 01:25 people need summaries of large data sets 01:40 "In fact, numbers computed from sample data make up the raw material of statistical inference, the science of drawing conclusions from data with the aid of the mathematics of probability." 02:00 focus on describing data "best-known tool is not a number but a picture" (histogram) 02:10 Tufte's book 03:10 "Pictures are efficient at the task of analyzing data as well as conveying the results of that analysis to others." computer vs human eye: combination of computer's ability to handle large data sets and generate picures, and our ability to recognize patterns in images 04:10 how well do baseball players hit? 1980 batting averages 05:40 outliers (George Brett) 06:40 principle: data = smooth plus rough look for pattern, then for deviations from pattern 07:15 quality control data 07:50 Demming (taught quality control to Japanese) 07:25 "What can we do to work smarter, not harder" (Demming) flinching example 09:55 descriptive stats ("a few carefully chosen numbers") 10:15 median (of batting averages) 10:30 median income in 1983: $24,580 10:55 spread (Spartany vs Affluentia) 11:30 range (and problems with measure -> quartiles) 12:45 exploring data, moving to 2 variables 14:05 concessions (soft drinks) scatter plot of temperature vs soft drink sales per capita smooth = line, rough = scatter (other factors) 15:40 equation for regression line and use to compare sales at different price by adjusting for temperature 17:10 1970 draft lottery 19:05 combine graphical with some basic numberical descriptions median and quartiles 20:50 3 variables? 22:05 Dr. Huber: 3d map of earthquack epicenters 25:00 summary 1) progress from simple to complex 2) remember: data = smooth plus rough (pattern/deviation) 3) remember power of picture: eye is best device for seeing both smooth and rough 26:20 end Program 9: Place your bets all times 1:18:10 plus minutes shown. 13:10 start -- intro to casinos 14:15 which people die vs how many (life insurance) 14:35 "a phenomenon is called random if individual outcomes are unpredictable, but the long-term pattern of many outcomes is predictable." 15:50 "mathematical description of randomness is called the theory of probability" 17:20 cummulative probability of 10,000 coin tossses probability defined as limiting value P(heads) = 0.507 (falls with acceptable range for fair coin) 18:45 listing all possible outcomes 19:35 2 laws of probability 21:20 probability of each side of die turning up (1/6) loaded dice 21:50 sampling is a lot like gampling -- link is randomness back to 50-bead samples histogram based 23:40 terminology (sample, statistic, sampling distribution of stat) sampling variability 25:40 normal curves 26:20 median vs mean 28:00 standard deviation 68-95-99.7 rule 30:40 central limit theorem 31:50 roulette wheel expected value central limit theorem again 38:45 summary 39:50 end Program 10: Confident Conclusions all times 1 hour 18 minutes plus minutes shown. 42:10 start 43:05 "chance is the ally rather than the enemy of conclusions" 43:20 Gallup poll (imagine 2 samples) 45:00 Physician's Health Study (aspirin, beta-caratine, placebos) brief mention, then back to polling 46:45 sampling distribution estimating st. dev. standard error confidence intervals 54:25 ATT circuit boards, quality control 64:45 "one very important goal of the statistician is to identify all of the possible variables that might affect and thereby confuse a conclusion." 65:20 hypothetical example of sex discrimination in college admissions Simpson's paradox 68:40 "Statistical evidence is not proof...At least statistics announces its degree of trustworthiness openly, right down to the margin of error."