For All Practical Purposes (statistics units)
Sol Garfunkel
Program 6: Statistics Overview
opening: baseball stats
03:15 three issues
1) how to collect accurate numbers
2) how to describe things in ways that we can work with
3) how to draw conclusions and make predictions
03:25 preview of some examples
04:45 "baseball fans thrive on statistics"
recording stats during baseball game
06:25 sampling
man in street interviews at Fanuel Hall
09:10 science of statistics is designed to sovle
1) correctly choosing a sample
2) eliminating bias of researcher
3) determining appropriate size of the sample
4) being precise about the meaning
09:35 Bureau of Labor Statistics
11:30 size, kind, manner drawn (of a sample) ... determined by what you
are trying to measure (Janet Norwood, BLS commissioner)
13:10 "a statistical agency must be completely open" -- J. Norwood
13:55 "bias is the bane of the science of statistics"
14:10 Phys. Health Study
15:40 goals of the study
16:00 randomized assignment
17:15 experiment: why doctors? understand risk/benefit, informed consent
19:00 ATT -- design of telephone circuits (and statistical control)
23:45 casionos -- roulette
"this tendency for random events to form very clear and
predictable patterns is one of the fundamental concepts
in the science of statistics. All it takes is enough
repetitions."
27:00 Statistics in science (estimates and prediction)
28:00 end
Program 7: Behind the Headlines
starts about 32 minutes into video.
32:20 start -- stats in newspaper
01:30 question: how reliable are statistics? (afraid of crime)
01:55 cigarettes (surgeon general find stats compeling, industry says
findings are weak because based on stats)
03:25 "As in any other science the foundation is accurate observation.
In statistics this means properly collecting data."
03:50 Unemployment rate
05:35 sampling distributions (50-bead samples)
09:20 how BLS samples (multistage design)
10:50 variance/sampling error/tolerance
12:05 smoking and lung cancer
randomized comparitive studies / random clinical trials
13:50 Physicians Health Study (Aspirin and beta-caratine)
observation vs. experiment
all four treatments (two kinds of placebos)
16:10 placebos
16:25 design of the study
17:10 blinding
18:15 ethics of clinical trials (reasonable chance, reasonable doubt)
19:50 null results/sample size too small
20:15 sig dif
21:00 deliberately introducion chance
22:20 size of sample / latin square method (motor oil in cars)
25:50 "Deciding what is applicable and appropriate is one of the
statistician's most important jobs."
26:40 end
Program 8: Picture this
begins approx 61:20 into video
00:00 "The proliferation of data is both a promise and a problem in modern
society"
threatens to overwhelm us
00:30 data in search for oil (seismography)
01:25 people need summaries of large data sets
01:40 "In fact, numbers computed from sample data make up the raw
material of statistical inference, the science of
drawing conclusions from data with the aid of the
mathematics of probability."
02:00 focus on describing data
"best-known tool is not a number but a picture" (histogram)
02:10 Tufte's book
03:10 "Pictures are efficient at the task of analyzing data as well as
conveying the results of that analysis to others."
computer vs human eye: combination of computer's ability
to handle large data sets and generate picures, and
our ability to recognize patterns in images
04:10 how well do baseball players hit?
1980 batting averages
05:40 outliers (George Brett)
06:40 principle: data = smooth plus rough
look for pattern, then for deviations from pattern
07:15 quality control data
07:50 Demming (taught quality control to Japanese)
07:25 "What can we do to work smarter, not harder" (Demming)
flinching example
09:55 descriptive stats ("a few carefully chosen numbers")
10:15 median (of batting averages)
10:30 median income in 1983: $24,580
10:55 spread (Spartany vs Affluentia)
11:30 range (and problems with measure -> quartiles)
12:45 exploring data, moving to 2 variables
14:05 concessions (soft drinks)
scatter plot of temperature vs soft drink sales per capita
smooth = line, rough = scatter (other factors)
15:40 equation for regression line and use to compare sales at different
price by adjusting for temperature
17:10 1970 draft lottery
19:05 combine graphical with some basic numberical descriptions
median and quartiles
20:50 3 variables?
22:05 Dr. Huber: 3d map of earthquack epicenters
25:00 summary
1) progress from simple to complex
2) remember: data = smooth plus rough (pattern/deviation)
3) remember power of picture: eye is best device for seeing
both smooth and rough
26:20 end
Program 9: Place your bets
all times 1:18:10 plus minutes shown.
13:10 start -- intro to casinos
14:15 which people die vs how many (life insurance)
14:35 "a phenomenon is called random if individual outcomes are
unpredictable, but the long-term pattern of many outcomes
is predictable."
15:50 "mathematical description of randomness is called the theory of
probability"
17:20 cummulative probability of 10,000 coin tossses
probability defined as limiting value
P(heads) = 0.507 (falls with acceptable range for fair coin)
18:45 listing all possible outcomes
19:35 2 laws of probability
21:20 probability of each side of die turning up (1/6)
loaded dice
21:50 sampling is a lot like gampling -- link is randomness
back to 50-bead samples
histogram based
23:40 terminology (sample, statistic, sampling distribution of stat)
sampling variability
25:40 normal curves
26:20 median vs mean
28:00 standard deviation
68-95-99.7 rule
30:40 central limit theorem
31:50 roulette wheel
expected value
central limit theorem again
38:45 summary
39:50 end
Program 10: Confident Conclusions
all times 1 hour 18 minutes plus minutes shown.
42:10 start
43:05 "chance is the ally rather than the enemy of conclusions"
43:20 Gallup poll (imagine 2 samples)
45:00 Physician's Health Study (aspirin, beta-caratine, placebos)
brief mention, then back to polling
46:45 sampling distribution
estimating st. dev.
standard error
confidence intervals
54:25 ATT circuit boards, quality control
64:45 "one very important goal of the statistician is to identify
all of the possible variables that might affect and thereby
confuse a conclusion."
65:20 hypothetical example of sex discrimination in college admissions
Simpson's paradox
68:40 "Statistical evidence is not proof...At least statistics
announces its degree of trustworthiness openly, right down
to the margin of error."