2 175 325 500 208.33 291.67 Total 625 875 1500 Chi-Sq = 2.667 + 1.905 + 5.333 + 3.810 = 13.714 DF = 1, P-Value = 0.000 ------------------------------------------------ Simpson's Parodox If we divide by programs applied to, we see a different story ------------------------------------------------ Expected counts are printed below observed counts acceptA rejectA Total 1 400 250 650 403.45 246.55 2 50 25 75 46.55 28.45 Total 450 275 725 Chi-Sq = 0.029 + 0.048 + 0.255 + 0.418 = 0.751 DF = 1, P-Value = 0.386 ------------------------------------------------ Expected counts are printed below observed counts acceptB rejectB Total 1 50 300 350 79.03 270.97 2 125 300 425 95.97 329.03 Total 175 600 775 Chi-Sq = 10.665 + 3.111 + 8.783 + 2.562 = 25.120 DF = 1, P-Value = 0.000 ------------------------------------------------ hospital example (Utts chapter 12, pages 213-215) give combined results first, then separate survive die s rate d rate standard 505 595 .46 .54 new 195 905 .18 .82 total 700 1500 standard 5 95 .05 .95 new 100 900 .10 .90 total 105 995 discrimination example (Utts, chapter 12, pages 215-217) ?? death penalty 326 cases, white defendant: 19/160 get death pen. (.119) black defendant: 17/166 get death pen. (.102) when separated by victim's race, see different story [overhead from Moore 207] point: statistically significant means that the effect is not likely to be due to chance alone, but there may be some other factor than the obvious one that is reason Probability random: long-term predictability vs short-term unpredictability law of large numbers / "law" of small numbers scale: 0 to 1 (0% to 100%) personal vs. mathematical (relative frequency) 4 Rules and applications axiomatic method four rules "overhead" examples probability of losing luggage is 1/176 (Krantz) P(heart attack kills) = .33, P(cancer kills) = .2 [assuming death] estimated probability of grades probability of two girls (P(boy) about .512) probability of winning 2 of 3, 3 of 5 given an estimate for each game video -- Life By the Numbers (#4 Prob) 02:00 (or 08:20) - 27:00: intro to prob., Graunt, casinos 27:00 - 42:25: polling, polio, prob assesses results note that (p)(1-p) < .25, so use .25 note that p-hat is usually very close to p, especially if the sample is large example: sample 1600 people and 500 people say yes example: Reeses' pieces Testing a hypothesis 1) determine null and alternative hypotheses 2) collect data 3) compute test statistic test stat is a measure of how true the null hypothesis seems to be 4) determine likelihood of such an extreme test statistic if null hypothesis is true (p-value) 5) make a decision Testing Hypotheses for Proportions test statistic is z-score example: predicting election outcomes == Tuesday, January 25 Topic: Time Series Topic: Wrap-Up Due: HW #10 @ hw06.shtml Vocab: time series, long-term trend, seasonal variation,%% seasonal adjustment, cycle A look at Calvin Tuition Data Time series plot Calvin Tuition Data plot CPI (with Calvin Price Index?) births Dow Jones postage? Things to watch for cherry picking data choice of units (\$ vs inflation adjustments, etc) vertical axis doesn't start at zero (magnifies steepness)