Math 143 E
Probability and Statistics
Fall 2000

Statistical Slogans

Bigger is Better.

Generally, the larger the sample size, the better. Larger sample sizes produce smaller confidence intervals and are more likely to produce statistically significant results.

But...

Quality is Job 1 ...

because

You can't do Good Statistics from Bad Data.

If the design of the study is flawed, the statistical analysis is not meaningful.

We talked about many such flaws earlier in the semester.

Statistical Significance is not the same as Practical Importance.

  1. A statistically significant difference may be of no practical importance if the magnitude of the difference is small.
  2. If a hypothesis test does not indicate statistical significance,

Significance Levels are not Magic.

It is generally best to report a P-value when doing a hypothesis test. Although significance levels of 5% (most common), 10%, or 1% are often used as decision thresholds, these choices are fairly arbitrary. A P-value of 4.9% is not much different from a P-value of 5.1%, even though in the first case we would claim significance at the 5% level and in the second case we would not.

Identifying Cause is harder than identifying Effect.

When a test indicates a significant difference between two populations, it is saying that there appears to be a difference that is not due to chance alone, but it does not say what it is due to. Pinpointing cause is a tricky business and requires careful design of a study (recall the issues of control from your water drops on coins experiment). One must always consider possible alternative causes due to extraneous differences, confounding factors, etc.

Keep your Assumptions in mind.

Many statistical tests are based on assumptions about the population involved. If these assumptions are not reasonable for the population in question, then the test should not be used. Some tests are robust against certain types of violations of the assumptions but not against others. For example, the t tests are based on an assumption that the population is normal but are fairly robust against skewed distributions (more so as the sample size increases), but do not do as well when there are outliers or when the population is not unimodal.

Sometimes assumptions are difficult to check from data and must be justified some other way. (There are also other tests that can be used that do not require many assumptions.)

There is No such thing as a Sure Thing.

Statistics deals with uncertainty. It uses probability to measure long-run frequency of a random event. Whenever you see a probability think: ``What is the random event here?'' The random event in statistical procedures is the selection of the sample. Thus,

  1. 95 % of samples produce 95% confidence intervals that are contain the population parameter.
  2. P-values measure what percentage of samples would give data as extreme (or more extreme) as the data you collected if the null hypothesis were true.

Know what you are Looking for.

Searching for significance by running lots of tests on the same data set is bad statistical practice. For example, for every 20 tests run, you should expect one of them to be significant at the 5% level by chance alone. So if you keep looking, you will detect ``significant'' results, but this is not meaningful. Knowing what you are looking for also helps you improve the design of your study.

When running multiple tests on the same data, one way to compensate for this is to require a stricter level of significance (smaller P-value) for each of the tests done. Or you can ...

Try, try again.

Another way to test whether indications resulting from data exploration with multiple tests are valid is to conduct a new study designed to test only for those things which had an indication of significance in the initial study.




File translated from TEX by TTH, version 2.78.
On 30 Oct 2000, 13:34.