Chapter 8: Correlation Confusion

Home > Chapter 8

One of the easiest errors to make in statistics is the correlation error. This correlation error, known by many as post hoc, results after one sees a strong correlation between two variables and assumes that one variable must be causing the other. Often this is not the case. The correlation may be flawed or the interpretation of the correlation may be faulty, and even when the correlation is legitimate, it should not be the basis for decision making. In the end, we must put any statement of relationship through strict scrutiny before we allow ourselves to agree with the statistical conclusions.

One possibility for a correlation between two variables is chance. It is quite possible that two variables have a correlation for no real reason. For example, one may say that the 100 meter olympic times in the past 50 years have a correlation with the number of hurricanes that occur in the same year as the before mentioned olympics. These two variables however are obviously not related. Another, more common, reason for chance correlation is that the sample size is small. Given a small enough sample size, almost any type of correlation may arise. Some companies use this tactic when they advertise by creating small sample sizes in order to create a correlation between their product and an appealing characteristic.

Another possible reason for a perceived correlation is a lurking variable. Lurking variables are hidden variables that affect the obvious variables. Lurking variables are more common than most would think. For example, a correlation could be found between the rise of Christianity in China and the rise of produced pollutants from China. However, there is a lurking variable of time. Time has allowed for more Christians to enter China and has also allowed for China to develop more industrially.

A third statistical error to guard against is the confusion of which variable caused which. A study may be done to prove that the amount of wrinkles one has affects his or her age, and a strong correlation may be found. However, it is obvious in this case that age causes wrinkles, not the other way around. Sometimes the variables are so intertwined that it is impossible to tell which causes which. In either case damage may be done by quickly assuming one caused the other.

Sometimes there may in fact be a true correlation between two variables, but still it is important not to extend the correlation too far. A correlation between health and the amount of fitness one receives may point towards exercising more. However, too much exercise is unhealthy as it does not give time for the body to recover and build new tissue and muscle. Even within the boundaries for which the correlation is true, it is important to think critically. Maybe there is a correlation between some desirable thing and some action, but that does not mean one should always attempt the action. For example there may be a correlation between moving to Boston and making more money, but there will still be exceptions, and it is foolish to think that moving to Boston will automatically mean that one will make more money. Judgement and rationality should always be an integral part of decision making.

When dealing with statistics, one must always be skeptical. Statistics are often misleading, and decisions should never be solely based upon one piece of statistical evidence. Even if there is a relationship between two variables, there are often many exceptions and other knowledge should be used to come to a decision.


For more examples check out:

Smoking and Income

Tiger Woods Driving Accuracy 

Presidents and Job Loss

Children Living at Home