Correlation Does Not Prove Causation, But Correlation Does Prove Correlation
Print Friendly and PDF

It took a surprisingly long time for the modern statistical concept of correlation to emerge. It was implicit for a long time, but Francis Galton worked out the basics in 1888 (when, by the way, he was 66 years old). 

So, it's not surprising that people aren't really good yet at thinking about correlation.

In recent years, the cliche "Correlation does not prove causation" has emerged as a staple of Internet discussions, which I guess is a good thing, although it often appears in the more questionable form "Correlation does not imply causation."

In truth, correlation suggests causation. If A correlates with B, then perhaps A causes B. Or maybe B causes A. Quite possibly, some C causes both A and B, or various combinations.

Now, it could be that a finding that A correlates with B to some extent might be just a coincidence due to a limited sample size. Fortunately, we have good statistical techniques for measuring that possibility, and we have the test of replication. Similarly, attempts at replication can help weed out apparent correlations cause by incompetence, fraud, unconscious bias, and the like, although they can never be ruled out completely.

But, keeping those caveats in mind, we can say (with only a reasonable degree of overstatement):

Correlation does prove correlation.

For example, illegal immigration is correlated with many important social measures. This doesn't prove that illegal immigration "causes" high or low high or low school achievement. As we've all heard, "Correlation does not prove causation!"

But, few have heard, "Correlation does prove correlation."

For example, it does prove (to the extent that anything can be proved)  that illegal immigration is correlated with low school achievement. Moreover, the children and grandchildren and great-grandchildren of illegal immigrants tend to have below average school performance.

Furthermore, these correlations have been around for as long as they've been measured.

Now, it is conceivable that these correlations will vanish tomorrow.  Thus, the insistence is widespread that the burden of proof must be on those pointing out the correlations to prove causality beyond any doubt.

But, shouldn't the burden of proof be on the people asserting that the correlations will vanish to come up with at least a prima facie theory of why that will happen?

Print Friendly and PDF