An Econometrics Lesson

Friday, June 20th, 2008

Arnold Kling offers An Econometrics Lesson:

I received an email from a reader who was very excited to find that over the past 70 years the correlation between excess health care inflation (the price of health care relative to the overall CPI) and the proportion of health care spending paid for by third parties was 0.92 (out of a maximum of 1.00)

I wrote back saying that correlation does not imply causation. He replied that he understood that, but still, with a correlation that high there must be something.

I’m sorry, but the inability to infer causation from correlation has nothing to do with the size of the correlation coefficient. It reflects the process generating the data. In a controlled experiment, you often can say something about causation. When you just observe some data, you cannot.

In addition, time series data (data that cover long time periods) are very subject to spurious correlation. Over time, data tend to follow trends. Any two trends are automatically correlated, whether there is a causal relationship or not.

When you look at data over time, it is important to ask yourself how many data points you really have. With a strong trend, you probably should just think of yourself as having two data points — the beginning and the end point. If there are a few sharp swings in the data, then you might have three or four effective data points. The fewer the number of effective data points, the harder it is to distinguish among alternative sources of causality.

That is why most macro-econometrics is junk science.

Leave a Reply