Sunday, December 23, 2012

Correlation and Causation

So true
Funny how it seems
Always in time
But never in line for dreams
--Spandau Ballet

Several times recently I have read statements claiming that correlation does not mean causation. Quite true. One morning a few months back I noted a small puddle of water on my basement floor near a corner wall. It had been raining heavily through the previous night. "Rain must be getting in somewhere," I thought.

Before donning the rain gear and heading outside, I decided to inspect the area above the leak, including various pipe runs in the vicinity. That's when I spied a drop of water hanging from copper tubing heading into the hot water heater. Upon closer examination, a slight leak had developed in a joint connecting two pieces of tubing, and the water drops were occasionally squeezing through the joint and dropping to the floor.

I had mistakenly presumed that the puddle on the floor was caused by the rain. Seemed reasonable--particularly because the rain was heavy and there hadn't been much rainfall in weeks. Because the puddle and rain showed up at the same time, their presence was correlated ('co-related'). But the cause of the floor puddle turned out to be something else, rendering the correlation between rain and puddle meaningless (what stats folks sometimes refer to as 'spurious').

Whenever one or more independent variables are regressed on an independent variable, the results bear the limitation that causality cannot be construed from the data alone. Even if a strong relationship between X and Y is detected, we don't know whether X caused Y or vice versa--or whether one or more unaccounted-for variables render the observed X/Y correlation spurious.

In empirical studies, analysts can try to reduce potential for false conclusions by 'controlling' for other variables that might influence relationships of interest. In most social settings, however, it is difficult to control for the myriad variables that might influence the findings.

Unfortunately, much social science research has been moving in the direction of empirical, multivariate studies. Because most of these studies explicitly or implicitly seek to determine/verify cause and effect relationships, the validity of the findings are usually severely limited. Lacking the careful controls of the lab bench, empirical studies are generally incapable of unearthing cause:effect findings that are highly convincing.

Moreover, it is also possible that simple linear models where X is thought to cause Y misspecify true relationships between variables. Senge's (1990) work demonstrated that feedback loops may reinforce or balance real-life systems that reduce directionality between variables.

Given the empirical difficulties, how can accurate conclusions about causality be reached? One way is to favor reasoning over statistics when conducting analysis. For example, the Austrian school of economics employs praexology, the study of human action grounded in a relatively small set of axioms about nature and human behavior, to deduce economic relationships.

For example, we can also deduce that, because people generally prefer leisure over work, and less work over more work, that people will be prone to act in ways that permit acquisition of more income (physical or psychic), with the least amount of effort. The prospect of getting something for nothing is the primary cause of most political behavior because people recognize that their situations can be made better on the backs of the productive effort of others.

We should also note that numerical data are not totally useless in studying causality. In fact, longitudinal series of proposed dependent variables that include known points where independent variables have been changed can be effective ways to investigate cause and effect.

For example, the 200+ year series of federal debt and deficits (two dependent variables) shown a few posts back includes labels of several possible independent variables thought to influence levels and trends of the proposed dependent variables. One of them is war. Because several wars take place over the horizon, and debt/deficits move in similar directions each time, there can be little doubt that war causes higher debt and spending. These patterns are consistent with deductive reasoning that suggests similar outcomes.

Longitudinal data are particularly useful when there is a lag between changes to independent variables and changes to the dependent variable. Back to the debt/deficit chart, note that the labels associated with the institution of the Fed (1913) and Nixon's closing of the gold window (1971) were followed by large increases in debt and deficits--after an initial lag. This lag could have been deduced ahead of time as time associated with establishing new institutional arrangements and routines. We should expect some lag between cause and effect here.

On the other hand, pinning the change in debt and deficits these two independent variables is not as convincing as the war variables that were 'turned on and off' multiple times. To gain more understanding of the effect of the Fed and removal of the gold standard, it would be nice to dismantle the Fed and go back on the gold standard, and then study changes to federal debt and deficits. We can only hope for such an opportunity!

Potentially, then, there is some causal information available from time series data. Studying US homicide rates in relationship to changes in gun control laws and gun ownership can yield some sense of cause and effect. When US homicide rates decline following weaker gun laws and increasing gun ownership, then that is consistent with theoretical predictions of causality.

Of course, studying additional periods of changing gun control laws and gun ownership would be more convincing. While hard to do with US data alone, it is possible to do so by looking at changes in gun control laws and gun ownership in other countries. I am pretty sure such research has been done, but I have only been privy to snippets here and there.

Thus far, the snippet data are consistent with the pattern observed in the US data. The lower the gun controls and the higher the gun ownership, the lower the violent crime. The more situations where such a pattern holds, the greater the evidence of causality.

References

Senge, P. 1990. The fifth discipline. New York: Doubleday.

1 comment:

dgeorge12358 said...

Unexplained obscure matters are regarded as more important than explained, clear ones.
~Friedrich Nietzsche