Profile photo for Marek J Druzdzel

I think that outside of some really puzzling correlations (e.g., divorce rate in Maine and per capita consumption of margarine), there is always a causal graph that explains the correlation. I take it that the question refers to observing a pair of variables that are correlated and that are not directly causally related. In such cases, there is usually some causal connection (e.g., a common cause or an observed common effect) that explains the correlation.

Here are three examples that I use in teaching causal graphs and, in general, the relationship between causality and probability. They use probabilistic dependence, which is more general than correlation. Correlation is a measure of linear dependence and dependence does not have to be linear.

(1) Ice cream consumption and drowning. The two are strongly correlated but the correlation disappears when we condition on the outside temperature (their common cause). Conditioning amounts to looking at hot days and cold days in separation.

(2) Negative correlation between being a good surgeon and operation statistics. It turns out that best surgeons do not have the best operation survival statistics (i.e., they have a higher patient death rate than average or poor surgeons). This is explained by a common cause: difficulty of a case. Best surgeons get the hardest cases. Even though they are good and in cases of the same difficulty they will perform best (so, when you condition on the difficulty of the case, their statistics are going to be very good), the patients they take tend to be just more prone to death.

(3) Negative correlation between heart problems and lung problems among patients in intensive care units (ICUs). This is harder to explain, as it involves conditional dependence rather than conditional independence. Landing in an ICU has two strong causes: Problems with breathing (lungs) and circulatory problems (heart). ICU patients are likely to have both problems but if they have heart problems, these explain why they have landed in ICU and, hence, make the probability of lung problems smaller. This relation is symmetric, so learning that an ICU patient has lung problems, decreases somewhat the probability that the same patient has also heart problems (compare the first to the second picture below). Conditional dependence, which occurs when we condition on a common effect of multiple causes, is much harder to understand intuitively but it is as important as conditional independence. It explains induced correlations that occur in almost every data set collected, as every collection is conditioned on something.

If you want to play with these ideas, I recommend Bayesian networks which, while a mathematical formalism, can be interpreted causally and allow for creating models and testing what happens at the intersection of probability and causality. My favorite Bayesian network package, from which I took the above pictures, is GeNIe. I’m rather biased here, as I took active role in developing it over twenty or so years at my research group at the University of Pittsburgh. GeNIe is currently a commercial product but still available free of charge to academic users.

View 100+ other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025