Correlation versus Causation: The Science, Art, and Magic of Experimental Design


Recently I was chastised for being ‘unnecessarily obscure’ for reacting to a specious conclusion by suggesting that it risked ‘conflating correlation with causation’. Guilty as charged!  I apologize:  the expression is quite a mouthful and requires unraveling for those unfamiliar with the nuances of the applied experimental method.  However, I feel passionately that this concept is something we all need to understand better, so I felt a post to unravel the concept would be appropriate.  Even those of us who grasp the central notion need to constantly remind ourselves of its logic, nuances, and pernicious symptoms.

Improperly ‘conflating causation with correlation’ is a central but often overlooked danger in business analysis and data science initiatives.  Especially with ‘Big Data’ sets, analysis will often reveal patterns that suggest a causal element which is only correlative (co-occurring) phenomenon, or worse, ‘phantom phenomenon’ (i.e. coincidence or a happenstance of a limited or unintentionally biased or skewed dataset).

[Of note, this post examines the theoretical and historical foundations related to experimental model design.  If you are seeking focused technical information on computational / analytical / data science approaches to experimental model design, please see the companion post:  Data science as an experimental process: unsupervised and supervised learning: ]



Some practical examples concerning mistaking correlation for causation:  a recent letter to the editor in the INFORMS society magazine by Dr. John Crocker entitled “Numbers don’t lie and other myths” raised two excellent examples of improper causal attribution (Crocker, June 2013).  In one case, he noted that a recent article claimed that suffering hair loss predisposed one to migraine headaches.  This is an example of correlation, but not causation.  Whereas there may be a statistical correlation between the two phenomenon (baldness and migraines), this is not a license to conclude one ‘causes’ the other, merely that they have a propensity to co-occur.  Such an observation indicates there is likely more fundamental phenomenon at play (i.e. genetics predisposition to higher testosterone levels which leads to both hair loss and greater stress, leading to high blood pressure, which predisposes one to migraines).

In another case, Crocker cites a ‘mathematician’ who claimed he could predict winning horses.  The ‘mathematician’s’ conclusion was based on an analysis of a sample of 176 races.  He concluded that winning horses statistically have one single name between eight and 10 letters.  This is an example of phantom phenomenon: due to a small dataset, an observed correlation emerged by coincidence which had no causal bearing on the core phenomenon (winning races).

The phenomenon of phantom correlations or spurious correlations (external to, ancillary to, or tangential to a causal hypothesis) can also occur in large datasets: indeed, a growing discussion in the discipline of data science notes that the larger the dataset (particularly with many variables combined with many instances), the greater propensity for spurious correlation to be observed.  Thus, one may find that vegetarians are less likely to be late for work, or that car owners are more likely to own a dog, or that Baptists often own a barbecue, or that gun owners are more likely to keep a garden.   These quickly fall into ‘so what?’ or, more appropriately, ‘yes…  but then what, so what else, and why?’ buckets.

There may be genuine correlation in two phenomenon, but the observation of correlation begs both a:  1) rigorous consideration of the data sample and possible embedded sample biases (i.e. we analyzed only state employees in Alabama), and  2) consideration of underlying phenomenon at play (i.e. rural residents have more land in which there tends to be a garden and also own more guns as they have more opportunities to go shooting as a hobby and also feel a need to protect their land – thus, missing causal variables or external phenomenon).   This process of second-guessing and refining causal hypothesis is at the core of a robust analytics process.

At root is a propensity of the human brain to seek patterns in the midst of complexity.  As clarified by Nobel Prize winning psychologist (awarded in the field of Economic Sciences) Daniel Kahneman’s research, covered in his recent book ‘Thinking, Fast and Slow’, our brains evolved to survive, not necessarily to be ‘right’ (i.e. to determine fundamental causes). After all, it is enough to know not to eat uncooked pork without having a detailed understanding of trichinosis.

Likewise, we can navigate using the stars without a fundamental understanding of astrophysics.  Indeed, for the bulk of human history stars were a navigational aid without grasping a notion of outer space or even that the earth was round and itself revolved around a star (the sun).  Thus, we can go quite far without knowing the details and indeed many businesses proceed in this practical fashion: navigating via the stars while failing to go the extra step to an understanding of the earth’s place in the universe.

However, while we can say that washing one’s hands before operating is a good thing to do, not knowing about bacteria and viruses easily leads us to forget to wash surgical instruments.  As well, placing a vat of sugary grape juice in the open air for a few weeks may lead to wine, but understanding the action of fermentation via yeast brings us to a whole new level concerning the ability to control and operationalize fermentation operations.

In dangerous situations, or at times when we are under pressure to make a decision, Kahneman has revealed that we have a tendency to take ‘cognitive shortcuts’, to sift through a large set of data (in the form of impressions and sensory input) to quickly decide and act.  This propensity he terms our ‘System 1’, our evolutionary intuitive executive that tells us to run, throw, or jump without lengthy ‘academic’ analysis or study.  You likely have seen articles and derivative research regarding Kahneman’s research in the form of articles on the topic of ‘behavioral biases’ (

Unfortunately, the same propensity to take the System 1 shortcut to make decisions exists when we are tired, overwhelmed by a complex set of circumstances, or simply too rushed and/or lazy to invest thought into a problem.  System 1, associated with intuition and quick conclusions, can be powerful and efficacious under duress (i.e. the young bears playing in the woods mean an angry momma bear is likely nearby = run!), but also leads us to formulate rapid and specious theories (i.e. because the rooster crows at sunrise each morning, the rooster makes the sun rise, therefore we cannot kill the rooster and must celebrate it as a sun deity).  False syllogisms quickly abound: all the swans I see are white, therefore there are only white swans.

Kahneman’s System 2, the more structured of our two decision modes, helps us to consider difficult problems ‘scientifically’.  An example would be of formulating theories concerning soil fertility, weather, and seasonal variations to engage in productive agriculture (suggesting farmers were the original scientists of human history).  System 1 rushes us to conclusions such at “when I sacrifice a goat on an alter in August, there is a resulting prolonged ‘Indian summer’ which allows me extra time to harvest the corn”.  There is little concern for the difference between correlation and causation (i.e. there is typically an ‘Indian Summer’ in August in my locale, whether I sacrifice a goat or not).

The tendency to find ‘magical’, superstitious, and ritual examples for ‘improperly conflating correlation with causation’ is not coincidental.  Some have suggested that science itself is a highly-refined propensity inherent among humans to find patterns. Whereas the process of scientific inquiry can give way to ‘practical truths’ (i.e. theories that are reliable over time and which have efficacy in implementation, such as in medicine and engineering), spurious pattern seeking can easily lead us to superstition, magic, and ritual (even in modern times).  For instance, the idea that sickness was the result of bad humors and bile led to the notion of extracting blood via leeches, whereas blood letting is of dubious medical value (although, surprisingly, subsequent experimentation has shown that applying leaches may actually boost the immune system due to the biochemical action of the leech itself).

In another example, although the fermentation process via yeast was only discovered in 1857 by Louis Pasteur, humans have been making alcohol for at least 9,000 years (  The genesis of alcohol was likely first attributed to some type of magical source, whereby it would transmute provided a particular ‘ritual’ was carried out (honoring the spirits, preparing and storing the juices in a particular way).  Indeed in some traditional open-air fermentation breweries in Belgium, the proprietors still refuse to clean or change anything in the brewery for fear of upsetting some minute factor which might change the outcome.

There is no better encyclopedia of this propensity to devolve to our superstitious and magical mind than ‘The Golden Bough: A Study in Magic and Religion’ by Sir James Frazer.  His twelve (!) volume masterwork traces mythic, superstitious, and religious though from ancient times into 19th century Europe.  In particular, his thesis cites the frequent overlap between magical and superstitious beliefs and primitive attempts at scientific theory building.

Frazer distinguishes between sympathetic magic (whereas something resembling something else has power over it, such as a wooden doll in someone’s likeness allowing control over the person), and contiguous magic (whereby having something which touched or was connected to something else maintains the connection, such as having a lock of someone’s hair allows the possessor to punish or control the subject).

These two principles: observing the importance of resemblance (i.e. similar form or behavior) and observing the connection between things (i.e. connected things touching, interacting, and reacting) remains as a core basis of scientific theory building, as we attempt to model and understand unknown phenomenon.  However, the danger always lurks that we are misapplying an observed similarity or misunderstanding the connection between phenomenons.  Thus, all science runs the risk of achieving spurious hypothesis, in the same way a witch doctor may assume rubbing an broken egg on a pregnant woman’s belly is a route to natal health, as the egg and the belly are both generative structures (sympathetic magic), and one touching the other predisposes heightened fertility (contiguous magic).

Even today, and even the most scientific of us, well past the enlightenment, must struggle to overcome the instinct to quickly congeal observed patterns into working theories.  An example is the phenomenon of synchronicity whereby we may suddenly notice a rapid succession of a particular number appearing everywhere.  We may, for instance, on leaving our home see the number 56 in front of our neighbor’s house, subsequently note that an electronic sign shows that it is 56 degrees Fahrenheit outside, arrive at work at precisely 8:56, and receive a phone call from 565-5665.  The superstitious mind suddenly alerts us of a meaningful pattern and the more superstitious of us will likely be tempted to go buy a lottery ticket with the chosen number 565656 (which, I assure you, still only has a one in 10 million chance of winning).

We, by nature, seek patterns in complexity and chaos, and, indeed, have a propensity to formulate rapid conclusions whereby a correlation of phenomenon suggests causation to our mind.  This is not to discount the propensity:  System 1, the intuitive facility, has allowed humanity to survive, to rise up from the muck and chaos of primordial times to the ‘modern’, hierarchical, highly structured technological society of today, with all its conveniences (at least for the lucky).

The lesson for business analytics and data science professionals is that we must always consciously engage System 2 to second guess and double-test out causal hypothesis.  Following an ‘experimental process’ or ‘decision process’ helps us to always doubt quick conclusions.  This is becoming more pressing as we deal with larger and larger datasets and increasingly complex phenomenon.

‘Big Data’ means that we may have hundreds of variables and millions of rows of observations which we can cram through algorithms and statistical routines (i.e. multi-regression analysis or decision trees).  There are increasingly non-linear techniques which ‘black box’ decision algorithms (i.e. neural network algorithms), offering no explanatory power, merely computational predictive power.  Some data scientists have spoken about a ‘new scientific revolution’, whereby we begin with no hypothesis, but let the computer identify correlation in order to form causal hypothesis.  While this is indeed possible, it is dangerous to make direct intuitive leaps by taking statistically significant correlation to imply causation.

A good scientist knows, based on an understanding of the philosophy of science (i.e. a reading of Thomas Kuhn’s ‘The Structure of Scientific Revolution’ and Popper’s ‘The Logic of Scientific Discovery’), that there are challenges inherent in basing scientific theory purely on inductive experimentation (i.e. all the swans we see are white, therefore there are no black swans).  Popper proposes that scientific theory testing is a process of ‘falsifying’, or continually proving, via empirical observation, that the theory is “not, not true” (what is termed by researchers and scientists to be ‘rejecting the null hypothesis’, a mainstay codified in regression analysis significance testing).

Popper is notable for having established a strong argument that no scientific theories can result from pure empirical testing, or induction.  He argues that our ‘bootstrap’ in the sciences, to continually formulate and falsify theories via testing, is in fact a form of deductive reasoning, whereby a working theory holds as long as testing ‘falsifies’ its empirical premises.  A more detailed explanation of this potentially confusing, yet powerful proposition can be located here:

For business analytics professionals and data scientists (the latter being more oriented toward big data and focused computational analysis utilizing refined statistical techniques), the crucial point is that if we defer to our computer / software tool in the formation of causal hypothesis, we run the risk of identifying spurious correlation or phantom patterns.

Going back to our original examples, letting our SAS or R tool loose on a large dataset to establish working theories (i.e. via multi-regression analysis or using a stepwise regression or deferring to a neural network algorithm) runs the risk of conflating correlation with causation:  the rooster causes the sun to rise, bald men are prone to headaches, horses with one single name win races, living in Alabama predisposes you to Republicanism.

There may be a component of causal significance suggested by observations of correlation, but typically the indication is of a missing variable or variables and of a corresponding need to collect more data and to refine the experimental hypothesis (to continue down the path of falsification as outlined by Popper).  For addition information on critiques of allowing observed patterns in large datasets to generate autonomous hypothesis, see

An example may be that men experiencing pattern balding statistically have a genetic predisposition toward higher levels of testosterone.  Testosterone contributes to a weaker ability to moderate physiological stress responses, which leads to more frequent daily adrenaline surges, which causes frequent spikes in heart rates, which leads to high blood pressure and stress, which leads to higher statistical incidences of migraine headaches.

This is how medical researchers approach a problem, a process called evidence-based management (  Notably, while there is a reliance on empirical testing, the cart is not put before the horse:  a working experimental model that has some logical concurrence with the experience of experts is continually tested, modified, and refined via the process of deductive falsification.  Thus, evidence-based management is not an open license to all computer algorithms to form theory, but is a template for allowing researchers to continually refine deductive theory.

The final problem, phantom correlation extrapolated to causation, leads back to the discussion of System 1 and magical thinking.  Our computational data analysis tools are becoming more and more sophisticated, even as we are able to crunch larger and larger datasets.  Similar to the obsessive-compulsive seeing the number 23 everywhere and anywhere, simply because yesterday they happened to have seen the number 23 twenty-three times, an algorithmic data analysis may ‘deify’ trends or patterns in a dataset which are statistically significant, but which occur purely by virtue of the particular sample set having had abnormal significant variations.

In any sufficiently large set of variables (as opposed to number of incidents), there is an increasing chance that spurious or serendipitous correlations emerge.  This can be cured by gathering larger sets of incidents, but even in a dataset with millions of incidents, there is the chance that some bias in the sample dataset predisposes certain patterns to emerge (i.e. national, cultural, temporal / historical, or geographic biases).  Examples include Sigmund Freud concluding that all humans are inherently neurotic and preoccupied with sex by treating and observing neglected upper-middleclass 19th century Viennese housewives seeking help for hysteria. Another example concerns the growing realization that most theories emerging from experimental psychology apply uniquely to 20th century American college students (otherwise the ‘white mice’ of the bulk of experimental psychology research:

To wrap up, we have taken a ‘deep dive’ into a problem which afflicts the human mind, and, arguably, increasingly our own computational tools (by virtue of our own improper framing or of drawing conclusions too quickly based on suggestions from a sophisticated algorithm).  The root problem is that we as humans have a tendency to quickly extract patterns, and this results in a tendency to quickly form causal hypothesis (working theories) based upon the observations of correlative phenomenon (patterns).

There is no quick fix:  we must continually battle against our tendency to leap to conclusions, driven by Kahneman’s System 1, and to ‘think like scientists’ by consciously engaging System 2.  We must continually struggle to realize the goals of the Enlightenment in ourselves: applying the scientific method where magical thinking abounds.  Acknowledging and following Popper’s principle of falsification, that no theory is finally and conclusively ‘proven’, only continually proven ‘not, not true’ (rejecting the null hypothesis), keeps us honest and humble, whether we be scientists, data scientists, business people, or simply superstitious humans seeking better and better explanations.


Crocker, J. (June 2013). Numbers don’t lie and other myths. INFORMS: OR/MS Today, June 2013, p16.

A good related post on another blog, Analytics Vidhya, related to segmentation:

, , , , , , , , , , , , , , , ,

About SARK7

Scott Allen Mongeau (SARK7) is an INFORMS Certified Analytics Professional (CAP) and a Data Scientist in the Cybersecurity business unit at SAS Institute. Scott has over 20 years of experience in project-focused analytics functions in a range of industries, including IT, biotech, pharma, materials, insurance, law enforcement, financial services, and start-ups. Scott is a part-time PhD (ABD) researcher at Nyenrode Business University. He holds a Global Executive MBA (OneMBA) and Masters in Financial Management from Erasmus Rotterdam School of Management (RSM). He has a Certificate in Finance from University of California at Berkeley Extension, a MA in Communication from the University of Texas at Austin, and a Graduate Degree (GD) in Applied Information Systems Management from the Royal Melbourne Institute of Technology (RMIT). He holds a BPhil from Miami University of Ohio. Having lived and worked in a number of countries, Scott is a dual American (native) and Dutch citizen. He may be contacted at: All posts are copyright © 2015 SARK7 All external materials utilized imply no ownership rights and are presented purely for educational purposes.

View all posts by SARK7


Subscribe to our RSS feed and social profiles to receive updates.

3 Comments on “Correlation versus Causation: The Science, Art, and Magic of Experimental Design”

  1. social media management Bedford Says:

    Very nice post. I just stumbled upon your weblog and wantrd to say that I’ve truly enjoyed sufing around your blog posts.
    After all I will be subscribing to your rss feed andd I hope you
    write again soon!



  1. Data science as an experimental process: unsupervised and supervised learning | BAM! Business Analytics Management... - August 17, 2013

    […] a companion to my recent post “Correlation versus Causation: The Science, Art, and Magic of Experimental Design”, I wanted to offer a more technical exposition concerning data science approaches to focused […]

  2. Analytics and Belief: The Struggle for Truth | BAM! Business Analytics Management... - September 8, 2013

    […] underlying, deeper causes which beg human interpretation.  The recent article on the problem of conflating correlation with causation covers this topic in more […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: