This week generated a meme which nearly melted the internet: the case of the gold or blue dress. A picture of dress worn at a wedding was seen by some people as being gold and white and others as blue and black. A riotous global argument that cast friends and lovers against one another ensued.
While a seemingly silly issue at base, fact that this went viral points to how quickly we become disturbed when our version of reality is called into question. Anyone who tries to trick a child on a matter of perception will similarly raise angry protestations: people become disturbed when what they see or experience is fundamentally called into doubt. The incident also observes how easily different people can perceive things in completely different ways.
This raises a central challenge for data analytics: how do we validate and socialize the conclusions we derive from data analytics when different people may perceive completely different patterns? How do we validate the conclusions we draw from data?:
Gold and white or blue and black?
The fact that so much consternation was generated by the lack of ability to generate a consensus on the dress color was, as a larger phenomenon, quite interesting. Indeed, this raises a fundamental issue from the philosophy of science: how can we rely upon empirical observations if they too are filtered by our perceptions. And, if our perceptions lead to different conclusions, how can we achieve consensus in order to form scientific propositions? Such arguments led the 20th century attack against pure positivism.
The fact that so many people apparently saw two markedly different color schemes deeply disturbed (or perhaps just deeply annoyed) many. Even when we do reach a consensus, there is always the looming threat that new information may appear which invalidates our original assumptions (the famous black swan of Karl Popper). How do we know anything at all, especially when the world cannot agree on the color of a dress?
Popper applied a structured set of arguments to posit that we cannot know anything through pure induction (i.e. pure empirical or sensory observation). Even when perceptions are validated by consensus and scientific measurement, the possibility of future disconfirming evidence always exists. As such, he proposed that we can only falsify theory – we can only propose that a hypothesis is not NOT true based on observations. As well, for a working theory to be valid, it must clarify the conditions under which the theory can be disproven: the principle of falsifiability.
The power of reason: Karl Popper
In the case of the fateful dress, we can only posit an empirical perception of the picture, which is essentially an opinion, but the “real” truth is found by actually locating the dress and viewing the “real” color in strong light (this was done and the answer is… BLUE!). Yet, is this enough: can we also admit that there are similar dresses that may appear blue-like under certain conditions, but are actually gold?
Why are we spilling electronic ink on this? It is a fundamental, but troubling topic which underlies the analytics movement: how do we validate conclusions drawn from data analytics? The answer is that we must frame conclusions as hypothesis which are capable of being falisified (of being proven false). Additionally, we must supply the caveats for any confirmatory evidence that has been gathered and produced.
For instance, lets say we are proposing that a particular type of customer likes a particular type of product more than another. We can use a statistical test to examine correlation via significance testing. Such a test attempts to show that there is a lack of evidence for the null hypothesis, that is, for the disconfirming case that there is no correlation. If we reject the null hypothesis (as per falsification), we have suggested that at some threshold level (i.e. 95%), evidence of NO correlation is not supported based on our sample.
The crucial thing for data scientists is to keep in mind that we have only proven that the proposition is not NOT true in the specific test on the specific data sample. Thus, we admit that more data may change the conclusion and that the theory is always open to being disproved. Ideally the conditions for disproving the theory become clearer in the process. For instance, the test may suggest that a different data sample would be useful to increasing confidence.
Without this very frank treatment of the errors possible in both sampling and hypothesis testing, data analytics experts can indeed reach a point where a blue dress appears gold. We can fool ourselves and believe, through tricks of perception, that we have validated something when in fact we are seeing a trick of the light…