In any business today we’re regular consumers of reports that indicate where we are and suggest where we’re going. They are often the distillation of a network of complex interconnected processes, systems and personalities and are intended to present that complexity in a digestible way that imparts in us the ability to see what’s happening and to be able to make a decision based upon those ‘facts’, but do they?

In one of Guinness’ myriad cult TV ads comedian Vic Reeves states “88.2% of statistics are made up on the spot¹” – a quote I often use, although the precise number is rarely the same one twice. It’s a quote that helps illustrate the vagaries of statistics. In principle the numbers are hard, fact based evidence, but there’s always a way to cut it to paint a picture that fits your agenda, and so it’s incredibly difficult to remove these little biases, spin and meaning to get down to the hard facts.

I was reminded of this when a friend and former colleague of mine Steve Fenton posted this article “Why Unique Visitors in Analytics Never Adds Up”. In his example the order and level that you aggregate the data and then sum it has a material impact on the figures that you arrive at, making it entirely possible that the answer you’re looking at is not reflective of the question you’re actually asking – and this for a seemingly simple question “How many unique visitors did I have this week / month / year”. So what hope do we have of truly understanding if the answer we’re getting addresses the question we’re asking when it gets much, much more sophisticated and includes the often black box inferences of Machine Learning?

Enters the realm of data science. In order to fully understand the nuances scientific method and rigour must be applied to defining, designing and developing both the questions and the answers. The data, the measures, the metrics and the facts that exist in the systems we rely upon to inform us of what happened, what’s going to happen and what can we do about it need to be carefully considered and widely understood by those who consume them or the actions we take based upon them may be materially flawed.

In some respects and in some areas of business, the mere fact of having data to act upon and to steer the ship is enough. If the interpretation isn’t pixel-perfect it may not be the end of the world. If the quantitative interpretation is consistently flawed and the trend, the qualitative pattern is what we’re looking for not the minutiae e.g. “I care about did more people come to my website this month than last” and “as long as I’m comparing apple’s to apple’s, the precise number isn’t what I’m worried about” then the macro trend may be enough.

But that’s not always the case. Sometimes the context matters.

An oft quoted example is crime statistics. “Crime statistics are up, blame the government” cry tabloids everywhere, but the reality is far more nuanced. If the crimes reported have increased, is that because we’ve changed the definitions somehow – lowered the bar at which we define something a crime – which might be a good thing if major crimes are down and our perception of what’s now unacceptable has tightened; or that as a society we have renewed confidence in the system and are more willing to openly report a crime; or of course, crime could be growing.

We need to be incredibly careful in how we frame the data which we capture and how we use it. So next time you’re presented with a stat, a fact, a beautifully polished pivot chart, take a moment to consider the provenance of what you’re seeing. What journey has the data I’m looking at gone on? Do I trust the chain through which it’s been supplied to me? And most importantly is it answering the question I think it is?

The numbers may be correct, but how they’ve been sliced, how they’ve been aggregated and any potential spin that’s been applied to them is entirely human and very much open to interpretation.

 

[1] http://news.bbc.co.uk/1/hi/uk/859476.stm