I love data – I love the unfolding stories they contain, but a warning for the untrained, caution is needed when drawing conclusions from data.
I have a background in Econometrics and Data Analysis having completed an MSc a few years back. Now, that really opened my eyes to everyday assumptions that people make when dealing with data. Data is biased – pretty much always, & our assumptions are not much better. We don’t have the required understanding to deal with data correctly.
One of the most common misunderstandings is around endogeneity, essentially, biased data.
Here’s a great example of biased data leading to dubious conclusions – inspired by “How Not To Be Wrong” by Jordan Ellenberg – a great read if you’re into a bit of maths!
During the Second World War US Bomber Command were looking at ways to protect their aeroplanes. You don’t want your planes to get shot down, so you put armour on them. But you don’t want so much armour that they become unwieldy and sitting ducks to marauding German Messerschmitts.
So you optimise the placement of the armour.
Bomber Command enlisted some mathematicians to help solve the optimisation problem, presenting them with the data below.
Section of Plan
|Bullet holes per square foot
|Rest of plane
Where would you place the armour?
Take a minute to look at the data and decide where you should place the armour.
Bomber Command saw the solution was to create efficiency by only placing armour where it was needed most – i.e. the parts being hit the most.
This is a fatal assumption – literally!
The actual solution is to put the armour where the bullet holes aren’t.
Why put the armour there?
The distribution of bullet holes should be a fairly random distribution and spread evenly over the plane. But this is not borne out by the data. We see far fewer holes in the Engine section.
So where are the missing holes?
The missing holes are on the planes that are not being included in our sample. Those planes never made it back to base because they were too badly damaged. And that is why you put the armour on the engine section.
You need to be careful when working with data that you understand its limitations. If you have biased data because of a biased sample, if you misinterpret causality or if you fail to include factors that influence outcomes; then you can make poor decisions which you believe are based on fact.
Here’s a quick one for you to think about:
Does having a video on your website help improve your Google ranking? A lot of people claim it does.
Indeed, when looked at in aggregate, pages with videos do rank higher. But those videos are being shared and by such actions, creating links back to the website – a known major influence on Google rankings
I’ll leave you with that one!
Finally, if you want to improve your statistical knowledge, “How Not To Be Wrong” is an eye-opening and entertaining read. And it also details how a bunch of MIT students cracked the lottery!