Wednesday, 15 April 2015

A reminder that "Big Data" analysis isn't research it's sympathetic magic


IF we analyse the principles of thought on which magic is based, they will probably be found to resolve themselves into two: first, that like produces like, or that an effect resembles its cause; and, second, that things which have once been in contact with each other continue to act on each other at a distance after the physical contact has been severed. The former principle may be called the Law of Similarity, the latter the Law of Contact or Contagion. From the first of these principles, namely the Law of Similarity, the magician infers that he can produce any effect he desires merely by imitating it: from the second he infers that whatever he does to a material object will affect equally the person with whom the object was once in contact, whether it formed part of his body or not. (from The Golden Bough, Sir James Frazer)

The current obsession with 'big data' should concern us - not because the data is useless but because it makes people think in peculiar (and worrying) ways:

But with the advent of “big data” this argument has started to shift. Large data sets can throw up intriguing correlations that may be good enough for some purposes. (Who cares why price cuts are most effective on a Tuesday? If it’s Tuesday, cut the price.) Andy Haldane, chief economist of the Bank of England, recently argued that economists might want to take mere correlations more seriously. He is not the first big-data enthusiast to say so.

This quote is from Tim Harford and describes what I refer to as sympathetic magic. We pile up enormous mountains of data and interrogate that data with clever computer technology (that mostly we didn't create and don't understand), find correlations and make sweeping assumptions based on the correlations we do find - as opposed to the myriad other correlations we haven't found.

So what that chap from the Bank of England is saying is that if we pull this lever here and press that button there it does seem that this result occurs. We've no idea why it occurs or even whether, given a different set of instructions to the clever computer technology, we'd get the same correlation again. Yet the economist takes the result spat out by the big data black box and declares it to be scriptural - the latest set of levers and buttons that will set the economy on the right course.

All the acolytes of that economist then produce graphs showing the results of all that wonderful (and essentially magical) data-crunching. Until such a time as a different mountain of data or a different analysis tool produces a different set of buttons and levers to press or pull. This continues in cycles as the followers of one or other school of magic contest to either create new answers or - more commonly - to argue backwards and forwards why the other school is wrong.

Back in 1990, before all this Internet lack, us direct marketers were playing with big databases - the geodemographics and psychographics economists and such folk think are new and exciting were the tools we used. We experimented with expert systems and with emerging data mining tools of one sort or another. And we discovered that the results of such analyses (prices cuts are more effective on Tuesdays or whatever) were very useful. But not as useful as we'd like them to be. Big data analysis was still no substitute for information about real purchase behaviour meaning that the database analyses were more useful as a planning tool than as a pointer to where marketing investment might work best.

Much of macroeconomics - for all the volume of learned interpretation it generates - falls into this trap. There is a great deal (too much probably) of information but what matters isn't how much data we have but the tools we use to assess that data. And these tools provide conflicting information meaning that there simply isn't a right answer - other than that something should be done to direct the economy.

None of this is to say we shouldn't analyse that data, crunch those numbers, try to understand what these Big Data runes tell us about the world. But we should do it with humility and should recognise that this is not real knowledge but rather a chimaera of knowledge - real knowledge is to know, for now at least, the causes of something:

Do Big Data help us establish ‘causation’ more accurately? No. But new and unexpected patterns might emerge that suggest how combinations of risks interact unexpectedly.

Though even then some patterns are just, well, luck. Their probative value can not be assumed. Quick, give me another grant! We need more data to help us understand what Big Data are telling us!

Such is the nature of this big data thing. Yet we assume - because there is so much information - that the answers it spews out will be better, more true. To which I reply with this research:

Demographic segmentation variables are cheap and easy to measure, while psychographic variables are more expensive and harder to measure, but can provide more insight into consumers’ psychology. Suggests that a prima facie case exists for the suitability of astrology as a segmentation variable with the potential to combine the measurement advantages of demographics with the psychological insights of psychographics and to create segments which are measurable, substantial, exhaustive, stable over time, and relatively accessible. Tests the premise empirically using results from a Government data set, the British General Household Survey. The analyses show that astrology does have a significant, and sometimes predictable, effect on behavior in the leisure, tobacco, and drinks markets. 

This is Big Data analysis. Do you believe it?

....

1 comment:

FrankC said...

"things which have once been in contact with each other continue to act on each other at a distance after the physical contact has been severed."

Sounds like those tangles protons, or whatever, that physicists are playing with.