“Big Data has essentially nothing to say about causation. It’s a common misconception that an influx of data flushes cause-effect from its hiding place.”
You recently graduated from a law school, and are still searching for a job. You get a voicemail from your school telling you that they are conducting a survey of whether recent graduates have gotten jobs. If you do not respond, they will assume you have a job. Do you bother to call them back to tell them the disappointing news?
Odds are, you don’t. That’s why law schools use this and other techniques to game the law school metrics, disingenuously boosting their entrance GPAs and LSAT scores, reputational reviews, and post-graduation employment statistics. Too often, unfortunately, those metrics are taken at face value.
In Numbersense, Kaiser Fung argues that we are in the age of Big Data – an age of extensive, personalized information useful for purposes including marketing, economics, and sports, but also a source of confusion, doubt, and increased evidence for theories both good and bad. Numbersense is the willingness to probe behind headline figures and decide if the data is actually meaningful, whether law school statistics or the unemployment rate. We turn to data for answers, but it is too often overwhelming, misleading, or evidence only of correlation, not causation.
The last point is perhaps the most critical. Target, a large shopping chain, was so effective at predicting pregnancy from consumption patterns they accidentally informed parents before the daughter had herself let them know – a triumph for Big Data, if something of an awkward one (details can be found in Charles Duhigg’s report here). Unfortunately, this doesn’t mean buying a large purse causes pregnancy, but simply that they correlate. Regardless of the size of the data set, Fung argues, Big Data shows correlations, not causations.
Big Data has become something of a buzzword in recent years, and the explosion in available information is indeed of huge importance. It is not, however, a panacea, and Fung rightly emphasizes this. Whether giving a how-to manual for Law School Deans looking to game the system, criticizing the Groupon business model, or studying obesity, Numbersense is an entertaining read. It will likely have the most appeal, however, to non-statisticians: Fung has succeeded in creating an almost entirely non-mathematical introduction to big data, explaining the challenges of econometrics without requiring knowledge of statistics, and for that reason alone the book is a worthwhile read. Understanding the difference between headline and core inflation may not induce murder-mystery suspense, but Fung makes it both interesting and enjoyable.
Still interested? Keep reading (or in the UK or Canada). Or, sign up for the Subtle Illumination email list to your right! Disclosure: I read Numbersense as an advance reader copy – it is released tomorrow.