The New York Times recently ran an excellent overview of the evolving state of data analytics.
Big Data’s Impact in the World – NYTimes.com: “The story is similar in fields as varied as science and sports, advertising and public health – a drift toward data-driven discovery and decision-making. “It’s a revolution,” says Gary King, director of Harvard’s Institute for Quantitative Social Science. “We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”
Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web – online searches, posts and messages – with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.
This is the latest iteration in the ongoing interplay between judgment and evidence in decision making. Which means it’s worth considering how this argument has been evolving over time and how new discoveries, technologies, and techniques could change the issues or cause lasting change in how we go about making decisions.
Probability theory traces its roots to a conversation between Blaise Pascal and Pierre de Fermat over how to divide the pot in a card game if it weren’t possible to finish the game. Could you estimate the relative chances of each player winning the game based on the current state of the game and use those estimates to fairly distribute the pot? In other words, how can you use the evidence at hand to make a better decision?
When I was getting an undergraduate degree in probability and statistics (a long time ago), the core issues centered on what inferences you could draw about the real world from limited samples. What kinds of errors and mistakes did you need to protect yourself from? What precautions were appropriate to keep you from going beyond the data? The tools would always find some pattern in the data and we were repeatedly cautioned to take care not to see things that weren’t there.
A few years later, in graduate school, I revisited the topic in various required methods and analytical tools courses. The software tools were more powerful and were still capable of finding the slightest hint of a pattern in the noise. The faculty offered their obligatory cautions, but I watched plenty of students wreaking intellectual havoc with their new power tools, spinning conclusions from the thinnest threads of pattern in the data. For every hundred MBAs who learned to run a multivariable regression, one might read Darrel Huff’s How to Lie with Statistics.
Today, the tools continue to become more and more powerful at teasing out patterns from the data. At the same time, the exponential growth in available data means that we aren’t sampling so much as we are searching for patterns in the population as a whole. What is the lag between the power of our analytical tools and our capacity to apply sound judgment to the results?
Here are some of the questions I am beginning to explore:
- How does statistical inference change as we move from small, representative, samples to all, or most, of the population of interest?
- How do we distinguish between patterns in the data that are spurious and patterns that reveal important underlying drivers?
- When is an arbitrary or spurious correlation good enough to support a business course of action (Amazon doesn’t, and probably shouldn’t, care why “other people who bought title X, also bought title Y.” Calling my attention to title Y leads the incremental sales; who needs a causal model?)
- How does our deepening understanding of the limits and biases of human decision making connect to the opportunities presented in “Big Data”? Here, I’m thinking of the work Dan Ariely on behavioral economics and Daniel Kahneman on decision making.
I would value pointers and suggestions on where to look next for answers or insight.