Predicting the Future Is Hard: Building Better Models, from Elections to Financial Markets

Article excerpt

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't, by Nate Silver, Penguin Press, 544 pages, $27.95

The Physics of Wall Street: A Brief History of Predicting the Unpredictable, by James Owen Weatherall, Houghton Mifflin, 304 pages, $27

[ILLUSTRATION OMITTED]

[ILLUSTRATION OMITTED]

HUMAN BEINGS naturally look for patterns in the mess of events and data that surrounds us. Groping for hidden architecture is an evolutionary response to a complex world. In general it serves us well, but we risk detecting patterns where none actually exist.

Sometimes we can learn after the fact that our pattern-based predictions were incorrect, and we update and move on, ideally with more humility and an updated mental model for the future. But biases often persist even after correction, especially when the subject of our attention is something with deep emotional roots, like the predicted outcome of an election.

Given the power of pattern recognition and our inherent biases, how do we separate the signal from the noise? That question has intrigued statisticians for centuries, including the statistician of the moment, Nate Silver. In The Signal and the Noise, the well-known New York Times poll-watcher examines the phenomenon of prediction. Silver asks how, in the face of uncertainty, we can separate meaningful patterns from the vast amount of information and data available to us.

Our innate cognitive limitations and biases, the biases arising from our use of perception, and the biases we introduce into prediction due to our interpretation and analysis all combine to distort rather than clarify. As Yogi Berra once observed, "Prediction is very hard, especially about the future."

Prediction involves a theoretical model to formulate a hypothesis, an empirical model to gather and analyze the (necessarily incomplete) data to test that hypothesis, and a method of evaluating the inferences drawn from those models to see if the theoretical and empirical models can be improved, in order to generate better future predictions.

Silver argues that better models and more successful predictions come from applying Bayesian reasoning, which revolutionized statistics in the 18th century and is used in engineering, medicine, and economics to analyze data. Bayesian reasoning involves formulating a probability of an event's occurrence, then updating that probability as new data arrive. Silver uses the example of finding a strange pair of underwear in your partner's drawer. A Bayesian analysis of whether your partner is cheating on you requires a hypothesis (cheating), an alternative hypothesis or reason why the underwear would be there, and a prior probability you would have assigned to the cheating hypothesis before finding the underwear. This prior is crucial. Given estimates of these variables, you can calculate an estimate of the probability that your partner is cheating on you, which you can express as a degree of confidence in the cheating hypothesis.

A fundamental Bayesian insight is that we learn about the world (and its patterns) incrementally. As we gather more data, says Silver, we get "closerand closer to the truth" (emphasis in original). Thus we can refine our models and perform better approximations, yielding more accurate estimates of our confidence in the truth of the hypothesis.

Silver has applied these techniques in formulating statistical models in poker, in baseball, and most famously in U.S. presidential elections. (In 2008 he accurately predicted the outcome in 49 out of 50 states. In 2012 he was right about 50.)

The Bayesian approach to probability and statistics is not the only one, and it is not always intuitive. The largest debate in probability theory arises between the Bayesian and the frequentist approaches. Frequentists interpret the probability of an event as a relative frequency of its occurrence, which is defined only in reference to a base set of events (for example, the probability of heads in a large number of coin tosses). …