Einstein's Razor: The Principle of Making Scientific Models as Simple as Possible but Not Simpler Is Difficult to Do

Article excerpt

ALBERT EINSTEIN SAID, "EVERYTHING SHOULD BE made as simple as possible, but not simpler." Whether in the physical, biological, or social sciences, this admonition applies, from the hypotheses generated to the explanatory models proposed. Over-simplification could omit one or more important causal variables, or might not adequately capture the interactions among the variables. Overly simplistic models will arouse doubts and fuel skepticism--with justification. Upon reading the article by Frank Borzellieri in SKEPTIC ("Roswell, Aliens, and Belief: Who Believes that Aliens Landed at Roswell?" Vol. 16, No. 4, 21-28), I realized that I was not fully convinced by the explanation of why survey respondents tended (or not) to believe in an alien visitation at Roswell, NM. His conclusions may well be correct, but they are based on a model that may be unnecessarily simplistic.

Although Borzellieri captured many variables in his survey, his conclusions devolved from a set of isolated bivariate analyses. That approach alone is sometimes problematic in social science research. But I also had suspicions that not all relevant variables had been captured in the survey. Furthermore, the survey data were self-volunteered. Such data may have biases that arise because the respondents hold very strong opinions, which they are predisposed to "share" whenever given the opportunity. Conversely, others may not want to respond to a survey, fearing ridicule for their opinion. Less committed individuals may not care to get involved at all in a study that holds little interest for them. Still others may never have heard of Roswell, yet feel compelled to express an opinion.

These potential problems and biases are, in fact, common in scientific research, and I have spent a career implementing methods to deal with them. Thus, Borzellieri's study provides an excellent opportunity to demonstrate how to design and analyze a multivariate study in a complete, transparent, and rigorous multivariate framework. The technology that enables these improvements is the Bayesian Belief Network (BBN). BBNs provide a computational, probabilistic framework for modeling causal relationships among any reasonable number of variables. This somewhat opaque statement should become less so as we progress through an example. For the best learning experience, download the BBN modeling program Netica from its developer's (Norsys) web site (Norsys.com). The program is free and fully functional, though it has limitations unless licensed. In addition to the program, you will need the network described below, which is available at awkml.com/Roswell_BBN.zip. This network appears in Figure 1. Be aware that the network is a "toy," that is, a teaching tool only. It is not based on the data collected by Borzellieri, and does not reproduce or challenge his results. That being said, it is not an arbitrary network, and a research-grade network would probably be quite similar to this one in many regards.

In the next few paragraphs, I just scratch the surface of the BBN approach to probabilistic data modeling, analysis, and hypothesis testing as an example of how to think about applying Einstein's principle of simplifying without over-simplifying. One hopes this discussion will be as simple as possible, but not simpler. The Norsys web site has abundant information about BBNs, with numerous examples, and a web search will produce abundant additional information, not all of it reliable, of course. Basic to our usage, BBN system models are rigorous probability models (unlike, say, fuzzy-logic models), and can address linear and nonlinear dependencies.

Most researchers have familiarity with regression analysis: one variable's response or value is predicted from the value of one or more predictive or covariate variables. The goal is simply prediction and usually nothing else. The issue of causality does not arise. In a sense, a BBN generalizes and expands upon a regression analysis. …