Model Selection and Accounting for Model Uncertainly in Graphical Models Using Occam's Window. by DAVID MADIGAN , ADRIAN E. RAFTERY ADRIAN E. RAFTERY [*] We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximizing predictive ability. But this has not been used in practice, because computing the posterior model probabilities is hard and the number of models is very large (often greater than [10.sup.11]). We argue that the standard Bayesian formalism is unsatisfactory and propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty by averaging over a much smaller set of models. An efficient search algorithm is developed for finding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable log-linear models. For each of these, we develop efficient ways of computing exact Bayes factors and hence posterior model probabilities. For the decomposable log-linear models, this is based on properties of chordal graphs and hyper-Markov prior distributions and the resultant calculations can be carried out locally. The end product is an overall strategy for model selection and accounting for model uncertainty that searches efficiently through the very large classes of models involved. Three examples are given. The first two concern data sets that have been analyzed by several authors in the context of model selection. The third addresses a urological diagnostic problem. In each example, our model averaging approach provides better out-of-sample predictive performance than any single model that might reasonably have been selected. KEY WORDS: Chordal graph; Contingency table; Decomposable log-linear model; Expert system; Hyper-Markov distribution; Recursive causal model. 1. INTRODUCTION Fruitful approaches to inference in high-dimensional contingency tables all involve choosing a broad class of models to be considered and then comparing them on the basis of how well they predict the data. Typically, the model classes are huge, and inference in the presence of the many competing models is not easy. Here we consider two classes of graphical models: the recursive causal models of Kiiveri, Speed, and Carlin (1984) and the decomposable log-linear models introduced by Goodman (1970) and Haberman (1974). This work is motivated by applications in expert systems that use a belief network to represent knowledge and perform inference (Lauritzen and Spiegelhalter 1988). These are the two model classes that arise in such applications. Potentially the most important advantage of constructing expert systems in this fashion is the system's ability to modify itself as data become available. In a series of recent papers, Spiegelhalter and Lauritzen (1990a, 1990b), Dawid and Lauritzen (1993), and Spiegelhalter and Cowell (1991) have addressed the issue of updating the quantitative layer of such models. Building on this work, we address the issue of updating the qualitative layer: How can the graphical structure itself be updated as data becomes available? Currently, the most commonly used approach to model selection in contingency tables is a stepwise one, adapted from stepwise regression by Goodman (1971); see also Bishop, Fienberg, and Holland (1975, sec. 4.5 and chap. 9). This consists of sequentially adding and deleting terms on the basis of approximate asymptotic likelihood ratio tests, leading to the selection of a single model. Inference about the quantities of interest is then made conditionally on the selected model.There are several difficulties with this approach. The sampling properties of the overall strategy are complex because it involves multiple tests and, at least implicitly, the comparison of nonnested models (Fenech and Westfall 1988). The use of P values themselves is controversial, even when there are only two models to be compared, because of the so-called "conflict between P values and evidence" discussed by Berger and Sellke (1987) and Berger and Delampady (1987). One aspect of this is that tests based on P values tend to reject even apparently satisfactory models when the sample size ... |
To continue reading this publication, you must have a Questia Subscription.Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.