The Intrinsic Bayes Factor for Model Selection and Prediction

Journal article by James O. Berger, Luis R. Pericchi; Journal of the American Statistical Association, Vol. 91, 1996

Journal Article Excerpt


The Intrinsic Bayes Factor for Model Selection and Prediction.

by James O. Berger , Luis R. Pericchi

I. INTRODUCTION

1.1 Is Another Model Selection Criterion Needed?

We obviously think so, but why? First, we feel that model selection should have a Bayesian basis. This is based not so much on generic Bayesian arguments as on a belief that Bayesian methods of model selection and hypothesis testing are particularly needed for the following reasons:

a. Measures based on frequentist computations, such as P values (in, say, chi-squared testing of fit), are at best difficult to interpret and at worst highly misleading (Berger and Delampady 1987; Berger and Sellke 1987; Delampady and Berger 1990; Edwards, Lindman, and Savage 1963).

b. Analysis of nonnested and/or multiple models or hypotheses is very difficult in a frequentist framework.

c. Non-Bayesian methods have difficulty incorporating "Occam's razor," the notion that if two models explain data equally well, then the simpler model is to be preferred; Bayes factors do this automatically (Jefferys and Berger 1992, Spiegelhalter and Smith 1982 and the references therein), whereas other methods require introduction of ad hoc penalties for model complexity.

d. Prediction is often the real goal and, in accounting for model uncertainty in prediction, Bayesian methods are natural in that they can keep all models under consideration, weighted by their Posterior probabilities (see Draper 1995 for a review and earlier references, and also Sec. 5 herein).

Among the many fine discussions of these issues are those provided by Edwards et al. (1963), Jeffreys (1961), and Kass and Raftery (1995).

A second basic premise of our motivation is that one needs automatic methods of model selection. Within the Bayesian community there has been continual debate over whether subjective or objective Bayesian methods should be used; most Bayesians today accept that both methods can be useful. The argument in favor of automatic methods of model selection is particularly compelling, because one often initially entertains a wide variety of models, and careful subjective specification of prior distributions for all parameters of all the models is typically not feasible.

Unfortunately, operation in strict accordance with these two basic premises is not possible. The reason is that Bayes factors in hypothesis testing and model selection typically depend rather strongly on the prior distributions, much more so than in, say, estimation. (For instance, as the sample size grows, the influence of the prior distribution disappears in estimation, but does not in hypothesis testing or model selection.) And, for most model selection problems, one cannot use standard improper noninformative priors; such priors are defined only up to a constant multiple, and the Bayes factor is itself a multiple of this arbitrary constant.

The best one can hope for is thus a method that is automatic and yet produces actual Bayes factors corresponding to reasonable (proper) prior distributions. An obvious way to achieve this is simply to choose "conventional" proper prior distributions for testing or model selection, priors that seem likely to be reasonable for typical problems. This was the approach espoused by Jeffreys (1961), who recommended specific proper priors for certain standard testing problems. This approach is arguably about the best that can be done from a default perspective, but it is very difficult to implement in general, requiring careful selection of a default prior for each specific situation.

In this article, we introduce a technique for developing default Bayes factors, which we call intrinsic Bayes factors (IBF's). These can be constructed in very general situations - nested, nonnested, and even irregular problems - and they seem to correspond to actual Bayes factors, at least asymptotically.

1.2 Preliminaries

Models [M.sub.1], [M.sub.2],...,[M.sub.q] are under consideration, with the data X having density [f.sub.i]x\[[theta].sub.i]) under model [M.sub.i]. (The densities are assumed to be taken with respect to a common measure, which is otherwise irrelevant to our analysis.) The parameter vectors [[the].sub.i] are unknown and are of dimension [k.sub.i].

Bayesian model selection proceeds by selecting prior distributions [[pi].sub.i]([[theta].sub.i] for the parameters of each model, together with prior probabilities [p.sub.i] of each model being true. The posterior probability that [M.sub.i] is true is then

[Mathematical Expression Omitted] (1)

where [B.sub.ji], the Bayes factor of [M.sub.j] to [M.sub.i], is defined by

[Mathematical Expression Omitted] (2)

here [m.sub.j] (x) is the marginal or predictive density of X under [M.sub.j].

Although we use this standard Bayesian language, note that one does not strictly have to assume that one of the models is true, in particular, [B.sub.ji] can be viewed as the "weighted" likelihood ratio of [M.sub.j] ...

End of free preview...

 To continue reading this publication, you must have a Questia Subscription.

Try Us Today! Click Here

Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.

Already a subscriber? Login:

Sponsored Links
Read more than 5,000 classic books FREE!
Free Newsletter
Get helpful how-to's, writing tips, search strategies, quizzes & more!
Search the Library

Customize your search: Search within the topic


Search in:
Books Journals Magazines
Newspapers Encyclopedia Research Topics
  • Type your specific word or phrase in the box above after the word and, then click Search.
  • Put exact phrases in double quotation marks. Do not put single words in quotation marks.
Back to top



Sponsored Link