A Bayesian Method for Combining Results from Several Binomial Experiments

Article excerpt


Consider a collection of I independent binomial experiments, with experiment i having size [n.sub.i] and success probability [[Theta].sub.i], i = 1, . . . , I. A typical Bayesian hierarchical approach would assume the [[Theta].sub.i]'s to be exchangeable, (see, for example, Albert and Gupta 1983, sec. 3, and Leonard 1972). Exchangeability at times may appear to be too strong an assumption; for example, when the value of a relevant covariate is available for each experiment and, more generally, when one suspects that the experiments may have various degrees of similarity in some respect. Even when no specific prior information is available, one may wish to adopt a more flexible approach, involving entertaining several partial exchangeability structures for the [[Theta].sub.i]'s and then combining the corresponding inferences. This idea has been suggested by Malec and Sedransk (1992) and implemented with normal data. Within a nonhierarchical setting, a very similar approach underpins the concept of "partition models" introduced by Hartigan (1990), later applied in a number of articles with Barry (see, for example, Barry and Hartigan, 1993).

Consider a partition of the experimental set {1, . . . , I}, whose subsets are denoted by [S.sub.1], . . . , [S.sub.d]. The basic assumption is to regard as exchangeable only the [[Theta].sub.i]'s associated with experiments belonging to the same partition subset [S.sub.k], whereas the [[Theta].sub.i]'s relative to experiments in distinct subsets are taken to be independent. Typically, there will be several partitions g whose relative plausibility is described by a prior probability mass function p(g). The final inference on [Theta] = ([[Theta].sub.1], . . . , [[Theta].sub.I]) will be a mixture, over the posterior distribution p(g[where]data), of the inferences on [Theta] given g. We emphasize that this methodology is especially suitable when the number of experiments is small or there exists substantial prior information on the possible ways in which the experiments may cluster together.

The advantage of our analysis lies essentially in its ability to borrow strength from several related experiments without imposing a prespecified dependence structure on the [[Theta].sub.i]'s. For this reason, our procedure appears especially valuable in the study of binary response data with categorical covariates and in the area of Bayesian meta-analysis (see, for example, Morris and Normand 1992).

The method adopted in this article may be seen as the Bayesian counterpart of the frequentist approach leading to multiple shrinkage estimators, pioneered by Efron and Morris (1973) and developed by George (1986). A related line of research has taken place in the field of nonparametric empirical Bayes estimation (see Laird 1978 and Leonard 1984).

The organization of the article is as follows. Section 2 introduces the basic hierarchical model for a given partition g, expressing the prior dependence structure among the [[Theta].sub.i]'s. It then derives the posterior expectation and covariance matrix of [Theta] for a given g in a closed form using an approximation to the beta-binomial likelihood and, finally, computes the posterior distribution of g. Section 3 discusses two proper noninformative priors for [Theta] and presents a detailed analysis of the role played by a specific prior hyperparameter. Finally, Section 4 illustrates the methodology with reference to three sets of real data, showing in particular how classifying factors might help choose the collection of partitions g and assess the sensitivity of the analysis to variations in the prior inputs. It also presents an empirical comparison with alternative methodologies, such as nonparametric and parametric empirical Bayes, standard logistic regression, and a class of combined general Stein estimators.


Let [X.sub.i] given [[Theta].sub.i] be independently distributed as binomial ([n. …