Teaching Bayesian Statistics Using Sampling Methods and MINITAB
Albert, James H., The American Statistician
In recent years, the Bayesian approach to statistical inference has received more attention in research in applied and theoretical statistics. It is perceived by many statisticians as a natural paradigm for solving applied problems. However, it appears that the education of undergraduate and graduate students in Bayesian methods has lagged behind the research advances. Many students go through their program in statistics without hearing even a mention of Bayesian ideas. In some courses, such as an introductory mathematical statistics course, Bayesian methods are treated as just an alternative technique for deriving a test or estimator. In these classes, the student gets a very distorted view of the Bayesian paradigm. The student will not get any experience in constructing a prior and understanding how these prior beliefs get updated by data.
One of the hurdles in teaching the Bayesian paradigm is the difficulty in computation. Once the posterior distribution is defined, the student will typically be introduced to a series of examples involving estimation of parameters from standard parametric families. In each example, a conjugate prior distribution is defined so that the posterior distribution can be mathematically derived. He or she does not typically get exposed to the use of nonconjugate priors, since the posterior calculations (for example, mean, standard deviation) need to be computed by numerical methods. On the basis of this experience, the student views Bayes's rule as a mathematical exercise and does not understand that this recipe applies to any likelihood and any prior.
In the last 10 years, a great amount of progress has been made in the area of Bayesian computation. A number of good numerical integration methods have been developed recently for computing the integrals that are common in Bayesian inference. [See Naylor and Smith (1988) and Smith, Skene, Shaw, Naylor, and Dransfield (1985) for recent surveys.] One general class of computation algorithms uses sampling techniques to simulate posterior distributions. One particular simulation method, the Gibbs sampler (Gelfand and Smith 1990), has been shown to be remarkably successful in simulating posterior distributions for a large number of parameters.
Although a number of good algorithms currently exist to perform Bayesian computations, these methods are not widely used. In particular, these computational methods are not taught in an introductory Bayesian course. Two reasons can be suggested for this lack of general use. First, most of these computational methods need to be applied by a user who is familiar with the algorithm and understands when the method has converged or produced an answer of sufficient accuracy. The second and probably most important reason is that there is little Bayesian computer software available. Most of the Bayesian software commercially available is designed for particular inference problems, such as linear models and time series. There is little software available that allows one to implement Bayes's theorem for arbitrary prior and likelihood specifications.
In this article, we explore the use of one particular simulation technique, the Sampling-Importance-Resampling (SIR) algorithm (Gelfand and Smith 1992; Rubin, 1987, 1988), in teaching Bayesian statistics. This algorithm, defined in Section 2, can be viewed as a general approximate method of simulating from a posterior distribution. For teaching purposes, this method has several advantages over alternative numerical integration schemes. First, the SIR algorithm can be performed somewhat automatically for a wide range of Bayesian inference problems. The student does not need to monitor or adjust the parameters of the algorithm to obtain satisfactory answers. Since the method is automatic, the student can focus his or her attention to the construction of the prior and how the posterior summarizes the information contained in the data and the prior. …