Academic journal article The Psychological Record

Experimental Tests of Subjective Bayesian Methods

Academic journal article The Psychological Record

Experimental Tests of Subjective Bayesian Methods

Article excerpt

Statistical methods are subject to dual evaluation: Both their mathematical and practical properties are important. Although the mathematical properties of various statistical methods have been studied extensively and formally, the evaluation of statistical practice by field tests has mostly been informal and anecdotal. Subjective Bayesian methods can be viewed as an exception to this: The substantial literature on subjective probability judgments can be viewed as relevant to the evaluation of these methods. The large literature on probability miscalibration of subjective prior estimates casts doubt on their use in Bayesian estimation. However, an interesting defense of subjective Bayesian methods was offered by Samaniego and Reneau (1994). In a classroom experiment, using a novel method to elicit subjective priors, they showed that their students were mostly good Bayesians. The students generated point estimates that had smaller average squared error than the best frequentist estimators. This result could be regarded as a small yet important empirical verification of the benefits of subjective Bayesian practice.

We discuss the relationship of the calibration literature to subjective Bayesian methods, the novel method used by Samaniego and Reneau and some issues arising from this method. We then report three experiments designed to investigate the usefulness of their proposed method.

Background

Calibration and Interval Estimation in Statistics

A series of subjective probabilities is said to be calibrated if, grouping together all events with subjective probability near p, the proportion of them actually found to occur is [approximately equal to] p. Many studies of calibration have subjects estimate directly a subjective interval with a specified probability (e.g., 50% or 80%) of containing a particular parameter. Often these are "almanac" questions--parameters such as the lengths famous rivers, and so forth. Such studies generally show poor calibration, typically in the direction of overconfidence (Keren, 1997; Lichtenstein, Fischhoff, & Phillips, 1982). For example, Alpert and Raiffa (1982) found that when respondents tried to provide interval estimates with a 50% probability of including the true value, only about 33% of the intervals produced actually did include it; for 98% target intervals, actual coverage was only 57%. Extensive efforts to rectify poor calibration have met with limited success (Arkes, Christensen, Lai, & Blumer, 1987; Fischer, 1982; Fischhoff & Bar-Hillel, 1984; Koriat, Lichtenstein, & Fischhoff, 1980; Van Lenthe, 1994). However, the conditions under which overconfidence occurs are only partly understood, despite extensive research (Keren, 1997; Tversky & Kahneman, 1974; Wright & Wisudha, 1982; Yaniv & Foster, 1997). Expertise also plays an important role, but the effects are mixed. Generally, weather forecasters, odds makers, and certified property appraisers are well-calibrated (Hoerl & Fallin, 1974; Murphy & Winkler, 1977; Spence, 1996) and more experienced technical operators have shown less overconfidence than inexperienced operators in their area of expertise (Cooke, Mendel, & Thijs, 1988). This has not held true in the case of doctors' diagnoses (Christensen-Szalanski & Bushyhead, 1981; Oskamp, 1965). Method of elicitation is also important (Fischer, 1982; Hogarth, 1975).

Although calibration questions can be raised for all sorts of probabilistic forecasts of events, the questions most relevant to statistical practice concern interval estimates. Classical statistics is much concerned with "coverage probability" for confidence intervals: 95%-confidence intervals are supposed to include the true parameter value with probability 0.95; this is a calibration criterion, though not ordinarily concerned with subjective confidence. Turning to Bayesian methods, it is natural to ask that Bayesian posterior distributions, and in particular, posterior credible intervals, should be approximately well-calibrated. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.