On the Probability of Observing Misleading Statistical Evidence

Article excerpt

The law of likelihood explains how to interpret statistical data as evidence. Specifically, it gives to the discipline of statistics a precise and objective measure of the strength of statistical evidence supporting one probability distribution vis-a-vis another. That measure is the likelihood ratio. But evidence, even when properly interpreted, can be misleading--observations can truly constitute strong evidence supporting one distribution when the other is true. What makes statistical evidence valuable to science is that this cannot occur very often. Here we examine two bounds on the probability of observing strong misleading evidence. One is a universal bound, applicable to every pair of probability distributions. The other bound, much smaller, applies to all pairs of distributions within fixed-dimensional parametric models in large samples. The second bound comes from examining how the probability of strong misleading evidence varies as a function of the alternative value of the parameter. We show that in large samples one curve describes how this probability first rises and then falls as the alternative moves away from the true parameter value for a very wide class of models. We also show that this large-sample curve, and the bound that its maximum value represents, applies to profile likelihood ratios for one-dimensional parameters in fixed-dimensional parametric models, but does not apply to the estimated likelihood ratios that result from replacing the nuisance parameters by their global maximum likelihood estimates.

KEY WORDS: Evidence; Evidential paradigm; Law of likelihood; Likelihood principle; Nuisance parameters; Profile likelihood; Support.


An important role of statistical analysis in science is interpreting observed data as evidence--showing "what the data say." Although standard statistical methods (e.g., hypothesis testing, estimation, confidence intervals) are routinely used for this purpose, the theory behind those methods contains no defined concept of evidence and no answer to the basic question "when is it correct to say that a given body of data represents evidence supporting one statistical hypothesis over another?" or to its sequel "can we give an objective measure of the strength of statistical evidence?". Because of this theoretical inadequacy, the use of statistical methods in science is guided largely by convention and intuition (common sense) and is marked by unresolvable controversies, such as those over the proper use and interpretation of "p values" and adjustments for multiple testing (Bower 1997; Cohen 1994; Goodman 1998; Morrison and Henkel 1970).

I have argued elsewhere (Royall 1997, 1998) that the law of likelihood represents the missing evidence concept, and that a paradigm based on this law can generate a frequentist methodology that avoids the logical inconsistencies pervading current methods while maintaining the essential properties that have made those methods into important scientific tools. These properties relate to the measurement and control of the probabilities of certain types of errors or misleading results. This article discusses the probabilities of observing strong misleading evidence--quantities that are fundamental both for evaluating and for implementing the proposed likelihood paradigm.

1.1 The Law of Likelihood

The fundamental questions about interpreting statistical data as evidence are answered by what Hacking (1965) termed the law of likelihood:

If one hypothesis, [H.sub.1], implies that a random variable X takes the value r with probability [f.sub.1] (x), while another hypothesis, [H.sub.2], implies that the probability is [f.sub.2] (x), then the observation X = x is evidence supporting [H.sub.1] over [H.sub.2] if [f.sub.1] (x) [greater than] [f.sub.2] (x), and the likelihood ratio, [f.sub.1] (x)/[f.sub.2] (x), measures the strength of that evidence.

The law of likelihood is an axiom for the interpretation of statistical data as evidence. …