Academic journal article Psychonomic Bulletin & Review

P^sub Rep^ Misestimates the Probability of Replication

Academic journal article Psychonomic Bulletin & Review

P^sub Rep^ Misestimates the Probability of Replication

Article excerpt

The probability of "replication," p^sub rep^, has been proposed as a means of identifying replicable and reliable effects in the psychological sciences. We conduct a basic test of p^sub rep^ that reveals that it misestimates the true probability of replication, especially for small effects. We show how these general problems with p^sub rep^ play out in practice, when it is applied to predict the replicability of observed effects over a series of experiments. Our results show that, over any plausible series of experiments, the true probabilities of replication will be very different from those predicted by p^sub rep^. We discuss some basic problems in the formulation of p^sub rep^ that are responsible for its poor performance, and conclude that p^sub rep^ is not a useful statistic for psychological science.

(ProQuest: ... denotes formulae omitted.)

Searching for significant effects in psychological experiments is a risky business, because data are often sparse and noisy. Killeen (2005a) rightly pointed out that searching for small effects is especially perilous using the contorted logic of null hypothesis significance testing (see Wagenmakers, 2007, for a review). So, in his influential article, Killeen (2005a; see also Killeen, 2005b, 2005c, 2006; Sanabria & Killeen, 2007) proposed a measure-the probability of "replication," p^sub rep^, where replication means "agreeing in sign"-that is claimed to offer hope.

The simplest way to understand p^sub rep^ is to consider the standard situation, in which data are normally distributed with a common known variance σ2, and with an experimental group mean µ^sub E^ and control group mean µ^sub C^. If both the experimental and control groups have n subjects, the observed effect size d is a draw from a normal distribution with mean δ = (µ^sub E^ - µ^sub C^)/σ, where δ is the "true" underlying effect size, and variance 2/n.

Under these assumptions, p^sub rep^ is derived as the probability that both d and an imagined replicate observed effect size d^sub rep^ have the same sign. A standard Bayesian posterior predictive calculation then gives p^sub rep^ = Φ(|d|[the square root of]n/4), as long as a uniform prior is placed on δ (e.g., Doros & Geier, 2005). We give formal details of this derivation in the Appendix, but immediately make three clarifying observations.

First, note that it is important to take the absolute value of the effect size in calculating p^sub rep^. Otherwise, for example, an observed effect d = -2 with n = 25 would give a p^sub rep^ > .00001, corresponding to an extremely strong belief that the replicate effect would have a positive sign, which is ridiculous. We mention this point because it is not very clear in the existing p^sub rep^ literature, where sometimes the absolute value notation has been omitted from key equations.

Second, note that our notation differs from Killeen's, who used n to denote the combined sample size from both the control and experimental groups, whereas we use n for each group separately. We prefer our notation, because it will generalize more naturally to cases where the number of subjects in each group is not the same.

Third, we note that for small sample sizes, Killeen (2005a) promoted the use of an ad hoc correction in which n is replaced by n 2 2 (in our notation). This makes a small quantitative difference that disappears quickly as n increases, but does not change the qualitative pattern of our results nor the substantive conclusions at all.

The General Pattern of Misestimation for p^sub rep^

In this section, we present a general pattern of results that makes it clear that p^sub rep^ is a poor estimator. We do this by comparing the true probability of replication for a fixed effect size (i.e., a δ value) with the estimates of the probability of replication provided by p^sub rep^.

Each panel of Figure 1 shows, for a different sample size n, a broken line corresponding to the true probability of replication for underlying effect sizes from 0 to 2. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.