Replication Is Not Coincidence: Reply to Iverson, Lee, and Wagenmakers (2009)
Lecoutre, Bruno, Killeen, Peter R., Psychonomic Bulletin & Review
Iverson, Lee, and Wagenmakers (2009) claimed that Killeen's (2005) statistic p^sub rep^ overestimates the "true probability of replication." We show that Iverson et al. confused the probability of replication of an observed direction of effect with a probability of coincidence-the probability that two future experiments will return the same sign. The theoretical analysis is punctuated with a simulation of the predictions of p^sub rep^ for a realistic random effects world of representative parameters, when those are unknown a priori. We emphasize throughout that p^sub rep^ is intended to evaluate the probability of a replication outcome after observations, not to estimate a parameter. Hence, the usual conventional criteria (unbiasedness, minimum variance estimator) for judging estimators are not appropriate for probabilities such as p and p^sub rep^.
Iverson, Lee, and Wagenmakers (2009; hereafter, ILW) claimed that Killeen's (2005) p^sub rep^ "misestimates the true probability of replication" (p. 424). But it was never designed to estimate what they call the true probability of replication (the broken lines named "Truth" in their Figure 1). We clarify that by showing that their "true probability" for a fixed parameter δ-their scenario-is the probability that the effects of two future experiments will agree in sign, given knowledge of the parameter δ. We call this the probability of coincidence and show that its goals are different from those of p^sub rep^, the predictive probability that a future experiment will return the same sign as one already observed. ILW's "truth" has nothing to do with the "true probability of replication" in its most useful instantiation, the one proposed by Killeen (2005).
The "True Probability of Replication"
Statistical analysis of experimental results inevitably involves unknown parameters. Suppose that you have observed a positive standardized difference of d^sub obs^ = 0.30 between experimental and control group means having n = 10 subjects each.1 You assume the usual normal model with an unknown true effect size δ and (for simplification) a known variance. What is the probability of getting again a positive effect in a replication (d^sub rep^ > 0)? If you are ready to assume a particular value for δ, the answer is trivial: It follows from the sampling distribution of d^sub rep^, given this δ. The true probability of replication is the (sampling) probability [varphi]+|δ (a function of δ and n) that a normal variable with a mean of δ and a variance of 2/n exceeds 0: [varphi]+|δ = φ(δvn/2). If you hypothesize that δ is 0, then [varphi]+|0 = 0.5. Some other values, for different hypothesized δs, are [varphi]+|0.50 = 0.868, [varphi]+|1.00 = 0.987, [varphi]+|2.00 [asymptotically =] 1. These values do not depend on d^sub obs^: It would not matter that d^sub obs^ = 0.30 or d^sub obs^ = 1.30. Of course, for reasons of symmetry, [varphi]+|-δ = 1-[varphi]+|δ.
What was novel about Killeen's (2005) statistic prep was his attempt to move away from the assumption of knowledge of parameter values, and the "true replication probabilities" [varphi]+|δ that can be calculated if you know them. The Bayesian derivation of p^sub rep^ involves no knowledge about δ other than the effect size measured in the first experiment, d^sub obs^. This is made explicit by assuming an uninformative (uniform) prior before observations-hence, the associated posterior distribution for δ: a normal distribution centered on d^sub obs^ with a variance of 2/n. To illustrate the nature and purpose of p^sub rep^, consider the steps one must follow to simulate its value, starting with a known first observation:
Repeat the two following steps many times:
(1) generate a value δ from a normal(d^sub obs^,2/n) distribution;
(2) given this δ value, generate a value d^sub rep^ from a normal(δ,2/n);
and then compute the proportion of d^sub rep^ having the same sign as d^sub obs^. Each particular value of d^sub rep^ is the realization of a particular experiment assuming a true effect size δ, and corresponds to a "true probability of replication" [varphi]+|δ (if d^sub obs^ > 0) or 1-[varphi]+|δ (if d^sub obs^ , 0). …