The Random Effects P^sub Rep^ Continues to Mispredict the Probability of Replication
Iverson, Geoffrey J., Lee, Michael D., Wagenmakers, Eric-Jan, Psychonomic Bulletin & Review
In their reply, Lecoutre and Killeen (2010) argue for a random effects version of p^sub rep^, in which the observed effect from one experiment is used to predict the probability that an effect from a different but related experiment will have the same sign. They present a figure giving the impression that this version of p^sub rep^ accurately predicts the probability of replication. We show that their results are incorrect and conceptually limited, even when corrected. We then present a meaningful evaluation of the random effects p^sub rep^ as a predictor and find that, as with the fixed effects p^sub rep^, it performs very poorly.
This reply addresses the two issues raised by Lecoutre and Killeen (2010; hereafter, LK). The first is their claim that we conflated two probabilities. The second is their claim that p^sub rep^ is an accurate predictor.
The first issue is easy to address. LK (2010) assert that Iverson, Lee, and Wagenmakers (2009) conflated two probabilities: the probability of coincidence and Killeen's (2005) probability of replication. On the basis of this supposed conflation, LK argue that "ILW's conclusions are irrelevant for Killeen's (2005) statistic" (p. 269). The fact of the matter is otherwise. We did not confuse these two probabilities. In Iverson, Lee, and Wagenmakers-and all of our earlier commentaries (Iverson, Lee, Zhang, & Wagenmakers, 2009; Iverson, Wagenmakers, & Lee, in press)-we used exactly the fixed effects p^sub rep^ definition that appears in the third column of Table 1 in LK. We most certainly did not confuse the statistic p^sub rep^ with the parameter p^sub coinc^ (for probability of coincidence), and we invite readers to verify this for themselves.
The second claim regarding the accuracy of p^sub rep^ is a more important source of disagreement. In their reply, LK (2010) stress a p^sub rep^ that is conceptually different from the fixed effects version, which they claim returns accurate predictions for both simulated and real-world data (LK, 2010, p. 266). Both versions of p^sub rep^ use a known effect size from an experiment. In the fixed effects formulation, p^sup F^ ^sub rep^, the goal is to predict the probability that a replication of the same experiment would yield an effect size of the same sign as the original. In the random effects version, p^sup R^ ^sub rep^, the goal is to use an effect size from one experiment to predict the probability of getting an effect of the same sign from a different experiment, albeit one coming from the same literature. This new formulation seems to us a strange goal for empirical science. Does it make sense to think that, having observed people preferring oval to square faces, we want to predict whether they will prefer natural to morphed faces?
But whatever the conceptual challenges, it is possible to continue analyzing p^sup R^ ^sub rep^ as a statistic. In more or less technical terms, our previous commentaries showed that p^sup F^ ^sub rep^ made poor predictions about the true replication probability. This reply extends those analyses to evaluate p^sup R^^sub rep^.
The Meaning of LK's (2010) Figure 5
The flowchart simulation presented by LK (2010), culminating in their Figure 5, gives the illusion of successful prediction under uncertainty. The abscissa is p^sup R^^sub rep^. The ordinate is a different random effects formulation of p^sub rep^, for which we derive an analytic expression,1 and which we denote p^sup O^^sub rep^. LK use numerical simulation to evaluate this ordinate.
The relationship between the functions p^sup R^^sub rep^ and p^sup O^^sub rep^, for the same set of total sample sizes N as that considered by LK (2010), is shown in our Figure 1A. Each line corresponds to a different sample size, and, by choosing different effect sizes, the whole curve relating the two prep versions can be traced out. We were surprised that these patterns did not seem to agree with Figure 5 in LK, and so we used their flowchart to calculate the results numerically. …