Reanalyses of Group Telepathy Data with a Focus on variability/Re-Analyses Des Donnees De Telepathie En Groupe Avec Un Focus Sur la variabilite/Reanalisis De Datos De Telepatia Grupal Con Un Foco En la variabilidad/Reanalysen Von Daten Bei Gruppentelepathie Fokussiert Auf Variabilitat
Dalkvist, Jan, Montgomery, William, Montgomery, Henry, Westerlund, Joakim, The Journal of Parapsychology
The vast majority of ESP experiments have been performed and analyzed at the individual level. That is, data have been collected for each participant individually, and the unit of analysis has been the participant, even though the results in general have been summarized at the group level.
One reason why group experiments on ESP are relatively rare is probably the old and widespread opinion that group testing is inferior to individual testing in producing positive results (see, e.g., Rhine, 1947/1971). In line with this negative evaluation, several later studies have failed to produce any positive results (e.g., Haight, Weiner, & Morrison, 1978; Milton & Wiseman, 1999). Positive results have also been reported, however (Barker, Messer, & Drucker, 1975; Carpenter, 1988; Dalkvist & Westerlund, 1998), but attempts to replicate some of these results have failed (Carpenter, 1991; Westerlund & Dalkvist, 2004).
In any case, it would be premature to abandon group testing at this point in time. One reason is that, thus far, too few well-controlled group studies using different designs and types of ESP tasks have been tested to permit any definite assessment of the merits and drawbacks of group testing. For example, most studies have been concerned with clairvoyance or precognition and not with telepathy. Besides the above-mentioned studies by two of us (JD and JW), we know of only one group telepathy study (Auriol et al., 2004). This long-term experimental series failed to demonstrate any deviation from chance expectation with respect to performance level, bur performance variations among experiments that deviated significantly from chance expectation were found.
Another reason for continuing to use group testing is that this method is much less time-consuming than individual testing is. Thus, as long as we are not certain that group testing, in contrast to individual testing, will fail to uncover any ESP phenomena, group testing should be used for purely practical reasons. A further, less obvious, reason for not abandoning group testing in ESP research is that ESP may be critically dependent on social factors, such as the psychological atmosphere in a group of senders or receivers in a telepathy experiment.
Unfortunately, when running group experiments, one is faced with a big statistical problem, called "stacking," which probably has made many researchers refrain from doing group experiments. The problem is this: Due to the possible occurrence of dependency among participants' responses in group testing (e.g., due to the occurrence of a common response bias, such as a tendency on the part of the respondents to give one type of response at the beginning of a run and another type at the end of it), the statistical assumption of independent measures runs the risk of being violated. In general, the stacking effect acts to inflate the results by effectively reducing "n" in any (conventional) statistical test due to the occurrence of positive correlations among participants' responses caused by stacking (for example, when all participants invariably respond in exactly the same way, the effective n is reduced to one).
There are several ways of overcoming the stacking problem, however. One is by statistically correcting the data for the effects (Thoules & Brier, 1970), although this method is in general extremely laborious or uncertain, depending on the specific technique being used.
Another solution is to let the whole group of participants who have been subjected to the same experimental treatment be the measurement object in a statistical analysis, and not the individual participant, the rationale behind this method being, of course, that correlations among responses within groups become irrelevant by this procedure and that no stacking effect can occur among different groups because of the lack of communication among them. There are drawbacks to this method, however. …