Planning Significant and Meaningful Research in Exercise Science: Estimating Sample Size
Thomas, Jerry R., Lochbaum, Marc R., Landers, Daniel M., He, Chunxiao, Research Quarterly for Exercise and Sport
Sensitivity in research involving treatment is defined as the ability to detect a real difference between two or more groups on some variable of interest (Lipsey, 1990). Specifically, the sensitivity of research experiments depends on six factors (Lipsey, 1990, p. 14-15):
1. Effect size: the magnitude of the "real" effect to be detected;
2. Subject heterogeneity: individual differences among members of the relevant population on the dependent variable of interest;
3. Sample size: the size of the sample taken from the population to constitute the experimental groups;
4. Experimental error: procedural variation in the way members of the experimental groups are treated during the research;
5. Measurement: muted on inconsistent response of the measurement instrument to the outcome of interest;
6. Data analysis: the inherent power of the statistical data analysis technique employed to test the difference between experimental groups.
At the end of an experiment, the researcher is left with a set of data in which some or all of the above issues are represented, depending on the planning and control in the study.
To make sense of the data set, the researcher should establish the power of the study - the chances of making a correct decision about whether a real difference exists between the treatment groups. The researcher should establish power to estimate sample size during the planning of the study; however, to do so requires knowledge about the sensitivity factors discussed by Lipsey (1990). Of course, standard parametric and nonparametric statistical procedures are well known for detecting significant differences between the treatment groups. A real difference is a reliable one, because the null hypothesis is rejected consistently in replications of the research. But, as Cohen (1990, p. 1308) has said about behavioral research, "the null hypothesis . . . is always false," meaning that given enough subjects, very small and trivial differences are declared statistically significant. The focus of this paper is how a researcher goes about planning an experiment to detect not only real but also meaningful differences between treatment groups, in which meaningful is an important finding within the theory and the context of the research.
Cohen (1988) has discussed the need to determine power in research. The concept involves the need to know (or estimate) three of the following four parameters - alpha, power, effect size, and sample size. If three are known (or can be estimated), the fourth can be calculated. Other authors (e.g., Kraemer & Thiemann, 1987; Lipsey, 1990) have provided straightforward procedures for using these parameters in planning experiments, particularly the estimated number of participants needed to detect real differences (e.g., in which the effect size is tied to a previous important outcome). While the procedures are generally known and widely advocated (e.g., Cohen, 1988, 1990; Kraemer & Thiemann, 1987; Lipsey, 1990; Rosnow & Rosenthal, 1989), the authors of this study searched Volumes 65 and 66 of Research Quarterly for Exercise and Sport (RQES) (1994, 1995) and found that, from 48 data-based or empirical papers and 18 research notes, no authors reported having used these parameters a priori to estimate the minimum sample size needed.
The question researchers ask most about these parameters is, "How many participants are needed to detect a real difference?" However, the question should be, "How many participants are needed to detect a real and meaningful difference?" To answer this question requires the experimenter to establish alpha, select a level of power, identify an appropriate effect size (within the theory and context of the planned research), and then consult a source such as Figure 1 or Figure 2 (effect size curves plotted against power and sample size, when alpha is .05 and .01) to estimate the needed sample size for independent groups. …