Sensitivity in research involving treatment is defined as the ability to detect a real difference between two or more groups on some variable of interest (Lipsey, 1990). Specifically, the sensitivity of research experiments depends on six factors (Lipsey, 1990, p. 14-15):
1. Effect size: the magnitude of the "real" effect to be detected;
2. Subject heterogeneity: individual differences among members of the relevant population on the dependent variable of interest;
3. Sample size: the size of the sample taken from the population to constitute the experimental groups;
4. Experimental error: procedural variation in the way members of the experimental groups are treated during the research;
5. Measurement: muted on inconsistent response of the measurement instrument to the outcome of interest;
6. Data analysis: the inherent power of the statistical data analysis technique employed to test the difference between experimental groups.
At the end of an experiment, the researcher is left with a set of data in which some or all of the above issues are represented, depending on the planning and control in the study.
To make sense of the data set, the researcher should establish the power of the study - the chances of making a correct decision about whether a real difference exists between the treatment groups. The researcher should establish power to estimate sample size during the planning of the study; however, to do so requires knowledge about the sensitivity factors discussed by Lipsey (1990). Of course, standard parametric and nonparametric statistical procedures are well known for detecting significant differences between the treatment groups. A real difference is a reliable one, because the null hypothesis is rejected consistently in replications of the research. But, as Cohen (1990, p. 1308) has said about behavioral research, "the null hypothesis . . . is always false," meaning that given enough subjects, very small and trivial differences are declared statistically significant. The focus of this paper is how a researcher goes about planning an experiment to detect not only real but also meaningful differences between treatment groups, in which meaningful is an important finding within the theory and the context of the research.
Cohen (1988) has discussed the need to determine power in research. The concept involves the need to know (or estimate) three of the following four parameters - alpha, power, effect size, and sample size. If three are known (or can be estimated), the fourth can be calculated. Other authors (e.g., Kraemer & Thiemann, 1987; Lipsey, 1990) have provided straightforward procedures for using these parameters in planning experiments, particularly the estimated number of participants needed to detect real differences (e.g., in which the effect size is tied to a previous important outcome). While the procedures are generally known and widely advocated (e.g., Cohen, 1988, 1990; Kraemer & Thiemann, 1987; Lipsey, 1990; Rosnow & Rosenthal, 1989), the authors of this study searched Volumes 65 and 66 of Research Quarterly for Exercise and Sport (RQES) (1994, 1995) and found that, from 48 data-based or empirical papers and 18 research notes, no authors reported having used these parameters a priori to estimate the minimum sample size needed.
The question researchers ask most about these parameters is, "How many participants are needed to detect a real difference?" However, the question should be, "How many participants are needed to detect a real and meaningful difference?" To answer this question requires the experimenter to establish alpha, select a level of power, identify an appropriate effect size (within the theory and context of the planned research), and then consult a source such as Figure 1 or Figure 2 (effect size curves plotted against power and sample size, when alpha is .05 and .01) to estimate the needed sample size for independent groups. Traditionally, alpha has been established as .05 or .01, although this should not be an arbitrary decision. As noted in Rosnow & Rosenthal (1989, p. 1277), "surely God loves the .06 nearly as much as the .05." Cohen (1988) suggested that two types of error exist in statistical tests: Type I - declaring statistical significance when the null hypothesis is true (probability established by alpha); and Type II - failing to reject the null hypothesis when it is false (probability established by beta). Cohen also indicated that in most behavioral and applied research, a 4:1 ratio was appropriate between beta and alpha; thus, if alpha is set at .05, beta would be .2 (.05 x 4). Because power is the chance of making a correct decision, then power equals 1.0 - beta (1.-.2) or .8.
If the previous estimates (or other reasonable estimates) are selected for alpha and power, then obtaining an estimate of needed sample size for an experiment requires the researcher to estimate an effect size that has meaning for the planned research. But what is effect size? Effect size (ES) represents the influence of the treatment or grouping variable on the dependent variable reported in standard deviation units and can be estimated as:
ES = ([M.sub.E] - [M.sub.C])/[S.sub.C] (1)
Where [M.sub.E] = mean of experiment group, [M.sub.C] = mean of control group, and [S.sub.C] = standard deviation of control group (if there is no clear control condition, then Hedges & Olkin, 1985, suggest the use of a pooled [s.sub.p]).
Cohen (1988) developed ES as a way to standardize the size of treatment effects, and it was used by Glass (1977) in his conceptualization for a standard metric that could be used to combine a series of studies for meta-analysis. In fact, Cohen (1988) suggested that ES for the social and behavioral sciences could be interpreted as 0.8 representing a …