Parker, R.A. and Berman, N.G.(2003), "Sample Size: More Than Calculation," the American Statistician, 57, 166-170
Lenth, Russell V., The American Statistician
PARKER, R.A., AND BERMAN, N.G. (2003), "SAMPLE SIZE: MORE THAN CALCULATIONS," THE AMERICAN STATISTICIAN, 57, 166-170: COMMENT BY LENTH AND REPLY
Parker and Berman (2003) made the important point that it is usually much better to make a statement about the information offered by a planned study rather than to just compute a sample size based on rather arbitrary criteria. I agree with this view, but I take exception to some of the details having to do with effect size.
Section 2, for example, lists four potential problems with a particular sample-size calculation, the last three of which make sense to me; but the first one reads:
(a) we assume that the effect size in this population will be similar to that previously reported, even though there is reason to suspect it might be smaller.
So what? The goal of the study is to estimate this effect size--so how can it be a problem to not already know it? What Parker and Berman are really saying is that the computed sample size may not be sufficient to detect the effect that actually exists. Apparently, the goal of the study is to obtain P < .05, rather than to detect an effect, if it exists, of size sufficient to be of clinical importance. I view sample-size determination as an attempt to equate statistical significance and practical significance, and that involves thinking specifically about what would constitute practical significance. The real problem with Parker and Berman's scenario is that this was not done.
In the discussion of their Figure 1, Parker and Berman further described "information" as a standardized quantity, not unlike what Cohen (1988) recommended. I warn against this in Lenth (2001). Imagine a patient who wonders whether the treatment is worth undergoing in view of the unpleasant side effects, and being told: "I expect that it will improve your symptoms by .75 standard deviations." Somehow, I don't think the patient's question has been adequately answered. She needs the answer in the same units as the measurements, such as about how much it will reduce her CD4 cell count.
My colleague, Steve Hillis, has convinced me that, in some cases, standard-deviation units are reasonable clinical descriptors of effect size--in cases like blood chemistry or IQ, where norms are based on a central range of the distribution of normal subjects. Accordingly, I would concede that .75 times the between-subject standard deviation may be a sensible way to measure information in the context of that discussion. However, when such norms exist, it is easy to translate the effect size into absolute units--units that will be easier to discuss in a meaningful way with physicians.
In Section 4 of the article, however, "information" is now measured as a multiple of the within-subject standard deviation. This is inconsistent with the earlier discussion, and it clearly no longer has to do with clinical norms. Moreover, it has a lot more to do with the accuracy of the measurements. By the authors' criterion, the planned study with eight patients will yield the same information regardless of whether the data come from detailed laboratory analyses of the blood, simple urinalyses, or just looking at the patient and guessing. Again, the capability of a study must be measured in meaningful clinical units. Often that entails getting some prior estimate of the error variance; and although that is indeed problematic, there is no honest way around it.
Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Lenth, R. V. (2001), "Some Practical Guidelines for Effective Sample Size Determination," The American Statistician, 55, 187-193.
Although we are familiar with Lenth's work and found his articles on sample size very useful, we are puzzled by his comments on our article. He seems either to misunderstand what we have said or to be reinterpreting it. …