# Empirical Direction in Design and Analysis

By Norman H. Anderson | Go to book overview

NOTES
18.1a
This discussion of effect size and importance is based on Anderson (1982), Chapter 6).
18.1b
The need for empirical rather than statistical framework is striking with the issue of effect size. Numerous writers have extolled the virtue of going beyond the significance test to report some measure of effect size. This theme seems totally persuasive—until one looks at the measures of effect size that have been proposed. They smother the empirical effect under an additional blanket of statistics.

A total of 40 measures of effect size were collected from the literature by Kirk (1996) and are listed in his Table 1. Notably absent from this list are the most important indexes: the mean, the one-variable regression coefficient, and their confidence intervals. This state of affairs is a symptom of the pervasive orientation that fixates on statistics to the neglect of empirics.

Questions of size and importance have no easy answer because they are basically extrastatistical issues. They need to be addressed in empirical terms, within the framework of each experiment. Focus on statistical indexes obscures the real issue. See further Importance Indexes and Self-Estimation Methodology in Chapter 6 of Anderson (1982) as well as Anderson and Zalinski (1991). Also of interest are Kruskal and Majors (1989) and Wright (1988).

18.1.1a
That small effects can cumulate across many instances to be important for outcome analysis has been emphasized by numerous writers, including Gilbert, Light, and Mosteller (1975), Yeaton and Sechrest (1981), and Abelson (1985). Prentice and Miller (1992) point up importance of small effects in process analysis.
18.1.2a
The d index of Equation 3 is not a general measure of effect size because effect size is a substantive concept that must be understood in extrastatistical terms. The same d of.50 would mean different things in experiments on person cognition, verbal memory, animal learning, and even in two different experiments within any of these fields. Cohen (1988) covers over this substantive problem by classifying d values of.20,.50, and.80 as “small, ” “medium, ” or “large.” This arbitrary classification obscures the prime importance of interpreting the mean difference—in its unstandardized form—within its own empirical framework.
18.1.2b
A much-cited non sequitur that published empirical reports generally have inadequate power was initiated by Cohen (1962) on the basis of his small-medium-large classification of effect sizes cited in the previous note. This claim was a non sequitur because, as Mulaik, Raju, and Harshman (1997) point out, Cohen did not calculate power from the data of any empirical report.

Instead, Cohen tabulated sample size for each report. This sample size was used to estimate power for the three cited hypothetical effect sizes of.20,.50, and.80 using Equation 3 (or formulas for comparable measures of effect size such as r). Cohen found that power averaged just under.50 for a “medium” effect size. He arbitrarily assumed that a “medium” effect size was the norm for empirical studies and so concluded that published experiments generally lack power.

-589-

