Nonnormality is likely to widen confidence intervals and decrease power. The robustness of Anova applies mainly to the false alarm parameter, α. Power, in contrast, can be seriously affected by nonnormality.
The real trouble is extreme scores; they contribute disproportionately to error variance. Nonnormality is a problem mainly because it is usually accompanied by extreme scores. Distributions that have longer or heavier tails than the normal distribution are common in practice; these tail scores increase the error variance disproportionately.
Avoiding extreme scores depends first on empirics: experimental task, experimental procedure, and response measure. Better instructions, for example, will reduce the likelihood that an occasional subject will misunderstand the task and yield an extreme score. Similar considerations apply to every aspect of experimental procedure. Choice of response measure can have marked effects on the shape of the data distribution—which should be a primary concern in planning any investigation.
After the data are in, the shape of their distribution can still be changed statistically. Trimming seems the most useful statistical technique to deal with extreme scores. This is one of numerous “robust” techniques, distinguished from the others by a formula for the variance that makes for easy use.
Three alternatives to trimming are also considered: response transformation, outlier methods, and distribution-free rank tests. Time scores, for example, often have long right tails; transforming time to speed tends to normalize the data and increase power. Statistical tests for outliers are nonrobust, useful only in very special situations. Distribution-free tests of the ranked data appear inferior to Anova of these same ranks.
Unequal variance can have adverse effects on overall Anova, especially if coupled with unequal n. Available evidence suggests that extensions of Anova to handle unequal variance with more than two groups are sensitive to nonnormality and so not very useful. Even when applicable, moreover, overall Anova does little to localize the effects and explicate the data pattern.
Accordingly, an alternative to overall Anova with unequal variance is advocated in this book: Focus on two-mean comparisons at the outset. To this end, formulas are given for confidence intervals that allow unequal variance and unequal n as well.