FREQUENCY DATA AND CHI-SQUARE
A new kind of data is considered in this chapter: frequency data. Cases are classified into one of various categories; the analysis looks at the frequency of cases in different categories.
The Experimental-Control paradigm is sometimes used this way. In the test of polio vaccine discussed below, the data are frequencies of polio cases in the vaccine and placebo groups. Frequencies of successes and failures are sometimes used in psychology, as in the experiment on smoking prevention cited in the preface to this chapter, and in several experiments in the Exercises.
Category data are qualitative, not quantitative. Each subject's “score” is the category to which they belong. This category is not typically a measure of amount or magnitude. Category data are thus very different from the numerical magnitude data used in Anova-regression.
Here is the key to analysis of category data:
Look at the pattern of frequencies across different categories.
In the polio example, this pattern is the simplest possible: the relative frequencies of polio in the vaccine and placebo groups. Since Anova-regression is not properly applicable to frequency data, some new statistical technique is required.
This new statistical technique is chi-square. It is analogous to analysis of variance, but considerably simpler for common applications.
The conceptual framework of previous chapters transfers directly to chisquare analysis. This will be illustrated in the following polio experiment. The null hypothesis is that the vaccine has no effect—that polio has the same frequency with the vaccine as with placebo. The chi-square test addresses this question: Is the difference between the observed frequencies and those predicted under the null hypothesis “large enough” to indicate an effective vaccine.