B. S. Everitt Institute of Psychiatry, University of London
This chapter is primarily concerned with methods for the analysis of data arising in the form of counts or frequencies. Such categorical data are common particularly in the social and behavioral sciences. Table 11.1, for example shows the results of recording hair color for a number of individuals and Table 11.2 the results of recording eye color for the same sample of people. Of far more interest, of course, is Table 11.3, which gives the cross-classification of hair and eye color for these individuals. Table 11.3 is a simple example of a contingency table.
The numbers appearing in Tables 11.1, 11.2, and 11.3 are counts of individuals falling into particular categories of the categorical variable (s) forming the table. These numbers might be transformed into proportions or percentages but it is important to note that, in whatever form they are presented, the data were originally frequencies or counts rather than continuous measurements. Of course, continuous data is often put into discrete form by the use of intervals on a continuous scale. Age, for example, is a continuous variable, but if people are classified into different age groups, the intervals corresponding to these groups can be treated as if they were discrete units.
The questions we might wish to ask about categorical data are similar in many respects to those usually of concern for continuous data. For example, we may wish to investigate whether two categorical variables are related, or how a categorical response variable is related to a number of explanatory variables (which may or may not themselves be categorical). So why should the analysis of categorical data need separate consideration? Of course, the distributional as-