Analysis of Categorical Data
Stephen S. Brier
In this chapter we consider the analysis of data that are discrete or categorical in nature as opposed to measurements made on a continuous scale. Examples are numerous in the biological and social sciences: Americans may be classified according to ethnic background (e.g., English, Italian, Russian, etc.); in an opinion poll peoples' attitudes towards an issue may be recorded as "favor," "oppose," or "indifferent"; in a medical experiment patients may be classified in two ways as treated or not treated and as recovered and not recovered. Discrete data might also arise as a result of partitioning the range of a continuous variable (e.g., a family's income might be described as low, middle, or high).
In most statistical problems we are interested in relationships between two or more variables. When all variables are categorical the data are presented in the form of a contingency table. Table 9.1 presents the results of a survey of hospital patients analyzed by Cohen ( 1976). The contingency table in this case is called a 2 × 2 table because there are two variables each having two categories. Because a contingency table is a table of counts, discrete data are often referred to as counted data. Although we primarily consider contingency tables, the paper by Nerlove and Press ( 1976) deals with problems in which the response (dependent) variables are discrete while the explanatory (independent) variables are continuous. This would be the case if we were interested, for example, in how the concentration of poison affected the probability of death of an insect pest.