The Statistical Analysis of Discrete Data

Article excerpt

Thomas J. Santner and Diane E. Duffy. New York: Springer-Verlag, 1989. xii + 367 pp. $45.

The Statistical Analysis of Discrete Data (SADD) is an exceptional contribution to the textbook/monograph literature on the theory and methods of discrete data. It contains only four chapters beyond a very brief introduction: Chapter 2-Univariate Discrete Responses (79 pages); Chapter 3-Loglinear Models (23 pages); Chapter 4-Cross-Classified Data (53 pages) and Chapter 5-Univariate Discrete Data With Covariates (65 pages). There are four Appendixes. After each chapter and also after Appendix 4, there are excellent sets of problems.

To give the reader an impression of what SADD does and does not cover, it is best to quote the authors:

... we felt there was a need for a book which incorporated some of the
myriad recent research advances. Our motivation was to introduce the
subject by emphasizing its ties to the well-known theories of linear
models, experimental design, and regression diagnostics, as well as
to describe alternative methodologies (Bayesian, smoothing, etc.).
..." (p. xi)

SADD covers the following specific topics: classical maximum likelihood (ML) estimators, estimators based on Bayesian, smoothing, shrinkage or ridge approaches for the estimation of parameters in structured and unstructured problems; ML estimator theory for log-linear models based on the idea of linear projection in order to demonstrate the similarities with ML estimation for normal linear models; standard testing and estimation formulations; simultaneous interval estimation; multiple comparisons; ranking and selection problems; some small-sample methods, especially for common confidence interval problems; recent research on graphical models for contingency tables and diagnostic tools for log-linear models and logistic regression. The authors wisely decided not to cover all aspects of the analysis of count data. Thus, for example, they did not include subjects such as measures of association, models for measuring change, ordinal data, incomplete and missing data, and the analysis of panel or repeated measurement data.

Chapter 2, on univariate discrete responses, is an exceptional chapter and presents the most thorough discussion of the binomial, multinomial, and Poisson distributions of all such chapters or appendixes that appear on this topic in other similar textbooks. In fact, I would recommend this chapter be assigned in lieu of the usual chapter on these three distributions that appears in texts used in graduate courses in mathematical statistics. Many frequentists would not agree with this suggestion because the material is heavily Bayesian. One does not have to be an entrenched Bayesian however, to appreciate the contribution this chapter makes to the graduate teaching of inferential statistics. In fact, as will be pointed out, the reader will find here many current ideas and procedures important to general statistical theory and methodology. As an illustration, consider the discussion of the multinomial. This distribution receives the greatest attention since the topics in multivariate discrete data analysis are greatly dependent on the multinomial or product multinomial. Four areas are discussed (the same areas are discussed for the Poisson but only the first two for the binomial). First, the authors discuss point estimation of the probability vector p. They first derive the MLE of p and give its optimally properties. Then, there is a discussion of loss functions--squared error loss (SEL), relative squared error loss (RSEL), and entropy loss--followed by a listing of some properties of each. Entropy loss, derived from the Kullback--Liebler distance function between two multinomial probability functions, is presented as representative of loss functions that distinguish between positive and zero estimates of [p.sub.i]([p.sub.i] < 0). The behavior of the MLE is then examined with respect to SEL and RSEL. …