Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. (Book Reviews)
Rao, Sunil J., Journal of the American Statistical Association
Frank E. HARRELL, JR. New York: Springer-Verlag, 2001.
ISBN 0-387-95232-2. xii + 568 pp. S79.95 (H).
Predictive modeling has received renewed focus in applied statistics in the last 10-15 years, with the advent of a wealth of new techniques. Our toolbox has increased in weight considerably. Yet bringing all of these ideas together has meant searching through a broad range of literature--often outside of statistics. While some advances in this area have been made (e.g., Hastie, Tibshirani, and Friedman 2001; Ripley 1996; Hastie and Tibshirani 1990), the focus has been much more on explaining what the different techniques are rather than how best to use them and when.
Frank Harrell has long been known as someone who has brought intelligent modeling tools to the masses through his broad range of S-PLUS-based freeware. Many applied statistics and biostatistics departments make regular use of his software. Now Harrell has gone a step further with Regression Modeling Strategies. As the title indicates, this is not simply a book about describing modem techniques and subtle modeling issues, but more a book about strategies for better gleaning structure from data. The meat of the book centers around Chapters 2-4, which introduce a myriad of recently developed techniques. These techniques include flexible modeling (with spline models, tree-based models, and kinds of shrinkage to relax and test the more strict assumptions of linear regression models), dealing with missing data, validating models through resampling methods (including some discussion on the limitations of these methods), and using data-reduction strategies (which are becoming all the more important with the emergence of extremely high-dimensional data such as DNA microarray data). In addition, more subtle issues, like overfitting and effective degrees of freedom, are addressed.
Harrell spends a fair amount of space describing how an honest assessment of model fit should be made and explaining that some price must be paid for the increased flexibility that modern methods bring. He also drives home points through the use of innovative graphics--not only to explore datasets, but to also creatively summarize and validate model fits. Chapters 3 and 4 conclude nicely with detailed summary guidelines on how to bring the various ideas together to develop modeling strategies.
The remaining chapters illustrate how one might approach various data analyses by modeling multivariable data with a continuous response, binary response, ordinal response, and failure-time response. Each task is accompanied by a preliminary chapter as background to better prepare the reader for these paradigm shifts, as well as detailed casestudy analyses with extensive explanations of software syntax (most often in S-PLUS). …