Flexible Discriminant Analysis by Optimal Scoring

Article excerpt

ANDREAS BUJA*

Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that are "optimal" for separating the groups. With two such functions, one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This article is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multiresponse linear regression using optimal scorings to represent the groups. In this paper, we obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multiresponse regression technique (such as MARS or neural networks) can be postprocessed to improve its classification performance.

KEY WORDS: Classification; Discriminant analysis; Nonparametric regression; MARS.

1. INTRODUCTION

Multigroup classification or discrimination is an important problem with applications in many fields. In the generic problem, the outcome of interest G falls into J unordered classes, which for convenience we denote by the set J = {1, 2, 3, ... J}. We wish to build a rule for predicting the class membership of an item based on p measurements of predictors or features X [epsilon] [R.sup.p]. Our training sample consists of the class membership and predictors for N items. Traditional statistical methods for this problem include linear discriminant analysis and multiple logistic regression. Neural network classifiers have become a powerful alternative, with the ability to incorporate a very large number of features in an adaptive nonlinear model. Ripley (1994) gave an informative survey from a statistician's viewpoint. The recent success and popularity of neural networks led us to look for similar methodologies in the statistical literature, but this seems to be a relatively unexplored area. One significant appr oach is the classification and regression tree (CART) methodology of Breiman, Friedman, Olshen, and Stone (1984), which is well known to statisticians and is becoming popular in the artificial intelligence community.

There have been a number of recent advances in the nonparametric multiple regression literature. These include projection pursuit regression (Friedman and Stuetzle 1981), the ACE algorithm (Breiman and Friedman 1985), additive models (Hastie and Tibshirani 1990), multivariate adaptive regression splines (MARS; Friedman 1991), Breiman's (1991) II method, the interaction spline methodology of Wahba (1990), and more recently the hinging hyperplanes of Breiman (199la). Neural networks (e.g., Barron and Barron 1988; Lippman 1989, and Hinton 1989) can be viewed as yet another approach to nonparametric regression. In this article we describe methods for multigroup classification that use these tools to generalize linear discriminant analysis.

The foundations for the developments described here can be found in the nonlinear scaling literature, notably the work of Gifi (1981, 1990). Our work was motivated by the unpublished paper by Breiman and Ihaka (1984). Section 6.3 details the connection with their work. Ripley and Hjovt (1994) were similarly motivated.

This article focuses on adaptive classification procedures. A companion article, "Penalized Discriminant Analysis" (Hastie, Buja, and Tibshirani 1994), gives a more technical basis for some of the procedures described here and focuses on obtaining smooth, interpretable canonical variates for high-dimensional problems such as spectral and image analysis. Both articles rely on the connections between penalized optimal scoring and penalized discriminant analysis. Hereafter we will refer to this companion article as PDA.

2. LINEAR DISCRIMINANT ANALYSIS AND GENERALIZATIONS

2. …