Binomial Regression with Monotone Splines: A Psychometric Application
Ramsay, J. O., Abrahamowicz, M., Journal of the American Statistical Association
1. THE BINOMIAL REGRESSION PROBLEM
Let vectors [x.sub.j] (j = 1, ..., J) be observed independent variables or covariates, and let vectors [[theta].sub.j] be associated latent variables. The data on which the estimation of a binomial regression function p is based are observations of binomial random variables [R.sub.j] (j = 1, ..., J) having the binomial distribution
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.1)
That is, the probability of a successful outcome in a single trial, given covariate [X.sub.j] and latent variable value [[theta].sub.j], is p([x.sub.j], [[theta].sub.j]), and the associated binomial observation is [r.sub.j] successes out of [n.sub.j] trials.
In the context of analyzing test data, psychometricians refer to p as the item-characteristic curve, and normally [n.sub.j] = 1 and [r.sub.j] is an indicator variable for the correct response on a test item. In this enterprise [theta] quantifies examinee ability, and is viewed either as a parameter requiring estimation along with the binary regression function p or as an incidental parameter to be eliminated eventually by marginalization or other techniques. Observed covariate x may be the total score on the test containing the item, or performance on some other tests.
The log-linear family p(z) = exp(z)/[l + exp(z)], where z = [alpha] + [beta]x, was discussed extensively by Cox (1970). The extension p(z) = [gamma] + (1 - [gamma])exp(z)/[l + exp(z)] is now used almost exclusively in psychometric modeling, where it is called the three-parameter logistic function. Parameters [alpha], [beta], and [gamma] are taken as indexes of item difficulty, discriminabihty, and probability of success by guessing, respectively. Lord (1980) provided an introduction to many applications of logistic modeling to testing problems; the field was surveyed by Lewis (1986). One can question, however, whether functions as simple as these are sufficient in a particular application. More flexible models would be desirable, if only to provide a justification for using the logistic. In fact, departures from such models are clearly visible where enough data have been available to plot p directly (Lord 1980; Wainer 1983).
This article develops some flexible tools for modeling binomial regression functions, with special attention to the psychometric problem. These techniques are illustrated in the following section by the analysis of some multiple-choice test data involving 100 items and 379 examinees. Since a typical test has many items, the result of such an analysis is many estimated binomial regression functions; the third section considers the principal-components analysis of these.
The use of monotone regression splines provides a semi-parametric approach. That is, although a small number of parameters is estimated per curve, these parameters themselves do not have an immediate interpretation. Rather, the local characteristics of the curve provide information about the relation between the item and examinees of various ability levels. A completely nonparametric approach would also be possible using smoothing splines (Hastie and Tibshirani 1986, 1987; O'Sullivan, Yandell, and Raynor 1986; Villalobos 1983), but various computational problems appear formidable. Samejima (1988) considered the nonparametric estimation of item-characteristic curves using a different approach.
The monotone regression spline technique used in this article was described in more detail in Ramsay (1982a, 1988) and Winsberg and Ramsay (1983), and involves the use of linear combinations of monotone splines [I.sub.kv] of order v (k = 1, ..., K) as a basis for functions mapping a closed real interval [A, B] into [0, 1], These authors discussed the characteristics of monotone regression splines and applied them to several problems; Winsberg, Thissen, and Wainer (1984) specifically considered the use of splines in item analysis. The approach is monotone in two senses: The basis functions used for estimating p are monotone, and if monotonicity is required of p, this can be conveniently achieved by imposing simple restrictions on the estimated parameters. …