Testing for Polynomial Regression Using Nonparametric Regression Techniques

Article excerpt


In this study, we employ nonparametric regression techniques for designing diagnostic tests of the validity of a polynomial regression model. The wide popularity of polynomial regression techniques makes it desirable that a high degree of certainty be attached to the accuracy of such a model. The standard diagnostic tests available in this situation are based on fitting a polynomial of higher order than the proposed model and then testing lack of fit. Unfortunately, such tests do not detect departures that are orthogonal to the fitted higher-order terms. The nonparametric test proposed here is designed to overcome this shortcoming.

Suppose that independent observations [y.sub.1], [y.sup.2], . . ., [y.sub.n] are obtained from a response variable y at known points [t.sub.1], [t.sub.2], . . ., [t.sub.n] of the predictor variable t according to the model

[Mathematical Expression Omitted]. (1)

Here f represents the departures from the polynomial model that are orthogonal to 1, t, . . ., [t.sup.k-1] and the [[Epsilon].sub.i] are iid random variables with mean zero and variance [[Sigma].sup.2]. Then our goal is to test the hypothesis [H.sub.0]: f = 0. In contrast to the parametric approach, we do not assume that f has a specific form, polynomial or otherwise. We merely assume that f has certain smoothness properties. For example, when testing for linearity (i.e., k = 2), we assume that f belongs to the class of absolutely continuous, square integrable functions that have absolutely continuous first derivative and square-integrable second derivative.

One approach to testing [H.sub.0] has been considered by Eubank and Spiegelman (1990). They derived goodness-of-fit tests for a linear model based on test statistics constructed from nonparametric regression fits to the residuals from linear regression. Specifically, they used a cubic smoothing spline fit in a test of linearity assuming normal errors and certain smoothness properties for f. They derived the asymptotic distribution of the test statistic under local alternatives converging to the null at a specific rate. This rate is slower than [n.sup.-1/2], the parametric rate.

In a test of the validity of a parametric model, Hardle and Mammen (1990) proposed a test statistic based on a weighted [L.sub.2] distance between the parametric fit and a nonparametric fit based on a kernel estimator. They demonstrated a bootstrap method to generate critical values in finite samples, because the use of their asymptotic distribution of the test statistic proved inadequate in small samples.

There is a common trend in the idea behind the formulation of the test statistics in the aforementioned works. In each case the test statistic is based on some measure of the distance between a nonparametric estimator of the regression function and some estimator of the regression curve under the null hypothesis. The test based on kernel estimators developed by King, Hart, and Wehrly (1991) to check the equality of two curves is another example. Other related work includes that of Azzalini, Bowman, and Hardle (1989), Cox and Koh (1989), Cox, Koh, Wahba, and Yandell (1988), Eubank and Hart (1992), Hall and Hart (1990), Hardle and Marron (1990), Holst and Rao (1980), Muller (1989), Raz (1990), and Staniswalis and Severini (1991).

The remainder of this article is organized as follows. In Section 2 a test is derived for [H.sub.0] and its asymptotic distribution theory is studied under the null and local alternatives to the null. This is a generalization of an approach of Eubank and Spiegelman (1990) to nonnormal errors and to the kth order. First, a least squares estimator of y is obtained under the null model and an estimator based on a 2kth order smoothing spline fit to the data is obtained under the alternative. Next, a test statistic is formulated based on an approximation to a weighted [L.sub.2] distance between the two estimators. …