The Convergence of Efroymson's Stepwise Regression Algorithm

Article excerpt

1. EFROYMSON'S ALGORITHM

The algorithm that is commonly called just stepwise regression was first presented by Efroymson (1960). It is basically as follows:

1. Enter into the (linear) regression model any variables that are to be "forced in."

2. Find the variable from those not in the model but available for inclusion that has the largest F-to-enter value. If it is at least as great as a prespecified value, [F.sub.in], then add the variable to the model. Stop if no variable can be added.

3. Find that variable among those in the model, other than those forced in, that has the smallest F-to-remove value. If it is less than a prespecified value, [F.sub.out], then drop the variable from the model. Repeat this step until no further variables can be dropped; then go to step 2.

Efroymson says that [F.sub.out] must be not greater than [F.sub.in], and suggests a value of 4.0 for both, but does not comment on whether or not the process terminates. As the algorithm is used thousands of times a day, there can be little doubt that it does converge, but nobody seems to have presented a proof.

In the above process, if [RSS.sub.p] is the residual sum of squares for a model with p parameters, then the F-to-enter statistic is

[RSS.sub.p] - [RSS.sub.p + 1] / [RSS.sub.p + 1]/(n - p - 1),

and similarly the F-to-remove statistic is

[RSS.sub.p - 1] - [RSS.sub.p] / [RSS.sub.p]/(n - p).

2. PROOF OF CONVERGENCE

For any subset of variables, S, consider the objective or Lyapunov function

[Mathematical Expression Omitted], (1)

where n is the number of cases, and F is any value such that [F.sub.out] [less than or equal to] F [less than or equal to] [F.sub.in]. In most practical cases the p parameters will consist of an intercept and (p - 1) regression coefficients.

As there is only a finite number of different subsets of variables to be considered, the algorithm must either terminate or cycle. The above objective function can be shown to decrease every time that a variable is added or deleted, and therefore the algorithm cannot cycle. Thus the Efroymson stepwise regression algorithm can be viewed as a heuristic algorithm to minimize the objective function L(S). Although the algorithm often finds the global minimum, there are many examples in which it stops at a local minimum.

Strictly, the objective function can remain constant when a variable is added for which the F-to-enter is exactly equal to [F.sub.in], but it decreases whenever a variable is dropped. Thus if at some stage in the iterations the currently selected model has p parameters, and at a later stage there are again p parameters, there must have been one or more deletions of variables that would have caused a decrease in the objective function. As each subset has only one value of L(S), then when the process has p parameters again in the model, it must be for a different subset.

In practice, the algorithm could cycle due to rounding errors in calculating the F-to-enter and F-to-remove values, but this is extremely unlikely except for deliberately constructed examples. …