Bootstrap Model Selection

Journal article by Jun Shao; Journal of the American Statistical Association, Vol. 91, 1996

Journal Article Excerpt


Bootstrap Model Selection.

by Jun Shao

1. INTRODUCTION

In a regression problem, typically there is a vector x of p explanatory variables to be used to fit a model between x and a response variable y. Because some of the components of x may not be related to y, using all p components of x does not necessarily produce a better model than using part of the components of x. Because the relative performance of each model (corresponding to a set of components of x) is usually unknown, we have to select a set of explanatory variables (components of x) based on a data set {([x.sub.i], [y.sub.i]), i = 1,..., n}, where [y.sub.i] is the response at x = [x.sub.i]. This variable selection problem is equivalent to a model selection problem in which each model corresponds to a particular set of the p components of x.

There exist many variable/model selection procedures in the case where the relationship between x and y is linear; for example, the Akaike information criterion (AIC) (Akaike 1970); the [C.sub.p] method (Mallows 1973); the Bayes information criterion (BIC) (Hannan and Quinn 1979; Schwartz 1978); the final prediction error ([FPE.sub.[Lambda]]) method (Shibata 1984); the generalized information criterion (Rao and Wu 1989) and its analogs (Potscher 1989); the delete-one cross-validation (Allen 1974; Stone 1974); the generalized cross-validation (Craven and Wahba 1979); and the delete-d cross-validation (Burman 1989; Geisser 1975; Shao 1993; Zhang 1993a). This article introduces some selection methods based on the bootstrap.

Besides the theoretical and empirical properties of the bootstrap selection procedures established in this article, there are at least two other reasons to use a bootstrap model selection procedure:

1. In the linear regression context, the bootstrap method provides inference procedures (e.g., confidence sets) that are asymptotically more accurate than those produced by the other methods (Adkins and Hill 1990; Hall 1989). It may be preferable to use the same method both in model selection and in the subsequent inference based on the selected model. In addition, if we use the bootstrap for both model selection and the subsequent inference, then the bootstrap observations generated for model selection can also be used in the subsequent inference; that is, in terms of generating bootstrap observations, there is no extra cost for using a bootstrap model selection procedure when the bootstrap is also used for inference. If a cross-validation method is used for model selection and the bootstrap is used for the subsequent inference, then the extra computations in generating resamples for cross-validating cannot be avoided.

2. The bootstrap selection procedure developed in the linear regression case can be extended, without any theoretical derivation, to more complicated problems such as the nonlinear regression models, generalized linear models, and autoregression models. The cross-validation method, which is also a data-resampling method, can also be easily extended to nonlinear regression and generalized linear models, but not to autoregression models.

In Section 2 we focus on the case where the relationship between x and y is linear. We consider two different ways of generating bootstrap observations: bootstrapping residuals and bootstrapping pairs (x, y). The main theoretical study of a bootstrap selection procedure is its consistency; that is, whether the probability of selecting a nonoptimal model vanishes as the sample size n increases to infinity. Finite-sample performances of some bootstrap selection procedures are studied by simulation. We consider more complicated cases in Section 3 and establish some results similar to those in Section 2 in nonlinear regression, generalized linear, and autoregression models.

Our main discovery is that a straightforward application of the bootstrap does not yield a consistent model selection procedure - although some simple modifications can be used to rectify this inconsistency. Consider, for example, the method of bootstrapping pairs. One usually generates n independent and identically distributed (iid) bootstrap observations from [Mathematical Expression Omitted], the empirical distribution putting mass [n.sup.-1] on each pair ([x.sub.i], [y.sub.i]), i = 1,..., n (Efron 1982, 1983; Freedman 1981). But our results in Sections 2 and 3 show that this leads to an inconsistent bootstrap selection procedure. A simple modification that results in a consistent bootstrap selection procedure is to generate fewer bootstrap observations from [Mathematical Expression Omitted]. More precisely, if rn (instead of n) iid bootstrap observations are generated from [Mathematical Expression Omitted], then the bootstrap selection procedure is consistent if and only if m/n [approaches] 0 and m [approaches] [infinity]. Changing the bootstrap sample size to rectify the inconsistency of the bootstrap has been shown to be successful in various other problems (Arcones and Gine 1989; Deheuvels, Mason, and Shorack 1993; Hall 1990; Huang, Sen, and Shao 1996; Shao 1994; Swanepoel 1986).

2. LINEAR MODELS

Let {([x.sub.i],[y.sub.i]), i = 1,..., n} be the available data set, where [x.sub.i] is the ith value of a p vector of explanatory variables and [y.sub.i] is the response at [x.sub.i]. We confine our study to the case where p is fixed; that is, p does not increase as n increases. The explanatory variable x is either random or deterministic. In the former case we assume that ([x.sub.i], [y.sub.i]), i = 1,..., n, are iid. In the latter case, we assume that [y.sub.i], i = 1,..., n, are independent. In both cases, we assume that X = ([x.sub.1],..., [x.sub.n])[prime] is of full rank and

[[Mu].sub.i] = E([y.sub.i] [where] [x.sub.i]) = [x[prime].sub.i][Beta], var([y.sub.i] [where] [x.sub.i]) = [[Sigma].sup.2],

i = 1,...,n, (1)

where [Beta] is a p vector of unknown parameters.

2.1 The Optimal Model

Let [Alpha] be a subset of {1,..., p} of size [p.sub.[Alpha]] and let [x.sub.i[Alpha]] (or [[Beta].sub.[Alpha]]) be the subvector of [x.sub.i] (or [Beta]) containing the components of [x.sub.i] (or [Beta]) indexed by the integers in [Alpha]. Then a model corresponding to [Alpha], called model [Alpha] for simplicity, is

[[Mu].sub.i[Alpha]] = E([y.sub.i] [where] [x.sub.i]) = [x[prime].sub.i[Alpha]][[Beta].sub.[Alpha]], var([y.sub.i] [where] [x.sub.i]) = [[Sigma].sup.2],

i = 1,...,n. (2)

For a given [Alpha], model [Alpha] is not necessarily a correct model in the sense that E([y.sub.i] [where] [x.sub.i]) is actually not always equal to [x[prime].sub.i[Alpha]][[Beta].sub.[Alpha]]. If [[Beta].sub.[Alpha]] contains all nonzero components of [Beta], then [x[prime].sub.i][Beta] = [x[prime].sub.i[Alpha]][[Beta].sub.[Alpha]] for any [x.sub.i] and model (2) is called a correct model. There may be more than one correct model.

Suppose that under each [Alpha], the model is fit using the least squares method; that is, [[Beta].sub.[Alpha]] is estimated by the least squares estimator (LSE),

[Mathematical Expression Omitted],

where [X.sub.[Alpha]] = ([x.sub.1[Alpha]],..., [x.sub.n[Alpha]])[prime] and y = ([y.sub.1],..., [y.sub.n])[prime]. Then the efficiency of model [Alpha] can be measured by the average loss,

[Mathematical Expression Omitted],

where [Mu] = ([[Mu].sub.1],..., [[Mu].sub.n])[prime], [Mathematical Expression Omitted] [[a]] = [square root of a[prime]a] for any vector a. After observing the data, our concern is to select a model [Alpha] [element of] A so that [L.sub.n]([Alpha]) may be as small as possible, where A is a collection of some subsets of {1,..., p}. The largest possible A is the one containing all nonempty subsets of {1,..., p}. But in a practical problem, we may consider a smaller collection of subsets.

Let [z.sub.i] be a future response at [x.sub.i], i = 1,..., n, and assume that the [z.sub.i] are independent of the [y.sub.i]. Then the average conditional expected loss in prediction is

[Mathematical Expression Omitted].

Thus selecting a model with the smallest [L.sub.n]([Alpha]) over all [Alpha] [element of] A is equivalent to selecting a model with the best prediction ability over all [Alpha] [element of] A.

Let [Epsilon] = y - [Mu], [H.sub.[Alpha]] = [X.sub.[Alpha]][([X[prime].sub.[Alpha]][X.sub.[Alpha]]).sup.-1][X .sub.[Alpha]], and

[[Delta].sub.n]([Alpha]) = [[[[Mu] - [H.sub.[Alpha]][Mu]]].sup.2]/n. (4)

Then

[L.sub.n]([Alpha]) = [[Delta].sub.n]([Alpha]) - 2([Mu] - [H.sub.[Alpha]][Mu])[prime][Epsilon]/n + [[[[H.sub.[Alpha]][Epsilon]]].sup.2]/n. (5)

When model a is correct, [Mu] = X[Beta] = [X.sub.[Alpha]][[Beta].sub.[Alpha]] = [H.sub.[Alpha]][Mu], [[Delta].sub.n]([Alpha]) = 0, and

[L.sub.n]([Alpha]) = [[[[H.sub.[Alpha]][Epsilon]]].sup.2]/n. (6)

Let [[Alpha].sub.0] be the subset corresponding to the correct model with the smallest size; that is, [[Beta].sub.[Alpha]0] contains exactly all nonzero components of [Beta]. Then, under (1) and

[Mathematical Expression Omitted],

model [[Alpha].sub.0] is optimal in the sense that it minimizes [L.sub.n]([Alpha]) over [Alpha] [element of] A for sufficiently large n; that is,

[Mathematical Expression Omitted].

Because [L.sub.n]([Alpha]) involves the unknown parameter [Beta], the optimal [[Alpha].sub.0] must be estimated. Selecting a model is the same as finding an estimate of [[Alpha].sub.0]. Let [Mathematical Expression Omitted] be the estimate of [Alpha] based on a model selection procedure. Then the model selection procedure is said to be consistent if

[Mathematical Expression Omitted].

2.2 Bootstrap Selection Procedures

We now introduce bootstrap model selection procedures (bootstrap estimators of [[Alpha].sub.0]). Under linear model (1), there are different ways of generating bootstrap observations:

1. Bootstrapping residuals (Efron 1979). Let [Mathematical Expression Omitted] be the ith residual, where [Mathematical Expression Omitted] is the LSE under model (1) (or model [Alpha] = {1,..., p}). Generate iid [Mathematical Expression Omitted],..., [Mathematical Expression Omitted] from the empirical distribution that puts mass [n.sup.-1] on [Mathematical Expression Omitted], i = 1,..., n, where [Mathematical Expression Omitted] is the average of the [r.sub.i]. The bootstrap observations under model [Alpha] are [Mathematical Expression Omitted], where [Mathematical Expression Omitted], i = 1,..., n. The bootstrap analog of [Mathematical Expression Omitted] is

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted].

2. Bootstrapping pairs (Efron 1982). Let [Mathematical Expression Omitted] be the empirical distribution putting mass [n.sup.-1] on each pair ([x.sub.i], [y.sub.i]), i = 1,..., n. Generate iid bootstrap data {[Mathematical Expression Omitted],..., n} from [Mathematical Expression Omitted]. The bootstrap analog of [Mathematical Expression Omitted] is

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Under the weak condition that X[prime]X [approaches] [infinity], [X.sup.*][prime][X.sup.*] [approaches] [infinity] almost surely. Hence [Mathematical Expression Omitted] exists asymptotically. In applications, [Mathematical Expression Omitted] can be replaced by [Mathematical Expression Omitted] in the event that [Mathematical Expression Omitted] does not exist.

Bootstrapping residuals is more suitable for the case of deterministic x, whereas bootstrapping pairs is more appropriate for the case of random x. But bootstrapping pairs can also be used for deterministic x (Efron 1982).

Efron (1982, 1983) derived the following bootstrap estimate of the mean of the prediction error [[Gamma].sub.n]([Alpha]) in (3). First, define the expected excess error under model [Alpha] by

[Mathematical Expression Omitted].

Then

[Mathematical Expression Omitted].

A bootstrap estimate of [e.sub.n] ([Alpha]) is

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted], and [E.sub.*] is the expectation with respect to the bootstrap sampling described under bootstrapping residuals or bootstrapping pairs. (Note that [Mathematical Expression Omitted] and [Mathematical Expression Omitted] for bootstrapping residuals.) A bootstrap estimate of E[[[Gamma].sub.n]([Alpha])] is

[Mathematical Expression Omitted].

This estimator is almost unbiased. Some similar estimates were provided by Bunke and Droge (1984).

It seems natural to define the bootstrap estimate of the optimal model [[Alpha].sub.0] as the model [Mathematical Expression Omitted] that minimizes [Mathematical Expression Omitted]. But this procedure is inconsistent:

[Mathematical Expression Omitted],

unless [[Alpha].sub.0] = {1,..., p}; that is, model (1) is the only correct model. The empirical result in Section 2.5 shows that this inconsistency can be quite serious: The probability in (12) can be very low.

It is interesting to note that the bootstrap selection procedures described here are asymptotically equivalent to the selection procedures using the information criterion, [C.sub.p], and delete-one cross-validation methods (see (14) and Shao 1993, form. (3.6)).

2.3 The Reason for Inconsistency

Let [[Alpha].sub.p] = {1,..., p} be the largest subset. For any [Alpha] [element of] A, define

[D.sub.n]([Alpha]) = E[[[Gamma].sub.n]([Alpha]) - [[Gamma].sub.n]([[Alpha].sub.p])] = E[[L.sub.n]([Alpha]) - [L.sub.n]([[Alpha].sub.p])]

= ([p.sub.a] - p)[[Sigma].sup.2]/n + [[Delta].sub.n]([Alpha]). (13)

Minimizing E[[[Gamma].sub.n]([Alpha])] or E[[L.sub.n]([Alpha])] is then the same as minimizing [D.sub.n] ([Alpha]). Although the bootstrap estimator [Mathematical Expression Omitted] in (11) is a reasonably good estimator of E[[[Gamma].sub.n]([Alpha])], the difference [Mathematical Expression Omitted] is not a consistent estimator of [D.sub.n] ([Alpha]) when [Alpha][not equal to] [[Alpha].sub.p] and [Alpha] is also correct. More precisely, it is shown in the Appendix that when [Alpha] is correct but [Alpha] [not equal to] [[Alpha].sub.p],

[Mathematical Expression Omitted].

Then, by (13)-(14) and the fact that [[Delta].sub.n]([Alpha]) = 0,

[Mathematical Expression Omitted] in probability.

This leads to the inconsistency of the bootstrap selection procedures described in Section 2.2.

2.4 Modified Bootstrap Selection Procedures

It is clear that if we can find a consistent estimator [Mathematical Expression Omitted] of [D.sub.n]([Alpha]) in the sense that

[Mathematical Expression Omitted] in probability, [Alpha] [element of] A.

then we can drive a consistent model selection procedure. Unfortunately, a consistent estimator of [D.sub.n]([Alpha]) is not available, unless [[Alpha].sub.p] is the only correct model.

Let {[m.sub.n]} be a sequence of integers such that lim [m.sub.n] = [infinity] where n [approaches] [infinity] and lim [m.sub.n]/n = 0 n[approaches][infinity]. Then with n pairs of data we can find a consistent estimator of [D.sub.[m.sub.n]]([Alpha]). Under condition (7),

[Mathematical Expression Omitted];

that is, [D.sub.n]([Alpha]) and [D.sub.[m.sub.n]] ([Alpha]) share the same minimizer [[Alpha].sub.0] for sufficiently large n. This leads us to obtain consistent model selection procedures by minimizing consistent estimators of [D.sub.[m.sub.n]] ([Alpha]) or E[[[Gamma].sub.[m.sub.n]]([Alpha])].

First, consider bootstrapping pairs. For m [less than] n, a simple bootstrap estimator of E[[[Gamma].sub.m]([Alpha])] is

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] is the bootstrap analog of [Mathematical Expression Omitted] based on m iid pairs [Mathematical Expression Omitted] generated from the empirical distribution putting mass [n.sup.-1] on ([x.sub.i], [y.sub.i]), i = 1,..., n; that is,

[Mathematical Expression Omitted].

A modified bootstrap model selection procedure is to select a model [Mathematical Expression Omitted] that minimizes [Mathematical Expression Omitted].

Next, consider bootstrapping residuals. Unless there is a special structure in the [x.sub.i] (see, e.g., Hall 1990 for the case where [x.sub.i] = i/n), it may not be easy to find a way to bootstrap residues with a bootstrap sample size m [less than] n. In view of the fact that only the first two moments of the bootstrap distribution are involved in [Mathematical Expression Omitted] and the fact that

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] and [Mathematical Expression Omitted] are defined in (10) and (16), we can modify the procedure in bootstrapping residuals by multiplying a factor [square root of n/m] to the values from which the bootstrap data are generated. That is, let [Mathematical Expression Omitted], i = 1,..., n, be iid from the distribution that puts mass [n.sup.-1] on each [Mathematical Expression Omitted], and let [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Then estimate E[[[Gamma].sub.m]([Alpha])] by

[Mathematical Expression Omitted].

The model selected by this modified bootstrap procedure is still denoted by [Mathematical Expression Omitted], which minimizes [Mathematical Expression Omitted] over [Alpha] [element of] A.

For bootstrapping residuals, it can be shown that under the linear model (1),

[Mathematical Expression Omitted].

Hence the method of bootstrapping residuals is the same as the generalized information criterion (Rao and Wu 1989); however, this is not true in nonlinear models (see Sec. 3).

As a special case of the general result in Section 3, both modified bootstrap selection procedures are consistent; that is,

[Mathematical Expression Omitted],

provided that m satisfies m/n [approaches] 0 and m [approaches] [infinity].

2.5 A Simulation Study

A simulation study was carried out to examine the finite-sample performance of the selection procedures based on bootstrapping pairs with different m. Model (1) was considered, with p = 5, n = 40, and iid standard normal errors [[Epsilon].sub.i], i = 1,..., n. The first component of each [x.sub.i] is 1 and the values of other components of [x.sub.i] were taken from the solid waste data example of Gunst and Mason (1980) (see also Shao 1993, table 1). The ratio of a component of [Beta] over [Sigma] was chosen to be [greater than or equal to] 2. If this ratio is too small, then one must increase the sample size n to show a good performance of any model selection procedure.

The bootstrap estimators [Mathematical Expression Omitted] were computed by Monte Carlo with size B = 100. The computation was done on an IBM 3090 at University of Ottawa. IMSL subroutines DRNNOA and RNUND were used for random number generation.

Table 1 reports the empirical probabilities (based on 1,000 simulations) of selecting each model using the modified bootstrap with various m. When m = 40, the bootstrap procedure is the unmodified bootstrap; that is, the model is selected by minimizing [Mathematical Expression Omitted] in (11). For comparison, empirical selection probabilities using the [C.sub.p] and the BIC are included.

The following is a summary of the simulation results in Table 1:

1. The empirical results clearly support the asymptotic result previously stated. First, the unmodified bootstrap selection procedure (m = 40) performs poorly unless the optimal model is the full model (the largest model). Second, the modified bootstrap selection procedure with an m smaller than 40 clearly improves the unmodified bootstrap selection procedure unless the optimal model is the full model.

2. The [C.sub.p] and the unmodified bootstrap selection procedures perform almost the same.

3. The modified bootstrap selection procedure can be substantially better than the BIC. 4. The optimal choice of m depends on the parameter [Beta].

2.6 Discussions

2.6.1 The Choice of the Bootstrap Sample Size m. The previous discussion indicates that for the consistency of the bootstrap selection procedure, m should satisfy m [approaches] [infinity] and m/n [approaches] 0. For practical uses, m needs to be specified for a fixed n. One restriction on m is that p/m should be reasonably small; we should choose an m so that the least squares fitting of a regression model with p regressors does not have too high a variability.

Zhang (1993b) derived the convergence rates for the [C.sub.p], and BIC procedures. It would be nice if we could choose m = [m.sub.n] so that the probability [Mathematical Expression Omitted] converges to 1 in the fastest speed. But such an optimal [m.sub.n] may depend on model parameters and thus may be very difficult or impossible to determine. For example, the results in Table 1 indicate that if model {1, 2, 3, 4, 5} is not the optimal model, then m = 15 is the best choice among all the bootstrap sample sizes considered in the simulation study; otherwise, m = n = 40 is the best choice.

[TABULAR DATA FOR TABLE 1 OMITTED]

In a practical problem, statistical inference usually is required after model selection. The bootstrap sample size m for model selection can then be determined by minimizing an accuracy measure of the inference procedure after model selection. We illustrate this idea by considering the case where a confidence interval for c[prime][Beta] with a fixed vector c is required after model selection. Under model [Alpha], a bootstrap-t confidence interval for [c[prime].sub.[Alpha]][[Beta].sub.[Alpha]] with approximate level 1 - 2a (0 [less than] a [less than] 1/2) is

[Mathematical Expression Omitted],

where [c.sub.[Alpha]] is the subvector of c containing the components of c indexed by the integers in [Alpha],

[Mathematical Expression Omitted],

[Mathematical Expression Omitted] (a) is the quantile function of the bootstrap distribution

[Mathematical Expression Omitted],

[Mathematical Expression Omitted] equals [Mathematical Expression Omitted] for bootstrapping residuals and[Mathematical Expression Omitted] for bootstrapping pairs, and [Mathematical Expression Omitted] is the bootstrap analog of [Mathematical Expression Omitted]. An important accuracy measure for the confidence interval in (17) is its length,

[Mathematical Expression Omitted].

Then we can choose an [Mathematical Expression Omitted] and select a model, [Mathematical Expression Omitted], by solving

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] is the model selected using one of the modified bootstrap methods described in Section 2.4, and {[a.sub.n]} and {[b.sub.n]} are two sequences satisfying [a.sub.n] [approaches] [infinity] and [b.sub.n]/n [approaches] 0 (e.g., [a.sub.n] = log log n and [b.sub.n] = n/log log n). This choice of [a.sub.n] and [b.sub.n] ensures that [Mathematical Expression Omitted] and [Mathematical Expression Omitted], and hence the selected model [Mathematical Expression Omitted] is consistent; that is, (9) holds with [Mathematical Expression Omitted].

A similar result can be obtained if simultaneous confidence intervals for c[prime][Beta], c [element of] C are required. Hall and Pittelkow (1990) derived the following bootstrap simultaneous confidence intervals:

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] and [Mathematical Expression Omitted] satisfy

[Mathematical Expression Omitted].

Then [Mathematical Expression Omitted] and [Mathematical Expression Omitted] can be obtained by solving (19) with [Mathematical Expression Omitted].

Rao and Tibshirani (1993) developed a method of determining the parameter [Lambda] in the [FPE.sub.[Lambda]] method (Shibata 1984), which can also be used here to determine m.

2.6.2 Bootstrap Monte Carlo Approximations. Computation of [Mathematical Expression Omitted] in (15) may require a Monte Carlo approximation. For bootstrapping pairs, [Mathematical Expression Omitted] can be approximated by

[Mathematical Expression Omitted],

where B is the Monte Carlo sample size, [Mathematical Expression Omitted] is computed according to (16) with [Mathematical Expression Omitted] replaced by [Mathematical Expression Omitted], and

[Mathematical Expression Omitted]

are mB independent bootstrap data generated from the empirical distribution putting mass [n.sup.-1] on ([x.sub.i], [y.sub.i]). Note that these bootstrap data are used for computing [Mathematical Expression Omitted] for all [Alpha] [element of] A.

The bootstrap data in (20) can still be used in inference after model selection. In bootstrap inference, such as setting bootstrap confidence interval (17), we need to compute Monte Carlo approximations to [Mathematical Expression Omitted] in (18) and its quantiles. With the bootstrap data in (20), we need only generate (n - m)B additional pairs of bootstrap data,

[Mathematical Expression Omitted].

This means that the total number of bootstrap data generated for model selection and the subsequent inference is nB, which is the same as that required in bootstrap inference without performing model selection.

3. GENERAL RESULTS

We now consider more complicated situations where the relationship between the mean response and the explanatory variables can be nonlinear.

3.1 Nonlinear Regression

The following model is an extension of the linear model (1):

[[Mu].sub.i] = E([y.sub.i][where][x.sub.i]) = f([x.sub.i], [Beta]), var([y.sub.i][where][x.sub.i]) = [[Sigma].sup.2],

i = 1,...,n,

where f is a known function defined on X x B, and X and B are admissible sets for [x.sub.i] and [Beta]. Let [f.sub.i]([Beta]) = f([x.sub.i], [Beta]), A be a collection of some subsets of {1,...,p}, and let [f.sub.i[Alpha]] ([[Beta].sub.[Alpha]]) = [f.sub.[Alpha]]([x.sub.i[Alpha]], [[Beta].sub.[Alpha]]), where [Alpha] [element of] A and [f.sub.[Alpha]] is the restriction of the function f to the admissible set of ([x.sub.i[Alpha]], [[Beta].sub.[Alpha]]). Let

[A.sub.c] = {[Alpha] [element of] A: [f.sub.[Alpha]]([x.sub.[Alpha]], [[Beta].sub.[Alpha]]) = f(x, [Beta]) [for every] x [element of] X}

be the collection of correct models, and assume that [A.sub.c] is nonempty.

The simplest example is f(x, [Beta]) = [Phi](x[prime][Beta]) with a function [Phi] on R (the real line). Then [f.sub.[Alpha]]([x.sub.[Alpha]], [[Beta].sub.[Alpha]]) = [Phi]([x[prime].sub.[Alpha]][[Beta].sub.[Alpha]]), and the correctness of a model is defined the same as that in Section 2. Another example is

[Beta] = (a, b)[prime], a [element of] R, b [element of] [0, [infinity]),

x = (1, z)[prime], z [element of] (0, [infinity]),

f(x, [Beta]) = a + [e.sup.-bz].

In this case, A = {[[Alpha].sub.i], i = 1, 2, 3},

[f.sub.[Alpha]1]([x.sub.[Alpha]1], [[Beta].sub.[Alpha]1]) = a + 1 (b=0)

[f.sub.[Alpha]2]([x.sub.[Alpha]2], [[Beta].sub.[Alpha]2]) = [e.sup.-bz], (a = 0)

[f.sub.[Alpha]3]([x.sub.[Alpha]3], [[Beta].sub.[Alpha]3]) = f(x, [Beta]) = a + [e.sup.-bz].

If a = 0, then [A.sub.c] = {[[Alpha].sub.2], [[Alpha].sub.3]} and [[Alpha].sub.0] = [[Alpha].sub.2] (the correct model with the smallest size). If b = 0, then [A.sub.c] = {[[Alpha].sub.1], [[Alpha].sub.3]} and [[Alpha].sub.0] = [[Alpha].sub.1]. If a [not equal to] 0 and b [not equal to] 0, then [A.sub.c] = {[[Alpha].sub.3]} and [[Alpha].sub.0] = [[Alpha].sub.3].

The model selection problem in nonlinear regression is similar to that in linear regression. The model corresponding to [Alpha] [element of] A is

[[Mu].sub.i] = E([y.sub.i][where][x.sub.i]) = [f.sub.i[Alpha]]([[Beta].sub.[Alpha]]), var([y.sub.i][where][x.sub.i]) = [[Sigma].sup.2],

i = 1,...,n.

We wish to select a model that minimizes the loss

[Mathematical Expression Omitted]

over [Alpha] [element of] A, where [Mu] = ([[Mu].sub.1],..., [[Mu].sub.n])[prime], [Mathematical Expression Omitted],..., [Mathematical Expression Omitted], and [Mathematical Expression Omitted] is the LSE of [[Beta].sub.[Alpha]]. For any function g([Gamma]), let [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Then [Mathematical Expression Omitted] is a solution of

[Mathematical Expression Omitted].

Result (8) still holds in this case; that is, the correct model with the smallest size is the optimal model.

The two modified bootstrap model selection procedures in Section 2.4 can be extended to this case. First, consider bootstrapping pairs. Let [Mathematical Expression Omitted], i = 1,..., m, be iid from the empirical distribution putting mass [n.sup.-1] to each ([x.sub.i], [y.sub.i]), i = 1,...,n, and let [Mathematical Expression Omitted] and

[Mathematical Expression Omitted].

Define

[Mathematical Expression Omitted].

Note that the exact bootstrap estimator of [[Beta].sub.[Alpha]] is the solution of [Mathematical Expression Omitted] and [Mathematical Expression Omitted] is the result from the first-step iteration in solving the exact bootstrap estimator using Newton's method. Hence [Mathematical Expression Omitted] in (21) is an approximation to the exact bootstrap estimator and is much easier to compute. Define

[Mathematical Expression Omitted].

The model selected by this bootstrap procedure is [Mathematical Expression Omitted], which minimizes [Mathematical Expression Omitted] over [Alpha] [element of] A.

Next, consider bootstrapping residuals. Let [Mathematical Expression Omitted], i = 1,..., n, be iid from the empirical distribution putting mass [n.sup.-1] on each [Mathematical Expression Omitted], i = 1,...,n, where [Mathematical Expression Omitted] with [Alpha] = {1,...,p}, and [Mathematical Expression Omitted]. Let

[Mathematical Expression Omitted],

[Mathematical Expression Omitted],

and let

[Mathematical Expression Omitted].

The model selected by this bootstrap procedure is [Mathematical Expression Omitted] which minimizes [Mathematical Expression Omitted] over [Alpha] [element of] A.

The following regularity conditions are required in studying the consistency of bootstrap procedures:

C1. For each [Mathematical Expression Omitted] are continuous functions on X x B.

C2. For each [Alpha] [element of] [A.sub.c], [Mathematical Expression Omitted] a.s.

C3. a. For deterministic [x.sub.i], [sup.sub.i] [[[x.sub.i]]] [less than] [infinity] and lim [inf.sub.n] [[Lambda].sub.[Alpha],n] [greater than] 0, where [[Lambda].sub.[Alpha],n] is the smallest eigen-value of [M.sub.[Alpha]]([[Beta].sub.[Alpha]])/n and [Alpha] [element of] [A.sub.c].

b. For random iid [x.sub.i], there is a function [h.sub.[Alpha]] ([x.sub.[Alpha]]) such that [Mathematical Expression Omitted] and [Mathematical Expression Omitted] for [Mathematical Expression Omitted], where [[Epsilon].sub.0] [greater than] 0 is fixed and [Alpha] [element of] [A.sub.c]. Also, [Mathematical Expression Omitted].

C4. For any incorrect model [Alpha],

[Mathematical Expression Omitted]

Regularity conditions C1-C3 are types of conditions in establishing asymptotic normality of [Mathematical Expression Omitted] and its bootstrap analog. Condition C4 is reasonable because

[Mathematical Expression Omitted]

for any correct model [Alpha]. Under the linear model (1), C1-C2 are clearly satisfied; C4 is the same as (7); and C3 can be replaced by [Mathematical Expression Omitted].

The following result shows the consistency of the two bootstrap model selection procedures.

Theorem 1. Assume that conditions C1-C4 hold and that m [approaches] [infinity] and m/n [approaches] 0 as n [approaches] [infinity]. Then (9) holds with [Mathematical Expression Omitted].

3.2 Generalized Linear Models

A generalized linear model is characterized by the following structure: the responses [y.sub.1],..., [y.sub.n] are independent and

[[Mu].sub.i] = E([y.sub.i][where][x.sub.i]) = [Mu]([[Eta].sub.i]), [Mathematical Expression Omitted],

i = 1,...,n, (22)

where [Phi] [greater than] 0 is an unknown scale parameter; [Mu]([Eta]) is a known differentiable function with derivative [Mathematical Expression Omitted]; the [[Eta].sub.i] are related to [x.sub.i], the values of explanatory variables, by a known injective and third-order continuously differentiable link function f,

f([Mu]([[Eta].sub.i])) = [x[prime].sub.i][Beta]; (23)

and [Beta] is a p vector of unknown parameters. Examples of generalized linear models, including logit models, log-linear models, gamma-distributed data models, and survival data models, have been provided by McCullagh and Nelder (1989). The linear model (1) is clearly a special case of model (22)-(23).

Let A be a collection of subsets of {1,..., p} and let

[[Mu].sub.i] = [Mu]([[Eta].sub.i[Alpha]]), [Mathematical Expression Omitted],

[Mathematical Expression Omitted],

be the model corresponding to [Alpha], where the [x.sub.i[Alpha]] and [[Beta].sub.[Alpha]] are defined the same as before. The correctness of a model is defined the same as that in Section 2, and the optimal model is still the correct model with the smallest size.

Note that in model (22)-(23) the distribution of [y.sub.i] is not specified. Hence we may not be able to obtain the maximum likelihood estimator of [[Beta].sub.[Alpha]]. We consider the general estimation equation approach. That is, under model [Alpha], [[Beta].sub.[Alpha]] is estimated by [Mathematical Expression Omitted], a solution of

[summation of] [x.sub.i[Alpha]][Psi] ([x[prime].sub.i[Alpha]][Gamma]) [[y.sub.i] - [f.sup.-1]([x[prime].sub.i[Alpha]][Gamma])] = 0 where i = 1 to n,

where [Psi] is the first-order derivative of [(f o [Mu]).sup.-1]. [Mathematical Expression Omitted] can be called a weighted least squares estimator of [[Beta].sub.[Alpha]].

The modified bootstrap model selection procedures in Section 2.4 can be used here for selecting a model from A; that is, we select a model that minimizes

[Mathematical Expression Omitted]

over [Alpha] [element of] A, where [Mathematical Expression Omitted], [Mathematical Expression Omitted] [Mathematical Expression Omitted] and [Mathematical Expression Omitted] is a bootstrap analog of [Mathematical Expression Omitted] obtained by either bootstrapping residuals or bootstrapping pairs. For bootstrapping residuals, we generate iid [Mathematical Expression Omitted],..., [Mathematical Expression Omitted] from the distribution putting mass [n.sup.-1] to [Mathematical Expression Omitted], where [Mathematical Expression Omitted], and [Mathematical Expression Omitted] and [v.sub.i] [Mathematical Expression Omitted] and [v.sub.i[Alpha]] with [Alpha] = {1,...,p}.. Then we can define [Mathematical Expression Omitted] to be the linear bootstrap estimator

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted]. For bootstrapping pairs, we generate iid pairs [Mathematical Expression Omitted],..., [Mathematical Expression Omitted] from the distribution putting mass [n.sup.-1] to each ([x.sub.i], [y.sub.i]) and define [Mathematical Expression Omitted] to be the linear bootstrap estimator

[Mathematical Expression Omitted].

Theorem 2. Assume that conditions C2-C3 hold, with [Mathematical Expression Omitted] replaced by [Mathematical Expression Omitted], and that

[Mathematical Expression Omitted]

for any incorrect [Alpha]. If m [approaches] [infinity] and rn/n [approaches] 0, then (9) holds, with [Mathematical Expression Omitted] the model selected by the modified bootstrap.

The proof of Theorem 2 is very similar to the proof of Theorem 1 given in the Appendix and thus is omitted.

3.3 Autoregressive Time Series

A series {[y.sub.t], t = 0, [+ or -]1, [+ or -]2,...} is called an autoregressive time series of order p if

[y.sub.t] = [[Theta].sub.1][y.sub.t-1] + [[Theta].sub.2][y.sub.t-2] + ... + [[Theta].sub.p][y.sub.t-p] + [[Epsilon].sub.t],

t = 0, [+ or -]1, [+ or -]2,..., (24)

where p is a fixed positive integer, [[Theta].sub.i], i = 1,...,p, are unknown parameters, and the [[Epsilon].sub.t] are iid random variables with mean zero and variance [[Sigma].sup.2]. The observed data are [y.sub.1-p],..., [y.sub.0], [y.sub.1],..., [y.sub.n].

In many practical problems the order of an autoregressive series is unknown and must be estimated using the data. Estimating the order can be formulated as a model selection problem in which we select a model [Alpha] from A = { 1,..., p} and each [Alpha] corresponds to the autoregressive model of order [Alpha]:

[y.sub.t] = [[Theta].sub.1][y.sub.t-1] + ... + [[Theta].sub.[Alpha]][y.sub.t-[Alpha]] + [[Epsilon].sub.t], t = 0, [+ or -]1, [+ or -]2,... (25)

Under model [Alpha], [[Beta].sub.[Alpha]] = ([[Theta].sub.1],..., [[Theta].sub.[Alpha]])[prime] is estimated by the LSE

[Mathematical Expression Omitted],

where

[S.sub.[Alpha]] = [summation of] [z.sub.t[Alpha]][z[prime].sub.t[Alpha]] where t = 1 to n and [z.sub.t[Alpha]] = ([y.sub.t-1],..., [y.sub.t-[Alpha]])[prime].

We assume that [Alpha] = p is the largest possible model. The optimal model is

[[Alpha].sub.0] = max{j: 1 [less than or equal to] j [less than or equal to] p, [[Theta].sub.j] [not equal to] 0}.

The modified bootstrap model selection procedure can be extend to this problem as follows. Let [Mathematical Expression Omitted], t = 0, [+ or -]1, [+ or -]2,..., be iid from the distribution putting mass [n.sup.-1] to [Mathematical Expression Omitted], i = 1,..., n, where [Mathematical Expression Omitted] is the ith residual under the largest model [Alpha] = p. The bootstrap analog [Mathematical Expression Omitted] of [Mathematical Expression Omitted] is defined by (26) with n replaced by m and with [y.sub.t] replaced by

[Mathematical Expression Omitted], t = 1 - [Alpha],..., 0, 1,..., m,

where [Mathematical Expression Omitted]. The model selected by the bootstrap, denoted by [Mathematical Expression Omitted], is then the minimizer of

[Mathematical Expression Omitted]

over [Alpha] = 1,.... p

Theorem 3. Assume that the roots of 1 + [[Theta].sub.1]z + [[Theta].sub.2][z.sup.2] + ... + [[Theta].sub.p][z.sup.p] = 0 are outside of the unit circle, E[[absolute value of [[Epsilon].sub.1]].sup.2(s+1)] [less than] [infinity] for some s [greater than or equal to] 3, and that [Mathematical Expression Omitted] satisfies Cramer's condition; that is, for every c [greater than] 0, there exists [[Delta].sub.c] [greater than] 0 such that [Mathematical Expression Omitted]. If m [approaches] [infinity] and m/n [approaches] 0, then the bootstrap model selection procedure is consistent; that is, (9) holds with [Mathematical Expression Omitted].

The result in Theorem 3 can be easily extended to the case where a constant term [Mu] is added to models given by (24) and (25).

4. CONCLUSIONS

We have studied bootstrap model selection procedures in linear regression, nonlinear regression, generalized linear models, and autoregressive time series models. We have shown that the procedure that selects a model by minimizing Efron's (1982, 1983) estimators of prediction error is inconsistent as the sample size tends to infinity. We have proposed two consistent modified bootstrap selection procedures. For bootstrapping pairs, we suggest generating m pairs of bootstrap data; for bootstrapping residuals, we suggest multiplying the residuals by a factor [square root of n/m], where m satisfies m/n [right arrow] 0 and m [approaches] [infinity].

APPENDIX: PROOFS

Throughout this article, [E.sub.*] and [var.sub.*] should be understood as the asymptotic expectation and variance (see Akahira and Takeuchi 1991), conditioned on [y.sub.1],..., [y.sub.n] (and [x.sub.1],..., [x.sub.n] if they are random). Thus if [Mathematical Expression Omitted] is a function of the bootstrap sample, then

[Mathematical Expression Omitted],

where [Mathematical Expression Omitted] (1) denotes a quantity [Mathematical Expression Omitted] satisfying [Mathematical Expression Omitted] for any [Epsilon] [greater than] 0.

Proof of (14)

We provide a proof only for bootstrapping pairs; the proof for bootstrapping residuals is similar. From the definition of [Mathematical Expression Omitted] and [Mathematical Expression Omitted],

[Mathematical Expression Omitted].

Because [Mathematical Expression Omitted]

[Mathematical Expression Omitted]

where the last equality holds when [Alpha] [element of] [A.sub.c] and [Mathematical Expression Omitted]. Because [Mathematical Expression Omitted].

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted].

For a correct model [Alpha], [Mathematical Expression Omitted], and thus

[Mathematical Expression Omitted],

assuming that [Mathematical Expression Omitted]. This proves (14).

Proof of Theorem 1

First, consider bootstrapping pairs. From Conditions C1-C3, (21) gives that for [Alpha] [element of] [A.sub.c],

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted].

Because

[Mathematical Expression Omitted],

we can similarly show that

[Mathematical Expression Omitted].

Then, for [Alpha] [element of] [A.sub.c],

[Mathematical Expression Omitted].

The same result also holds for bootstrapping residuals. Let [[Epsilon].sub.i] = [y.sub.i] - [f.sub.i]([Beta]). For [Alpha] [element of] [A.sub.c],

[Mathematical Expression Omitted],

[Mathematical Expression Omitted],

and

[Mathematical Expression Omitted].

Hence for [Alpha] [element of] [A.sub.c],

[Mathematical Expression Omitted].

Because [Mathematical Expression Omitted] a.s., it follows from C4 that

[Mathematical Expression Omitted]

for [Alpha] [not element of] [A.sub.c]. From the definition of [Mathematical Expression Omitted],

[Mathematical Expression Omitted].

Hence for [Alpha] [not an element of] [A.sub.c],

[Mathematical Expression Omitted].

By (A.2), for [Alpha] [element of] [A.sub.c] and [Alpha] [not equal to] [[Alpha].sub.0],

[Mathematical Expression Omitted].

This proves that (9) holds.

The result for bootstrapping residuals can be shown similarly.

Proof of Theorem 3

Let [[Sigma].sub.[Alpha]] be the [Alpha] x [Alpha] matrix whose (i,j)th element is cov([y.sub.i], [y.sub.j])/[[Sigma].sup.2] and let [Mathematical Expression Omitted] be the [Alpha] x [Alpha] matrix whose (i, j)th element is [Mathematical Expression Omitted]. Bose (1988) showed that

[Mathematical Expression Omitted]

where [H.sub.[Alpha]](x) is the distribution of [Mathematical Expression Omitted] and [Mathematical Expression Omitted] is the bootstrap distribution of [Mathematical Expression Omitted].

When [Alpha] [greater than or equal to] [[Alpha].sub.0], using result (A.3), we obtain that

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

[Mathematical Expression Omitted],

where the last equality follows from the fact that when [Alpha] [greater than or equal to] [[Alpha].sub.0],

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted].

Because both [Mathematical Expression Omitted] and [n.sup.-1][S.sub.[Alpha]]/[[Sigma].sup.2] converge to [[Sigma].sub.[Alpha]] (Bose 1988), we have

[Mathematical Expression Omitted].

When [Alpha] [less than] [[Alpha].sub.0],

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted]

(Wei 1992). It follows from (A.4)-(A.6) that

[Mathematical Expression Omitted].

REFERENCES

Adkins, L. C., and Hill, R. C. (1990), "An Improved Confidence Ellipsoid for the Linear Regression Models," Journal of Statistical Computation and Simulations, 36, 9-18.

Akahira, M., and Takeuchi, K. (1991), "On the Definition of Asymptotic Expectation," in Asymptotic Theory of Statistical Estimation, ed. M. Akahira, Institute of Mathematics, Univ. of Tsukuba, Japan.

Akaike, H. (1970), "Statistical Predictor Identification," Annals of the Institute of Statistical Mathematics, 22, 203-217.

Allen, D. M. (1974), "The Relationship Between Variable Selection and Data Augmentation and a Method for Prediction," Technometrics, 16, 125-127.

Arcones, M. A., and Gine, E. (1989), "The Bootstrap of the Mean With Arbitrary Bootstrap Sample Size," Annals of the Institute of Henri Poincare, 25, 457-481.

Bickel, P. J., and Freedman, D. A. (1981), "Some Asymptotic Theory for the Bootstrap," The Annals of Statistics, 9, 1196-1217.

Bose, A. (1988), "Edgeworth Correction by Bootstrap in Autoregressions," The Annals of Statistics, 16, 1709-1722.

Bunke, O., and Droge, B. (1984), "Bootstrap and Cross-Validation Estimates of the Prediction Error for Linear Regression Models," The Annals of Statistics, 12, 1400-1424.

Burman, P. (1989), "A Comparative Study of Ordinary Cross-Validation, v-Hold Cross-Validation and Repeated Learning-Testing Methods," Biometrika, 76, 503-514.

Craven, P., and Wahba, G. (1979), "Smoothing Noisy Data With Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Cross-Validation," Numerical Mathematics, 31, 377-403.

Deheuvels, P., Mason, D. M., and Shorack, G. R. (1993), "Some Results on the Influence of Extremes on the Bootstrap," Annals of the Institute Henri Poincare, 29, 83-103.

Efron, B. (1979), "Bootstrap Methods: Another Look at the Jackknife," The Annals of Statistics, 7, 1-26.

----- (1982), The Jackknife, the Bootstrap, and Other Resampling Plans, Philadelphia: SIAM.

----- (1983), "Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation," Journal of the American Statistical Association, 78, 316-331.

Freedman, D. A. (1981), "Bootstrapping Regression Models," The Annals of Statistics, 9, 1218-1228.

Geisser, S. (1975), "The Predictive Sample Reuse Method With Applications," Journal of the American Statistical Association, 70, 320-328.

Gunst, G. F., and Mason, R. L. (1980), Regression Analysis and Its Applications, New York: Marcel Dekker.

Hall, P. (1989), "Unusual Properties of Bootstrap Confidence Intervals in Regression Problem," Probability Theory and Related Fields, 81, 247-273.

----- (1990), "Using the Bootstrap to Estimate Mean Squared Error and Select Smoothing Parameters in Nonparametric Problems," Journal of Multivariate Analysis, 32, 177-203.

Hall, P., and Pittelkow, Y. E. (1990), "Simultaneous Bootstrap Confidence Bands in Regression," Journal of Statistical Computation and Simulation, 37, 99-113.

Hannan, E. J., and Quinn, B. G. (1979), "The Determination of the Order of an Autoregression," Journal of the Royal Statistical Society, Ser. B, 41, 190-195.

Huang, J. S., Sen, P. K., and Shao, J. (1996), "Bootstrapping a Sample Quantile When the Density Has a Jump," Statistica Sinica, 6, 299-309.

Mallows, C. L. (1973), "Some Comments on [C.sub.p]," Technometrics, 15, 661-675.

McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, 2nd ed., London: Chapman and Hall.

Potscher, B. M. (1989), "Model Selection Under Nonstationary: Autoregressive Models and Stochastic Linear Regression Models," The Annals of Statistics, 17, 1257-1274.

Rao, C. R., and Wu, Y. (1989), "A Strongly Consistent Procedure for Model Selection in a Regression Problem," Biometrika, 76, 369-374.

Rao, J. S., and Tibshirani, R. (1993), "Bootstrap Model Selection via the Cost Complexity Parameter in Regression," technical report, University of Toronto.

Schwartz, G. (1978), "Estimating the Dimensions of a Model," The Annals of Statistics, 6, 461-464.

Shao, J. (1993), "Linear Model Selection by Cross-Validation," Journal of the American Statistical Association, 88, 486-494.

----- (1994), "Bootstrap Sample Size in Non-Regular Cases," Proceedings of American Mathematical Society. 122, 1251-1262.

Shibata, R. (1984), "Approximate Efficiency of a Selection Procedure for the Number of Regression Variables," Biometrika, 71, 43-49.

Stone, M. (1974), "Cross-Validation Choice and Assessment of Statistical Predictions," Journal of the Royal Statistical Society, Ser. B, 36, 111-147.

Swanepoel, J. W. H. (1986), "A Note on Proving That the (Modified) Bootstrap Works, Communications in Statistics, Part A - Theory and Methods, 15, 3193-3203.

Wei, C. Z. (1992), "On Predictive Least Squares Principles," The Annals of Statistics, 20, 1-42.

Zhang, P. (1993a), "Model Selection Via Multifold Cross-Validation," The Annals of Statistics, 21,299-313.

----- (1993b), "On the Convergence Rate of Model Selection Criteria," Communications in Statistics, Part A - Theory and Methods, 22, 2765-2775.

Jun Shao is Associate Professor, Department of Statistics, University of Wisconsin, Madison, WI 53706. The author would like to thank R. Tib-shirani for conversations that led to this study and the referees for helpful comments. The research was supported by NSF Grant DMS-9504425.

-1-

End of free preview...

 To continue reading this publication, you must have a Questia Subscription.

Try Us Today! Click Here

Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.

Already a subscriber? Login:

Sponsored Links
Read more than 5,000 classic books FREE!
Free Newsletter
Get helpful how-to's, writing tips, search strategies, quizzes & more!
Search the Library

Customize your search: Search within the topic


Search in:
Books Journals Magazines
Newspapers Encyclopedia Research Topics
  • Type your specific word or phrase in the box above after the word and, then click Search.
  • Put exact phrases in double quotation marks. Do not put single words in quotation marks.
Back to top