# Bootstrap Methods for Developing Predictive Models

By Austin, Peter C.; Tu, Jack V. | The American Statistician, May 2004 | Go to article overview

# Bootstrap Methods for Developing Predictive Models

Austin, Peter C., Tu, Jack V., The American Statistician

1. INTRODUCTION

Researchers frequently develop regression models to predict dichotomous outcomes. Investigators need to maintain a balance between including too many variables and model parsimony (Murtaugh 1998; Wears and Lewis 1999). Omitting important prognostic factors results in a systematic misestimation of the regression coefficients and biased prediction, but including too many predictors will result in loss of precision in the estimation of the regression coefficients and the predictions of new responses (Murtaugh 1998).

Automated variable selection methods such as backwards elimination are frequently used for the purpose of identifying independent predictors or for developing parsimonious regression models (Miller 1984, 2002; Hocking 1976). Several studies have shown that automatic variable selection methods in ordinary least squares regression result in spurious noise variables being mistakenly identified as independent predictors of the outcome (Derksen and Keselman 1992; Flack and Chang 1987) and that global measures of goodness of fit are overly optimistic (Flack and Chang 1987; Copas and Long 1991). Similarly, the use of automated variable selection methods with logistic regression results in the identification of nonreproducible models (Austin and Tu in press).

The purpose of this article is to propose a method for developing predictive models that combines bootstrap resampling with automated variable selection methods. The article is divided into three sections. First, we describe a model selection method that uses backwards elimination on multiple bootstrap samples. Second, we apply our methods to a clinical dataset to develop a model for predicting mortality within 30 days of a heart attack. Third, we summarize our results.

2. BOOTSTRAP METHODS FOR MODEL SELECTION

The bootstrap is a well-known statistical method used to assess the variability of test statistics (Efron and Tibshirani 1993; Davison and Hinkley 1997). The nonparametric bootstrap allows one to estimate an empirical distribution function by repeated sampling from the observed data. The use of bootstrap methods allows one to approximate the distribution of test statistics in settings in which analytic calculations are intractable or in small samples in which large-scale asymptotic results may not hold.

Earlier studies have described the instability of automated variable selection methods. Studies have demonstrated that spurious noise variables are mistakenly identified as independent predictors of the outcome (Derksen and Keselman 1992; Flack and Chang 1987). Furthermore, the number of noise variables included increased as the number of candidate variables increased, and the probability of correctly identifying variables was inversely proportional to the number of variables under consideration (Murtaugh 1998).

Our proposed model selection method is based upon drawing repeated bootstrap samples from the original dataset. Within each bootstrap sample, backwards elimination is used to develop a parsimonious predictive model. For each candidate variable, the proportion of bootstrap samples in which that variable was identified as an independent predictor of the outcome is determined. Candidate variables are then ranked according to the proportion of bootstrap samples in which they were identified as independent predictors of the outcome. A preliminary predictive model would consist of those variables that were identified as significant predictors in all bootstrap samples. Variables could then be sequentially added to this preliminary model according to the proportion of bootstrap samples in which they were selected as significant predictors. Each candidate model can then be assessed for its predictive accuracy and a final model identified. Our approach is a simplification of one proposed by Sauerbrei and Schumacher (1992) for identifying strong and weak factors for predicting survival, based upon repeated bootstrap sampling. …

• Questia's entire collection
• Automatic bibliography creation
• More helpful research tools like notes, citations, and highlights

If you are trying to select text to create highlights or citations, remember that you must now click or tap on the first word, and then click or tap on the last word.
One moment ...
Default project is now your active project.
Project items

Highlights (0)
Some of your highlights are legacy items.
Citations (0)
Some of your citations are legacy items.
Notes (0)
Bookmarks (0)

Project items include:
• Saved book/article
• Highlights
• Quotes/citations
• Notes
• Bookmarks
Notes

#### Cited article

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

(Einhorn, 1992, p. 25)

(Einhorn 25)

1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

#### Cited article

Bootstrap Methods for Developing Predictive Models
Settings

#### Settings

Typeface
Text size Reset View mode
Search within

Look up

#### Look up a word

• Dictionary
• Thesaurus
Please submit a word or phrase above.

Why can't I print more than one page at a time?

Full screen

## Cited passage

Style
Citations are available only to our active members.
Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn, 1992, p. 25).

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences." (Einhorn 25)

"Portraying himself as an honest, ordinary person helped Lincoln identify with his audiences."1

1. Lois J. Einhorn, Abraham Lincoln, the Orator: Penetrating the Lincoln Legend (Westport, CT: Greenwood Press, 1992), 25, http://www.questia.com/read/27419298.

## Thanks for trying Questia!

Please continue trying out our research tools, but please note, full functionality is available only to our active members.

Your work will be lost once you leave this Web page.