On Measuring and Correcting the Effects of Data Mining and Model Selection

Journal article by Jianming Ye; Journal of the American Statistical Association, Vol. 93, 1998

Journal Article Excerpt


On Measuring and Correcting the Effects of Data Mining and Model Selection.

by Jianming Ye

1. INTRODUCTION

In the theory of linear models, the concept of degrees of freedom plays an important role. This concept has several different interpretations. The degrees of freedom in regression are the number of variables in the model. Accordingly, degrees of freedom are often used as a model complexity measure in various model selection criteria, such as Akaike information criterion (AIC) (Akaike 1973), [C.sub.p] (Mallows 1973), and Bayesian information criterion (BIC) (Schwarz 1978), generalized cross-validation (GCV) (Craven and Wahba 1979), and risk inflation criterion (RIC) (George and Foster 1994). Degrees of freedom can also be interpreted as the cost of the estimation process and thus can be used for obtaining an unbiased estimation of the error variance. Finally, the degrees of freedom in regression are the trace of the so-called "hat" matrix; that is, the sum of the sensitivity of each fitted value with respect to the corresponding observed value.

An extension of degrees of freedom to general model structures is useful both practically and theoretically. The last two decades have brought rapid progress in modeling high-dimensional data by means of complex statistical procedures. These procedures typically require minimum assumptions about the structures of the underlying models and try to capture the structures through adaptive fitting. But their flexibility often leads to substantial overfitting. The complex nature of these procedures makes it difficult to study their statistical behavior and to assess their performance objectively.

Traditionally, for general modeling problems, statisticians tend to define degrees ...









































End of free preview...

 To continue reading this publication, you must have a Questia Subscription.

Try Us Today! Click Here

Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.

Already a subscriber? Login:

Sponsored Links
Read more than 5,000 classic books FREE!
Free Newsletter
Get helpful how-to's, writing tips, search strategies, quizzes & more!
Search the Library

Customize your search: Search within the topic


Search in:
Books Journals Magazines
Newspapers Encyclopedia Research Topics
  • Type your specific word or phrase in the box above after the word and, then click Search.
  • Put exact phrases in double quotation marks. Do not put single words in quotation marks.
Back to top