Estimation and Inference for Logistic Regression with Covariate Misclassification and Measurement Error in Main Study/Validation Study Designs

Journal article by Donna Spiegelman, Bernard Rosner, Roger Logan; Journal of the American Statistical Association, Vol. 95, 2000

Journal Article Excerpt


Estimation and Inference for Logistic Regression with Covariate Misclassification and Measurement Error in Main Study/Validation Study Designs.

by DONNA SPIEGELMAN , BERNARD ROSNER , ROGER LOGAN

In epidemiological studies, continuous covariates often are measured with error and categorical covariates often are misclassified. Using the logistic regression model to represent the relationship between the binary outcome and the perfectly measured and classified covariates, the model for the observed main study data is derived. This derivation relies on the assumption that the error in the continuous covariates is multivariate normally distributed and uses a chain of logistic regression models to describe the misclassification processes. These model assumptions are empirically verified in the validation study, where the misclassified and mismeasured covariates are validated using perfectly measured and classified data. The full data likelihood, including contributions from both the main study and the validation study, is maximized to obtain the maximum likelihood estimates for the parameters of the underlying logistic regression model and of the measurement error model and reclassification models simulta neously. Standard asymptotic theory is applied. An example of this methodology is presented from the Nurses' Health Study investigating the relationship between cumulative incidence of breast cancer and saturated fat, total energy, and alcohol intake. A detailed simulation study was conducted to investigate the small-sample properties of these likelihood-based estimates and inferential quantities. No single estimation/inference option performed satisfactorily when the main study/validation study size was representative of that typically encountered in practice. When the validation size was twice or larger than from the usual one, features of asymptotic optimality were more apparent. By example and through simulation, the procedures appeared to be robust to misspecification of the order of the chain of conditional measurement error/reclassification models.

KEY WORDS: Breast cancer; Epidemiology; Logistic regression; Maximum likelihood; Measurement error; Misclassification; Regression calibration.

1. INTRODUCTION

Numerous statistical investigators have proposed methods for estimation and, to a lesser extent, inference for binary response data with covariate measurement error in one or more continuous exposure variables. (See Carroll, Ruppert, and Stefanski 1995 for a recent comprehensive review.) Computational barriers have impeded the use of maximum likelihood (ML) methods for routine data analysis of binary response models with measurement error, usually taken to be Gaussian, and/or misclassification in model covariates. Because of these barriers, investigators have devised other estimators that are more computationally feasible but still inconsistent and inefficient (Armstrong 1985; Stefanski 1989; Stefanski and Carroll 1985). These estimators are approximately consistent under additional assumptions, involving small measurement error or small relative risk. One inconsistent estimator has enjoyed widespread popularity in applications, because of its intuitive appeal and computational simplicity (Rosner, Spiegelman , and Willett 1990, 1992). Because the maximum likelihood estimator (MLE) is both consistent and efficient, in the absence of obstacles to its implementation it is a desirable choice.

In this article we present efficient computational strategies to facilitate routine maximum likelihood calculations for binary response data with covariate measurement error and reclassification processes of arbitrary complexity in one or more model covariates. Likelihood-based estimation and inference in this setting has the additional advantage of offering full flexibility in model choices. In particular, the model that we present accommodate both measurement error in continuous model covariates and misclassification in categorical ones. These models arise in the analysis of data obtained to study the epidemiology of chronic diseases in relation to nutritional, environmental, and other determinants.

In Section 2 we develop the models. We give a detailed example using data investigating the relationship between dietary fat intake and breast cancer incidence in Section 3, and describe results from a simulation study in Section 4. In Section 5 we present our conclusions and propose directions for further research.

2. THE MODEL

As is customary in epidemiology, in the absence of measurement error the data are assumed to follow a logistic regression model,

[f.sub.1](D\w, x, [u.sub.1]; [beta]) = [e.sup.([[beta].sub.0]+[[beta]'.sub.1]x+[[beta]'.sub.2]w+[[beta]'.sub .3][u.sub.1])D]/1 + [e.sup.[[beta].sub.0]+[[beta]'.sub.1]x+[[beta]'.sub.2]w+[[beta]'.sub. 3][u.sub.1]], (1)

where D is the binary response variable and u1 is a vector of model covariates that are assumed always perfectly measured, with dim([u.sup.1]) = s. We call [f.sub.1] the relative risk model.

Throughout, the binary covariate q-vector is denoted by w, the continuous covariate p-vector is denoted by x, lover case notation refers to error-free measurements, and upper case notation refers to error-prone measurements. Note that [beta], or some elements thereof, are the parameters of interest in the research. Other models for D given (x, w, [u.sub.1]) may be considered, although this one typically appears in biomedical applications, and empirical verification of fit as ...












































































































































































































































End of free preview...

 To continue reading this publication, you must have a Questia Subscription.

Try Us Today! Click Here

Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.

Already a subscriber? Login:

Sponsored Links
Read more than 5,000 classic books FREE!
Free Newsletter
Get helpful how-to's, writing tips, search strategies, quizzes & more!
Search the Library

Customize your search: Search within the topic


Search in:
Books Journals Magazines
Newspapers Encyclopedia Research Topics
  • Type your specific word or phrase in the box above after the word and, then click Search.
  • Put exact phrases in double quotation marks. Do not put single words in quotation marks.
Back to top



Sponsored Link