Prospective Analysis of Logistic Case-Control Studies

Article excerpt

[Mathematical Expression Omitted].

Let [[Psi].sub.12] = [[Psi].sub.22]([center dot], [Theta]) be the vector of size 2M whose (dM + m)th element equals I(D = d) {I(Z = m) - [[Theta].sub.2,md]}.

Let [Mathematical Expression Omitted]. Denote the logistic argument in (A.12) by [H.sub.*](Z, X, [Theta]) and write [Mathematical Expression Omitted].

At the end of this section, the estimating equation is shown to be unbiased, the particular method being to condition on all [Mathematical Expression Omitted] or, equivalently, on all the [Delta]'s. In addition, with 1. INTRODUCTION

In a classical prospective logistic regression study, a random sample from a source population is taken and the status of a binary outcome D is ascertained, along with the values of covariates (Z, X), these being related via the logistic regression model

[Mathematical Expression Omitted],

where H([center dot]) is the logistic distribution function. The classical case-control study (choice-based sample in econometrics) begins with the model (1), but instead uses retrospective sampling. Specifically, one first obtains a set of cases (D = 1) and controls (D = 0), and then samples from within the cases and controls to observe the covariates. The analysis of case-control studies of this type was described by Prentice and Pyke (1979), who showed that if one ignored the case-control sampling scheme and analyzed the data as if it came from a prospective sampling scheme, then the resulting estimates of ([[Theta].sub.11], [[Theta].sub.12]) are consistent and the usual standard errors are asymptotically correct.

For prospective logistic regression studies, many other types of analyses and sampling schemes are possible. Here are a few examples:

* One might replace the classical logistic regression parameter estimates by robust methods of estimation (Copas 1988; Carroll and Pederson 1993, Kunsch, Stefanski, and Carroll 1989).

* When X is measured with error, there is a large literature dealing with techniques for measurement error corrections in logistic regression (e.g., Carroll and Stefanski 1994; Rosner, Willett, and Spiegelman 1989; Satten and Kupper 1993; Stefanski and Carroll, 1987).

* In problems with partially missing data, one can use likelihood techniques (Little and Rubin 1987) or unbiased estimating equations due to Robins, Rotnitzky, and Zhao (1994).

Although the prospective analyses of these prospective techniques have been worked out, there is to date no corresponding general theory for whether they even lead to consistent estimates when applied to case-control studies and, if they do, whether these prospectively calculated standard errors are asymptotically correct in case-control studies. Our aim is to provide one version of such a theory, and in particular to answer the question: When can prospective analyses be used in case-control studies without having to adjust for the retrospective sampling structure?

We will show that, in general, using prospectively derived standard errors is at worst asymptotically conservative; that is, the standard errors are at worst too large. In addition, we derive a simple sufficient condition guaranteeing that prospective standard errors are asymptotically correct.

In the Appendix we sketch an informal argument derived from a semiparametric perspective that suggests that prospectively computed standard errors are retrospectively correct whenever the distribution of (Z, X) is left unrestricted. Much of this article is a formalization of this argument, along with consideration of cases that are not so easily categorized. The key feature of our analysis is that we start with a general class of unbiased estimating equations, instead of working with specific examples. The results allow for general patterns of missing data as well as for stratified studies. The asymptotic distribution theory is almost trivial to derive in this general framework, thus facilitating the identification of a simple sufficient condition for checking whether prospectively derived standard errors are asymptotically correct. …