Survival Analysis with Median Regression Models

Article excerpt

1. INTRODUCTION

An alternative to the Cox proportional hazards model in the analysis of survival data is the so-called accelerated failure time (AFT) model. The AFT model simply regresses the logarithm of the survival time on its covariates and is rather appealing to practitioners due to its ease of interpretation. This log-linear model has been studied extensively with observations subject to right censorship (Buckley and James 1979; Koul, Susarla, and Van Ryzin 1981; Leurgans 1987; Miller 1976; Prentice 1978). Recently, some useful semiparametric inference procedures for the AFT model have been proposed. These methods have sound theoretical justification and can be implemented efficiently using a simulated annealing algorithm (Lai and Ying 1991; Lin and Geyer 1992; Ritov 1990; Robins and Tsiatis 1992; Tsiatis 1990; Wei, Ying, and Lin 1990).

Since the AFT model relates the mean of the logarithm of the failure time to the covariates, the presence of censoring usually precludes any serious attempt to estimate the intercept parameter in the model (Meier 1975). Furthermore, all of the semiparametric inference procedures for the AFT model in the literature are derived under a rather strong assumption that the error terms are identically distributed.

The median is a simple and meaningful measure for the center of a long-tailed survival distribution. Moreover, the median, unlike the mean, can be well estimated provided that the censoring is not too heavy. Therefore, a natural alternative to the AFT model is to regress the median of the failure time or a monotone transformation thereof on the covariates.

In this article we propose semiparametric inference procedures for median regression models with possibly censored observations and examine their large-sample properties. In particular, we show that the new proposals are valid even when the error term in the model depends on the covariates. Furthermore, for practical sample sizes, we demonstrate through numerical studies that the new methods are more robust than their counterparts for the AFT model. The procedures presented in this article are illustrated with a lung cancer data set.

For noncensored data, robust estimation of regression models has been the focus of recent attention in econometric research (Bassett and Koenker 1978; Koenker and Bassett 1978). Powell (1984, 1986) used the least absolute deviations (LAD) method to analyze data with censored observations when the censoring variables are always observable.

This article is organized as follows. In Section 2 we propose an estimating equation, the solution of which extends the usual LAD estimator to censored regression data. Procedures for obtaining point and interval estimators are presented and illustrated with a real example. In Section 3 we document some simulation studies, and in Section 4 we present concluding remarks. All technical derivations are summarized in the Appendixes.

2. MEDIAN REGRESSION ANALYSIS

Let [T.sub.i] be the ith failure time or a transformation of it, where i = 1, . . . , n. For [T.sub.i], we observe a bivariate vector ([Y.sub.i], [[Delta].sub.i]), where [Y.sub.i] = min([T.sub.i], [C.sub.i]) and [[Delta].sub.i] = 1 if [T.sub.i] = [Y.sub.i] and 0 otherwise. The C's are censoring variables that have a common distribution function (1 - G) . We assume that [C.sub.i] and [T.sub.i] are independent. Let [X.sub.i] be a p x 1 vector of covariates for [T.sub.i]. Also, assume that {([T.sub.i], [C.sub.i], [X.sub.i]), i = 1, . . . , n} are iid. Conditional on [X.sub.i], let the median of the conditional distribution of [T.sub.i] be denoted by [m.sub.i]. Suppose that there exists an unknown (p + 1) x 1 vector [Beta](=([[Beta].sub.0], [[Beta].sub.1], . . . , [[Beta].sub.p])[prime]) such that

[m.sub.i] = [Beta][prime][Z.sub.i], (1)

where [Z.sub.i] = (1, [X[prime].sub.i])[prime] and i = 1, . . . , n. Furthermore, let [[Beta].sub.0] be the true value of [Beta] and let the conditional distribution of [T. …