Time-Dependent Hazard Ratio: Modeling and Hypothesis Testing with Application in Lupus Nephritis. by Michal Abrahamowicz , Todd MacKenzie , John M. Esdaile 1. INTRODUCTION The proportional hazards model is the most popular regression method for analyzing censored survival data. It requires the assumption (the proportional hazards, or PH, assumption) that the hazard ratio, which describes the effect of a predictor on survival, is constant over the entire follow-up period (Cox 1972). However, the PH assumption may not hold in some survival studies. Predictors of interest are often measured only at time zero, even if their values may later change, so that a time-dependent covariate must be treated as a fixed one. Further, the impact of a predictor, which is by definition fixed in time, may change during follow-up. For example, in studies of chronic diseases, the hazard ratio of disease over control may increase if damage to the affected organs increases with disease duration. Numerous tests of the PH hypothesis have been proposed (Lin and Wei 1991). However, after having rejected the PH hypothesis, it is not obvious how to summarize the effect of a predictor. Such a difficulty arose in our investigation of the role of prognostic factors for mortality in lupus nephritis, a rare rheumatologic disease that affects mostly young women. Eighty-seven lupus patients undergoing renal biopsy between 1967 and 1983 at the Yale-New Haven Hospital were followed since biopsy until death or the end of 1990 (Esdaile, Abrahamowicz, MacKenzie, Hayslett, and Kashgarian 1994). There were 35 deaths in the cohort. Comparison of the results of published short- and long-term studies suggested that the PH assumption may be violated in the case of variable "disease duration" (the length of period prior to biopsy when the disease was active but not treated). Indeed, in our preliminary analyses the Cox model estimates of the hazard ratio were 1.16 (p = .22) and 1.64 (p = .005) for the entire 16-year follow-up and for the initial 5-year period. Thus to describe the effect of disease duration accurately, we need to model the changes in the hazard ratio over time. Different nonparametric regression methods have been proposed to estimate the hazard ratio as a function of time (HR function). Gray (1992) and Kooperberg, Stone, and Truong (1995) used low-order regression splines (step functions and breaking lines). Verweij and Houwelingen (1995) smoothed local hazard ratios by introducing a penalty for the first difference. Hess (1994) worked with natural cubic splines. Zucker and Karr (1990) and Hastie and Tibshirani (1993) relied on smoothing splines. However, several issues of practical interest remain to be addressed. As adaptive modeling techniques are often used, the impact of model selection on the accuracy of estimation and inference must be studied. Little is known about the behavior of the proposed methods in small samples, and simulation results are scarce. It is not clear how to test the null hypothesis of no association against the compound alternative of either constant or time-dependent HR. In this article we propose a method that uses regression splines to model a time-dependent hazard ratio. Our objective is to provide a flexible tool that is simple to use yet able to yield smooth, clinically plausible estimates and reliable inference. In Section 2 we describe maximum partial likelihood estimation, model selection, and inference. We discuss in detail testing of the hypothesis of no association against a possibly time-dependent predictor effect. In Section 3 we present the results of simulations under different assumptions about the shape of the HR function. We investigate the accuracy of the estimates and of pointwise confidence intervals. We compare the model-based test of the PH assumption with two conventional tests, and evaluate the power of the proposed procedure for testing the hypothesis of no association. In Section 4 we reexamine the association between duration of untreated disease and survival in lupus nephritis and analyze a well-known primary biliary cirrhosis (PBC) data set. We summarize our findings and suggest some directions for future work in the final section. 2. ESTIMATION AND INFERENCE 2.1 Estimation We consider the following model, [Lambda](t; x) = [[Lambda].sub.0](t) exp ([summation of] [[Beta].sub.i](t)[x.sub.i] where i = 1 to p), (1) where x = ([x.sub.1], . . ., [x.sub.p]) is a vector of p covariates, [[Lambda].sub.0](t) is an unspecified baseline hazard function corresponding to x = 0, and [[Beta].sub.i](t) is the logarithm of the hazard ratio at time t corresponding to a unit increase in covariate [x.sub.i]. Model (1) generalizes the PH model as the constant log hazard ratios [[Beta].sub.i] are replaced by estimable functions of time [[Beta].sub.i](t). Of course, effects of some covariates may still be represented by a constant HR, [[Beta].sub.i](t) = [[Beta].sub.i]. We assume that non-constant functions [[Beta].sub.i](t) lie in a pre-chosen polynomial regression spline space. Regression splines are smooth piece-wise polynomials with pieces that join at knots. The spline order k (order of polynomials) determines the continuity properties of the estimate and the number of interior knots m controls its flexibility. A space of regression splines is linear, and its dimension is m + k. Thus for each i, [[Beta].sub.i](t) = [summation of] [[Alpha].sub.ij][g.sub.ij] (t) where j = 1 to [r.sub.i], (2) where [r.sub.i] is the dimension of the regression spline space for the ith covariate and [g.sub.ij] (t) are basis functions for this space. We use a B-spline basis (de Boor 1978); very similar results will be obtained with other bases. Given the expansion in (2), model (1) can be written as [Mathematical Expression Omitted], (3) where [y.sub.ij](t) = [g.sub.ij](t)[x.sub.i]. Model (3) can be considered a PH model with "artificial" time-dependent covariates [y.sub.ij](t). Thus conventional methods for time-dependent covariates in the Cox model can be used to obtain the maximum partial likelihood estimates of the regression parameters [[Alpha].sub.ij] (MacKenzie 1993). 2.2 Model Selection To specify model (1), one needs to choose a particular sequence of knots and spline order. We place interior knots so as to ensure an approximately equal number of failures in each between-knots interval (as in Abrahamowicz, Ciampi, and Ramsay 1992). This reduces the model selection problem to the choice of the spline order k and number of knots m suitable for a given application. Two approaches are possible. First, the model may be fixed a priori, to ensure the flexibility sufficient to approximate all functions of potential interest (Gray 1992; Ramsay 1988). Second, a posteriori model selection criteria such as the Akaike information criterion (AIC) (Akaike 1974) may be used to find a reasonable trade-off between model parsimony and fit to the data at hand and to reduce the risk of overfitting bias (Sleeper and Harrington 1990). We use both the a priori fixed model and the minimum AIC model selection approaches. To ensure the continuity of [Beta](t) and its first derivative, we choose quadratic splines (k = 3). Further, we use not more than two interior knots (m [less than or equal to] 2), because we do not expect the log HR function to have many local extrema and/or inflection points. We also consider a simple linear model (k = 2, m = 0) as it satisfies our continuity requirements. Thus we restrict the choice to four models: linear, quadratic polynomial, and two quadratic spline models (m = 1, 2), with 2-5 degrees of freedom. In small samples we further restrict the choice, because our experience shows that we need a minimum number of failures per degree of freedom. In our simulations, requiring that this ratio be at least 10 resulted in reasonably accurate inference. This ad hoc requirement may be modified somewhat, depending on the censoring level and/or sample size. After having defined the set of models to be considered in a given application, we use the minimum AIC to select the final a posteriori model from this set. For the fixed model, we choose the most flexible (i.e., highest degree of freedom) among these models. 2.3 Inference Confidence intervals for [Beta](t) and hypotheses tests proposed in this section are based on large-sample theory of maximum partial likelihood estimation (Andersen and Gill 1982). The theory holds if the spline basis is chosen a priori. The basis for the fixed model is data dependent only to the extent that m [less than or equal to] 2 interior knots are placed at the quantiles of the sample distribution of failure times. This should have a minor impact on the accuracy of inference. However, in the case of AIC-based modeling, the dimension of the basis is also selected a posteriori. The implications of these violations of the fixed-basis assumption will be assessed in simulations. We estimate the pointwise variance of [[Beta].sub.i](t) from the observed information matrix, [Mathematical Expression Omitted], (4) and rely on normal approximation to obtain the corresponding pointwise confidence intervals that may be used to test the effect of a predictor at a specific time. If a (1 - [Alpha]) confidence interval at time [t.sub.0] excludes the HR of 1.0, then the pointwise hypothesis of no association [Beta]([t.sub.0]) = 0 can be rejected by a two-tailed test at [Alpha] significance level. The impact of a predictor on hazard may be null, constant over time, or time dependent. Because each polynomial spline space includes both the null model [Beta](t) = 0 and the constant model [Beta](t) = C, we can use likelihood ratio (LR) tests to discriminate between different models. Three LR tests of interest are described in Table 1 and are referred [TABULAR DATA FOR TABLE 1 OMITTED] to by their respective degrees of freedom in what follows. The 1 df test is the conventional test of no association in the PH model. The (v - 1) df test is a model-based test of the PH assumption. Notice that rejection of the hypothesis [Beta](t) = C [element of] R by the (v - 1) df test implies rejection of the hypothesis of no association [Beta](t) = 0. Finally, the v df test can be used to test the hypothesis of no association when it is believed a priori that the HR may vary over time. However, if the HR is approximately constant, then the 1 df test is more powerful than either the (v - 1) df or the v df test. If both contrast HR and time-dependent HR are a priori plausible alternatives to the null, then combining two tests may be advantageous. We propose a combined testing procedure in which the hypothesis of no association is rejected whenever the 1 df test and/or the (v - 1) df test yields a significant result. The null hypothesis [Beta](t) = 0 for all t implies the conjunction of two logically independent statements: (a) the average, over t, value of [Beta](t) equals zero and (b) [Beta](t) is constant (PH assumption). In our procedure the 1 df and the (v - 1) df tests are used to assess (a) and (b) at significance levels [[Alpha].sub.1] and [[Alpha].sub.2]. Because the two tests are independent, the overall probability of type I error is simply [[Alpha].sub.1] + (1 - [[Alpha].sub.1])[[Alpha].sub.2]. We assign the respective error rates so as to minimize the loss of power, compared to the conventional 1 df test, in the arguably most plausible case when the HR is constant. For example, to ensure the overall error rate of .05, we propose to assign [[Alpha].sub.2] = .01 for the (v - 1) df test and [[Alpha].sub.1] = .04/.99 for the 1 df test. If departures from the constant HR are minor, then the 1 df test will still offer an acceptable power. But the HR changes substantially over time, then the (v - 1) df test of the PH assumption should detect such changes even with low [[Alpha].sub.2] = .01. In our simulations this combined testing procedure was more powerful than the alternative approach based on combining the 1 df and the v df tests, two tests that are not independent. 3. SIMULATIONS 3.1 Design and Data Generation Our method was evaluated in a simulation study involving some variation of sample size, model complexity, and shape of the true HR function. In all cases data were generated assuming that the marginal distribution of the survival time is exponential with 33% random censoring. (Details of the data-generating procedure are described in the Appendix.) In the "univariate simulations," where the hazard depended on a single binary predictor (P[Z = 1] = P[Z = 0] = .5), we generated samples of size 50, 150, and 300, corresponding on average to 33, 100, and 200 deaths. In the "multivariable simulation" (n = 120, 80 deaths), the regression model included four normally N[0; 1] distributed variables: a predictor of interest (Z) and three covariates. We assumed that the effect of Z may vary over time, whereas the HR for each of the three lid covariates was a priori known to be constant in time. Two covariates (true HR 1.0 and 1.5) were independent of Z, but the third covariate (HR = 2.0) was correlated with Z (Pearson r = .7). This allowed us to assess the impact of a confounder for which HR is constant on the accuracy of the estimation and inference related to a potentially time-dependent effect of Z. For each case, data were generated under five different models for the effect of Z on hazard. The "null model" corresponds to [[Beta].sub.z](t) = 0 for all t. The shapes of the four other true HR function models are represented by solid lines in Figure 1. We refer to these models as constant, rise/fall, linear inversion (O'Quigley and Pessione 1991), and exponential decay. The rise/fall model has considerable curvature over a limited time interval and will be useful for assessing the performance of our method in the case when the true HR function is too complex to be fully recovered by low-dimensionality polynomial splines. When the results do not depend on the shape of the true HR function, only selected models are reported. We generated 4,000 random samples for the null and constant models and 1,000 in other cases. Table 2. Root Mean Squared Errors: AIC-Based Versus Fixed-ModelIn each case the range of models considered for minimum AIC selection was defined following the approach of Section 2.2. In the small-sample univariate simulations (n = 50, 33 deaths), the range of models was restricted to 2 or 3 df, whereas in all other cases models with 2-5 df were considered. Accordingly, the a priori fixed model corresponded to a quadratic polynomial (3 df) and a quadratic spline with two interior knots (5 df). In preliminary simulations we used also the 5 df model to analyze small-sample data (n = 50) and found that the accuracy of inference decreased, compared to a more parsimonious 3 df model. In preliminary univariate simulations we also considered uniformly distributed survival times as well as some variations of the data-generating procedure. The results were very similar to those reported herein. 3.2 Bias and Variance of Point Estimates Figure 1 shows that although even in small samples, mean estimates for the fixed model typically agree well with the true HR functions (solid lines), AIC model selection tends to minimize the bias, in particular when the true HR is rather simple. However, our 5 df estimates cannot accurately recover considerable local curvature of the rise-and-fall model, though the general pattern of change in correctly represented. Increasing degrees of freedom or adaptive knot placement would help in reducing bias in this case. In the linear inversion and exponential decay models, the estimates of time-dependent effects are relatively unbiased even if the predictor is strongly correlated with a covariate for which the true HR is constant (data not shown). At 20 different points in time, we computed the pointwise root mean square error (RMSE) of estimates and the resulting ratio of corresponding RMSE values for AIC-based and fixed-model estimates. Table 2 shows, for different HR shapes and sample sizes, the median of 20 pointwise ratios, which is typically below 1.0, indicating that AIC-based estimates are more accurate. Minimum AIC criterion tends to select more parsimonious models, which reduces over fitting bias and stabilizes the estimates. [TABULAR DATA FOR TABLE 3 OMITTED] 3.3 Pointwise Confidence Intervals Table 3 shows empirical coverage rates (i.e., proportion of samples in which the nominal 95% pointwise confidence interval includes the true value) in the exponential decay model. For the fixed model estimates, the rates are close to the nominal .95. Results for other HR functions are similar except for the rise-and-fall model, where the rate was low due mainly to bias (data not shown). Coverage rates for the minimum AIC models are low, particularly in large samples. This reflects the inability of conventional inference to account for additional variance due to a posteriori model selection. Asymptotics-based inference assumes a fixed model, and the effect of sampling error at the model selection stage is ignored. In hypotheses testing this results in an inflation of type I error rates. In Table 3 coverage rates at P90, where the true HR equals 1.0, are "correct," which suggests that type I error rate of the pointwise test of association (see Sec. 2.3) is close to the nominal [Alpha]. 3.4 Testing Proportional Hazards Assumption We compare the proposed LR (v - 1) df test of PH assumption, based on the fixed model, with the tests described by Nagelkerke, Oosting, and Hart (1984) and by Wei (1984). Table 4 presents the proportion of samples for which the constant HR hypothesis was rejected, at [Alpha] = .05. Results for the constant model indicate that in small samples type I error rate for our test is somewhat inflated, but with 200 deaths, error rate for the fixed model-based test is very close to the nominal .05. As expected, the test is unreliable in AIC model selection, even with large samples (error rate of .109). When the true HR changes (decay and rise-and-fall models), the empirical power of the test proposed by Nagelkerke et al. is very low. Our fixed model-based test has slightly better power than Wei's test for the exponential decay model and is substantially more powerful in the rise and fall case where the HR is a nonmonotone function of time. This suggests that the test of the PH assumption based on a flexible model may be more powerful against a wider range of alternatives than some conventional tests. Table 5 shows the proportion of samples in which. the shape of the rise and fall HR function is correctly recovered based on relative positions of [Mathematical Expression Omitted] at the selected times. In larger samples the conclusions about the general pattern of changes are quite accurate. In small samples, however, the probability of the shape of [Mathematical Expression Omitted] being "correct" is higher in samples in which the PH assumption is rejected. Thus testing PH assumption may protest an analyst against a false conclusion. Table 4. Comparison of Three Tests of PH Hypothesis: Proportion of3.5 Testing the Hypothesis of No Association The hypothesis [[Beta].sub.Z] (t) = 0 for all t was tested using (a) the conventional 1 df test and two methods designed to detect associations even if HR is not constant, (b) the supremum test described in Fleming and Harrington (1990), and (c) our combined procedure (Sec. 2.3). In each case the overall type I error rate was set to .05. Table 6 shows the changes in the empirical power due to replacing the 1 df test by the alternative methods in (b) and (c). Although the absolute power and the relative power of a given testing strategy clearly depends on the magnitude and pattern of changes in the data-generating HR model, some general conclusions can be afforded. In the inversion model, where the average, over time, of the log HR is close to zero, the power of the 1 df test is very low, so that using alternative methods is essential to detect association. In that case our procedure is considerably more powerful than the supremum test, and the difference in empirical power may be as high as 13%. Notice also that our procedure can be applied even if the model includes additional covariates, whereas the supremum test is essentially univariate. In the decay and rise-and-fall models, where the log HR remains nonnegative over the entire follow-up period, the 1 df test performs reasonably well, but its power may be still about 10% lower than that of flexible alternatives in (b) and (c). Not surprisingly, when HR is constant, the 1 df test, specifically designed for that case, maximizes the power. However, the loss of power associated with alternative procedures is very minor (about 3%) and, in our opinion, is outweighed by substantial gains in the case when the HR does change over time. Table 5. Proportion of Samples Where the Rise-and-Fall Pattern was4. EXAMPLES OF DATA ANALYSIS 4.1 Duration of Untreated Disease in Lupus Nephritis We reexamine the evidence of association between duration of untreated disease and risk of lupus fatality. Because only 35 deaths occurred, the fixed model is a simple quadratic polynomial (df = 3) (see Sec. 2.2). Figure 2 shows point estimates corresponding to the fixed model (solid curve) and to the minimum AIC selection (dashed line). Both estimates indicate that the predictive ability of disease duration decreases with increasing follow-up time. Pointwise confidence intervals for the fixed model (dotted line) suggest that during the first few years of follow-up, patients with longer duration of previous disease may be at significantly higher risk. However, the hypothesis of no association cannot be rejected according to the proposed combined testing procedure. The conventional 1 df test and the (v - 1) df test of the PH assumption based on the quadratic model yield p values of .22 and .055. The v = 3 df test of the fixed model versus the null model yields a p value of .066. On the other hand, with only 35 deaths, the power is low. In this case prior considerations might help in choosing a more efficient testing strategy. If we rejected a prior the constant HR model and focused only on testing the PH assumption, then the result could be deemed almost significant at [Alpha] = .05. This example illustrates the problems typically encountered when applying powerful techniques in small samples, although the number of deaths observed in our data is actually one of the highest among all lupus nephritis studies (Esdaile et al. 1994). There are substantive grounds to explain why the strength of the association between the duration of untreated disease and risks may decrease over time. The severity of the patient's condition at time zero (biopsy) is likely to correlate with the length of the previous period when the disease was active but not treated. However, aggressive [TABULAR DATA FOR TABLE 6 OMITTED] treatment after the biopsy gradually reduces this impact. 4.2 Prothrombin Time as a Predictor of Death in PBC For a large sample illustration, we use data on 312 patients with primary biliary cirrhosis of the liver (PBC), reported by Fleming and Harrington (1990). Median follow-up time was 8 years, and 125 deaths occurred (60% censoring). Among variables measured at time zero, prothrombin time (PT) is of particular interest to us. Fleming and Harrington reject the PH assumption for PT but include it in the Cox model while acknowledging the need to remodel the data. The model should include five other covariates that are significant at [Alpha] = .05 and for which the PH assumption holds. We use model (1) to represent the effects of each of five covariates by a constant HR and to model simultaneously the time-dependent effect of PT. Figure 3 shows our fixed-model estimate of the PT effect (quadratic spline with two interior knots, solid line), adjusted for five covariates, and its 95% pointwise confidence limits (dotted line). The minimum AIC estimate (quadratic polynomial, not shown) is very similar but decreases mono-tonically in the initial phase. All three tests described in Table 1 yield p values below .001, which clearly indicates that there is an association between PT and hazard and that its strength decreases with time. Point estimate and confidence intervals suggest that the initial PT measurement maintains some predictive ability for about 3-4 years. The dashed horizontal line in Figure 3 represents the Cox model estimate of constant log HR (.27), adjusted for the same five covariates. As expected, the constant estimate is close to the average, over follow-up time, value of the time-varying HR function. However, under the Cox model, the prognostic ability of PT is considerably underestimated for early deaths and overestimated for late mortality. Kooperberg et al. (1995) also analyzed the PBC data. They used HARE, a very powerful adaptive model that allows for stepwise selection of variables and of their possibly nonlinear or time-dependent effects. Their conclusions regarding the pattern of changes in the PT effect are very similar to ours. By providing reasonably reliable significance testing and pointwise confidence intervals, our method may complement the results of more complex analyses based on adaptive models. The prognostic utility of PT measured at time zero decreases with increasing follow-up time. One explanation could be that the PT values in individual patients change over time, so that their correlation with the initial values may gradually decrease. This might weaken the effect of the initial PT even if the actual impact of updated PT remained constant over time. We cannot reject this explanation without measuring PT repeatedly and estimating the effect of the resulting time-dependent covariate. However, this explanation does not apply to fixed-in-time covariates, such as duration of lupus prior to biopsy. 5. DISCUSSION By using regression splines that share several properties of parametric models to estimate changes in the hazard ratio over time, we develop a method that requires only tools used conventionally in maximum likelihood estimation. We have proposed a method for generating data suitable for simulations aimed at evaluating maximum partial likelihood estimates. The results of simulations indicate that unless there is substantial local curvature, our estimates of HR functions are relatively unbiased even in small samples. AIC-based model selection may further improve the accuracy of the estimates. However, the AIC-based approach is less useful when inference is of primary interest, as standard large-sample theory cannot account for the effect of sampling error at the model selection stage. If the model is fixed a priori, then the inference is accurate unless the model is biased or data are very scarce. The reliability of pointwise confidence intervals may enhance the interpretation of results. Our model-based test of the PH assumption is quite powerful, especially when the HR is a nonmonotone function of time. We proposed a procedure for combined testing of the hypothesis of no association in the case when both constant and time-varying HR are a priori plausible. Compared to the Cox model-based test, this may offer a substantial gain in power when HR changes over time, whereas the loss of efficiency in the constant HR case is minor. Our method allows for simultaneous estimation of the effects of variables for which the PH assumption holds and those for which it is violated. In many applications there may be a priori interest in modeling and testing the time-dependent effect of a single predictor when adjusting for covariates whose effects are a priori known to be constant over time. For example, a study may focus on the effectiveness of a new treatment or the impact of a newly postulated aetiologic factor while adjusting for covariates well studied in the past. In the multivariable simulations, where time dependence of the effect was restricted to a single predictor, we found that estimation and inference are reasonably accurate even if the predictor is strongly correlated with a covariate whose effect is constant over time. However, a more complex case, in which it is not known a priori which among the effects of several correlated variables are time-varying and which are constant, remains to be studied. The issue of simultaneous estimation of nonlinear and time-dependent effects of the same predictor also warrants further study. This would require modeling the HR, corresponding to different combinations of time and predictor values, as a flexible bivariate response surface. If the predictor has strong impact, so that almost all individuals with high values die early, then there may be not enough data to estimate a portion of the response surface. Gray (1992) developed elegant spline-based tools to model nonlinear or time-dependent effects of covariates but did not discuss how to represent both types of effects for the same covariate. Although HARE allows for simultaneous modeling of these effects, the impact of adaptive model building on the variance of the estimates remains to be quantified (Kooperberg et al. 1995). Quantifying the variance inflation due to a posteriori model selection is of more general interest given the increasing popularity of adaptive methods in conventional and nonparametric regression, but no satisfactory analytical solutions have been proposed (Hurvich and Tsai 1990). Until the problem of combining adaptive modeling with reliable inference is solved, it will be necessary to strike a compromise between the desires to model all potentially interesting features of the data and to make reliable inference about the effects being modeled. APPENDIX: DATA GENERATION ALGORITHM In simulations where hazard is assumed to depend on covariates, it is customary to first generate covariate values [x.sub.i] and then generate survival times from respective conditional survival functions S(t[where][x.sub.i]) = exp[-[Lambda](t[where][x.sub.i])]. However, in our model it is difficult to specify the conditional cumulative hazard functions A(t[where]x). This difficulty arises because in general, given x [not equal to] 0, the integral [Lambda](t[where]x) = [integral of] exp [[Beta](u)x][[Lambda].sub.0] (u) du between limits t and 0 cannot be solved analytically even in the simple case of an exponential baseline hazard [[Lambda].sub.0] (u) = A for all u. (Exceptions that we are aware of are when log hazard ratio function [Beta](u) is linear and when [Beta](u) = ln(a + u).) Thus neither S([t.sub.i][where][x.sub.i]) nor f([t.sub.i][where][x.sub.i]) can be evaluated without numerical integration. This will make conventional data generation procedures computationally expensive. Several strategies could be investigated, including a possibility of extending to the censored data case the acceptance/rejection method proposed by Lewis and Shedler (1979). However, the goal of our simulations is restricted to evaluating the estimates of hazard ratio obtained by maximizing partial likelihood that does not require parameterization of the baseline or conditional distribution. Therefore, we propose a computationally inexpensive data generation procedure that avoids specifying conditional distributions while ensuring that the data are generated according to the same partial likelihood on which estimation is based. The procedure involves generating survival times from a marginal distribution (rather than from distributions conditional on specific covariate values) and then assigning these times to independently generated covariate values by sampling from a permutation distribution. More specifically, we first generate an n-tuple of failure/censoring times [Mathematical Expression Omitted] such that [t.sub.1] [less than or equal to] . . . [less than or equal to] [t.sub.n] from the marginal distribution and generate, independently of times [t.sub.i], an n-tuple of covariate values [Mathematical Expression Omitted]. Then, given a function [Beta](t), we form a data set of n "observations," [Mathematical Expression Omitted], by sampling a permutation [Sigma] from the set of all n! possible permutations according to the probability [Mathematical Expression Omitted]. (A.1) The algorithm includes the following steps: 1. Generate covariate values, [x.sub.1], . . ., [x.sub.n] from d(x). 2. Generate iid [Mathematical Expression Omitted] from f(T) and iid [Mathematical Expression Omitted] from g(C). 3. For i = 1 to n: [t.sub.i] = min{[T.sub.i], [C.sub.i]}. If [t.sub.i] = [T.sub.i], then [[Delta].sub.i] = 1; otherwise, [[Delta].sub.i] = 0. 4. Sort [Mathematical Expression Omitted] so that [t.sub.i] [less than or equal to] [t.sub.i+1]. 5. R = {1, 2, . . ., n} and data = [null set]. 6. For i = 1 to n: a. If [[Delta].sub.i] = 0, sample j from R with equiprobability. b. If [[Delta].sub.i] = 1, sample j from R with probability exp[[Beta]([t.sub.i])[x.sub.j]]/[summation over k [element of] R] exp[[Beta]([t.sub.i])[x.sub.k]]. c. [Sigma](i) = j, add {[t.sub.i], [[Delta].sub.i], [x.sub.[Sigma](i)])} to data. d. R = R - {j}. Although sampling from a permutation distribution means that the n observations are not independent, the dependence of observations become negligible as sample size increases. In our simulations the distributions of failure times f(T) and censoring times g(C) are exponential with means 1 and 2, which yields 33% censoring. The choices for covariate distribution d(x) and log hazard ratio function [Beta](t) are discussed in Section 3.1. REFERENCES Abrahamowicz, M., Ciampi, A., and Ramsay, J. O. (1992), "Nonparametric Density Estimation for Censored Survival Data: Regression-Spline Approach," Canadian Journal of Statistics, 20, 171-185. Akaike, H. (1974), "A New Look at Statistical Model Identification," IEEE Transactions on Automatic Control, AC-19, 716-723. Andersen, P. K., and Gill, R. D. (1982), "Cox's Regression Model for Counting Processes: A Large Sample Study," The Annals of Statistics, 10, 1100-1120. Cox, D. R. (1972), "Regression Models and Life Tables" (with discussion), Journal of the Royal Statistical Society, Set. B, 34, 187-220. de Boor, C. (1978), A Practical Guide to Splines, New York: Springer. Esdaile, J. M., Abrahamowicz, M., MacKenzie, T., Hayslett, J.P., and Kashgarian, M. (1994), "The Time-Dependence of Long-Term Prediction in Lupus Nephritis," Arthritis and Rheumatism, 37, 359-368. Fleming, T., and Harrington, D. (1990), Counting Processes and Survival Analysis, New York: Wiley. Gray, R. J. (1992), "Flexible Methods for Analyzing Survival Data Using Splines, With Applications to Breast Cancer Prognosis," Journal of the American Statistical Association, 87, 942-951. Hastie, T. J., and Tibshirani, R. J. (1993), "Varying-Coefficient Models" (with discussion), Journal of The Royal Statistical Society, Ser. B, 55, 757-796. Hess, K. R. (1994), "Assessing Time-By-Covariate Interactions in Proportional Hazards Regression Models Using Cubic Spline Functions," Statistics in Medicine, 13, 1045-1062. Hurvich, C. M., and Tsai, C. L. (1990), "The Impact of Model Selection on Inference in Linear Regression," The American Statistician, 44, 214-217. Kooperberg, C., Stone, C. J., and Truong, Y. K. (1995), "Hazard Regression," Journal of the American Statistical Association, 90, 78-94. Lewis, P. A., and Shedler, G. S. (1979), "Simulation of Nonhomogeneous Poisson Processes by Thinning," Naval Research Logistics Quarterly, 26, 403-413. Lin, D. Y., and Wei, L. J. (1991), "Goodness-of-Fit Tests for the General Cox Regression Model," Statistica Sinica, 1, 1-17. MacKenzie, T. (1993), "Modeling a Time-Dependent Hazard Ratio With Regression Splines," master thesis, McGill University, Dept. of Mathematics and Statistics. Nagelkerke, N. J. D., Oosting, J., and Hart, A. A. M. (1984), "A Simple Test for Goodness of Fit of Cox's Proportional Hazards Model," Biometrics, 40, 483-486. O'Quigley, J., and Pessione, F. (1991), "The Problem of a Covariate-Time Qualitative Interaction in a Survival Study," Biometrics, 47, 101-115. Ramsay, J. O. (1988), "Monotone Regression Splines in Action" (with discussion), Statistical Science, 3, 425-461. Sleeper, L. A., and Harrington, D. P. (1990), "Regression Splines in the Cox Model With Application to Covariate Effects in Liver Disease" Journal of the American Statistical Society, 85, 941-949. Verweij, J. M., and Houwelingen, H. C. (1995), "Time-Dependent Effects of Fixed Covariates in Cox Regression" Biometrics, 51, 1550-1556. Wei, L. J. (1984), "Testing Goodness of Fit for Proportional Hazards Model With Censored Observations," Journal of the American Statistical Society, 79, 649-652. Zucker, D. M., and Karr, A. F. (1990), "Nonparametric Survival Analysis With Time-Dependent Covariate Effects: A Penalized Partial Likelihood Approach" The Annals of Statistics, 18, 329-353. Michal Abrahamowicz is Associate Professor, Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada H3A 1A2, and Division of Clinical Epidemiology, Department of Medicine, Montreal General Hospital, Montreal, Quebec, Canada H3G 1A4. Todd MacKenzie is a Ph.D. candidate, Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada H3A 2K6. John M. Esdaile is Professor, Department of Medicine, McGill University, Montreal, Quebec, Canada H3G 1Y6. M. Abrahamowicz and J. M. Esdaile are Senior Research Scholars of the Fonds de Recherche sur la Sante du Quebec. This work was supported in part by the Natural Sciences and Engineering Research Council, Dairy Farmers of Canada, and the Canadian Arthritis Society. The authors are indebted to Roxane du Berger, Antonio Ciampi, Sungsub Choi, Ming-Gao Gu, Jim Ramsay and three anonymous reviewers for their helpful comments. -1- |
To continue reading this publication, you must have a Questia Subscription.Questia provides the world's largest online library of scholarly books and journal articles, with integrated footnote and bibliography tools, highlighting, note taking and book marking. With a Questia subscription, you'll have access to the full text of more than 67,000 books and 1.5 million articles.