Style: APA

#### Cited page

Citations are available only to our active members. Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

Display options
Reset

# Computational Methods for Multiplicative Intensity Models Using Weighted Gamma Processes: Proportional Hazards, Marked Point Processes, and Panel Count Data

By: Ishwaran, Hemant; James, Lancelot F. | Journal of the American Statistical Association, March 2004 | Article details

# Computational Methods for Multiplicative Intensity Models Using Weighted Gamma Processes: Proportional Hazards, Marked Point Processes, and Panel Count Data

Ishwaran, Hemant, James, Lancelot F., Journal of the American Statistical Association

1. INTRODUCTION

Aalen (1975, 1978) developed a unified theory for nonparametric inference in multiplicative intensity models from a frequentist perspective. This treatment included, for example, the life-testing model, the multiple decrement model, birth and death processes, and branching processes. A Bayesian treatment for the real line was given by Lo and Weng (1989), who modeled hazard rates in the multiplicative intensity model as mixtures of a known kernel k with a finite measure [mu] modeled as a weighted gamma measure on the real line. That is, a hazard r is modeled as

r(x|[mu]) = [[integral].sub.R] k(x, v) [mu](dv). (1)

Lo and Weng (1989) showed that using arbitrary kernels provides the user with a great deal of flexibility; for instance, the choice of the kernel k(x, v) = I{v [less than or equal to] x} gives monotoneincreasing hazards as considered by Dykstra and Laud (1981), whereas kernels k(x, v) = I{|x - a| [greater than or equal to] v} and k(x, v) = I{|x - a| [less than or equal to] v} give U-shaped hazards (with minimum and maximum at a) similar to those of Glaser (1980), and normal density kernels k(x, v) = exp(-.5(x - v)[.sup.2]/[[tau].sup.2])/square root of (2[pi][[tau].sup.2]) can be used to estimate hazards without shape restriction.

In this article we develop a general approach to Bayesian inference for hazard (intensity) rates in nonparametric and semiparametric multiplicative intensity models by incorporating kernel mixtures of spatial weighted gamma process priors. This approach extends the work of Lo and Weng (1989) and Dykstra and Laud (1981) from a nonparametric setting on the real line to the nonparametric and semiparametric settings over general spaces and applies to the nonparametric multiplicative intensity models considered by Aalen (1975, 1978) and their semiparametric extensions developed by Andersen, Borgan, Gill, and Keiding (1993, chap. III). Models that fall within this framework that have been considered using gamma and weighted gamma processes from a Bayesian perspective include Markov models used in survival analysis subject to certain types of censoring, filtering, and truncation (Arjas and Gasbarra 1994; Laud, Smith, and Damien 1996; Gasbarra and Karia 2000), as well as Poisson point process models used in reliability (Kuo and Ghosh 1997), forest ecology (Wolpert and Ickstadt 1998a), and health exposure analysis (Best, Ickstadt, and Wolpert 2000). Another related application was given by Ibrahim, Chen, and MacEachern (1999) who used a weighted gamma process to select variables in Cox proportional hazards models.

A major contribution of this article is to develop a unified computational treatment of these problems from a Bayesian perspective. As was shown by Lo and Weng (1989) (see also Lo, Brunner, and Chan 1996) the posterior for multiplicative intensity models under weighted gamma processes share common structural features with posterior distributions for models subject to the Dirichlet process (i.e., Dirichlet process mixture models). (See Lo 1984 for background and posterior descriptions of Dirichlet process mixture models.) Recently, James (2003) extended these results to an abstract semiparametric setting (see Sec. 3), thus providing explicit calculus for relating posteriors for spatial semiparametric intensity models to posteriors for Dirichlet process mixture models. This equivalence, in combination with our use of a weighted gamma process approximation (Sec. 3), allows us to use efficient Dirichlet process computational procedures and to approximate the laws for general functionals of interest. An important aspect is that we avoid ad hoc methods used to approximate likelihoods. For example, in survival analysis problems we do not discretize time, as is often done to simplify computations. Another nice benefit of using Dirichlet process methods is that they are well understood and have a rich literature that can be drawn on. For example, the number of iterations and burn-in iterations suggested for Gibbs sampling Dirichlet process mixture models, and other such practical experience, can all be applied here. Moreover, computational strategies that have been developed to handle problems such as nonconjugacy and slow Markov chain mixing all have natural analogs in our setting. Handling nonconjugacy is especially relevant, because it will allow us to deal with complicated terms that can appear in our posteriors. Slow mixing is also an important consideration. Here we deal with it by adapting acceleration steps used to improve mixing in Dirichlet process problems. Computational methods that we consider include Polya urn Gibbs samplers (Escobar 1988, 1994; Escobar and West 1995) and blocked Gibbs sampling methods (Ishwaran and Zarepour 2000; Ishwaran and James 2001).

The article is organized as follows. Section 2 presents the multiplicative intensity likelihood and defines the weighted gamma process. Applications to proportional hazards, nonhomogeneous Poisson process and marked processes used in spatial regression models are discussed. Sections 3 and 4 present the details of our Polya urn Gibbs sampler and blocked Gibbs sampler. The methods are illustrated by applications to a long-term follow-up study of heart rate recovery and mortality (Sec. 3) and to simulations from a nonhomogeneous Poisson spatial process (Sec. 4). Section 2 also discusses applications to recurrent event data arising from conditionally independent nonhomogeneous Poisson processes. Section 5 illustrates this extension to the panel count data problem (Kalbfleish and Lawless 1985; Sun and Kalbfleish 1995), a generalization of the interval-censoring problem. Using a Poisson process model discussed by Wellner and Zhang (2000), we show how one can obtain smoothed kernel estimates for the intensity, thus providing a novel Bayesian nonparametric approach complementary to the iterative convex minorant algorithm used by frequentists (Wellner and Zhang 2000).

2. SEMIPARAMETRIC MULTIPLICATIVE INTENSITY LIKELIHOODS

Throughout, we work with a multiplicative intensity likelihood of the form

L([mu], [theta]) = exp{-[(n+m).summation over (i=1)] [[integral].sub.S] [[[integral].sub.X] [Y.sub.i](x)[k.sub.i] (x, v, [theta])[eta](dx)][mu](dv)} X [n.[product].[i=1]] [[integral].sub.S] [k.sub.i] ([X.sub.i], [v.sub.i], [theta])[mu](d[v.sub.i]), (2)

where [mu] is a finite measure over a measurable space (S, A), the value [theta] is a Euclidean parameter with parameter space [THETA] (in the applications considered here, [THETA] = [R.sup.d]), and X is the sample space for the data [X.sub.1],..., [X.sub.n+m]. (In right-censoring survival analysis problems, n and m denote the sample sizes for the failure and censored times; however, the values for n and m vary depending on the specific application.) The functions [k.sub.i] (x, v, [theta]) in (2) are known nonnegative kernels that are jointly measurable in (x, v, [theta]) and integrable with respect to [mu] and [eta], where [eta] is some fixed [sigma]-finite measure. Their role will be to smooth the unknown hazard (intensity) function via a mixing approach similar to (1).

Censoring and more general types of filtration are captured by the functions [Y.sub.i](x) appearing in (2), which in the counting process literature are usually called predictable functions. In many applications in event history analysis, [Y.sub.i](x) records whether an individual is still at risk just before time x. A key point is that using predictable functions allows the likelihood (2) to retain the same structure under different types of filtration, thus presenting a unified approach for studying the multiplicative intensity model (see Andersen et al. 1993, chap. III). In right censoring (see Sec. 2.2), [X.sub.1],..., [X.sub.n] denote observed failure times and [X.sub.n+1],..., [X.sub.n+m] denote censored times. In the case of Poisson spatial processes, considered in Section 2.3, all observations are observed, so that m = 0. Setting [Y.sub.i](x) = 1/n yields a likelihood (2) for a nonhomogeneous Poisson spatial process based on a single realization. More complex marked processes are also possible, as discussed in Section 2.3. Our methods can also be extended to more general likelihoods that arise from conditionally independent nonhomogeneous Poisson processes. We motivate this idea briefly in Section 2.4 for recurrent event data, and in Section 5 we illustrate the approach in depth for panel count data.

2.1 Missing Spatial Data and Weighted Gamma Processes

Let v = ([v.sub.1],..., [v.sub.n]). Note that the integrals on the right side of (2) index v by a subscript of i, even though this is notationally redundant. This is done to emphasize that v is thought of as missing spatial data. The idea is to augment the parameter space and the likelihood function (2) to include these missing data. Placing a prior on the parameters [theta], v, and [mu] induces a posterior, which we can then use to compute various quantities, including estimates for the smoothed intensity function. The key is the prior for (v, [mu]), which is assumed to have a joint product measure such that [v.sub.1],..., [v.sub.n], given [mu], represent n conditionally independent realizations from the random finite measure [mu], where [mu] has a weighted gamma process law. Let [pi] (d[theta]) denote our prior for [theta]. We assume a prior on ([theta], v, [mu]) with the joint product measure

[pi](d[theta]) [n.[product].[i=1]] [mu](d[v.sub.i])G(d[mu]|[alpha], [beta]).

The expression G(*|[alpha], [beta]) denotes a weighted gamma process law with shape parameter [alpha] (a finite measure over S) and scale parameter [beta] (a positive integrable function over S). That is, for each Borel-measurable set A [member of] A, the random measure [mu], defined by

[mu](A) = [[integral].sub.A] [beta](s)[[gamma].sub.[alpha]](ds),

is said to have a G(*|[alpha], [beta]) law, where [[gamma].sub.[alpha]] is a gamma process over S with shape measure [alpha]. We call [[gamma].sub.[alpha]] a gamma process with shape [alpha] if [[gamma].sub.[alpha]] (A) is a gamma([alpha](A)) random variable with mean [alpha](A) and variance [alpha](A).

Remark 1. The gamma process was described in an early paper by Moran (1956). A more complete description was given by Ferguson and Klass (1972), who discussed gamma processes over R, and Kingman (1975), who provided a description over arbitrary measurable spaces. Lo (1982) described weighted gamma processes over Polish spaces, and Dykstra and Laud (1981) provided a description over R.

It is important to keep in mind that [mu] is an arbitrary finite measure and thus is not necessarily a probability measure. Because of this, what the resulting posterior for (2) will look like is not at all obvious. For example, it is not even clear whether the posterior values for [v.sub.1],..., [v.sub.n] can be associated with a proper probability distribution, which is something we would need if we wanted to use Monte Carlo methods to approximate posterior quantities related to v. In familiar Dirichlet process mixture models, these kinds of connections follow automatically because we always work with proper probability measures. It might be guessed, though, that these same features apply here, due to the intimate connection between the gamma process and the Dirichlet process. It is well known that if P(*|[alpha]) is a Dirichlet process law with a finite measure parameter [alpha] (Ferguson 1973, 1974), then P(*|[alpha]) can be expressed as a normalized gamma process, that is, the random probability measure

P(*) = [[[gamma].sub.[alpha]](*)]/[[[gamma].sub.[alpha]](S)] (3)

has a P((*)|[alpha]) law. Thus it stands to reason that there must be an intimate connection between Dirichlet process mixture models and multiplicative intensity posteriors derived from weighted gamma process priors. This is, of course, only an informal argument. The exact details are more subtle, as we spell out specifically in Section 3. Before going into these details, however, we provide some motivating examples.

2.2 Proportional Hazards

The Cox regression model (Cox 1972) is an important example of a multiplicative intensity model of the form (2). Under independent right censoring, the Cox proportional hazards likelihood may be written as

[[n+m].[product].[i=1]] ([r.sub.0]([T.sub.i]|[mu]) exp([[theta].sup.T][Z.sub.i]))[.sup.[DELTA].sub.i] X exp{-[[integral].sub.R] [Y.sub.i](t)[r.sub.0](t|[mu]) exp([[theta].sup.T] [Z.sub.i])[eta](dt)}, (4)

where [T.sub.i] are failure times, [[DELTA].sub.i] = I{[T.sub.i] [less than or equal to] [C.sub.i]} is a 0-1 indicator function indicating whether censoring occurred at times [C.sub.i], the [Z.sub.i]'s are covariate vectors with parameter vector [theta], and [Y.sub.i](t) = I{[T.sub.i] [greater than or equal to] t} is the predictable function. To produce a smoothed estimate for the hazard, we model the unknown baseline hazard function [r.sub.0](t|[mu]) as a mixture of the form

[r.sub.0](t|[mu]) = [[integral].sub.S] [k.sub.0](t, v)[mu](dv), (5)

where [k.sub.0](t, v) is some prespecified kernel. Typically, S [??] [R.sup.+], with the exact choice depending on the selected kernel and the specific problem. Now to express (4) in the form of (2), let [X.sub.i] = [T.sub.i], for i = 1,..., n + m, and set X = R. If [T.sub.1],..., [T.sub.n] are the uncensored observed failure times, then (4) can be expressed in the form of (2) if

[k.sub.i](t, v, [theta]) = exp([[theta].sup.T][Z.sub.i])[k.sub.0](t, v). (6)

Observe that the underlying hazard is modeled so that it is a kernel mixture,

r(t|Z, [theta], [mu]) = exp([[theta].sup.T] Z) [[integral].sub.S] [k.sub.0](t, v)[mu](dv).

Kalbfleisch (1978) presented one of the earliest Bayesian approaches for estimating the Cox model based on a gamma process. In this method the baseline cumulative hazard function is modeled as a gamma process. Time is then discretized by chopping the time axis into a collection of nonoverlapping intervals, which, due to the use of a gamma process, results in a cumulative hazard function that can then be modeled as a collection of independent gamma random variables (see also Burridge 1981). In more recent work, Ibrahim et al. (1999) took a different approach and modeled the baseline hazard function as a mixture (5), where [mu] has a weighted gamma process prior. This method applies to kernels of the form

[k.sub.0](t, v) = I{v [less than or equal to] t} (7)

and leads to smoothed increasing hazard functions similar to those of Dykstra and Laud (1981). To sample the posterior, Ibrahim et al. (1999) constructed a refined partition of the time axis and worked with the resulting approximate likelihood of (4) (see also Laud et al. 1996).

However, in Section 3 we take a different approach, showing, by exploiting an equivalence to Dirichlet process mixture models, that the Cox posterior under a weighted gamma process can be sampled using a Polya urn Gibbs sampler without requiring any approximation to the likelihood or the prior. This is especially advantageous with large datasets containing, say, thousands of observations (see the example of Sec. 3.3), where with a discretized approach it becomes tricky to select a suitably refined partition of the time axis while keeping computations manageable. Moreover, this approach also applies to more general kernels, and allows us to estimate smoothed hazard functions with or without imposed shape restrictions.

2.3 Poisson Process Spatial Regression Models

Lo and Weng (1989) discussed the use of a weighted gamma process for estimating the intensity of a nonhomogeneous Poisson process (see also Kuo and Ghosh 1997). This approach also falls within our framework. Suppose that [X.sub.1],..., [X.sub.n] are the observed points from one realization of a Poisson random measure, PRM([LAMBDA]), with mean intensity [LAMBDA]. Assume that the derivative of [LAMBDA] exists, that is,

[LAMBDA](dx|[theta], [mu]) = [eta](dx) [[integral].sub.S] [k.sub.0](x, v)[mu](dv),

where [k.sub.0] is some positive integrable kernel but [mu] is unknown. The likelihood function is

L([mu]) = exp{-[[integral].sub.S][[[integral].sub.X][k.sub.0](x, v)[eta](dx)][mu](dv)} X [n.[product].[i=1]] [[integral].sub.S] [k.sub.0]([X.sub.i], [v.sub.i])[mu](d[v.sub.i]).

This is equivalent to (2) after setting [k.sub.i](x, v, [theta]) = [k.sub.0](x, v), [Y.sub.i](x) = 1/n, and m = 0. (See Snyder and Miller 1991, chap. 2.5, for background on multidimensional Poisson processes and their likelihoods.)

In many statistical settings, in addition to the locations [X.sub.1],..., [X.sub.n] of the observed points from the point process, covariate information [Z.sub.1],..., [Z.sub.n] is also observed. In such studies the goals are to discover the underlying mechanism producing the spatial counts and to explore its dependence on covariates. Such problems can be naturally subsumed within our multiplicative intensity framework, providing an important extension to the method of Lo and Weng (1989) and leading to what can broadly be considered a class of semiparametric Poisson spatial regression models.

A good situation illustrating the potential of such models is a spatial setting in which spatial count data has been aggregated at a macroregional level. Disease mapping or aggregate spatial epidemiology studies that analyze count data are good illustrative examples. In such studies the data comprise the observed counts [N.sub.i] of a rare event, such as an incident of a rare disease or a death, within a particular fixed geographical region [R.sub.i] for i = 1,..., m. There may also