Testing for Overdispersion in Poisson and Binomial Regression Models

Article excerpt

Count data analyzed under a Poisson assumption or data in the form of proportions analyzed under a binomial assumption often exhibit overdispersion. Breslow (1984), Brillinger (1986), Lawless (1987 a) and McCullagh and Nelder (1989, Sec. 6.2) discuss the analysis of count data when extra-Poisson variation is present; corresponding types of analyses for data in the form of proportions are given in Crowder (1985, 1978) and Williams (1982). It is desirable to use a model that allows for the possibility of extra-Poisson or extra-binomial variation if we are interested primarily in inference concerning regression parameters and if the situation is one in which overdispersion routinely occurs. Many times, however, the Poisson and binomial models remain valid; because of the simplicity and appeal of these models, it is of real interest to ascertain when they apply. Testing for extra-Poisson or extra-binomial variation by fitting a more comprehensive model that includes the Poisson or binomial, and then testing for a reduction to the simple model using, for example, a likelihood ratio test, may provide misleading results. Lawless (1987a) notes that in certain circumstances the asymptotic distributions used with these tests may be unreliable, as they tend to underestimate the evidence against the base model. Score tests for overdispersion also have been proposed. With these tests we may fit the Poisson or binomial model as a first step in the model building process and test for overdispersion. Score tests for detecting extra-Poisson variation have been discussed by Cameron and Trivedi (1986), Collings and Margolin (1985), Dean and Lawless (1989), and Fisher (1950); tests for extra-binomial variation have been presented by Tarone (1979) and Prentice (1986).

This paper develops a unifying theory for all the score tests mentioned above. The test for overdispersion is derived in Section 1. The tests above are special cases of this generalized overdispersion test. Many other tests for overdispersion may be obtained using the general framework; the construction of these tests depends on the mechanism through which the overdispersion may arise. Omnibus tests are also discussed. Asymptotic distributions of the tests are given and, in the binomial case, the tests of Tarone and Prentice are extended from a consideration of the single sample problem to the general regression setup. The tests derived are designed to be powerful against arbitrary alternative distributions where only the first two moments are specified. In Sections 2 and 3, we focus on tests for extra-Poisson and extra-binomial variation, respectively. Section 4 offers some miscellaneous comments.

1. DERIVATION OF OVERDISPERSION TESTS

Consider the natural exponential family with probability density function

f(Y; [theta]) = exp {a([theta]) Y - g(theta) + c(Y)}, (1.1)

where Y represents the response variable and B is an unknown parameter on which the distribution of Y depends. This family includes both the Poisson and binomial distributions. Let [Y.sub.1], ..., [Y.sub.n] be a sample of independent observations, [Y.sub.i] hypothesized to be from (1.1), with the corresponding [[theta].sub.i] a function of a p X 1 vector of covariates [x.bar.sub.i], and regression parameters [[beta].bar]; that is, [[theta].sub.i], = [[theta].sub.i] [x.bar.sub.i]; [beta]), i = 1, ..., n. The mean and variance of [y.sub.i] are

E([Y.sub.i]) = [[mu].sub.i] ([[theta].sub.i]) = {[a'.sub.i]([[theta].sub.i])}.sup.-1]} [g'sub.i]([[theta].sub.i]),

var([Y.sub.i]) = [[sigma].sub.i.sup.2]([[theta].sub.i])

= [{[a'.sub.i]([[theta].sub.i])}.sup.2] {[gsub.i]([[theta].sub.i]) - [a"sub.i]([[theta].sub.i]) E([Y.sub.i])}, (1.2)

where 'denotes differentiation with respect to [[theta].sub.i].

To derive a test for the adequacy of the model (1.1), we shall first construct a larger "overdispersed" family of models, for which var([y. …