Academic journal article Political Research Quarterly

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

Academic journal article Political Research Quarterly

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

Article excerpt

In recent years political scientists have become increasingly sensitive to questions of conditional dependence in their data. I outline and compare two general, widely-used approaches for addressing such dependence-robust variance estimators and generalized estimating equations (GEEs)-using data on votes in Supreme Court search and seizure decisions between 1963 and 1981. The results make clear that choices about the unit on which data are grouped, i.e., clustered, are typically of far greater significance than are decisions about which type estimator is used.

(ProQuest Information and Learning: ... denotes formulae omitted.)

(ProQuest Information and Learning: ... denotes obscured text omitted.)

Regression and regression-like models form the basis of most quantitative work in the social sciences. In the cross-sectional realm, advances in such models have largely occurred through the development of models for response variables which do not conform to standard linear-normal assumptions, including binary, nominal, ordinal, and count data. In recent years, however, scholars have increasingly begun to move beyond the form taken by the dependent variable to consider other issues in their data analyses. One of the most prominent has been violation of the exchangeability assumption, that is, data where, conditional on the influence of the model's covariates, values of the response variable are not independently and identically distributed (King 2001; McCullagh 2004). Such a situation may be especially likely to arise when data are grouped or clustered; examples include dyadic data (e.g., Hojnacki and Kimball 1998; Oneal and Russett 1999) or panel/time-series cross-sectional data with repeated observations on units (Stimson 1985).

While a number of approaches to dealing with such possible dependence exist, two merit special attention because of their generality and predominance in the field. The first, commonly referred to as robust standard errors,1 is a general means of empirically correcting variance-covariance estimates in the presence of heteroscedasticity, clustering, and other forms of conditional dependence. The second, the method of generalized estimating equations (GEE) (Liang and Zeger 1986), offers researchers the benefits of asymptotically-consistent variance-covariance estimates when data are nonexchangeable, even when the precise nature of that dependence is unknown. Both methods have seen increasing use by applied researchers in recent years, yet most know little about their respective properties and even less about how to make an informed choice on their use.

The purpose of this article is to begin to address this lacuna. I begin with an outline and comparison of the development and general characteristics of these two approaches. I then illustrate, using a realistic example from the literature unjudicial politics, the potential benefits and tradeoffs associated with each of the methods, with particular focus on how decisions over model specification and choice of clustering unit can influence one's results. I conclude with some general guidelines for applied researchers about choosing among these methods.


Consider at the outset a basic regression-like model, in which a N × 1 vector Y of responses is modeled as some stochastic function of k covariates X (typically including a constant term) and a vector of disturbances u:

Y = f(Xβ) + u (1)

This general model subsumes a number of commonly used alternatives, including exponential-family GLMs (McCuIlagh and Nelder 1989) as well as panel, time-series crosssectional, and other grouped data models. In a likelihoodbased estimation framework (e.g., King 1989), we can derive the log-likelihood for (1) by selecting f (.) and making a distributional assumption about u. Under the usual regularity conditions (that is, given a properly specified model and conditionally independent, identically distributed observations), one can then obtain a consistent estimate of the variance of the estimated parameter vector . …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.