Statistical Methods for Linking Health, Exposure, and Hazards
Mather, Frances Jean, White, LuAnn Ellis, Langlois, Elizabeth Cullen, Shorter, Charles Franklin, Swalm, Christopher Martin, Shaffer, Jeffrey George, Hartley, William Ralph, Environmental Health Perspectives
The Environmental Public Health Tracking Network (EPHTN) proposes to link environmental hazards and exposures to health outcomes. Statistical methods used in case-control and cohort studies to link health outcomes to individual exposure estimates are well developed. However, reliable exposure estimates for many contaminants are not available at the individual level. In these cases, exposure/hazard data are often aggregated over a geographic area, and ecologic models are used to relate health outcome and exposure/hazard. Ecologic models are not without limitations in interpretation. EPHTN data are characteristic of much information currently being collected--they are multivariate, with many predictors and response variables, often aggregated over geographic regions (small and large) and correlated in space and/or time. The methods to model trends in space and time, handle correlation structures in the data, estimate effects, test hypotheses, and predict future outcomes are relatively new and without extensive application in environmental public health. In this article we outline a tiered approach to data analysis for EPHTN and review the use of standard methods for relating exposure/hazards, disease mapping and clustering techniques, Bayesian approaches, Markov chain Monte Carlo methods for estimation of posterior parameters, and geostatistical methods. The advantages and limitations of these methods are discussed. Key words: Bayesian modeling, data linkage, exposure, GIS, hazards, health outcome data, statistical methods. Environ Health Perspect 112:1440-1445 (2004). doi:10.1289/ehp.7145 available via http://dx.doi.org/[Online 3 August 2004]
The environment plays an important role in health and human development. Acute effects from exposure to environmental contaminants, such as pesticide poisoning, are well recognized, but the environmental link to most chronic diseases remains unclear. Researchers have linked exposure to specific environmental hazards with a health effect, such as benzene and leukemia. Other associations are suspect, such as exposure to mixtures of drinking water disinfection by-products and bladder cancer. In other cases, linkages between environmental agents, individually or as mixtures, and health outcomes lack epidemiologic evidence and are postulated from laboratory animal studies. The Pew Environmental Health Commission (2000) calls the lack of information linking environmental hazards and chronic disease the "environmental health gap." To address this gap, the Centers for Disease Control and Prevention (CDC) established the National Environmental Public Health Tracking Network (EPHTN; CDC 2003a), which is developing the infrastructure, resources, and methods for assembling and using available environmental hazard, exposure, and health outcome data (HOD). This initiative presents great methodologic challenges such as using existing data in new ways and for purposes other than for which they were collected; expanding the limited guidance for using available statistical methods to analyze and link data; and closing gaps in methodology for linking disparate data sets. Despite the challenges, great opportunities exist to forge partnerships to make data more available, develop standards to facilitate data exchange, and analyze data to describe the impact of environmental hazards on human health. However, without defining the appropriate rules for data linkage, indiscriminate linking may lead to erroneous conclusions. This highlights the need to understand each data set, articulate the uses and limits of each data set, and standardize methods for using the data.
Fundamental Premise for Linking Data
Fundamental questions must be asked before linking different types of data. For example: Is there a scientific basis for connecting the data sets? Are the data to be linked adequate and appropriate for addressing the issue? A useful framework for examining these questions has been presented by Thacker et al. …