Academic journal article Journal of Marriage and Family

Handling Missing Values in Longitudinal Panel Data with Multiple Imputation

Academic journal article Journal of Marriage and Family

Handling Missing Values in Longitudinal Panel Data with Multiple Imputation

Article excerpt

The use of longitudinal panel (prospective) survey data is common in the area of family research. From 2010 to 2014, approximately 287 quantitative and qualitative research articles (excluding theory development, research reviews, comments, rejoinders, and methodological innovation articles) were published in the Journal of Marriage and Family (JMF). Of these, 176 (61%) analyzed longitudinal data. Data on the same individuals or families at multiple points in time provide for stronger inferences about change processes and allow for more control of unmeasured differences between individuals that can bias study findings (Johnson, 1995, 2005). What tempers these advantages is the large amount of missing data found in many longitudinal studies. Nearly all of the JMF articles explicitly mentioned the presence of missing values and study dropout-suggestive of the widespread concern with missing data in panel studies.

Few guidelines for the analysis of longitudinal panel data in the presence of missing values are accessible to family researchers. Moreover, no clear appraisals of the consequences of different ways of handling missing data are readily offered. Existing guidelines tend to be directed toward statisticians or focus on types of longitudinal data rarely found in the family literature, such as randomized clinical trials (e.g., Daniels & Hogan, 2008; Enders, 2011; Hedeker & Gibbons, 2006; National Research Council, 2010) or data sets with few cases but many waves, such as cross-national time-series studies (e.g., Honaker & King, 2010). Methods for handling missing values have been addressed in the family literature (e.g., Acock, 2005; Johnson & Young, 2011; Young & Johnson, 2013), but these resources focus primarily on cross-sectional data. Although much of what we know about the approaches to handling missing values in cross-sectional situations applies to longitudinal panel data, panel data have characteristics that complicate the application of techniques such as multiple imputation (MI). Such complications, along with a lack of accessible guides to help address these issues, may be contributing to the limited use of modern methods like maximum likelihood (ML) or MI among the many studies in the area of family that use longitudinal data (Jelicic, Phelps, & Lerner, 2009).

In this article, we review standard approaches to handling missing data in longitudinal panel studies, apply several techniques to a simulations study based on an empirical family research problem using a multiwave panel data set, and assess how different strategies have consequences for the research findings. Our focus is on missing values in panel data sets with large numbers of respondents but small numbers of survey waves administered at fixed intervals-typical conditions for data sets found in much family research. Missing data MI strategies with fixed effect, pooled time-series models and event-history (Cox proportional hazard) models are examined. Our review of the methods used in 176 JMF articles suggests that the most common models for analyzing longitudinal data were event history (19%), fixed effects (18%, or 19% including change scores), and mixed effect or multilevel (17%, or 22% including growth curve), followed by linear regression (16%), logistic regression (15%), and structural equation models (10%, or 15% including growth curve and latent class analysis). Less common methods for analyzing longitudinal data included multinomial regression (5%) and qualitative analysis (2%). (Note that percentages sum to more than 100% because many articles used more than one method.)

BACKGROUND

Longitudinal panel studies have several features that complicate the techniques commonly applied when handling missing data. Unlike cross-sectional data sets, longitudinal data sets have both within-wave and whole-wave missingness. Longitudinal data analysis methods require a particular data structure (long vs. wide) that creates issues when handling missing data (Lloyd, Obradovic, Carpiano, & Motti-Stefanidi, 2013). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.