Academic journal article Social Work Research

Imputing Missing Data: A Comparison of Methods for Social Work Researchers

Academic journal article Social Work Research

Imputing Missing Data: A Comparison of Methods for Social Work Researchers

Article excerpt

Choosing the most appropriate method to handle missing data during analyses is one of the most challenging decisions confronting researchers. Often, missing values are just ignored rather than replaced with a reliable imputation method. Six methods of data imputation were used to replace missing data from two data sets of varying sizes; this article examines the results. Each imputation method is defined, and the pros and cons of its use in social science research are identified. The authors discuss comparisons of descriptive measures and multivariate analyses with the imputed variables and the results of a timed study to determine how long it took to use each imputation method on first and subsequent use. Implications for social work research are suggested.

KEY WORDS: data analysis; data imputation methods; missing data; research methods


"Five hundred high school students completed the longitudinal study ... The analysis suggests that a significant difference was found between ..."

These hypothetical results may appear to be positive, but the researcher failed to report that originally 850 students were in the study, and that each year 5% to 6% of the sample could not be found because they had moved, no longer had a phone, or chose not to participate. Furthermore, because of incomplete data for some variables, researchers had to drop other cases from the analysis. So in reality, more than 50% of the original sample might not be included, or accounted for, in this statement. It is possible that the participants not included in the final analysis have different characteristics from those who were included. How does this dearth of data affect the outcomes reported? Unfortunately, this scenario is all too common in the social work research reported in the literature. This article summarizes the hazards of ignoring missing data and identifies six data imputation methods that can resolve this problem. To examine how results might differ based on the imputation procedure selected, each of these methods was used on two different data sets, each with missing values. The results effectively demonstrate the importance of dealing with missing data and the many issues confronting the social work researcher in this regard.

The researcher's goal is to conduct the most accurate analysis of the data to make valid and efficient inferences about a population to guide practitioners and researchers alike (Schafer & Graham, 2002). Accomplishing this goal requires choosing the most appropriate method to handle missing data. Too often, social work researchers ignore missing data and their effects on data analysis, thus limiting the researcher's ability to achieve this goal. Ignoring missing data typically occurs when there is a widespread failure to understand the significance of the problem or a lack of awareness of the solutions to the problem of missing data (Figueredo, McKnight, McKnight, & Sidani, 2000).

The handling of missing data is not typically addressed in research reports; literature reviews prove this point. Of approximately 100 articles reviewed between 2001 and 2003 from three social work research journals (Journal of Social Service Research, Social Work, and Social Work Research), only 15 percent reported any information about the amount of missing data or how missing data were handled in the analysis. Because virtually all social science survey research involves some incomplete data, treatment of missing data should be a universal concern and addressed in all research reports.

Numerous methods exist to handle the problem of missing data. They include both "old" methods requiring just a few mathematical computations and "new" methods requiring more complex computations that are increasingly easier for social work researchers to perform with statistical programming software. Here we examine the traditional methods, including listwise deletion (the least sophisticated method), mean substitution, hotdecking, and regression imputation. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.