The Malawi Diffusion and Ideational Change Project 2004-06: Data Collection, Data Quality, and Analysis of Attrition
Anglewicz, Philip, Adams, Jimi, Obare, Francis, Kohler, Hans-Peter, Watkins, Susan, Demographic Research
In this paper we evaluate the quality of survey data collected by the Malawi Diffusion and Ideational Change Project by investigating four potential sources of bias: sample representativeness, interviewer effects, response unreliability, and sample attrition. We discuss the results of our analysis and implications of our findings for the collection of data in similar contexts.
Empirical analysis in demographic publications typically involves hypothesis testing about the determinants and correlates of demographically relevant outcomes. Although high-quality data are essential for these analyses, published articles rarely address important characteristics of the data, such as interviewer effects or, in longitudinal data, the implications of attrition for the results. This paper examines the data quality of the Malawi Diffusion and Ideational Change Project (MDICP), a data set that is widely used for analysis of social networks, HIV/AIDS and family planning in sub-Saharan Africa. We investigate several sources of potential bias in a longitudinal dataset: sample representativeness, interviewer effects, response unreliability, and sample attrition.
The analysis in this paper builds on an earlier evaluation conducted by Bignami et al. (2003). We extend this previous research for several reasons. First, as the MDICP has completed three additional waves since 2003 and now encompasses five waves of data collection (1998, 2001, 2004, 2006, and 2008), some aspects of data quality have become more important. For instance, potential attrition biases may have increased as attrition of the initial cohorts has accumulated across each survey wave, or the addition of new samples of respondents - most importantly a new adolescent sample in 2004 - may have changed the sample properties and representativeness of the survey. To address these issues, we conduct a series of data quality analyses for the MDICP data, including comparisons of the data with the Malawi Demographic and Health Surveys (MDHS) and analyses of interviewer effects, response reliability, and sample attrition. The analyses for this paper are similar to those of Bignami et al. (2003), thus permitting a comparison of data quality issues within the first four waves of the project. We do not include the 2008 data, since they are not yet fully ready for analysis.
The MDICP is a longitudinal research project with the overall goals of investigating the multiple processes and influences that contribute to variation in HIV risks in a sub-Saharan African context, identifying prevention strategies for managing risks and assessing the potential effect of HIV risk reduction programs on infection risks and disease dynamics. An unusual feature of the data is information on social networks, which permits examination of the role of social interactions on attitudes related to contraceptive use and family planning, as well as AIDS knowledge and risk behavior.
The data collection takes place in three sites in rural Malawi, each representing one of the three regions of the country: Balaka (southern region), Mchinji (central), and Rumphi (northern). The first wave was conducted in 1998 among ever-married women aged 15-49 years and their husbands. Interviews were completed with 1,541 women (out of a possible 1,790) and 1,065 of their husbands (out of a possible 1,520). In 2001, the first follow-up wave, information was collected from (1) the same respondents, (2) sample members who were not found in 1998, and (3) new spouses of respondents who married again between 1998 and 2001.6
In 2004, the third wave of MDICP data collection, interviews were conducted with the same respondents as in 1998 and 2001, as well as all new spouses of respondents. In addition two new samples were added. First, a sample of approximately 1,500 married and never-married adolescents aged 15-28 years was added in each site,7 for two reasons: to adjust for aging of the 1998 sample over time, which led to under-representation of the adolescent population by 2004; and to introduce never-married adolescents into the MDICP sample (the 1998 sample was restricted to ever-married men and women). …