Academic journal article The Journal of Consumer Affairs

Complex Samples and Regression-Based Inference: Considerations for Consumer Researchers

Academic journal article The Journal of Consumer Affairs

Complex Samples and Regression-Based Inference: Considerations for Consumer Researchers

Article excerpt

This article demonstrates that researchers who treat data collected via complex sampling procedures as if they were collected via simple random sample (SRS) may draw improper inferences when estimating regression models. Using complex sample data from the 2004 panel of the Survey of Income and Program Participation (SIPP) two models--one ordinary least squares (OLS) regression and one logistic regression--were estimated using three methods: SRS with and without population weights, Taylor series linearization, and Fay's Balanced Repeated Replication (BRR). The results of the alternative models demonstrate that depending on the variables of interest, authors who fail to incorporate sample design information or fail to consider the effects of weighting may draw improper inferences from their regression models. Reasons why researchers continue to neglect complex sample-based variance are proposed and discussed, and example SAS and Stata code is offered to encourage adoption by the consumer research community.

**********

Nearly all publicly available survey data are collected via complex sampling methods such as stratification, oversampling, and multiframe samples, where respondents have an unequal probability of selection. The use of complex sampling methods allows for the collection of data from populations of interest in an efficient and cost-effective manner. However, many commonly utilized statistical packages default to the assumption that data were collected through a simple random sample (SRS). Adhering to this default assumption when analyzing data collected via complex sampling methods typically increases the probability of a Type I error (Kish 1965, 2004; Wolter 1985), although Type II errors may also occur (Davem et al. 2007).

Nielsen et al. (2009) demonstrated how neglecting to incorporate complex sample design information results in substantially smaller standard errors and an increased risk of Type I errors. That demonstration was motivated by a review of 51 articles in the 2000-2007 issues of the Journal of Consumer Affairs, Journal of Financial Counseling and Planning, and the Journal of Family and Economic Issues that used data from the Consumer Expenditure Survey (CE), Current Population Survey (CPS), Survey of Consumer Finances (SCF), and the Survey of Income and Program Participation (SIPP) that revealed that few indicated whether standard errors accounted for sample design. Our review of the 2008-2013 issues of the same journals revealed that this practice continues: four (Coleman-Jensen 2011; Rhine and Greene 2013; Sanders and Porterfield 2010; Tamborini, lams, and Reznik 2012) of the 27 articles that used the CE, CPS, SCF, or SIPP indicated that the analyses accounted for the given survey's complex sample design. (1)

The analysis by Nielsen et al. (2009) focused on the standard errors associated with weighted descriptive estimates of population parameters. Of course, weighting descriptive statistics is standard practice. However, nearly all articles evaluated in both reviews progressed from weighted estimates of the population to one or more unweighted multivariate models. The decision to weight, or not weight, a multivariate model has long been debated (Deaton 1997; Lee and Forthofer 2006; Pfefferman 1993; Scott and Wild 2002). However, the near-uniform practice of estimating unweighted regression models revealed in the review of papers is noteworthy given that CE, CPS, SCF, and SIPP samples are all derived from over-sampling, stratification, multistage, and other complex sample selection strategies that render an unweighted sample dissimilar from the population of interest. When modeled without weights, subpopulations selected via complex sample selection methods may exert unequal influence, resulting in regression coefficients that usually do not accurately represent the population of interest (Deaton 1997; Lee and Forthofer 2006; Pfefferman 1993). Deaton (1997) indicates that weights are necessary for models seeking to estimate population regression coefficients. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.