Academic journal article The Journal of Consumer Affairs

Complex Sample Design Effects and Health Insurance Variance Estimation

Academic journal article The Journal of Consumer Affairs

Complex Sample Design Effects and Health Insurance Variance Estimation

Article excerpt

Fifty-one articles using complex sample data published between 2000 and 2007 in three journals are reviewed. Of these, three articles indicate whether the analyses account for sampling design when calculating standard errors. To demonstrate how neglecting to properly calculate variances increases the probability of Type I errors, data from the Survey of Income and Program Participation (SIPP) are used to estimate health insurance coverage using three methods: simple random sample (SRS), generalized variance functions (GVFs), and direct estimation via replicate weights. The analysis shows that researchers using complex sample data are likely to draw improper inferences if they do not use replicate weights to estimate standard errors.

**********

A review of any recent volume of this journal confirms that consumer affairs researchers often rely on complex, secondary data (e.g., the Survey of Consumer Finances [SCF] from the Federal Reserve [Board of Governors of the Federal Reserve System 2009]; Survey of Income and Program Participation [SIPP] from the U.S. Census Bureau [U.S. Census Bureau 2009a]; Consumer Expenditure Survey [CE] from the Bureau of Labor Statistics [U.S. Bureau of Labor Statistics 2009]; and the Current Population Survey [CPS] from the U.S. Census Bureau [U.S. Census Bureau 2008a]). None of these data are collected by means of simple random sample (SRS) with replacement. Rather, they are collected via complex sample designs that include clusters and stratification where respondents have unequal probability of selection. The use of complex sampling methods allows one to obtain data from populations of interest in an efficient and cost-effective way, but complex sampling methods are at odds with most statistical packages' default assumption that the elements were collected through a SRS. As a result, the proper estimation of variances, and the subsequent standard errors and confidence intervals derived from those variances, introduces a layer of analytic complexity that many researchers fail to incorporate into their analyses (Brick, Morganstein, and Valliant 2000).

Unfortunately, failing to account for a complex sample design typically increases the probability that the resulting inferences are biased. Usually, this omission increases the probability of a Type I statistical error (rejecting a null hypothesis when the null hypothesis is true) due to an incorrect calculation of the variances (Kish 1965, 2004; Wolter 1985). As a result, corresponding standard errors are demonstrably smaller than they would be when properly accounting for the sample design, increasing the probability of Type I error (Davern et al. 2006; 2007). The increased probability of incorrect inference is present whether one is simply calculating confidence intervals for simple point estimates, testing the relationship between two values via chi-squares or t-tests, or specifying multivariate models.

To demonstrate how the failure to incorporate sample design information may result in substantially different standard errors and improper inferences, we use the SIPP to estimate several simple statistics. Though our demonstration is conducted using SIPP data, the agencies that produce many of the datasets commonly used in this journal explicitly advise researchers to account for sample designs using methods similar to those demonstrated here. For example, documentation that explicates the need for researchers to account for sample design is available for the Consumer Expenditure Survey (Blaha 2003 p. 30; U.S. Bureau of Labor Statistics 2007 p. 4; U.S. Bureau of Labor Statistics 2008 p. 276), the CPS (Davem et al. 2006; U.S. Bureau of Labor Statistics 2006 pp. 1-10), and the SCF (Board of Governors of the Federal Reserve System 2006; Kennickell 2005 p. 21).

MOTIVATION

This demonstration was motivated by a review of each and every one of the 51 articles in the 2000-2007 issues of the Journal of Consumer Affairs, Financial Counseling and Planning, and the Journal of Family and Economic Issues that used CE, CPS, SCF, or SIPP data. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.