A Hierarchical Latent Variable Model for Ordinal Data from a Customer Satisfaction Survey with "No Answer" Responses
Zaslavsky, Alan M., Bradlow, Eric T., Journal of the American Statistical Association
Corporations survey customer satisfaction to support crucial decisions about market segmentation, product repositioning, allocation of sales personnel, new research and development, and marketing campaigns. Given the importance and costs of these surveys, corporations should use the most effective data analysis methods.
In 1992 the DuPont Corporation conducted more than 130 surveys (one for each core business) of current and potential customers as part of its ambitious (and successful) continuous improvement program. These surveys cost millions of dollars, but the analysis methods used were restricted to calculation of means, standard deviations, and distributions of rating scores (an ordinal 1-10, "strongly disagree" to "strongly agree" scale) by item, demographic subgroup and for the overall population.
Although this information was useful as a reference, DuPont was interested in determining whether more rigorous analytical methods could provide deeper insight in the following areas.
1. Analyses of means treat the data as interval-scaled rather than ordinal. DuPont suspected that one-point differences on the extreme ends of the scales were more meaningful than those at the middle. Available analyses provided no basis for determining whether this was true.
2. The analyses did not provide an adequate basis for determining population relationships between respondent characteristics and opinions. DuPont desired an approach that could validate initial findings when true, and also determine additional associations that were not spurious.
3. The analyses did not provide stable estimates of individual satisfaction when the data were sparse due to item nonresponse.
4. The treatment of item nonresponse was of concern. Basing inferences on the mean of the observed responses makes sense only under strong assumptions that are unlikely to hold. Specifically, DuPont wanted to test whether item nonresponse was informative about satisfaction, and thus believed that it was a process of intrinsic interest.
Our goal was to address these concerns, using the following parametric models.
1. The ordinal data structure is modeled by an ordinal probit model with effects for persons and items.
2. Person and item parameters are linked to covariates through regression models, yielding hypothesis tests for covariate effects on satisfaction ratings.
3. The hierarchical Bayes framework allows for borrowing of strength across persons and items.
4. We incorporate a subject matter-based theory for item nonresponse that jointly models it with the ordinal ratings by positing a series of cognitive steps, described via latent variables, that determine the observed outcome.
Our work draws on previous research on ordinal data modeling. Item response theory models for binary data (Rasch 1960) are widely used, and were extended by Birnbaum (1968) and Lord (1980), among others. Bahadur (1961) modeled dependent multinomial responses using an independence model modified by a quadratic factor. Agresti (1977) and McCullagh (1980) developed generalized linear models for ordinal data. Andrich (1978), Holland (1981), and Masters (1982) discussed maximum likelihood estimation, and Albert and Chib (1993) conducted Bayesian inference using iterative simulation.
The remainder of the article is laid out as follows. In Section 2 we describe the goals, methods, and standard exploratory and descriptive analysis of a DuPont customer satisfaction survey (CSS) dataset and introduce notation. We present an analysis using standard ordinal and logistic regression in Section 3. We describe our Bayesian hierarchical approach in Section 4, and also compare inferences using the three approaches. In Sections 5.1-5.2 we identify outliers and check the influence of individual cases and of components of the model specification on the inferences. …