Accuracy and Bias of Race/ethnicity Codes in the Medicare Enrollment Database

Article excerpt


Medicare's administrative files have the potential to be a rich source of information about disparities in health care access, use, or outcomes. Detailed information can be extracted on services used, diagnosis, treatment, and program expenditures. Used alone or in combination with other sources, the administrative files can be of enormous help in macro- and micro-analysis of disparities.

The inclusion of race and ethnicity in such analysis is almost universal, but not without problems. It has been argued, for example, that using race as a variable in social science research distracts one's attention from the life factors that are causally related to differences in health outcomes (Fullilove, 1998). Others have contended that race and ethnicity categories are too aggregated to be meaningful (Bhopal and Donaldson, 1998; Kaufman, 1999). These concerns notwithstanding, race and ethnicity endure among the most frequently used demographic characteristics in exploratory analysis of potential disparities in health care (e.g., Gornick, Eggers, and Reilly, 1996; Eggers and Greenberg, 2000). They act as much to proxy characteristics (such as culture, social networks, and socioeconomic status) that cannot be captured in administrative data, as to detect the role of race as an overt barrier to health care.

No matter how one uses race and ethnicity in analysis of differences in program experience, that analysis can easily be confounded if race and ethnicity are not well identified in the administrative records. This inquiry extends an earlier examination of the Medicare data (Arday et al., 2000) and has two goals: to review the accuracy of race and ethnicity classification in Medicare administrative data, and to help researchers identify potential problems when using those administrative data to examine disparities in health and health care. How accurate are Medicare data files as a source of information on race and ethnicity? Are there differences between those members of racial and ethnic groups who are identified as such in the administrative data, and those who are not identified?


At present, race and ethnicity are recorded in Medicare's administrative files with a single variable. This variable can take one of six different values--White, Black, Asian, North American Native, Hispanic, and other (plus an unknown category). Only one category is coded for each person in the file. This variable is populated principally with data provided to the Social Security Administration (SSA) by people at the time they apply for a Social Security number, when they apply or re-apply for benefits, or when they apply for a replacement Social Security card. Reporting race is voluntary and is not checked or verified by the Social Security staff receiving the application. Data are transferred from SSA's databases to the Medicare enrollment database (EDB), maintained at CMS.

Previous researchers have documented a number of problems with race coding in the EDB, concluding that these files were especially prone to difficulties in identifying people who affiliate with Hispanic/ Lafino ethnicity or with race groups other than White or Black. Comparing the EDB distribution of race/ethnicity with U.S. Bureau of the Census figures and with SSA data on country of birth, Lauderdale and Goldberg (1996) concluded that the EDB captures one-quarter of people with Hispanic heritage, less than one-fifth of American Indian enrollees, and just over one-half of Asian enrollees (with marked differences in success at capturing subracial groups in the Asian population). Pan and colleagues (1999) compared Medicare and Medicaid race codes for people in New Jersey enrolled in both programs, and found that agreement between the two administrative sets was highest for White and Black enrollees. Agreement was quite poor for Hispanic enrollees; too few Asian or American Indian enrollees were in the sample to draw conclusions about agreement on those codings. …