Use of Medicare Data to Identify Incident Breast Cancer Cases

Surveillance, Epidemiology and End Results (SEER) data from the National Cancer Institute (NCI) provide reliable information about cancer incidence. However, because SEER data are geographically limited and have a 2-year time lag, we evaluated whether Medicare data could provide timely information on cancer incidence. Comparing Medicare women hospitalized for breast cancer with women reported to SEER, Medicare data had high specificity (96.6 percent), yet low sensitivity (59.4 percent). We conclude that Medicare hospitalization data can identify incident cases for cancers that usually require inpatient hospitalization. For cancers that often only receive outpatient treatment, such as breast cancer, additional Medicare data, such as physician bills, are needed to understand the entirety of treatment practices.


Data collected from the SEER program maintained by the NCI are usually considered the "gold standard" used to estimate the incidence and treatment of cancers throughout the United States. Prior to 1992, there were five States (Connecticut, Hawaii, Iowa, New Mexico, and Utah) and four metropolitan areas (Seattle, San Francisco-Oakland, Detroit, and Atlanta) participating in the SEER program. The geographic areas represent about 10 percent of the Nation's population (Miller et al., 1993), are concentrated in the western United States, and do not include large numbers of some demographic groups, such as African Americans, raising concerns about the representativeness of the data. Moreover, because the SEER areas are geographically limited, they may not capture regional variation in treatment practices for specific cancers, costs of care, and medical outcomes following treatment. In addition, there is a 2-year lag needed for NCI to obtain case reports from the State registries.

Given concern about the representativeness and timeliness of the SEER data, other data sources may be able to provide accurate and more current information regarding cancer incidence and treatment. A potential alternative source of information about the incidence of cancer in the population is the administrative data collected for insurance billing purposes, such as Medicare data. These data offer the opportunity for timely studies that include the entirety of the United States. The cancer diagnoses from administrative data for inpatient stays have been found to have high levels of sensitivity and specificity when compared with the medical record for the hospitalization (Fisher et al., 1992; Romano and Luft, 1992). Previous studies have used Medicare hospitalization bills to analyze whether incidence rates from Medicare data were comparable to incidence rates from SEER data for five cancers--breast, colon, esophagus, lung, prostate, and uterus (Whittle et al., 1991; McBean, Warren, and Babish, 1994; McBean, Babish, and Warren, 1993). The comparability of the rates varied by type of cancer. McBean, Warren, and Babish (1994) and McBean, Babish, and Warren (1993) found that for those cancers that are usually treated in the hospital setting, such as esophagus, lung, and uterine, Medicare rates were comparable to rates from the SEER data. For colon and prostate cancer, which are often treated in the outpatient setting only, the rates calculated from Medicare hospitalization and SEER data were significantly different. These studies only utilized aggregate data from SEER and Medicare for comparisons and did not attempt to link files at the individual level to determine if the same persons were being identified from the two independent sources of data.

The purpose of this study was to determine if persons identified as having incident breast cancer from the Medicare data were also found in the SEER data for that year. If a method could be developed to identify specific women with incident breast cancer from administrative data, it would help researchers to identify cohorts to examine treatment practices for the Nation or for subgroups. …