Page 1 of 1

Executive Summary

Creation of New Race-Ethnicity Codes and Socioeconomic Status (SES) Indicators for Medicare Beneficiaries


This project sub-task is a continuation of an earlier task order project (Contract Number 500-00-0024, Task 8) Health Disparities: Measuring Health Care Use and Access for Racial/Ethnic Populations (2005) intended to identify race/ethnic disparities in the use of selected Medicare covered services. This sub-task has two objectives. The first is to create and validate an acceptable measure of socioeconomic status (SES) that could be incorporated into further analyses of health care disparities among the racial/ethnic groups participating in the Medicare program. The second objective is to prepare tabulations (using an improved race/ethnicity measure created in the earlier project) that incorporate the SES variable (as well as age and gender) in such a way, that differences in utilization associated with race/ethnicity are highlighted while the effects of the aforementioned covariates are controlled or held constant.

To rigorously investigate whether there are racial/ethnic health care disparities present in Medicare, it is critical to be able to assess the extent to which disparities are associated with an improved race/ethnicity variable alone, rather than with socioeconomic status (SES), because the impacts of these variables are often confounded. Thus we sought to examine apparent racial/ethnic health care disparities while other important factors, such as SES, age, and gender were controlled. In the past, it has not been possible to do this kind of analysis using Medicare administrative data alone because the enrollment database (EDB) which contains the person-level characteristics of beneficiaries does not include an appropriate variable or surrogate to measure SES. In this sub-task we created and validated such a measure, building on our efforts in the earlier project to geocode beneficiaries' addresses and link them to US Census data on their block group. In addition, accuracy of the race/ethnicity coding on the EDB was increased by using the improved race/ethnicity measure we developed in the earlier project. Because the current task was based largely on our previous efforts, it is important to understand the work done in the earlier project to lay the foundation for the work performed in this task. For that reason, a summary of this previous work has been included.

Return to Contents

Research Methods

Race/Ethnicity Coding on the EDB. The race data on the EDB has historically been obtained from the Social Security Administration's (SSA's) master beneficiary record (MBR). Until 1980, applicants filing for a Social Security number completed a form SS-5. The race item only permitted classification of race as "White", "Black", or "Other", and missing responses were coded as "Unknown". In 1980, the SSA's race categories were expanded to "Hispanic"; (non-Hispanic) "White"; (non-Hispanic) "Black"; Asian, Asian-American, or Pacific Islander; American Indian or Alaska Native; and "Unknown". When the SSA began enrolling applicants at birth by extracting data from birth certificates, it was not considered necessary to include race or ethnicity (Scott, 1999). In 1994, the expanded race/ethnicity codes from the SS-5 form were incorporated into the Medicare EDB. This update was repeated in 1997 and 2000, and annually since then. In 1997, the Health Care Financing Administration (now the Centers for Medicare & Medicaid Services, or CMS) conducted a post card survey to improve the EDB's race/ethnicity coding. Following these efforts, researchers assessed the improvement in the EDB's race/ethnicity data, and concluded that while there was a noticeable improvement in the coding, identification of Hispanics, Asians/Pacific Islanders, and American Indians/Alaska Natives was still incomplete (Arday, et al 2000; Eggers and Greenberg, 2000; and Waldo, 2005).

Assessing Current Status of EDB Race/Ethnicity. The first step in our earlier project was to assess the correctness of the EDB race/ethnicity coding on the mid-2003 EDB. To do this, we compared the race/ethnicity on the EDB for the almost 831,000 respondents to the 2000-2002 Medicare CAHPS surveys (also known as the Medicare Satisfaction Surveys). The survey self-response to the race/ethnicity items served as the gold standard against which the accuracy of the EDB race/ethnicity codes was assessed. We used sensitivity, specificity, positive predictive value, negative predictive value, and Kappa to measure agreement between the two sources of race/ethnicity data. The accuracy of the EDB was highest for non-Hispanic Blacks, with all measures above 90 percent. Non-Hispanic Whites were the next most accurately coded on the EDB. Only specificity (62 percent) and Kappa (0.71) were less than 90 percent for non-Hispanic Whites. The moderate level of specificity and Kappa reflect a considerable number of self-reported non-White CAHPS respondents coded as White on the EDB. Sensitivity for American Indians/Alaska Natives was only 36 percent and the positive predictive value just 60 percent, contributing to a low Kappa (0.45).

Hispanics and Asians/Pacific Islanders were the minority groups of particular interest since we planned to develop an algorithm based on their unique surnames to improve their coding on the EDB. The sensitivity for Hispanic coding on the EDB was a low 30 percent and for Asia/Pacific Islander it was 55 percent. Closer examination revealed that these low sensitivities largely reflected self-identified Hispanics coded as White on the EDB, and self-identified Asians coded as Other on the EDB. The Kappas were 0.45 and 0.66 respectively for Hispanic and Asian/Pacific Islander Medicare beneficiaries, reflecting the low sensitivities, but the other measures were acceptable at approximately 90 percent or more.

Developing an Algorithm to Accurately Impute Race/Ethnicity. Having established the need to improve coding for the Hispanic and Asian/Pacific Islander Medicare beneficiaries we undertook creation of imputation algorithms for both minority groups. The algorithms made use of information on the EDB, such language preference for mailing informational materials, source of their race/ethnicity code, and whether they resided in Hawaii or Puerto Rico. The algorithms also used Hispanic (Word and Perkins, 1996) and Asian/Pacific Islander (Falkenstein and Word, 2002) surname lists developed by the U.S. Census Bureau. The Hispanic surname list included a percentage for each name representing the proportion of times a household headed by an individual with a particular Hispanic surname was indeed in an Hispanic household as reported to the Census. There were similar percentages for the Asian/Pacific Islander surnames. We also considered typical Hispanic and Asian first names.

We incorporated these pieces of information into a SAS program that, through an iterative process which differed slightly for Hispanics and Asians/Pacific Islanders, created an algorithm that improved the race/ethnicity variable. In the algorithm, a beneficiary was considered Hispanic (or Asian): if the beneficiary's surname was identified as Hispanic (Asian) by the Census at least 70 percent of the time, otherwise, if the EDB coded the beneficiary as Hispanic (Asian), otherwise, if the person was a resident of Puerto Rico (Hawaii), otherwise, if the beneficiary preferred to get program information in Spanish, otherwise, if the beneficiary's first name had Hispanic (Asian) origins, and the surname was considered Hispanic (Asian) at least 50 percent of the time by the Census. Conditions were also identified under which a race/ethnicity changed according to these rules was restored to its EDB code.

Assessment of the Algorithm. Using the self-reported race/ethnicity data from the 2000-2002 Medicare CAHPS survey respondents as the gold standard again, we assessed the results of applying the algorithm to the CAHPS respondents. We found the algorithm significantly improved the race/ethnicity categorization of Hispanic and Asian/Pacific Islander Medicare beneficiaries. Among Hispanic beneficiaries, sensitivity improved from 30 to 77 percent, the Kappa coefficient rose from 0.43 to 0.79, and the other measures (specificity and predictive values) remained virtually unchanged. The improvement for Asian/Pacific Islander beneficiaries was equally impressive – sensitivity rose from 55 to 80 percent, Kappa increased from 0.66 to 0.80, and the other measures were not materially changed. Applying the algorithm to the entire 41.7 million persons on the mid-2003 EDB resulted in changing race/ethnicity codes to Hispanic for nearly two million Medicare beneficiaries and to Asian/Pacific Islander for three hundred thousand beneficiaries. Hispanics increased from 2.2 percent of Medicare beneficiaries to 7.0 percent, and Asians/Pacific Islanders increased from 1.4 percent to 2.0 percent.

Geocoding Beneficiary Addresses. As part of the earlier project, we employed a software package from GeoLytics, Incorporated called Geocode CD (release 2.60) to geocode the addresses of Medicare beneficiaries listed on the mid-2003 EDB. Because using GeoCode CD required the elements of beneficiary addresses in a very particular order, we needed to clean and reorder the addresses before processing them. The process of geocoding was performed to generate a FIPS code that would allow linkage of beneficiaries to the socioeconomic characteristics of their residential neighborhood (block group) from the 2000 Census. We were able to run 87.5 percent of the 41.7 million Medicare beneficiary's addresses though the geocoding process. Those that were not processed either had a box or route number and no street address or were foreign addresses, conditions that Geocode CD could not handle. We obtained FIPS codes for 99.2 percent of those processed by invoking options that allowed the use of variations from the input address when that address could not be found in the Geocode CD database.

Creating an SES Index. In the current sub-task, we used the beneficiary-linked block group data to develop a single measure of SES for beneficiaries that incorporated the common strains of the separate socioeconomic variables of their neighborhood (block group) extracted from the Census. Following the work of Krieger et al. (2003a) at Harvard University, we used the same block group level socioeconomic characteristics she extracted from the Census to create an SES index for the sample of 1.96 million Medicare beneficiaries selected for study in our previous task order project. These characteristics were representative of the occupational, income, wealth, and educational characteristics of residents in the block group.

Just as Krieger et al. (2003a) did, we performed a principal components analysis of the following seven Census variables: percentage of persons in the labor force who are unemployed; percentage of persons living below poverty level; median household income; median value of owner-occupied dwellings; percentage of persons 25 years of age or older with less than a 12th grade education; the percentage of persons 25 years of age or older completing four or more years of college; and the percentage of households that average one or more persons per room. The weights from the first principal component were used to create an SES index score for the 1.57 million beneficiaries in the Medicare sample who had a FIPS code and block group Census data associated with their address. The continuous range of SES index scores was standardized so scores could range between 0 and 100. The scores were then grouped into four categories to facilitate tabular analysis.

Validating the SES Index. Before using the four category SES measure in tabulations we validated it. We used the national probability sample of Medicare beneficiary respondents to the three Medicare fee-for-service CAHPS surveys for 2002-2004 as the basis for our validation. In addition to the CAHPS survey measures, we had requested and received some income-related information for CAHPS respondents from the Social Security Administration (SSA)— the indexed monthly earnings (IME) that were taxed for Social Security purposes while the beneficiary was paying the Social Security tax, and the monthly benefit amount (MBA) that Social Security is currently paying beneficiaries.

The first step in the validation process involved computing the SES index scores for the full validation sample of over 381,000 Medicare fee-for-service CAHPS respondents and creating the four category SES measure. We next computed the means of the two SSA variables within each level of SES and we also cross tabulated the two SSA variables with SES scores. We found that the mean IMEs increased significantly as the SES level rose. The distribution of beneficiaries across the four categories of SES according to the four categories of their IME was also highly significant, indicating that, proportionately more beneficiaries with lower IME were classified in lower SES categories, and proportionately more of those with higher IME were classified in higher SES categories. We also found that the mean MBA increased significantly as the SES category went from the lowest to the highest. The cross tabulation of MBA and SES showed a similar significant association, with proportionately more low MBA beneficiaries in the lowest SES category and proportionately more high MBA beneficiaries in the highest SES category.

In addition to the two SSA variables, we had several others from the CAHPS survey — having additional insurance (not including Medicaid), having private insurance to cover prescription drugs, reporting health status to be fair or poor, and achieving educational status no higher than high school graduate — and one from the EDB — whether or not a beneficiary is simultaneously eligible for both Medicare and Medicaid — that we believed should be related to SES. Eligibility for both Medicare and Medicaid was significantly associated with SES: eligibility was greatest among the lowest SES category. The associations between the four CAHPS measures and the SES variable were also highly significant. The direction of the associations was as expected: larger percentages of persons in poor or fair health and persons who had no more than a high school education were in the lower SES categories, and fewer persons with other insurance (not including Medicaid) and private prescription drug coverage were in the lower SES categories.

Return to Contents


Sample Selection. The analyses planned for this sub-task were performed on a probability sample of 1.96 million Medicare beneficiaries selected for analysis in the previous task order contract and reported on in the report titled Health Disparities: Measuring Health Care Use and Access for Racial/Ethnic Populations (2005).This sample was selected from the full 10 segments of the mid-2003 unloaded EDB. To be eligible for inclusion in the sample, beneficiaries must have been enrolled in traditional fee-for-service (FFS) Medicare (Part A, Part B, or both) for the full 12 months of the 2002 calendar year and not have been enrolled in a Group Health Organization at all during that calendar year. In addition, beneficiaries must have been alive for the full 12 months of calendar year 2002. We set these criteria to allow the maximum opportunity (period of time) for beneficiaries to submit claims documenting their use of preventive and other Medicare covered services.

The primary sampling goal at the time this sample was selected was to have sufficient sample size to provide equally precise estimates of health care utilization for the different racial/ethnic groups. We therefore sampled such that, to the extent possible, the same number of Medicare beneficiaries would be included in the sample in each of the different racial/ethnic groups. The sampling rates based on the NEWRACE code was 11 percent for Black Medicare beneficiaries, 1.2 percent for White, 26 percent for Hispanic, 71 percent for Asian/Pacific Islander, and 100 percent of American Indian/Alaska Native, Other, and Unknown.

Tabulations. We redesigned a number of tabulations performed for the earlier CMS task order to identify health care disparities among Medicare beneficiaries by race/ethnicity. The tabulations for this project incorporated a four categorical version of the SES index score along with race/ethnicity, gender (where appropriate) and age group. The health care utilization variables analyzed in the tabulations included the use of cancer screening services, services for the secondary prevention of complications of diabetes, hospitalizations for ambulatory sensitive conditions that are indicators of inadequate primary care, and the number, length, and expenditures for common hospitalizations experienced by Medicare beneficiaries. These were extracted for the sample members from their 2002 Medicare claims.

As part of the expansion of the tabulations to include SES, the tabulations were done for the nation as a whole and repeated for the 10 metropolitan statistical areas (MSAs) where the largest number of Hispanics and Asians/Pacific Islanders 65 years of age and older reside. Since four of the MSAs were in common between the two groups of ten MSAs, the tabulations were only prepared for the nation as a whole and the 16 unique MSAs.

Multivariate Modeling. In an effort to better understand the overall impact of the SES measure on the disparities in health care utilization between White Medicare beneficiaries and those who are members of racial/ethnic minorities at the national level, we estimated several multivariate logistic regression analytic models. Our analytic approach involved three steps. The first was to estimate the size of the White-minority group differences in utilization, controlling only on gender and age group, and then we re-estimated the differences by controlling on SES as well as gender and age group in the model. In the final step, we re-estimated the differences in utilization controlling on the interaction between SES and race/ethnicity as well as gender and age group.

We conducted regression model analyses on seven of the 45 utilization measures included in the tabulations, and only at the national level. They included three cancer screening measures (past 12 month receipt of: the combination of mammogram and Pap smear for women, the prostate specific antigen (PSA) test for men, and any of the three colorectal cancer screening tests for both sexes), three diabetes secondary preventive services for beneficiaries identified as diagnosed with diabetes (past 12 month receipt of: physiologic testing (hemoglobin A1c, lipid profile, or micro albumin) to monitor insulin needs, an eye exam, and instruction in self-care (diabetes education and self-monitoring)), and whether or not a beneficiary had a hospital or emergency department admission with a diagnosis of any of 15 ambulatory care sensitive conditions (ACSCs) we included.

For the first six service use measures, minorities typically had lower utilization than Whites while equal or higher would have been better. For the ACSC measure, the difference in utilization was reversed because a higher level of hospitalization for these diagnoses is poorer quality care considered largely avoidable with appropriate and timely ambulatory care. Furthermore, the magnitude of disparities between minority beneficiaries and Whites represented by these seven utilization measures ranged from very small or none to very large or substantial.

The regression models confirmed that controlling the impact of SES (as well as age and gender) typically reduces the size of the utilization difference between Whites and minorities, i.e. the disparity. The amount of the reduction varied with the measure of use and the minority, however, it is important to note that it never came close to eliminating the difference. A final set of regression models was run to investigate whether there is a statistical interaction between race/ethnicity and SES that impacts the differences in utilization. We found that there were interactions, and that the reduction in differences between Whites and minorities varied according to race/ethnicity and level of SES. The interaction between race/ethnicity and SES revealed that the differences between Whites and minorities were not uniform across SES levels, but were often larger among beneficiaries in the higher SES levels than they were in the lower SES levels. Based on so few measures of utilization, however, these results are suggestive at best, and indicate that analyses of additional utilization variables are needed.

Return to Contents


Important Results. Both of the objectives of this sub-task were achieved. We developed an index of socioeconomic status (SES) for a probability sample of nearly 1.6 million Medicare beneficiaries stratified by an upgraded measure of race/ethnicity. Development of the upgraded race/ethnicity was itself an achievement because it made it possible to more confidently examine racial/ethnic disparities with regard to the health care utilization of Black, Hispanic and Asian/Pacific Islander Medicare beneficiaries.

We developed the SES index from Census data representing the beneficiaries' residential neighborhood. The methods we employed were similar to those of other researchers seeking to measure the impact of socioeconomic status on disparities in health services utilization. The resulting SES measure was subsequently validated on a large independent sample of Medicare beneficiaries using economic, social, and behavioral measures presumed to be related to SES that we obtained from the Social Security Administration and CMS (data from the EDB and fee-for-service CAHPS). The validation activity showed that these variables were moderately related to the SES measure and in the expected direction, exactly what one wants in the validation of an index based on multiple related items. The associations were all very highly statistically significant.

We also achieved the second objective which was to generate tabulations on a variety of health services utilization measures controlling on SES, as well as age and gender, to provide a better estimate of potential disparities between Whites and minority group members in their use of health care. While limited time and resources prevented analysis of the hundreds of tabulations prepared, the results of our limited multivariate modeling analyses confirmed that controlling on SES did reduce the difference in health care utilization between White and minority Medicare beneficiaries. Our examination of the interaction between race/ethnicity and SES indicated that these differences in utilization were often smaller among beneficiaries in the lower SES categories than in the higher ones.

Limitations. As we have indicated, the results of this sub-task utilizing the SES index are an important contribution to the understanding of racial/ethnic disparities in the use of health care. The results of the race/ethnicity imputation algorithm used in this sub-task also represents an important expansion in the use of a large and potentially fruitful administrative database that to date has been of limited use for examining disparities beyond White and Black differences. There are limitations to the work and aspects of it for which additional research is needed.

While the naming algorithm did greatly improve the coding of Hispanics and Asians/Pacific Islanders, there is still room for further improvement, not to mention the need for continual updating as new beneficiaries are added to the Medicare program. We did nothing to improve the accuracy of the American Indian/Alaska Native group and our analysis suggests that they remain seriously under-identified. There may be some way to better identify who these beneficiaries are through reservation addresses or sources of care used. This should be further investigated as the available data show this group to be particularly vulnerable to not using health care as much as others.

The geocoding of beneficiary addresses has allowed researchers to associate some type of SES measures based on place of residence with most Medicare beneficiaries, but not all. Because many of the addresses were box numbers, rural route, in Puerto Rico or a foreign country and could not be geocoded by the software, they were not linked to the Census data needed to create the SES index. Further research should be undertaken to determine how these addresses should be handled.

Further, preliminary examination of a sample of geocoded addresses indicates that employing some of the Geocode CD software's options may have resulted in misidentifying a small proportion of block groups. In the future it would be advisable to more thoroughly investigate the impact of this misidentification on results.

Finally, our multivariate analysis and the associated tabulations were produced using available Medicare claims data from 2002, and may not represent the situation in 2006. Just as the race/ethnicity algorithm and geocoding process should be updated, so also should the 2002 sample and claims data, to reflect the new beneficiaries in Medicare and any improvements made in the more equitable receipt of care.

Return to Contents
Proceed to Next Section

Page last reviewed January 2008
Internet Citation: Executive Summary: Creation of New Race-Ethnicity Codes and Socioeconomic Status (SES) Indicators for Medicare Beneficiaries. January 2008. Agency for Healthcare Research and Quality, Rockville, MD.