This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Using Administrative Data To Monitor Access, Identify Disparities, and Assess Performance of the Safety Net
By John Billings, J.D.
Strengths and Limitations of Administrative Data
Vital Statistics (Birth Records)
Hospital Discharge Records
Emergency Room Records
Appendix A. Sample Data Confidentiality Agreement
Appendix B. Ambulatory Care Sensitive Conditions
One of the greatest difficulties in assessing barriers to access, health disparities, and performance of the safety net in a community is obtaining meaningful data to measure these factors. While it is quite easy to conceptualize indicators that might provide some insights, obtaining the data can be challenging.
Many potential measures involve talking directly to patients to determine whether the patient has a usual source of care, had a doctor visit in the last year, or was unable to obtain needed care. However, population surveys can be costly, difficult to administer effectively (many of the most vulnerable can be hard to reach), and usually provide, at best, data at community-wide levels.1
While some population subgroup analysis is usually possible (How do low-income patients compare to higher income patients? Do the measures differ by gender, age, or race/ethnicity?), targeting problems in a specific geographic area within a community is seldom possible with survey data. Moreover, some of these measures are quite subjective (e.g., "unable to obtain needed care"), and differences in expectations, culture, or health beliefs may mask important differences among population subgroups or generate an appearance of disparity when none exists.
Other data that seem quite straightforward are simply not available. For example, data on immunizations for children can be important potential indicators of how well the local safety net is performing and of possible barriers to accessing needed care. However, short of manual review of patient medical records in physicians' offices (a costly and impracticable task), this information is generally not available. Some communities have attempted to establish immunization registries, but most are incomplete. Schools require evidence of immunization on initial enrollment, but few maintain records on compliance that can be used for assessment, and there is usually no means of distinguishing whether enrolling students received immunizations in accordance with the recommended schedules or simply received immunization just prior to school enrollment.
Accordingly, there is an increasing interest in examining "administrative" data as a means of assessing barriers to access, health disparities, and the performance of the safety net. These data are computerized records that are gathered for some administrative purpose, but contain information that can be used for other purposes as well. A classic example is birth records. These data are maintained as a matter of public record and have long been computerized to facilitate their use. The main administrative use, of course, is as a legal record certifying the birth of an individual and recording the newborn's parentage. However, the record also includes information on prenatal care, birth weight, gestation period, and birth outcome—data that can be of enormous value in assessing patterns of care for pregnant women in a community.
Another important example of administrative data is computerized hospital discharge data. These data are used primarily for electronic billpaying, for both Government (Medicare and Medicaid) and commercial payers. Because payment levels are determined by a variety of factors, these computerized records contain a substantial amount of information about the patient (age, gender, expected payer, and so on) and the hospitalization (diagnoses, procedures, discharge status, and so on). As a result, hospital discharge data have been used increasingly for many other purposes as well. An obvious use is marketing: providers (hospitals and managed care plans) have a direct interest in understanding patient origin and the dynamics of market share (e.g., Where are patients going to obtain heart bypass surgery? How do these patterns change over time?). The data have also been of interest to researchers and policymakers. Since the 1970s, analysis of hospital discharge data has revealed substantial variation in hospitalization rates that have challenged assumptions about medical practice and raised important issues concerning cost, quality, and patient outcomes (Wennberg and Gittelsohn, 1973; Wennberg et al., 1989). In addition, as is explored below, many researchers and analysts also have used hospital data to assess access problems and to identify areas within a community where there may be more substantial barriers to timely and effective ambulatory care (Billings et al., 1993; Bindman et al., 1995; Billings, Anderson, and Newman, 1996; Millman, 1993).
In the material that follows, the general strengths and weaknesses of using administrative data is discussed, followed by a more detailed examination of how three different types of administrative data (vital statistics records, hospital discharge data, and emergency department data) have been used to monitor access, identify disparities in health outcomes, and assess the performance of local safety nets. Included in these sections are brief descriptions of what is required to use these data, information on how to obtain the data, and some examples of their use.
1 For information on conducting surveys, go to the chapter by Joel Cantor.
Return to Contents
Strengths and Limitations of Administrative Data
The main strength of administrative data is their availability. No new data collection is required. Someone else has gathered and entered the data, and your job is simply to analyze it. The data are available electronically, so they are relatively easy to transport (on disk or CD-ROM) and archive for easy access (hard disk space has become incredibly affordable). Even analysis itself can be relatively inexpensive, since most databases are in standardized formats and virtually all analyses can be conducted on a desktop computer, often using off-the-shelf analytic programs (e.g., SAS®, Stata®, SPSS®, and others).
Moreover, most of these databases are relatively large. That usually means it is possible to analyze population subgroups separately (e.g., by race/ethnicity, national origin, and so on) or to focus on specific geographic areas (a part of town, an individual ZIP code, or even a census tract). Most problems with access are not uniform across populations or within areas (the companion volumes document huge variation in virtually all measures related to access and the safety net). Therefore, administrative data can often be useful in identifying problems specific to a particular subgroup or geographic area, something that is often impossible with survey data, which are typically available only at the metropolitan statistical area (MSA) or county level.
However, administrative data are not without serious problems and limitations. First and foremost, the data can be "dirty." It is critical to remember that the data were gathered for another purpose. The fields you are most interested in may or may not be central to the primary record keeping or payment purpose.
For example, information on birth records for the month that prenatal care began is not a vital statistic for the purposes of creating a certification of a birth. It may be missing altogether, and even if the information is provided, its accuracy is not guaranteed. Issuing payments is not contingent on the field being completed, and there is certainly no penalty for providing inaccurate information. The good news is that information on the date prenatal care began as well as birth weight and gestation period is usually completed by a physician who may actually know the answer (and who has no reason to get it wrong). Sometimes, however, the data are incomplete, missing, or simply wrong. For that reason, exercising caution is necessary.
In all administrative data sets, some fields are likely to be more accurate than others. For birth records, the date of birth is undoubtedly accurate. The baby's weight is likely to be correct too (if it's there), but other fields may be more worrisome. Information on smoking and drug/alcohol use during pregnancy is self-reported by the mother, who may be reluctant to reveal personal information that may be embarrassing or even illegal. With hospital discharge data, the diagnosis and procedure fields determine payment levels. Getting it wrong constitutes fraud. Within the generally accepted parameters of up-coding to maximize reimbursement, the data are likely to be accurate. But information on the expected payer may or may not be reliable, and using the field to assess utilization by Medicaid and uninsured patients can be problematic. A patient may enter the hospital uninsured, but leave with an expected payer status of "Medicaid," as the hospital's reimbursement staff helps the patient secure coverage (and the hospital secure reimbursement). And what is expected may not happen—coverage may be denied or retrospectively changed.
Accordingly, the first step in any use of administrative data is "cleaning" and data quality analysis. Simple frequency distributions and cross tabulations of variables of interest can identify the extent to which data in the field are complete and may reveal serious anomalies. The first rule of analysis of administrative data is the assumption that an unexpectedly high or low number or rate is probably due to bad data or to events not captured in the data. Check and re-check to be sure. If a middle-class ZIP code has admission rates for heroin use that are 10 times the area average, probe the data before notifying authorities. A simple cross tabulation may reveal that all the admissions are at a single hospital, and further analysis may indicate that the hospital is mistakenly coding the ZIP code field with the hospital's own ZIP code, rather than the residence of the patient. Or perhaps a new residential facility for individuals with drug problems has opened within the ZIP code in question—verify the accuracy of the data before concluding that a drug epidemic exists among suburban teens in the area.
That there could be alternative explanations for a finding suggests the second major limitation of administrative data: they seldom tell the whole story, and further analysis is often necessary. The strength of analysis of administrative data is identifying patterns and formulating hypotheses. But, in most cases, additional information is necessary to determine the next step. For example, in the material that follows, using hospital discharge data to identify areas with high rates of preventable/avoidable conditions and analyzing emergency department data to document areas with high emergency room use for primary care treatable conditions are described. These findings can be vital to policymakers in understanding how utilization patterns differ among population subgroups and pinpointing geographic areas within a community where problems are the most severe. But these data only help analysts partially understand the nature of the problem, and often cannot explain the underlying cause of the problem. Is it a lack of primary care resources? Do patients need more help deciding when to seek care or how to manage chronic illnesses? Are the providers in the area performing suboptimally? Administrative data cannot answer these questions—analyzing such data is often critical in posing and focusing the next question, but must be part of a more comprehensive assessment process to help policymakers respond effectively to meeting the needs of vulnerable populations.
Return to Contents
Vital Statistics (Birth Records)
Among the first administrative databases to receive attention from policy analysts seeking to understand the nature and extent of access problems were vital statistic records—databases that record and generate certificates of births and deaths. With regard to access issues, most of the attention has focused on birth records, examining rates of late/no prenatal care and birth outcomes (infant mortality, low birth weight, preterm births, and so on). In addition, identifying potential problems among pregnant women and newborns usually engenders significant concern and response from policymakers. Little disagreement exists about the importance of prenatal care, and poor birth outcomes have enormous physical, social, emotional, and financial implications that usually command attention.
Death records also have been examined, often revealing substantial disparities in death rates for various causes among population subgroups (e.g., race/ethnicity and gender) and geographic areas (those where low-income populations live) (Institute of Medicine, 2003). While these disparities in death rates raise important policy issues, death records are usually perceived as less useful in assessing access because health care access problems resulting in immediate death are rare, and problems contributing to premature death are likely to involve circumstances and experiences encountered over a lifetime. Accordingly, drawing conclusions from death records about a specific population in a specific area can be misleading, or at least subject to the criticism that multiple factors, many unrelated to current access issues in the community, may have contributed to the death. A high homicide death rate among young males in an area is obviously quite well-anchored in time and place (indicating a serious and immediate problem), but higher death rates due to diabetes and heart disease in an area or population present greater challenges to interpretation, especially given the mobility in society and high rates of immigration in some communities. Therefore, in the material that follows in this section, attention is focused on birth records, although many of the same issues apply to both.
Obtaining Birth Records
Computerized birth records are maintained by State and local health departments. In most circumstances, it makes sense to obtain records from State authorities for several reasons. First, health departments are, of course, bureaucracies, and obtaining the release of computerized records requires an application and usually some process for approving their release. State-level agencies are more likely to have had similar requests in the past, and are therefore more experienced in responding to such inquiries. Most bureaucracies do not cope well with the unexpected or with something new, and there is an obvious potential advantage to not being the first to ask for such data. Secondly, statewide data offer an opportunity for comparison. If you are examining birth data for Trenton, NJ, and have data showing what you think are alarmingly high rates of late/no prenatal care, rest assured that the first question from local policymakers is likely to be: Compared to what? The second and third questions may be: What about Patterson? How about Elizabeth and Newark? Being able to analyze data for multiple areas has distinct advantages. And in many cases, the "local" area of interest may involve several local health departments—applying once is clearly preferable to multiple forays through the bureaucratic maze.
Most health departments publish aggregate data at the county or municipal level. These data are obviously useful, but are generally not available at smaller geographic levels and may not include data for specific population subgroups. Accordingly, the data request is for "micro" or individual record-level data—one record for each birth. If there were 10,000 births in the area, you want 10,000 records. There is a uniform birth record format with essential data elements, and many jurisdictions have added additional elements of interest. In making a data request, be flexible in terms of the medium for providing the data (tape, cartridge, CD-ROM, and so on) and the specific data elements requested. The rule of thumb is to make processing your request as easy as possible for the agency supplying the data. Accept whatever format they provide, e.g., tape or cartridges. And if there is a standard data format that contains the essential data elements you need, accept what they offer, even if getting a little bit more might potentially be useful. In most circumstances, external requests for data are an additional task for someone, and being able to make that job as easy as possible often makes a huge difference. While the records are uniform, charges for the data can vary widely among jurisdictions. Some have only nominal charges (but appreciate you supplying the tape, cartridge, or CD-ROM), while others have a moderate per record charge that can mount up when the data set is large.
There is an increasing concern among health officials and administrators about maintaining confidentiality. Computerized data provided for analysis will not contain patient names, and obtaining addresses (for possible geocoding to small geographic areas such as census tracts or blocks) is also unlikely. ZIP code-level data are usually available, but even this information may be of concern to some agencies (select for discussion on the implications of restrictions on hospital records arising from the Health Insurance Portability and Accountability Act). One approach to dealing with these issues pre-emptively is to offer up-front confidentiality restrictions. A sample user confidentiality agreement that offers some protections (e.g., provisions on security in maintaining the data and restrictions on the number of records for any published "cell size") is included in Appendix A. Including such an agreement in a request for data can often obviate problems or at least expedite discussion of the issues.
There is often a considerable time lag in data availability, typically with a 1- to 2-year delay of the release of data from the end of the year of interest. While this is an obvious concern (most policymakers are interested in recent occurrences), it is important to recognize that changes in rates associated with prenatal care and birth outcomes occur slowly. Data that are 2 years old are likely to remain useful in assessing which populations are having specific problems (racial/ethnic subgroups, recent immigrants, and so on) or particular geographic areas where concerns are most acute. The time lag problem is of greater concern when the data are being used to assess the impact of an intervention, as opposed to a more general needs assessment or evaluation of a local safety net. In these circumstances, it may be useful to obtain the data directly from the local health department (rather than the State agency), where the delays may be shorter because there is no need to assemble data from multiple jurisdictions and the task of assessing data quality is less substantial.
Analyzing the Data
As noted, the first step in any analysis is an assessment of data quality. Of critical concern are completeness of the data set and missing data for individual variables. The number of births in a jurisdiction does not change substantially from year to year, so if the data set has a significantly larger or smaller number of records in the current year than in the previous year, there is likely to be something wrong. Simple frequency distributions of date and hospital fields can often identify systematic problems.
Frequency distributions of each data element of interest are also critical to identifying missing data. The rate of late/no prenatal care is not the number of births with no prenatal care or care in the third trimester divided by total births—the denominator is the number of births for which prenatal care is known. Rates of missing data can be in the 5- to 10-percent range, and if births with missing data are included in the denominator, findings can understate problems and make comparisons to findings from other sources inaccurate.
The birth measures related to potential barriers to access and performance of the local safety net include:
- Late/no prenatal care. The percentage of births with no prenatal care or prenatal care initiated in the third trimester.
- Low birth weight full-term births. The percentage of full-term births (gestation period 37 weeks or longer) with birth weight less than 2,500 grams.
- Preterm births. The percentage of births that are preterm (gestation period less than 37 weeks).
These rates are easy to calculate and do not require additional databases, as the data used in both the numerator and denominator of the equation used for calculating the rates come from birth records. Analysis of low birth weight is typically limited to full-term births, since preterm births often involve smaller babies, and analysis of birth weight and preterm births may measure different aspects of health status or adequacy of prenatal care.
Rates of infant mortality are often of significant interest, but require databases that link births and deaths, which are more difficult to obtain and typically involve greater delays. Moreover, analysis of infant mortality for small geographic areas or population subgroups is likely to have serious limitations because of "small number" problems. Infant mortality is a relatively rare event (6.9/1,000 live births in the United States in 2000 [Anderson, 2002]), and analysis often requires multiple years of data to assure statistical significance, even at the county or municipal level, making ZIP code and population subgroup analysis problematic.
An important advantage of the birth data is the richness of the database. Most data sets include a substantial amount of data about the mother and father, including race/ethnicity, national origin, education, and employment status and sector. This information is useful in identifying important differences in rates among population subgroups. For example, in New York City, when compared with the total population, rates of late/no prenatal care were 25-percent higher for black mothers, and as much as 60-percent higher for foreign-born mothers from Mexico and Africa. Similar differences were observed for low birth weight and preterm births (Table 1).
More sophisticated multivariate analysis can help control for various factors simultaneously, thereby helping to pinpoint particular problems. For example, in New York City, black and foreign-born mothers had higher rates of late/no prenatal care after controlling for level of education (a proxy for income) and insurance status, and Medicaid and uninsured mothers had much higher rates controlling for all other factors. Table 2 displays the "relative risk" for these measures for various characteristics of the mother, where values greater than one represent risks that are above the rate for the comparison group (e.g., greater than high school education, white, not married) and values less than one reflect lower risks.
ZIP code-level analysis often can provide findings of significant interest to planners and policymakers as well. Combining birth data with information on characteristics of ZIP codes can help produce charts and maps that are very intuitive to policymakers. For example, simply examining area income reveals a relatively strong association between area income and rates of late/no prenatal care. Displaying these data in a graphic format often has more power than mere tables with numbers or more complex multivariate analyses, which may be confusing to some readers. For example, in Figure 1, the relationship between income and rates of late/no prenatal care in New York City is apparent, even to observers without statistical backgrounds. It is easy to see that higher rates tend to be in low-income areas, with generally lower rates in high-income areas.
These types of charts also illustrate another important point: not all low-income areas have the same rates. Some have very high rates, while others have more moderate rates. And not all high-income areas have low rates. This can also be communicated to policymakers by mapping the data In Figure 2, rates of late/no prenatal care for black mothers in New York City are mapped at the ZIP code level, showing the highest rates to be in middle-class areas of Coney Island and Staten Island, not Central Harlem, Bedford Stuyvesant, or South Bronx, which are neighborhoods usually associated with access problems. Maps like these can help local policymakers begin to understand the complexity of access issues, in this case raising important concerns about the adequacy of safety net services in neighborhoods without large concentrations of low-income or other vulnerable populations.
Return to Contents
Proceed to Next Section