This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Over 200 indicators (listed in Appendix 7 of the full report) that could be specified using inpatient discharge data, such as the HCUP NIS, and that met our criteria for "quality indicator," (i.e. examined an aspect of quality as defined above, applicable to most providers/areas) were identified and evaluated as potential HCUP QIs. Based on our preliminary application of criteria for indicator validity, 45 promising indicators were retained for comprehensive literature and empirical evaluation. In some cases, whether an indicator complemented other promising indicators was a consideration in retaining it, allowing the HCUP indicators to provide more depth in specific areas.
The Evidence Report provides detailed literature summaries and data from empirical analyses on each of the 45 indicators. The indicators were constructed, as appropriate, for two perspectives on quality—"provider-level" and "area-level." Provider-level indicators are designed using a hospital-level denominator. Area-level indicators are designed with population-based denominators, specifically the population of the metropolitan statistical area (MSA). There are 25 provider-level quality indicators and 20 area-level indicators recommended for use.
While none of these indicators is without its limitations, a considerable literature in most cases coupled with evidence on satisfactory empirical performance suggests that the recommended indicators may be useful additions to the "toolkit" for clinical quality professionals, health care managers, health policymakers, as well as researchers. Each of the recommended indicators is appropriate for use as a quality "screen," or as a first examination of potential quality problems, to be followed up by more in-depth investigations. Our evaluation noted the most promising uses of each indicator, as well as important limitations and suggestions for further investigation.
Provider indicators are constructed at the provider level; they provide information related to the quality of care at individual hospitals. There are four types:
- Volume indicators include inpatient procedures for which a substantial research literature has detected a significant relationship between hospital volume and outcomes, and for which a nontrivial number of procedures are performed by institutions that do not meet recommended volume thresholds. The volume indicators are somewhat different than the other provider-level indicators, in that they simply represent counts of admissions in which particular intensive procedures were performed rather than more direct measures of quality.
- Utilization indicators include procedures whose use varies significantly across hospitals, and for which high or low rates of use are likely to represent inappropriate or inefficient delivery of care, leading to worse outcomes, higher costs or both.
- Mortality indicators for inpatient procedures include those for which mortality has been shown to vary substantially across institutions and for which evidence suggests that high mortality may be associated with deficiencies in the quality of care.
- Mortality indicators for inpatient conditions include those for which mortality has also been shown to vary substantially across institutions, and for which evidence suggests that high mortality may be associated with deficiencies in the quality of care.
The evidence report includes a set of quality indicators constructed at the area level. Versions of some of these indicators were previously recommended as HCUP I indicators. However, their construction differs in that the denominator for the indicators is now constructed at the area level. For most of these indicators, the denominator is the age- and gender-adjusted population, and the numerator is the rate of hospitalization with the procedure or diagnosis. These indicators are constructed at the level of metropolitan statistical areas (MSA).
At the county level (a finer area measure), evidence from Medicare and California data suggest that a significant proportion of patients at many hospitals come from outside the area, and many patients from an area seek care at facilities in other areas.
At the MSA level, the vast majority of patients treated in an MSA come from the MSA; and the vast majority of residents of an MSA receive treatment in the MSA. With more detailed information on patient residence (not available currently in the HCUP NIS), richer and more accurate area indicators could be constructed using the definitions applied in this report.
There are two types of area indicators assessed:
- Utilization indicators include procedures for which use has been shown to vary widely across relatively similar geographic areas, with (in most cases) substantial inappropriate utilization.
- Avoidable hospitalizations/Ambulatory care sensitive condition (ACSC) indicators involve admissions that evidence suggests could have been avoided, at least in part, through better access to high quality outpatient care.
Even though these quality indicators are area-based, an important role remains for hospital-level measures of procedures or ACSC admissions. If an area is found to have unusually high procedure rates, a natural focus for efforts to understand why rates are high and possibly to reduce them is the particular hospitals that perform a relatively large proportion of the area procedures. Similarly, if an area is found to have unusually high admission rates for potentially avoidable conditions, then the patient populations treated by hospitals with a relatively large share of these admissions might be a good starting point for interventions to understand and reduce hospitalization rates.
Using Indicators as Groups
All indicators in isolation provide a unidimensional and fairly limited picture of quality. As the results of this report indicate, many factors besides quality may contribute to provider or area performance on a single quality indicator, including random variation. However, consistent good or bad performance on several related indicators is more convincing evidence of a true underlying difference in performance, as it is more unlikely that such a pattern could arise from random events. Looking at groups of indicators together, therefore, is likely to provide a more complete picture of quality. While the HCUP indicators were not designed to be averaged or combined into an overall quality score, they do group together both by clinical domain and by aspects of care or outcome.
For example, CABG mortality rates must be viewed in the context of CABG utilization and volume (i.e., grouping by clinical domain), since inappropriate utilization for less severe patients may increase provider volumes and decrease postoperative mortality. Mortality rates for major medical diagnoses should also be viewed together (i.e., grouping by outcome), because skill in caring for community-acquired pneumonia would be expected to carry over to diagnoses such as congestive heart failure. This report does not present findings on the validity of such groupings, although some, such as the ACSC indicators, have been examined extensively elsewhere.
As noted, each potential indicator underwent extensive evaluations based on literature reviews and empirical analyses. Table 1S (provider-level indicators) and Table 2S (area-level indicators) list each indicator, describe its definition, rate its empirical performance, recommend a risk adjustment strategy, and note important caveats identified in the literature reviews.
Empirical performance rating. Our rating of empirical performance is a numerical rating that ranges from 0-26. This rating summarizes the performance on four empirical tests of precision (signal variance, provider/area-level share, signal ratio, and r-square), and five tests of minimum bias (rank correlation, top/bottom decile movement, absolute change, and change over 2 deciles). Because we were better able to conclusively measure the precision of an indicator than minimum bias (because available risk adjustment techniques were not clinically comprehensive, and thus may underestimate some bias), we weighted precision tests more than minimum bias tests. Each indicator was given a score of 0-4 based on its performance on the precision tests, relative to the other indicators, and based on specific cutoffs described in the main document. Likewise, each indicator was given a score of 0-2 on each of the bias tests. The empirical performance rating is the sum of those nine scores. The mean for the provider indicators was 9.7 (S.D. = 6.5). The mean for the area indicators was 16.2 (S.D. = 3.4). This reflects primarily the better precision of area measures relative to mortality measures. In cases where multivariate smoothing techniques improve the amount of variance that can be attributed to true differences in performance, it is noted that smoothing is recommended.
Caveats from the literature review. During the review of the literature we identified serious and potential caveats for each of the recommended indicators. These caveats tended to follow general themes, and are summarized in the table below. When specific evidence was found demonstrating that the caveat applies to that indicator, that caveat is preceded by a checkmark. When no such evidence was located, but there is a strong theoretical basis or suggestive evidence that the caveat applies, a question mark precedes the caveat name in the table. The specific caveats are described below, along with potential remedies:
Proxy indicator. Some indicators do not specifically measure a patient outcome or a process measure of quality. Rather, some indicators measure an aspect of care that has been correlated with process measures of quality or patient outcomes. The validity of these indicators relies on the persistent and strong relationship between the measured phenomenon and actual quality. For example, provider volume has been correlated with better outcomes for numerous procedures, but volume, in the absence of these relationships, does not tell one anything about quality. Area utilization measures are another example of proxy indicators. High procedure rates do not necessarily imply overuse or inappropriate utilization; for some areas, higher rates may actually represent better care.
In cases where this concern is noted, continued research on the relationship validating the indicator (such as volume-outcome relationships) is required to ensure the validity of this indicator. These indicators are best used in conjunction with other indicators measuring similar aspects of clinical care, or when followed with more direct and in-depth investigations of quality.
Selection bias. Selection bias results when the cases with a condition or procedure ascertainable from HCUP data do not represent the universe of patients with that condition or procedure. As a result, the rate of an indicator based on HCUP data may differ from the true value in the population. This problem arises when a substantial percentage of care for a condition or procedure is provided in the outpatient setting, so the subset of inpatient cases may be unrepresentative. For example, laparoscopic cholecystectomy rates based on HCUP data may be biased because hospitals admit all patients who require open cholecystectomy, but only some patients scheduled for laparoscopic cholecystectomy. Similarly, patients with mild congestive heart failure may be admitted at some hospitals, but managed as outpatients elsewhere. A related problem is that inadequate or variable coding of key diagnoses may interfere with consistent ascertainment of cases, such as for vaginal births after cesarean delivery.
In cases where this concern is noted, examination of outpatient care or patients not admitted to the hospital (e.g., ER data) may help to improve indicator performance. Better risk-adjustment may help reduce selection bias for mortality indicators, which is attributable to variation in the threshold for admission.
Information bias. HCUP II QIs are based on information available in hospital discharge data sets, but some missing information may actually be important to evaluating the outcomes of hospital care. For instance, for some conditions, 30-day mortality has been shown to substantially exceed in-patient mortality. Without 30-day mortality data (ascertained from death certificates), hospitals that have short lengths of stay may appear to have better patient outcomes than other hospitals with equivalent 30-day mortality.
In cases where this concern in noted, examination of missing information, such as 30-day mortality, may help to improve indicator performance.
Confounding bias. Patient characteristics, such as disease severity, comorbidities, physiologic derangements, and functional status, may substantially affect performance on a measure, and may vary systematically across providers or areas. We are especially concerned about confounders that cannot be identified from HCUP data, such as physical examination, laboratory, radiographic, and functional abnormalities.
In cases where this concern is noted, adequate risk adjustment may help to improve indicator performance. In some cases, such risk-adjustment may require only the demographic and comorbidity data captured by APR-DRGs or similar systems. In other cases, detailed clinical data may be necessary for adequate risk-adjustment.
Unclear construct validity. Many indicators have not been examined extensively in the literature, although they are currently in use by various health care organizations. Problems with construct validity include:
- Uncertain or poor correlations with widely accepted process measures.
- Uncertain or poor correlations with risk-adjusted outcome measures.
Although these indicators have adequate face validity, they would benefit from further research to establish their relationship with quality care.
Easily manipulated. When quality indicators are instituted, they may create perverse incentives to improve performance on the quality indicator without actually improving quality. Dysfunctional organizational responses might include "cherry-picking" the easiest cases, "teaching to the test" by ignoring broader aspects of quality, "deception" through "upcoding" of comorbidities used in risk adjustment, and by being overcritical of quality measurement efforts. Providers may admit or perform procedures on less severe patients with dubious indications in order to inflate their volumes and improve apparent performance. Although very few of these perverse responses have been proven to occur, they are important theoretical concerns that should be monitored to ensure true quality improvement.
Unclear benchmark. Some indicators have clear goals for performance. Fewer deaths is always better; fewer low birth weight infants is ideal. However, for a few indicators, the numerator may include appropriate and unavoidable occurrences. When there is a base "right rate" of the indicator, either too low a rate or too high a rate may be a quality problem. For procedure utilization and ACSC admissions, too low a rate may indicate poor access to care or underuse of appropriate care. For these indicators, the "right rate" has not been established, so comparison with national, regional, or peer group means may be the best benchmark available.
Return to Contents
Conclusions and Future Research
For use as screens for quality concerns, each of the indicators evaluated and included in this report performed adequately. In many cases, however, adequate performance required important statistical enhancements (risk adjustment, smoothing methods) beyond simply calculating average rates. These indicators, accompanied by statistical enhancements, are recommended for implementation into software modules to replace the current HCUP QI set. For users of these indicators, further investigations are likely to be necessary when an indicator flags a potential problem. That is, even if an indicator identifies "outlier" hospitals or areas with great degree of precision, the cause of systematic differences in performance may be something other than poor quality. Our report presents specific suggestions for such follow-up steps for each type of indicator; we summarize some of the general findings here.
Provider-level Volume Indicators
The HCUP QI empirical results confirm that hospital volume is an important correlate of quality of care. However, our empirical results as well as the prior studies summarized in the detailed reviews of each indicator also make clear that volume is at best a quite noisy reflection of true quality or performance differences. While hospital volume has significant explanatory power, the relationship is not precise; in practical terms, there appear to be many high-quality procedures performed by low-volume institutions, and conversely many low-quality procedures performed by high-volume institutions. Causes of the relatively weak relationship between volume and quality include the confounding role of surgeon volume (not captured presently in HCUP data), differences in the severity and complexity of cases treated, and differences in training and experience that are not reflected in volume.
Moreover, use of volume as a quality indicator may lead to undesirable hospital responses, such as performing more procedures on patients who have mild disease or who are otherwise inappropriate candidates. Thus, while volume is a useful proxy for quality, it is important to consider more direct measures of hospital performance to help determine whether a high-volume hospital provides excellent quality of care, and whether a low-volume hospital provides poor quality of care.
Provider-level Mortality Indicators
The recommended hospital mortality indicators are all associated with large systematic differences in hospital performance, that is, differences in mortality outcomes between lower- and higher-performing hospitals are often several percentage points or larger. Thus, the mortality indicators may be helpful in identifying opportunities for large improvements in outcomes. However, many of the mortality indicators require careful attention to risk adjustment, and virtually all benefit from "smoothing" methods to help remove differences in hospital performance that are due to random chance. Because unmeasured differences in patient mix and other factors besides quality of care may influence hospital mortality, these measures can benefit significantly from use in conjunction with other sources of data on hospital quality.
For example, medical chart reviews and other types of electronic clinical data collection (e.g., laboratory test results) can be used to better adjust for severity and comorbidity in comparisons across hospitals. Record reviews may also be helpful for identifying weaknesses in processes of care that are correlated with mortality.
Our empirical analysis also showed that many of the mortality indicators are significantly related to each other, suggesting that information on more general aspects of hospital quality (e.g., staffing ratios, procedures to avoid medication errors) may be useful to examine in hospitals with unusual performance. Better information on post-hospitalization morbidity can be obtained by linking hospital records longitudinally or by surveying patients, and better information on post-admission mortality can be obtained by linking death certificate data. Finally, analyses of hospital outpatient data (particularly ambulatory surgery and emergency room data) in conjunction with inpatient discharge data can help to determine whether the mortality measures reflect differences in outpatient practices.
Provider-level Utilization Indicators
The hospital utilization indicators not only show large variations across hospitals; they also show some relationships to other hospital quality indicators and thus may be helpful as "proxies" for other aspects of care. As with the HCUP mortality indicators, these indicators are generally likely to be most useful as a "screen" for further evaluations using supplemental data to determine whether utilization is truly inappropriate. However, these indicators are generally more precisely measured, so that "smoothing" methods are less critical for identifying systematic differences in hospital performance.
Additional data collection (e.g., chart review) is also less critical for some of these measures. For example, incidental appendectomy is almost always inappropriate and bilateral catheterization is usually inappropriate, though review of some of the cases performed might identify valid exceptions. For the other utilization indicators, detailed clinical guidelines on appropriate use have been developed and could be applied to determine whether hospitals that appear to have high rates are in fact treating an unusually large number of inappropriate or questionable cases.
Area-level Utilization Indicators
The area utilization indicators all demonstrate substantial differences in procedure rates across MSAs that are apparent even without sophisticated statistical methods. For all of these indicators, detailed clinical guidelines exist for judging the appropriateness of procedure use. Such guidelines can be applied to sample cases from hospitals that make large contributions to high area rates, to help identify specific opportunities for safely lowering rates. For some of the area utilization indicators, e.g., CABG rate, previous studies have shown little variation in inappropriate procedure use and significant underutilization in "necessary" cases, so any effort to lower procedure rates should be undertaken very cautiously. However, in conjunction with the other recommended CABG indicators, this indicator can help provide a relatively comprehensive picture of CABG utilization and outcomes in an area and so may be helpful for public health purposes.
Further investigation of area rate differences might also involve collecting information on patient residence, to identify and exclude patients from outside the area from the area rate calculations. Patient residence information could also be used to provide a "proxy" (based on ZIP code) for patient income and other characteristics of the area that may influence rates.
Area-level Avoidable Hospitalizations/ACSC
All of the recommended ACSC indicators also show considerable variation across areas, though for some of the indicators, smoothing methods should be used to avoid erroneous classification of outliers. Unfortunately, for many of the ACSC indicators, the available literature on causes of area rate differences is limited.
Nonetheless, some further investigations are likely to provide useful insights. The vast majority of patients hospitalized with a subset of the ACSCs are elderly (e.g., dehydration, pneumonia). For these conditions, complementary analyses of data from the Medicare program, which include longitudinal records of both inpatient and outpatient care, can provide further insights about whether high area rates are associated with less use of outpatient care. Even though HCUP data lack detail, they are much more complete in terms of providing information on Medicare beneficiaries enrolled in managed care plans (historically, managed care plans in Medicare have not reported inpatient or outpatient encounter data). Thus, Medicare and HCUP data may be complementary, especially in areas with high rates of managed care enrollment among the elderly. As with the area utilization indicators, additional information on patient residence can support analyses of the impact of "leakage" in and out of MSAs, and analyses of effects of socioeconomic and other area characteristics on rates.
In addition, information abstracted from medical records can provide evidence on whether some of the admissions might have been avoidable, and on whether hospitals and areas differ in their ability to manage some of the ACSCs effectively on an outpatient basis.
Extensive literature review and empirical evaluation identified 45 quality indicators, out of over 200 indicators inventoried, that can be used with hospital administrative data, similar to HCUP data. These 45 indicators had the best face validity and empirical performance of all evaluated indicators. The results of that evaluation are presented in this report. In addition, the indicators are available in a software package, written in SAS programming language. These quality indicators are intended as quality screens or tools to identify potential problem areas in health care quality, primarily providing an impetus for further investigation. The report discusses the proper use of these indicators, making indicator specific recommendations for further investigations. Such recommendations include analyzing indicators in context of related indicators, using additional data or chart review to identify quality problems, and further investigating sources of potential bias. For reasons fully described in the report, these indicators may not be appropriate for public accountability programs, at least without further attention to the potential limitations and sources of bias. We conclude by setting forth suggestions for future enhancements to HCUP data and recommendations for future research on quality indicators.
Return to Contents
Availability of Full Report
The full Technical Review from which this summary was derived was prepared for the Agency for Healthcare Research and Quality by the UCSF-Stanford Evidence-based Practice Center under contract No. 290-97-0013. Printed copies may be obtained free of charge from the AHRQ Publications Clearinghouse by calling 800-358-9295. Requestors should ask for Technical Review No. 4, Refinement of the HCUP Quality Indicators (AHRQ Publication No. 01-0035).
The Technical Review is also online on the National Library of Medicine Bookshelf.
Return to Contents
Current as of May 2001