Technical Review, Number 4
This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Under its Evidence-based Practice Program, the Agency for Healthcare Research and Quality (AHRQ) is developing scientific information for other agencies and organizations on which to base clinical guidelines, performance measures, and other quality improvement tools. Contractor institutions review all relevant scientific literature on assigned clinical care topics and produce evidence reports and technology assessments, conduct research on methodologies and the effectiveness of their implementation, and participate in technical assistance activities.
Introduction / Reporting the Evidence / Methodology / Results / Conclusions and Future Research / Availability of Full Report
Healthcare quality has received heightened attention over the last decade, leading to a growing demand by providers, payers, policymakers, and patients for information on quality of care to help guide their decisions and efforts to improve health care delivery. At the same time, progress in electronic data collection and storage has enhanced opportunities to provide data related to health care quality. In 1989, the Agency for Health Care Policy and Research (AHCPR, now the Agency for Healthcare Research and Quality, AHRQ) initiated the Healthcare Cost and Utilization Project (HCUP). HCUP is an ongoing federal-state-private collaboration to build uniform databases from administrative hospital-based data collected by state data organizations and hospital associations.
The first products of the collaboration were:
- Creation of a comprehensive dataset of inpatient administrative records called the HCUP Nationwide Inpatient Sample (NIS).
- Development of a set of healthcare quality indicators (QIs).
The HCUP quality indicator set, developed in 1994, and hereafter referred to as HCUP I, consists of 33 measures, constructed using administrative data available in the NIS. Included in the set are indicators of utilization of procedures, ambulatory care sensitive condition admissions, post-operative and other complications, and mortality. Many measurement systems rely on extensive and expensive data collection, causing financial burdens on health care organizations and making ongoing and comprehensive monitoring of quality of care less likely. The HCUP indicators were developed as a low-cost, ongoing quality measurement mechanism for states able to develop standardized hospital discharge data. Due to the limitations of such administrative data, the indicators were intended for use as a screening tool rather than an absolute measurement of quality problems. Primarily, these indicators were based on measures described in the literature at the time of development. Further, the indicators were defined to be empirically simple; broad "denominator" populations were used in lieu of complicated risk adjustment systems.
Since the original HCUP QI development work in 1994, numerous managed care organizations, state Medicaid agencies and hospital associations, quality improvement organizations, the Joint Commission for the Accreditation of Healthcare Organizations (JCAHO), the National Committee on Quality Assurance (NCQA), academic researchers and others have contributed substantially to the knowledge base of hospital quality indicators. Based on input from current users and advances to the scientific base for specific indicators, AHRQ decided to fund a research project to refine and further develop the HCUP QIs.
As a result, AHRQ charged the UCSF-Stanford Evidence-based Practice Center (EPC) to revisit the initial 33 indicator set (HCUP I QIs), evaluate their effectiveness as indicators, identify potential new indicators, and ultimately propose a revised set of indicators. This report documents the evidence project to develop recommendations for improvements to the HCUP I indicators.
In evaluating potential quality indicators, we applied the Institute of Medicine's widely cited definition of quality of care as "the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge." We further focused on the clinical domains of potential underuse, overuse, and misuse, and excluded potential indicators based on patient satisfaction, health professional satisfaction, or cost containment. Only indicators ascertainable from current HCUP data were eligible for detailed review and empirical analysis. This report also excludes indicators relating to potential complications of care, because this set will be included in a separate evidence report covering patient safety indicators.
Three primary goals were established to accomplish this task:
- Identify indicators in use and potential indicators.
- Evaluate existing HCUP indicators and potential indicators using both literature review and empirical analyses of indicator performance.
- Examine the need for risk adjustment of recommended indicators.
The team designed a series of investigations to accomplish these goals. These included telephone interviews of a small, purposeful sample of individuals knowledgeable about quality measurement, two phases of extensive literature reviews, and a series of empirical analyses using the State Inpatient Data (SID) data sets from 5 states. The in-depth review, supplemented by extensive empirical evaluation, focused on information that would be useful for implementing a revised set of HCUP quality indicators.
Return to Contents
Reporting the Evidence
The approach to identification and evaluation of QIs presented in this report serves as the basis for development of the revised HCUP QIs, hereafter referred to as HCUP II. The primary goal of the report is to document the evidence, both from the literature and from empirical analysis, on quality indicators suitable for use based on hospital discharge abstract data. By identifying and evaluating potential indicators, the report may serve as a springboard for commentary on proposed recommendations for specific improvements to the HCUP I QIs.
Six specific key questions were formulated to guide the research process:
- What indicators are currently in use or described in the literature that could be defined using HCUP discharge data?
- What are the quality relationships reported in the literature that could be used to define new indicators using HCUP discharge data?
- What evidence exists for indicators in AHRQ's designated expansion areas—pediatric conditions, chronic disease, new technologies, and ambulatory care sensitive conditions?
- Of the existing HCUP I and potential indicators, which ones have literature-based evidence to support face validity, precision of measurement, minimum bias, and construct validity of the indicator?
- What risk-adjustment method should be supported, given the limits of administrative data and other practical concerns, for use in conjunction with the recommended indicators?
- Of the existing HCUP I and potential indicators, which ones perform well on empirical tests of precision of measurement, minimum bias, and construct validity?
The results of this project are:
- This evidence report, that summarizes all analyses and evaluations.
- Software that can be used with hospital discharge data such as HCUP data (written in SAS™ programming language).
Return to Contents
The project team interviewed a purposeful sample of 31 quality measurement stakeholders and experts affiliated with hospital associations, business coalitions, State data groups, Federal agencies, and academia. These individuals, most of whom were either current or prospective users of HCUP QIs, provided the project team with background information regarding quality indicator use, suggested new indicators and risk adjustment methods, and helped frame our evaluation of potential indicators. (Interview methods are described in detail in Section 2.A. of the full report.)
Development of Evaluation Framework
Based on the interviews and a review of the relevant literature, the project team developed an evaluation framework of ideal standards by which to judge quality indicator performance:
- Face validity. An adequate quality indicator must have sound clinical and or empirical rationale for its use. It should measure an important aspect of quality that is subject to provider or health care system control.
- Precision. An adequate quality indicator should have relatively large variation among providers that is not due to random variation or patient characteristics.
- Minimum bias. The indicator should not be affected by systematic differences in patient case-mix, including disease severity and comorbidity. In cases where such systematic differences exist, an adequate risk adjustment system should be available based on HCUP discharge data.
- Construct validity. The indicator should be supported by evidence of a relationship to quality, and should be related to other indicators intended to measure the same or related aspects of quality.
- Fosters Real Quality Improvement. The indicator should not create incentives or rewards for providers to improve measured performance without truly improving quality of care.
- Application. The indicator should have been used effectively in the past, and/or have high potential for working well with other indicators currently in use.
In applying these criteria, the research team also considered the completeness of the evidence: obviously, it was more difficult to reach conclusions about each of these topics for indicators that had not been evaluated much in previous research. (More detail regarding the evaluation framework is available in Section 2.B. of the full report.)
The literature review was completed in two phases:
- The first phase was designed to identify potential indicators. Quality indicators could be applicable to comparisons among providers of health care (e.g., hospitals, health systems) or among geographic areas (e.g., metropolitan service areas, counties), and should be applicable to a majority of providers or areas (i.e. not highly specialized care such as burn units).
- The second phase included a detailed review of the evidence on each indicator identified in Phase 1 using the criteria described in our evaluation framework. (Figure 1S diagrams the literature review process. Literature methods are described in detail in Section 2.C. of the full report).
Phase 1. To identify potential indicators, we performed a structured review of the literature. Using Medline, we identified the search strategy that returned a test set of known applicable articles in the most concise manner. The final Medical Subject Headings (MeSH) terms used were "hospital, statistic and methods" and "quality indicators." This search resulted in over 2000 articles published during or since 1994. These articles were screened for relevancy to this project according to specified criteria. The yield from the search and screen was 181 relevant articles.
Information from these articles was abstracted in two stages by clinicians, health services researchers and other team members. The first stage, preliminary abstraction, involved evaluation of each of the 181 identified articles for the presence of a defined quality indicator, potential quality indicators, and obvious strengths and weaknesses. To qualify for full abstraction (stage 2 of phase 1), the articles must have explicitly defined and evaluated a novel quality indicator. Similar to previous attempts to cull new indicators from the peer reviewed literature, few articles (27) met this criterion. Information on the definition of the quality indicator, validation and rationale were collected during full abstraction.
Additional potential indicators were identified using the CONQUEST (COmputerized Needs-oriented QUality Measurement Evaluation SysTem) database, a list of ORYX™ approved indicators provided by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), Healthy People 2010 reports, and from the interviews and known web sites.
Phase 2. The inventory assembled from Phase 1 consisted of over 200 potential indicators. Initially, team members evaluated the clinical rationale of each indicator, and selected the most promising indicators based on a preliminary evaluation according to certain criteria, including minimum frequency of the event and sound clinical rationale. HCUP I indicators were not evaluated in this stage; they were automatically selected for the next step of evaluation.
Second, indicators passing the initial screen (including the HCUP I indicators) were evaluated according to basic empirical tests of precision, including significant variation across providers, as described below.
Third, a full literature review was conducted for those indicators with adequate performance on empirical precision tests. Medline was searched for articles relating to each of the six areas of evaluation, described in the evaluation framework. Clinicians, health services researchers and other team members searched the literature for evidence, and prepared a referenced summary description of the evidence from the literature on each indicator. Each of these indicators also underwent a full empirical evaluation.
Risk adjustment review and selection
The literature regarding risk adjustment systems was reviewed. Alternative adjustment approaches for each indicator described in the literature were examined according to type of indicator (mortality, utilization, volume, ambulatory care sensitive condition) and analytic approach, method of development, feasibility of implementation given data availability, and empirical measures of discrimination and calibration. The evidence from the literature and information collected in the interviews with potential HCUP users were used to identify a practical method for risk adjustment of HCUP indicators.
Few risk adjustment systems could be feasibly implemented, given the lack of ambulatory, clinical, and longitudinal patient information in the current HCUP database. Diagnosis Related Group (DRG) systems fit more of the user preference-based criteria than other alternatives. In particular, a majority of users interviewed already used All Patients Refined (APR)-DRGs, and APR-DRGs have been reported to perform equivalently or better in predicting resource use and death for most indicators, when compared to other DRG based systems. Where feasible, the APR-DRG system was used to determine the effect of risk adjustment on the measured performance of providers on each reviewed indicator. (Risk adjustment methods are described in detail in Section 2.D. of the main report.)
Extensive empirical testing of all potential indicators was conducted (See Tables 1-3 in the full report for a summary of empirical tests). In this overview, we provide a summary of the data sets used, and the specific tests for each of the evaluation criteria that were assessed empirically: precision, bias, and construct validity.
Data set. The primary data sets used were the HCUP Nationwide Inpatient Sample and the State Inpatient Database for 1995-1997. The annual NIS consists of about 6,000,000 discharges and over 900 hospitals from participating states. The SID contains all discharges for the included states. Most of the statistical tests used to compare candidate indicators were calculated using the SID, because the provider level results were similar to the NIS, and the SID includes all discharges for the calculation of area rates.
Precision. The first step in the analysis involved precision tests to determine the reliability of the indicator for distinguishing real differences in provider performance. Any quality indicator consists of both signal ('true' quality, that is what is intended to be measured) and noise (error in measurement due to sampling variation or other non-persistent factors). For indicators that may be used for quality improvement or other purposes, it is important to know with what precision, or surety, a measure can be attributed to an actual construct rather than random variation. For some indicators, the precision will be quite high for the raw measure. For other indicators, the precision will be rather low.
However, it is possible to apply additional statistical techniques to improve the precision of these indicators. These techniques are called signal extraction, and are designed to "clean" or "smooth" the data of noise, and extract the actual signal associated with provider or area performance. We used two techniques for signal extraction to potentially improve the precision of an indicator. Detailed methods are contained in the methods section of the main report (Section 2.C). First, univariate methods estimated the "true" quality signal of an indicator based on information from the specific indicator and one year of data. Second, new multivariate signal extraction (MSX) methods estimated the signal based on information from a set of indicators and multiple years of data. In most cases, MSX methods extract additional signal.
Bias. To provide empirical evidence on the sensitivity of candidate QIs to potential bias from differences in patient severity, we compared unadjusted performance measures for specific hospitals with performance measures that were adjusted for age, gender, and, where possible, patient clinical factors available in discharge data. We used the 3M APR-DRG System Version 12 with Severity of Illness and Risk of Mortality subclasses, as appropriate, for risk adjustment of the hospital quality indicators. For a few measures, no APR-DRG severity categories were available, so that unadjusted measures were compared to age-sex adjusted measures. Because HCUP data do not permit the construction of area measures of differences in risk, only age-sex adjustment is generally feasible for area-level indicators. We used a range of bias performance measures, most of which have been applied in previous studies. We note that these comparisons are based entirely on discharge data.
In general, we expect performance measures that are more sensitive to risk adjustment using discharge data also to be more sensitive to risk adjustment using more complete clinical data, though the differences between the adjusted and unadjusted measures may be larger in absolute magnitude than the discharge data analysis would suggest. However, there may not be a correlation between discharge and clinical-record adjustment. Specific cases where previous studies suggest a greater need for clinical risk adjustment are discussed in our literature reviews of relevant indicators.
To investigate the degree of bias in a measure, we performed five empirical tests (Spearman rank correlation, percentage remaining in extreme deciles, absolute change, percentage changing more than 2 deciles). Each test was repeated for the "raw" data, for data smoothed by univariate techniques (one year of data, one indicator), and for data smoothed by multivariate (MSX) techniques (using multiple years of data, all indicators).
Construct validity. Two measures of the same construct would be expected to yield similar results. If quality indicators do indeed measure quality, at least in a specified domain, such as ambulatory care, one would expect measures to be related. As quality relationships are likely to be complex, and outcomes of medical care are not entirely explained by quality, perfect relationships between indicators seem unlikely. We performed analyses to assess the potential relationships between indicators.
To measure the degree of relatedness between indicators, we conducted a factor analysis, a statistical technique used to reveal underlying patterns among large numbers of variables. The output for a factor analysis is a series of "factors" or overarching constructs, for which each indicator would "load" or have a relationship with others in the same factor. The assumption is that indicators loading strongly on the same factor are related to each other via some independent construct. We used an orthogonal rotation to maximize the possibility that each indicator would load on one factor only, to ease the interpretation of the results. In addition to the factor analysis, we also analyzed correlation matrices for each type of indicator (provider level, ambulatory care sensitive condition (ACSC) area level, and utilization area level).
The construct validity analyses provided information regarding the relatedness or independence of the indicators. Such analyses cannot prove that quality relationships exist, but they can provide preliminary evidence on whether the indicators appear to provide consistent evidence related to quality of care. For hospital volume quality indicators, we evaluated correlations with other volume and hospital mortality indicators, to determine whether the proposed HCUP II indicators suggested the same types of volume-outcome relationships as have been demonstrated in the literature.
Results of empirical evaluations. Statistical test results for candidate indicators were compared. First, the results from precision tests were used to sort the indicators. Those indicators performing poorly were eliminated. Second, the results from bias tests were conducted to determine the need for risk adjustment. Finally, construct validity was determined to provide some evidence on the nature of the relationship between potential indicators.
Return to Contents
Proceed to Next Section