This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Healthcare Cost and Utilization Project (HCUP)
Prepared by Rosanna Coffey, Ph.D., Marguerite Barrett, M.S., Bob Houchens, Ph.D.,
Ed Kelley, Ph.D., Ernest Moy, M.D., M.P.H., Roxanne Andrews, Ph.D.
January 4, 2008
This section discusses methods for applying the Agency for Healthcare Research and Quality (AHRQ) Quality Indicators (QIs) to the Healthcare Cost and Utilization Project (HCUP) hospital discharge data for several measures in the 2007 National Healthcare Disparities Report.
AHRQ Quality Indicators
The AHRQ Quality Indicators (QIs) are measures of quality associated with processes of care that occurred in an outpatient or an inpatient setting. The QIs rely solely on hospital inpatient administrative data and, for this reason, are screens for examining quality that may indicate the need for more in-depth studies. The AHRQ QIs used for this report include four sets of measures:
- Prevention Quality Indicators (PQIs)—or ambulatory-care-sensitive conditions—identify hospital admissions that evidence suggests could have been avoided, at least in part, through high-quality outpatient care.1
- Inpatient Quality Indicators (IQIs) reflect quality of care inside hospitals and include measures of utilization of procedures for which there are questions of overuse, underuse, or misuse.2
- Patient Safety Indicators (PSIs) reflect quality of care inside hospitals, by focusing on surgical complications and other iatrogenic events.3
- Pediatric Quality Indicators (PDIs) reflect quality of care inside hospitals and identify potentially avoidable hospitalizations among children.4
The QI measures selected for this report are described in Table B.1.
The Healthcare Cost and Utilization Project (HCUP) is a family of health care databases and related software tools and products developed through a Federal-State-industry partnership and sponsored by AHRQ. HCUP databases bring together the data collection efforts of State data organizations, hospital associations, private data organizations, and the Federal Government to create a national information resource of discharge-level health care data. HCUP includes the largest collection of longitudinal hospital care data in the United States, with all-payer, encounter-level information beginning in 1988. These databases enable research on a broad range of health policy issues, including cost and quality of health services, medical practice patterns, access to health care programs, and outcomes of treatments at the national, State, and local market levels.
The 2004 HCUP State Inpatient Databases (SID), a census of hospitals (with all of their discharges) from 23 participating States, were used to create a disparities analysis file designed to provide national estimates on disparities for this report. A sample of hospitals from the following States were included: Arizona, Arkansas, California, Colorado, Connecticut, Florida, Georgia, Hawaii, Kansas, Maryland, Massachusetts, Michigan, Missouri, New Hampshire, New Jersey, New York, Rhode Island, South Carolina, Tennessee, Texas, Virginia, Vermont, and Wisconsin. For the list of the HCUP data sources, go to Table B.2.
Steps for Applying the AHRQ Quality Indicators to HCUP Data
To apply the AHRQ Quality Indicators to HCUP hospital discharge data, several steps were taken:
- QI software review and modification.
- Acquisition of population-based data.
- Special methods for race/ethnicity reporting.
- Preparation of HCUP data and development of the disparities analysis file.
- Iidentification of statistical methods.
These steps, described briefly below, are presented in detail in the "Technical Specifications for HCUP Measures in the Fifth National Healthcare Quality Report and the National Healthcare Disparities Report"5 (available from AHRQ on request).
QI Software Review and Modification. For this report, we started with the following QI software versions: PQI Version 3.0, IQI Version 3.0, PSI Version 3.0, and PDI Version 3.0b. Because each of these software modules was developed for State and hospital-level rates, rather than national rates, some changes to the QI calculations were necessary.5 We also added two indicators particularly relevant to the structure of the NHQR, for patients age 65 years and over: immunization-preventable influenza and adult asthma admissions.
Acquisition of Population-Based Data. The next step was to acquire data for the numerator and denominator populations for the QIs. A QI is a measure of an event that occurs in a hospital, requiring a numerator count of the event of interest and a denominator count of the population (within the hospital or within the geographic area) to which the event relates. For the numerator counts of the AHRQ QIs, we used HCUP data selected from the SID for a disparities analysis file (described under Step 4, below).
We identified two sources of denominator counts for all reporting categories and for all adjustment categories listed in the HCUP-based tables. The HCUP data were used for national-level discharge denominator counts for QIs that related to providers. Population ZIP Code-level counts by age and gender from Claritas (a vendor that compiles and adds value to Bureau of Census data for sale) were used for denominator counts for QIs that related to geographic areas. For the area-based QIs, we also used the Claritas population data for risk adjustment by age and male-female gender.
Claritas uses intracensus methods to estimate ZIP Code-level statistics.6 ZIP Code-level counts were necessary for statistics by median income and urban-rural location of the patient's ZIP Code.
- Special Methods for Race/Ethnicity Reporting. Race and ethnicity measures can be problematic in hospital discharge databases. Many hospitals do not code race and ethnicity completely. Because race/ethnicity is a pivotal measure for the NHDR, we explored the reporting of the race/ethnicity data in the 37 States that participated in the 2004 HCUP SID. Nine States did not provide information on patient race to HCUP. Five States did not report Hispanic ethnicity. The remaining 23 States were used for the creation of the disparities analysis file. The following table demonstrates the representation by region of the 23 States.
||Number of States used for the disparities analysis file
||Number of States in the region
||Percentage of States in the region included in the disparities analysis file
The table below compares aggregated totals of various measures for the 23 States as a percentage of the national measure. In 2004, the 23 States accounted for 60% of U.S. hospital discharges (based on the American Hospital Association's Annual Survey). They accounted for about 60% of Whites and African Americans in the Nation (based on 2004 Claritas data) and about 80% of Asian/Pacific Islanders and Hispanics.
||Total of 23 HCUP States with race/ethnicity as a percentage of national total
|Total resident population
|Population by race/ethnicity:
| African American
| Asian/Pacific Islander
|Population by age:
| Population under age 18
| Population age 18-64
| Population over age 64
| Population with income under the poverty level
* Calculated using 2004 Claritas and 1990 Census race definitions.
** Calculated using Urban Institute and Kaiser Commission on Medicaid and the Uninsured estimates based on pooled March 2004 and 2005 Current Population Surveys.
Data on Hispanics are collected differently among the States and also can differ from the Census methodology of collecting information on race (White, African American, Asian, Native American) separately from ethnicity (Hispanic, non-Hispanic). States often collect Hispanic ethnicity as one of several categories that include race. Clerks use these combined race/ethnicity categories to classify patients on admission to the hospital, often by observing rather than asking the patient. The HCUP databases maintain the combined categorization of race and ethnicity. When a State and its hospitals collect Hispanic ethnicity separately from race, HCUP processing for a uniform database uses Hispanic ethnicity to override any other race category.
- Preparation of HCUP Data and Development of the Disparities Analysis File. Several HCUP data issues had to be resolved before applying the QI algorithms. First, we selected community* hospitals only from the 23 States and eliminated rehabilitation hospitals in the 2004 SID because the completeness of reporting for rehabilitation hospitals was inconsistent across States.
Second, community hospitals from these 23 States were sampled to approximate a 40% stratified sample of U.S. community hospitals. The sampling strata were defined based on five hospital characteristics: geographic region, hospital control (i.e., public, private not for profit, and proprietary), urbanized location, teaching status, and bed size. Hospitals were excluded from the sampling frame if the coding of patient race was suspect (i.e., more than 30% of the discharges in the hospital had the race reported as "other"; more than 50% of the discharges in the hospital had no information on the race of the patient; all of the discharges in the hospital had race coded as White, other, or missing; or 100% of the discharges in the hospital had race coded as White and the hospital had more than 50 beds).
Once the 40% sample was drawn, discharge-level weights were developed to produce national-level estimates when applied to the disparities analysis file. The sampling and weighting strategy used for the disparities analysis file is similar to the method used to create the HCUP Nationwide Inpatient Sample (NIS), except that the disparities analysis file samples from 23 of the 37 States included in the 2004 NIS and is a 40% sample of community hospitals rather than a 20% sample as in the NIS. The final disparities analysis file included about 14.7 million hospital discharges from more than 1,800 hospitals.
Third, for missing age, gender, ZIP Code, race/ethnicity, and payer data that occurred on a small proportion of discharge records, we used a "hot deck" imputation method (which draws donors from strata of similar hospitals and patients) to assign values while preserving the variance within the data.
Fourth, we assigned median household income, in addition to hospital urban-rural location based on ZIP Code data obtained from Claritas. The urban-rural location of the patient was already in the HCUP data.
The 2007 NHDR also includes information derived from the 2003 disparities analysis file. This data file was developed using the 2003 SID and approach described above. For more details, refer to the Methods Applying AHRQ Quality Indicators to Healthcare Cost and Utilization Project (HCUP) Data for the Fourth (2006) National Healthcare Disparities Report.7
Identification of Statistical Methods. Identification of statistical issues issues included the following: age-gender adjustment for all QIs; severity/comorbidity adjustment for the discharge-based IQIs, PSIs, and PDIs; and derivation of standard errors and appropriate hypothesis tests.
Age-Gender Adjustment. For the PQIs and area-based IQIs, PSIs, and PDIs, age-gender adjustments were made for age and gender differences across population subgroups and were based on methods of direct standardization.8 Age was categorized into 18 five-year increments. The AHRQ QI software uses a somewhat different approach to adjust the area-based QIs. We relied on direct standardization because of the additional reporting categories and population denominators required in the NHDR.
Age, Gender, Severity, and Comorbidity Adjustment. For the discharge-based PSIs, adjustments were made for age, gender, age-gender interaction, DRG cluster, and comordibity, using the regression-based standardization that is part of the AHRQ PSI software. For the discharge-based IQIs, adjustments were made for age, gender, age-gender interaction, and 3M™ all patient refined diagnosis related groups (APR-DRGs) risk of mortality or severity score using the regression-based standardization that is part of the AHRQ IQI software.
For the discharge-based PDIs, adjustments were made for age, gender, DRG and major diagnostic category (MDC) clusters, and comorbidity, using the regression-based standardization that is part of the AHRQ PDI software. Measure-specific stratification by risk group, clinical category, and procedure type was also applied.
- Standard Errors and Hypothesis Tests. Standard error calculations for the rates were based on the HCUP report titled "Calculating Nationwide Inpatient Sample (NIS) Variances."9 There is no sampling error associated with Claritas census population counts. The appropriate statistics were obtained through the Statistical Analysis System (SAS) procedure called PROC SURVEYMEANS. The threshold selected for reporting estimates in this report is a relative standard error less than 30% and at least 10 unweighted cases in the denominator. Statistical calculations are explained in the "Statistical Methodology and Calculations" section below and in "Technical Specifications for HCUP Measures in the Fifth National Healthcare Quality Report and the National Healthcare Disparities Report."5
Caveats Relating to Data Collection Differences Among States
Some caution should be used in interpreting the AHRQ QI statistics presented in this report. The caveats relate to differences among States in data collection and are discussed below.
Data Collection Differences Among States. Organizations that collect statewide data, generally collect data using the Uniform Bill (UB-92) formats and, for earlier data, the Uniform Hospital Discharge Data Set (UHDDS) format. However, not every statewide data organization collects all data elements or codes them the same way. For this report, uneven availability of a few data elements underlie some estimates, as noted below.
Data Elements Needed in Some QIs. Two data elements not available in every State that are required for certain QIs are "secondary procedure day" and "admission type" (elective, urgent, and emergency). These data elements are used to exclude specific cases from some QI measures. The PSIs that use secondary procedure day were modified to not use this information for any State. Admission type of elective and newborn are used in four PSIs. We imputed the missing admission type using available information. For all States except California, an admission type of "elective" was assigned if the DRG did not indicate trauma, delivery, or newborn. An admission type of newborn was assigned if the DRG indicated a newborn. For California, that did not provide any information on admission type; information on scheduled admissions was used to identify elective admissions and DRGs were used to identify newborn admissions.
Number of Clinical Fields. Another data collection issue relates to the number of fields that statewide data organizations permit for reporting patients' diagnoses and procedures during the hospitalization and whether they specifically require coding of external cause of injury (E codes). The SID for different States contain as few as 6 or as many as 30 fields for reporting diagnoses and procedures, as shown in Table B.3. The more fields used, the more quality-related events that can be captured in the statewide databases. However, even for States with 30 diagnosis fields available in the year 2004, 95% of their discharge records captured all of patients' diagnoses in 10 to 13 data elements. For States with 30 procedure fields available, 95% of records captured all of patients' procedures in 5 fields. Thus, limited numbers of fields available for reporting diagnoses and procedures are unlikely to have much effect on results, because all statewide data organizations participating in HCUP allow at least nine diagnoses and six procedures. We decided not to truncate artificially the diagnosis and procedure fields reported, so that the full richness of the databases would be used.
Another issue relates to external cause of injury reporting. Eight of the 27 Patient Safety Indicators use external cause of injury (E code) data to help identify complications of care or to exclude cases (e.g., poisonings, self-inflicted injury, trauma) from numerators and denominators, as shown in Table B.4. Although E codes in the AHRQ PSI software have been augmented wherever possible with the related non-E codes in the ICD-9-CM system (go to Table B.4for specific details), E codes are still included in some AHRQ PSI definitions. Uneven capture of these data has the potential to affect some PSI rates and should be kept in mind when judging the level of these events.
Race/Ethnicity Coding. Even excluding hospitals with a large proportion of missing race/ethnicity coding, differences among States may remain in race/ethnicity coding that affect estimates. For example, some States include Hispanic ethnicity as a category among racial categories, and some ask about Hispanic ethnicity separately from race. At the hospital level, policies vary on methods for collecting such data. Some hospitals ask patients to identify their race and ethnicity; some determine it from observation. The effect of these and other unmeasured differences in coding of race and ethnicity across the States and hospitals cannot be assessed.
Statistical Methodology and Calculations
This section explains the statistical methods and gives formulas for the calculations of standard errors and hypothesis tests. These statistics are derived from the disparities analysis file created from the HCUP SID and Claritas (a vendor that compiles and adds value to Bureau of Census data). For disparities analysis file estimates, the standard errors are calculated as described in the HCUP report titled "Calculating Nationwide Inpatient Sample (NIS) Variances."7 We will refer to this report simply as the NIS Variance Report throughout this section. This method takes into account the cluster and stratification aspects of the disparities analysis file sample design when calculating these statistics using the SAS procedure PROC SURVEYMEANS. For Claritas population counts, there is no sampling error.
Even though the disparities analysis file contains discharges from a finite sample of hospitals, we treat the sample as though it was drawn from an infinite population. We do not employ finite population correction factors in estimating standard errors. We take this approach because we view the outcomes as a result of myriad processes that go into treatment decisions rather than being the result of specific, fixed processes generating outcomes for a specific population and a specific year. We consider the disparities analysis file to be a sample from a "super-population" for purposes of variance estimation. Further, we assume the counts (of QI events) to be binomial.
Section 1. Area Population QIs Using Claritas Population Data
- Standard error estimates for discharge rates per 100,000 population using the 2004 Claritas population data.
The observed rate was calculated as follows:
wi and xi, respectively, are the discharge weight and variable of interest for patient i in the disparities analysis file. To obtain the estimate of S and its standard error, SES, we followed instructions in the NIS Variance Report.
The population count in the denominator is a constant. Consequently, the standard error of the rate R was calculated as:
- Standard error estimates for age/sex adjusted inpatient rates per 100,000 population using the 2004 Claritas data.
We adjusted rates for age and sex using the method of direct standardization.7 We estimated the observed rates for each of 36 age/sex categories. We then calculated a weighted average of those 36 rates using weights proportional to the percentage of a standard population in each cell. Therefore, the adjusted rate represents the rate that would be expected for the observed study population if it had the same age and sex distribution as the standard population.
For the standard population, we used the age and sex distribution of the United States as a whole according to the year 2000. In theory, differences among adjusted rates were not attributable to differences in the age and sex distributions among the comparison groups because the rates were all calculated with a common age and sex distribution.
The adjusted rate was calculated as follows (and subsequently multiplied by 100,000):
g = Index for the 36 age/sex cells.
Ng,std = Standard population for cell g (year 2000 total U.S. population in cell g).
Ng,obs = Observed population for cell g (year 2001 subpopulation in cell g; e.g., Medicare insureds, age greater than 65).
n(g) = Number in the sample for cell g.
xg,i = Observed quality indicator for observation i in cell g (e.g., 0 or 1 indicator).
wg,i = Disparities analysis file discharge weight for observation i in cell g.
The estimates for the numerator, S*, and its standard error, SES*, were calculated in similar fashion to the unadjusted estimates for the numerator S in formula A.1. The only difference was that the weight for patient i in cell g was redefined to account for the weighting for direct standardization and the discharge weight as:
Following instructions in the NIS Variance Report, we used PROC SURVEYMEANS to obtain the estimate of S* (A.3), the weighted sum in the numerator using the revised weights (A.4), and the estimate SES*, the standard error of the weighted sum S*. The denominator of the rate is a constant. Therefore, the standard error of the adjusted rate, A, was calculated as
Section 2. Provider-Based QIs Using Weighted Discharge Data (Disparities Analysis File)
- Standard error estimates for inpatient rates per 1,000 discharges using discharge counts in both the numerator and the denominator.
We calculated the observed rate as follows:
Following instructions in the HCUP NIS Variance Report, we used PROC SURVEYMEANS to obtain estimates of the discharge weighted mean, S/N, and the standard error of that weighted mean, SES/N. We multiplied this standard error by 1,000.
- Standard error estimates for age/sex adjusted inpatient rates per 1,000 discharges using inpatient counts in both the numerator and the denominator.
We used the 2000 Nationwide Inpatient Sample estimates for the standard inpatient population age-sex distribution. For each of the 36 age-sex categories, we estimated the number of U.S. inpatient discharges, , in category g. We calculated the directly adjusted rate:
g = Index for the 36 age/sex cells.
= Standard inpatient population for cell g (estimate of year 2000 total U.S. inpatient population for cell g).
n(g) = Number in the sample for cell g.
xg,i = Observed quality indicator for observation i in cell g.
wg,i = Disparities analysis file discharge weight for observation i in cell g.
Note that is the proportion of the standard inpatient population in cell g. Consequently, the adjusted rate is a weighted average of the cell-specific rates with cell weights equal to . These cell weights are merely a convenient, reasonable standard inpatient population distribution for the direct standardization. Therefore, we treat these cell weights as constants in the variance calculations:
The variance of the ratio enclosed in parentheses was estimated separately for each cell g by squaring the SE calculated using the method of Section 2.a:
Following instructions in the HCUP NIS Variance Report, we used PROC SURVEYMEANS to obtain estimates of the discharge- and standardization-weighted means, Rg, and their standard errors.
Section 3. Significance Tests
Let R1 and R2 be either observed or adjusted rates calculated for comparison groups 1 and 2, respectively. Let SE1 and SE2 be the corresponding standard errors for the two rates. We calculated the test statistic and (two-sided) p-value:
where Z is a standard normal variate.
Note: the following functions calculate p in SAS and EXCEL:
SAS: p = 2 * (1 - PROBNORM(ABS(t)));
EXCEL: = 2*(1- NORMDIST(ABS(t),0,1,TRUE))
1. Agency for Healthcare Research and Quality. AHRQ Quality Indicators—Guide to Prevention Quality Indicators: Hospital Admission for Ambulatory Care Sensitive Conditions, AHRQ Pub. No. 02-R0203, Revision 3. Rockville, MD: Agency for Healthcare Research and Quality, 2004.
2. Agency for Healthcare Research and Quality. AHRQ Quality Indicators—Guide to Inpatient Quality Indicators: Quality of Care in Hospitals—Volume, Mortality, and Utilization, AHRQ Pub. No. 02-R0204, Revision 3. Rockville, MD: Agency for Healthcare Research and Quality, 2004.
3. Agency for Healthcare Research and Quality. AHRQ Quality Indicators—Guide to Patient Safety Indicators, AHRQ Pub. No. 03-R203, Revision 2. Rockville, MD: Agency for Healthcare Research and Quality, 2004.
4. Agency for Healthcare Research and Quality. AHRQ Quality Indicators—Guide to Pediatric Quality Indicators, Version 3.0b. Rockville, MD: Agency for Healthcare Research and Quality, 2006.
5. Barrett ML, Houchens R, Coffey RM, Moy E, Andrews R, Kelley E. Technical Specifications for HCUP Measures in the Fifth National Healthcare Quality Report and the National Healthcare Disparities Report. Washington, DC: The Medstat Group, Inc., 2007.
6. Claritas, Inc. The Claritas Demographic Update Methodology, July 2003.
7. Coffey RM, Barrett ML, Houchens R, Moy E, Andrews R. Methods Applying AHRQ Quality Indicators to Healthcare Cost and Utilization Project (HCUP) Data for the Fourth National Healthcare Disparities Report. 2006. HCUP Methods Series Report #2006-08 Online. October 30, 2006. U.S. Agency for Healthcare Research and Quality. Available at: http://www.hcup-us.ahrq.gov/reports/methods.jsp.
8. Fleiss JL. Statistical Methods for Rates and Proportions. New York: Wiley, 1973.
9. Houchens R, Elixhauser A. Final Report on Calculating Nationwide Inpatient Sample (NIS) Variances, 2001. HCUP Methods Series Report #2003-2. ONLINE. June 2005 (revised June 6, 2005). U.S. Agency for Healthcare Research and Quality. Available at: http://www.hcup-us.ahrq.gov/reports/methods.jsp.
* Community hospitals are defined by the AHA as "non-Federal, short-term, general, and other specialty hospitals, excluding hospital units of institutions." Specialty hospitals included among community hospitals are obstetrics-gynecology, ear-nose-throat, short-term rehabilitation, orthopedic, and pediatric institutions. Also included are public hospitals and academic medical centers. Excluded are short-term rehabilitation hospitals, long-term hospitals, psychiatric hospitals, and alcoholism/chemical dependency treatment facilities.
Return to Appendix B: Detailed Methods