Skip Navigation Archive: U.S. Department of Health and Human Services U.S. Department of Health and Human Services
Archive: Agency for Healthcare Research Quality
Archival print banner

This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: Let us know the nature of the problem, the Web address of what you want, and your contact information.

Please go to for current information.

CareScience Risk Assessment Model - Hospital Performance Measurement

Presentations from a November 2008 meeting to discuss issues related to mortality measures.

By E. A. Kroch and M. Duan


I. Introduction
  1.2 CareScience Quality Measures
    1.2.2 Complications
    1.2.3 Morbidity
  1.3 CareScience Efficiency Measures
    1.3.1 Length of Stay
    1.3.2 Charges
    1.3.3 Costs
  1.4 Outcome Evaluation and Risk Adjustment
  1.5 CareScience Risk Assessment Method
II. Risk-Assessment Model Specification
  2.1 Functional Form
  2.2 Dependent Variables
    2.2.1 Mortality
    2.2.2 Complications
    2.2.3 Morbidity
    2.2.4 Length of Stay
    2.2.5 Total Charges
    2.2.6 Comparative Costs
  2.3 Independent Variables
  2.4 Semi-log Model
III. Model Calibration
  3.1 Data Source
  3.2 Missing Outcomes and Independent Variables
  3.3 SAS Programming
    3.3.1 Data Transforming
    3.3.2 Model Selection
    3.3.3 Macro Function
  3.4 Beta Tables
    3.4.1 Regression Information Table
    3.4.2 Beta Table
    3.4.3 Covariance Table
  3.5 Model Implementation Test
    3.5.1 Technical Test
    3.5.2 Clinical Validation
IV. Clinical Knowledge Base
  4.1 Comorbidity-Adjusted Complication Indices (CACI)
  4.2 Diagnosis Morbidity
  4.3 Chronic Diseases and Disease History
  4.4 Valid Procedure
  4.5 Other Clinical Elements and Considerations
    4.5.1 Relative Value Unit
    4.5.2 "Do Not Resuscitate" Orders
V. Statistical Significance
  5.1 Claim-level Computation
  5.2 Aggregation
  5.3 Environmental Description
VI. Select Practice
  6.1 Setting
  6.2 Methodological Details
  6.3 Scaling Factors
  6.4 Other Implementation of Select Practice Method

Appendix A—Calculating Costs
Appendix B—Semilog Modeling
Appendix C—Select Practice Formulas
Appendix D—Technical Details about Model Specification
Appendix E—Technical Details about SAS Programming
Appendix F—New Methods on Horizon

I. Introduction

1.1 Quality of Care

The ongoing debate over how to measure inpatient quality of care has, from time to time, focused on different aspects of Avedis Donabeidan's 1965 "structure-process-outcome" triad.1 The focus on observable outcomes has shifted conceptualizations of hospital quality towards the ideas of Joe Juran and others2 who define manufacturing quality as the absence of defects. By this definition, deaths, complications, unusually long hospital stays, unscheduled ICU admissions and other sentinel events that are deemed universally negative (or nearly so) signal the extent to which a care provider deviates from "good quality." Despite disagreements over the best approach to measure inpatient quality, most investigators accept that treatment quality is the absence of adverse events.

1.2 CareScience Quality Measures

CareScience measures three adverse outcomes to capture quality of care: mortality, complications, and morbidity. Each of these measures possesses its own strengths and weaknesses and is better suited to certain patient populations and applications. Together they provide a complementary and effective means for screening quality improvement opportunities.

1.2.1 Mortality

Mortality is perhaps the most widely used quality measure, since the occurrence of death seems unambiguously a defect of care. Inpatient mortality rates are also easily observed by simply counting deaths from discharges. While mortality has advantages as a quality of care indicator, it possesses drawbacks. The approach of simply counting deaths from discharges can inadvertently mask "true" mortality rates, which may be disguised by discharge policies. For instance, inpatient mortality rates can be reduced by transferring the most severely afflicted patients to another acute care facility, skilled nursing home, or hospice. Mortality rate is also prone to wide variation across diseases, rendering them irrelevant for certain populations for quality analysis. In populations where death is very rare (e.g. kidney and ureter calculus) or largely expected (e.g. admitted with DNR), mortality becomes a less meaningful quality measure.

1.2.2 Complications

Complications are a relevant quality measure for most patient populations. They range from trivial to significant and can result in increased lengths of stay or unscheduled treatments. The challenge with measuring complications is the difficulty observing them and their dependence on good documentation and coding consistency. Traditionally, complications have been tracked using chart reviews during which clinicians pull and review individual patient charts. These time-consuming reviews are expensive and laborious and consequently unsuitable for large scale data analysis. CareScience has developed a unique decision-theoretic complication tracking model that uses comorbidity adjusted complication indices (CACI) to distinguish complications from comorbidities. This model assumes a nonstandard definition of complications, defining them as conditions that arise during a patient's hospital stay. By this construction, complications do not necessarily imply iatrogenic events or physician negligence.

Validation studies have shown that the comorbidity-adjusted risk (CACR) model yields similar results to chart reviews at the aggregate level, particularly for surgically treated patients.3

1.2.3 Morbidity

Morbidity is defined as the severity of a patient's complications. Within the CareScience model, morbidity is divided into 5 severity levels, A-E, which follow a Likert scale. Complications in category D and E are considered 'morbid' complications, and they are separately measured under the label of 'morbidity'. They often result in temporary impairment, unscheduled ICU admission, and significant increase in length of stay.

Return to Article Contents

1.3 CareScience Efficiency Measures

In addition to quality of care, a hospital's success depends on its financial performance and its ability to manage patients' lengths of stay, costs, and charges efficiently. These economic concerns are particularly relevant in today's climate of rising healthcare costs that routinely exceed the Consumer Price Index (CPI) by several folds. CareScience tracks and examines three efficiency outcomes: length of stay, costs, and charges.

1.3.1 Length of Stay

Length of stay is a commonly used proxy for resource usage, reflecting how efficiently a hospital allocates resources. It is easy to observe and compare across hospitals and offers the advantage of being reliably recorded. Despite these desirable attributes, length of stay is prone to varying hospital discharge policies that can bias it as an outcome measure. Hospitals that regularly transfer patients to affiliated long-term care facilities often have reduced lengths of stay. As a result, the relationship between efficiency and length of stay can be soured by the possibility of efficiency being achieved at the cost of sufficient treatment. Nevertheless, length of stay is widely accepted as a proxy for efficiency.

1.3.2 Charges

Patient charges can serve as a supplementary efficiency measure to length of stay. Charge information can be found in electronic record databases built for billing. Although charge data are readily available, charge practices are generally not comparable across facilities or even departments and therefore require adjustments.

1.3.3 Costs

In measuring hospital performance, charges have become a useful proxy for the "costliness" of care. Unadorned charges, however, are a poor indicator of hospital expenditures, especially at the patient level. Nonetheless, by applying a well defined and hospital-specific Cost-to-Charge-Ratio (CCR), hospitals' reported charges can be adjusted to a dollar amount that more closely approximates the "true" cost of care. These adjusted costs can be used to compare hospitals that do not share the same accounting standards. Appendix A provides a detailed discussion of calculating costs.

Return to Article Contents

1.4 Outcome Evaluation and Risk Adjustment

Inpatient care can be viewed as a process in which the patient's characteristics upon admission (e.g. comorbidities, etc) are the inputs and his health status and financial outcome upon discharge are the outputs. Patient health and financial outcomes are influenced not only by the care process but also by the severity of the patient upon entering the hospital; sicker patients are at higher risk for worse outcomes than patients who are less severely afflicted upon hospital admission. Risk adjustment strives to account for these differences in evaluations of care.

Evaluation of patient outcomes4 requires benchmarking, wherein a hospital's (or physician's) outcome rates are compared to their expected rates (outcome risks) as suggested by their case-mix. Expected outcome rates for any facility or grouping of patients are based on the characteristics of those patients and a model of the relationship between patient characteristics and outcomes.

"True" patient severity upon admission cannot be directly measured. Instead, it must be inferred from the available data's recorded patient characteristics. Administrative billing data, compiled after treatments are completed, are the data most readily available for these kinds of large scale analyses. These data, optimized for reimbursement purposes and often only secondarily for quality analysis, are prone to inconsistencies and variations in coding that can distort "true" severity. Incomplete coding such as omission of secondary diagnosis codes can erroneously make patients appear healthier. Conversely, overly aggressive coding can give an inflated impression of severity and complications. Often the "real" picture upon admission can not be easily crafted from the records.

Patient health status at the output end of the care process is not unambiguous. As previously described, discharge policy can affect mortality rate and length of stay. Moreover, discharge codes do not fully reflect patients' health condition. Being discharged home does not necessarily equate to fully recovery of the patient. Some patients may be re-admitted shortly. CareScience clients' data show that about 10% of patients are re-admitted to hospital within 30 days after they have been discharged. Readmission information is unavailable in public data sets.

Due to imperfect information, risk adjustment is subject to limitations. Nevertheless, it remains a widely accepted approach for measuring hospital performance by controlling for patient characteristics and allowing benchmarking to compare apples to apples.

Return to Article Contents

1.5 CareScience Risk Assessment Method

The CareScience risk assessment model is estimated statistically by regression analysis of a defined population of hospital discharges. The population can be restricted to a single hospital over a quarter or can encompass a broad range of hospitals across the country over several years. The more encompassing the population, the broader is the basis for comparison. Basing the benchmark as broadly as possible permits comparisons of hospitals and their physicians across all possible markets and locations. On the other hand, benchmarking on the experiences of a small region restricts comparisons to the hospitals and physicians of that region alone.

CareScience calibrates its model of hospitalization outcomes and related performance measures on the maximum amount of discharges from both private and public sources. The aim is to construct a set of parameter estimates (coefficients or beta values) for each of six measures that can be used to predict outcome rates for any set of patients in CareScience products. The six outcomes include in-hospital mortality, major morbidity, complications, length-of-stay, charges, and costs. Predicted rates for these outcomes can be compared to the actual rates to evaluate performance for any set of patients based on their case-mix within a particular facility, service line, diagnosis, age grouping, or treating physician grouping.

This approach has a number of advantages over the alternative method of calibrating the risk assessment on the basis of the individual hospital or hospital system. First, a universally calibrated model makes it possible to generate predicted outcomes (risks) and other relevant statistics without rerunning regressions for every new set of patients. That reduces processing time of new data and even potentially allows real-time processing of discharges Second, outcome risks that are generated by a set of universal beta values can be used to compare patients within different facilities and physician groupings regardless of where they practice. Third, any set of discharges can be analyzed, no matter how small, because the model parameters themselves do not need to be estimated from the analysis data set.

Return to Article Contents

II. Risk-Assessment Model Specification

2.1 Functional Form

The purpose of the model is to generate expected or "standard" outcomes (risks) under typical care, based on a patient's characteristics and socioeconomic factors. Patient-level risks for a variety of target outcomes are assessed via a stratified multiple regression model. The model has the following functional form:

Y sub i j k l = x sub i j k l times beta sub k l + epsilon sub i j k l, for all i j k l

where yijkl is the value for each outcome l at patient level i and provider j and principal diagnosis k. xijkl is a vector of patient characteristics and socioeconomic factors. ßkl is the marginal effect of the independent variables on the outcome measure, and εijkl is the random error component of the model. The strata (k) are roughly based on 3-digit level ICD-9-CM diagnosis codes. Rare and insignificant diagnoses are rolled up into broad diagnosis groups (BDGs), which are defined in the ICD-9-CM book. There are a total of 142 disease strata and over 800 equations in the model. Details about the stratification can be found in Appendix D (Technical Details about Model Specification).

Return to Article Contents

2.2 Dependent Variables

The following outcome measures are modeled separately with their own set of specifications:

Length of Stay
Total Charges
Comparative Costs

2.2.1 Mortality

At the patient level, mortality is captured by discharge disposition. Category '20' is designated for patients who expired. The ccms_exp_flag field in the CareScience database indicates the mortality status (expired/alive) of patients based on their discharge disposition. Patients who were transferred to another acute care facility (with discharge disposition code '02') have an indeterminate mortality value and are consequently excluded from mortality analyses. The mortality risk for these patients is therefore set to 'null.' Exemptions about Mortality outcomes are included in Appendix D (Technical Details about Model Specification).

2.2.2 Complications

Complication is defined as the probability of having at least one complication. It is calculated as

1 minus the product of all terms 1 minus p sub i j for j = 1 through m

where m is the number of secondary diagnosis; and pij is the probability of complication for the jth secondary diagnosis given principle diagnosis i. The probability that any given secondary diagnosis is a complication of a given principle diagnosis is determined ex ante by clinical experts. The method is called Comorbidity-Adjusted Complication Indices (CACI), which is elaborated in the section of Clinical Knowledge Base. The algorithm of calculating complications is described in Appendix D (Technical Details about Model Specification).

2.2.3 Morbidity

Morbidity is defined as the probability of having at least one morbid complication. It is calculated in a similar manner as complications, however, only secondary diagnoses rated 'D' or 'E' are included in the calculation. Consequently, it has a smaller value than that of complications. The same rules that govern calculating complications are applied in calculating morbidity.

2.2.4 Length of Stay

Length of Stay (LOS) is defined as the number of full days a patient stays in the hospital. It is calculated as the difference between discharge date and admission date. The shortest valid LOS is one day. If a patient is admitted and discharged on the same day and coded as inpatient, LOS is counted as one day. If a patient stays in the hospital for more than 100 days, the case, as an outlier, is dismissed from LOS analysis.

2.2.5 Total Charges

Total Charges represent the dollar amount charged to a patient during the hospital stay. The field is directly available in both private and public data. If the dollar amount is greater than 500, 000 USD, the case is excluded from both charge and cost analysis.

2.2.6 Comparative Costs

The conversion of charges to costs is a simple matter of multiplying the patient-level total charges from the discharge abstract (typically the UB-92 record) by the facility-specific cost-to-charge ratio. The computation is performed for each individual patient stay in the hospital. To calculate costs, total charges must be recorded (and fall within trimming guidelines). The calculation is performed during early data processing prior to CareScience risk assessment.

Return to Article Contents

2.3 Independent Variables

The following patient characteristics and socioeconomic factors comprise the set of regressors.

Age (quadratic form)
Birth weight (quadratic form, for neonatal model only)
Sex (female, male, unknown)
Race (white, black, asian-pacific islander, unknown)
Income (median household income within a zip code reported by US Census Bureau)
Distance traveled (the centroid-to-centroid distance between the zip code of the household and the zip code of the hospital or provider, represented as a relative term)
Principal diagnosis (terminal or three digit ICD-9-CM code, where statistically significant)
CACR5 comorbidity scores (count of comorbidities within each of five severity categories on the CACR Likert scale)
Defining diagnosis (three digit ICD9-CM code for neonatal model only)
Cancer status (benign, malignant, carcinoma in situ, history of cancer, derived from secondary diagnoses)
Chronic disease and disease history (terminal digit ICD9-CM diagnosis codes, such as diabetes, renal failure, hypertension, chronic GI, chronic CP, obesity, and history of substance abuse)
Valid procedure (terminal ICD9-CM procedure codes, where clinically relevant and statistically significant)
Time trend factor (to control for inflation specific to each disease in the inpatient hospital setting, derived from discharge date, for Cost and Charge model only)
Admission source (Physician Referral, Clinic Referral, HMO Referral, Transfer from a Hospital, Skilled Nursing Facility or Another Health Care Facility, Emergency Room, Court/Law Enforcement, Newborn—Normal Delivery, Premature Delivery, Sick Baby, or Extramural Birth, Unknown/Other)
Admission type (Emergency, Urgent, Elective, Newborn, Delivery, Unknown/Other)
Payer class (Self-pay, Medicaid, Medicare, BC/BS, Commercial, HMO, Workman's Compensation, CHAMPUS/FEHP/Other Federal Government, Unknown/Other)
Discharge disposition (Home or Self Care, Short-term General Hospital, Skilled Nursing Facility, Intermediate Care Facility, Other Type of Institution, Home under Care of Organized Home Health Service, Left against Medical Advice, Discharged Home on IV Medications, Expired, Unknown/Other)
Facility type (Acute, long-term, Psych.)

Risk factors used in the CareScience risk assessment model are tailored to specific patient subpopulations and outcomes. The use of the following risk factors may vary depending on the specific subpopulation and outcome evaluated:

  • Diagnosis detail.
  • Significant comorbidities.
  • Defining procedures.
  • Birth weight (used instead of age for neonates).
  • Time trend (controls inflation for costs and charges).
  • Discharge disposition (excluded in mortality analyses).

The following table summarizes the independent variables for specific outcomes and subpopulations. Some independent variables are defined and selected according to clinical relevance, and some are transformed through mathematic method. The methods are elaborated in Appendix D (Technical Details about Model Specification).

Return to Article Contents

2.4 Semi-log Model

Length-of-Stay (LOS), Costs, and Charges are distributed with a rightward (positive) skew. Applying linear regression to data with skewed distributions of dependent variables gives rise to a number of pathologies, including inefficient and often biased, parameter estimates and predictions outside logical bounds (e.g., negative values for LOS and costs). When outcome measures are not symmetrically distributed, analysis of performance can be disproportionately influenced by outliers and extreme cases. A robust solution is to take the natural log of the dependent variable, which results in an approximately symmetric distribution and contracts the outliers inward toward the center of the data (i.e., area of greatest density within the distribution). It also ensures that all predicted values will be positive. (No matter how negative the log value is, taking the anti-log to restore the values will guarantee that they are positive.) Detail on the Semi-log model can be found in Appendix B (Semilog Modeling)

Geometric vs. arithmetic means:

The arithmetic mean is the simple average, computed by adding up all values (xi) in the sample and dividing by the number of such values (n):

arithmetic mean Mean = 1 over n times sum of x sub i for i = 1 through n.

The geometric mean follows the same principle, but instead of adding the values and dividing by n,, they are multiplied together and the nth root of the product is taken:

geometric mean Geometric mean = nth root of product of x sub i for i = 1 through n.

An equivalent way to compute the geometric mean is to take advantage of natural logarithms. Defining y as the natural log of x [y = ln(x)], the geometric mean is the anti-log (exp) of the arithmetic mean of y:

geometric mean Geometric mean = exp of mean of y, where mean of y = 1 over n times sum of y sub i for i = 1 through n.

Because the geometric mean is based on log values and the log transformation tends to draw extreme values toward the center of the data, the geometric mean is more "robust" than the arithmetic mean; the geometric mean is less influenced by outliers and consequently is a better representation of the data distribution. In the Care Management System tool, Length-of-Stay (LOS), Costs, and Charges are reported as geometric means.

1 Donabedian A. Evaluating the Quality of Medical Care, Milbank Quarterly, 1966; 44:166-203.
2 Juran JM et al. Quality Planning and Analysis: From Product Development Through Use. McGraw-Hill Series in Industrial Engineering and Management Science, 1993.
3 Azimuddin K, Rosen L, Reed JF. Computerized Assessment of Complications after Colorectal Surgery. Diseases of Colon & Rectum 2001; 44:500-505.
4 "Outcome" is a term of art that includes a range of observable performance measures beyond mortality and morbidity. Length of stay and treatment costs can be considered "outcomes," since they are indicators of efficiency as well as efficacy.
5 Comorbidity Adjusted Complication Risk—Brailer DJ, Kroch E, Pauly MV, Huang J. Comorbidity-Adjusted Complication Risk: A New Outcome Quality Measure, Medical Care 1996; 34:490-505.

Return to Article Contents
Proceed to Next Section


Page last reviewed March 2009
Internet Citation: CareScience Risk Assessment Model - Hospital Performance Measurement. March 2009. Agency for Healthcare Research and Quality, Rockville, MD.


The information on this page is archived and provided for reference purposes only.


AHRQ Advancing Excellence in Health Care