CareScience Mortality Risk Model
By Eugene Kroch, PhD, Michael Duan, MS, Emi Terasawa
CareScience Mortality Risk Model
Defining the Mortality Population
Risk Model Specification
Out of Range Predictions
Data Source and Model Calibration
All-Payer State Data
Private Client Data
Model Selection for Private Client Data
Model Selection for Public Data
Comparison to a Logit Model
Logit Model Functional Form
Logit Model Considerations
Mortality is arguably the most commonly employed outcome measure in quality of care studies. Easily measured by simply counting deaths from discharges, inpatient mortality presents a seemingly unambiguous yardstick for judging quality. As an outcome measure, its clinical significance and relevance are unequivocal. It is the archetypical "sentinel event," signaling ultimate failure in care. For hospital staff and leadership, it forms the basis of Mortality and Morbidity Reviews, and for the public and media, it is a focus of quality assessment. In addition to its clinical relevance, mortality is easily explained and understood, a valuable attribute in performance improvement discussions and public reporting.
Despite the aforementioned advantages, mortality presents challenges as an outcome measure. The approach of counting deaths from discharges can inadvertently mask "true" mortality rates, which may be disguised by discharge policies. More specifically, inpatient mortality rates may be reduced by transferring the most severely afflicted patients to other acute care facilities, skilled nursing homes, or hospice facilities. Mortality rates are also prone to wide variation across diseases, rendering them irrelevant for certain populations for quality analysis. In populations where death is very rare (e.g., kidney and ureter calculus) or largely expected (e.g., admitted with DNR), mortality becomes a less meaningful quality measure.
Mortality rates can be defined for a range of periods (e.g., inpatient stay, N days post hospital admission, etc), however, the CareScience Mortality Risk Model restricts its purview to inpatient mortality to isolate in-hospital care effects.
The purpose of the CareScience Mortality Risk Model is to generate the expected or "standard" mortality rate ("risk" rate) under typical care, given the patient's health status and relevant characteristics. Patient-level mortality risk is assessed via a stratified multiple regression model with the following functional form:
where yijk is the mortality risk rate at patient level i, provider j, and principal diagnosis k. xijk is a vector of patient characteristics and socioeconomic factors. βk is the marginal effect of the independent variables on the mortality outcome measure, and εijk is the random error component of the model. The strata (k) are roughly based on 3-digit level ICD-9-CM diagnosis codes. Rare and insignificant diagnoses are rolled up into broad diagnosis groups, which are defined in the ICD-9-CM book. A total of 142 disease strata are analyzed.
The following patient characteristics and socioeconomic factors comprise the set of regressors (i.e., classes of independent variables) used in the CareScience Mortality Risk Model.
- Age(quadratic form)
- Birth weight(quadratic form, for neonatal model only)
- Sex(female, male, unknown)
- Race(white, black, asian-pacific islander, unknown)
- Income(median household income within a zip code reported by US Census Bureau)
- Distance traveled(the centroid-to-centroid distance between the zip code of the household and the zip code of the hospital or provider, represented as a relative term)
- Principal diagnosis(terminal or three digit ICD-9-CM code, where statistically significant)
- CACR1 comorbidity scores(count of comorbidities within each of five severity categories on the CACR Likert scale)
- Defining diagnosis(three digit ICD9-CM code for neonatal model only)
- Cancer status (benign, malignant, carcinoma in situ, history of cancer, derived from secondary diagnoses)
- Chronic disease and disease history(terminal digit ICD9-CM diagnosis codes, such as diabetes, renal failure, hypertension, chronic GI, chronic CP, obesity, and history of substance abuse)
- Valid procedure(terminal ICD9-CM procedure codes, where clinically relevant and statistically significant)
- Admission source(Physician Referral, Clinic Referral, HMO Referral, Transfer from a Hospital, Skilled Nursing Facility or Another Health Care Facility, Emergency Room, Court/Law Enforcement, Newborn - Normal Delivery, Premature Delivery, Sick Baby, or Extramural Birth, Unknown/Other)
- Admission type(Emergency, Urgent, Elective, Newborn, Delivery, Unknown/Other)
- Payer class (Self-pay, Medicaid, Medicare, BC/BS, Commercial, HMO, Workman's Compensation, CHAMPUS/FEHP/Other Federal Government, Unknown/Other)
- Facility type(Acute, long-term, Psych.)
Risk factors used in the CareScience risk assessment model are tailored to specific patient subpopulations and outcomes. Use of the following risk factors may vary depending on the specific subpopulation and outcome evaluated:
- Diagnosis detail.
- Significant comorbidities.
- Defining procedures.
- Birth weight (used instead of age for neonates.)
CACR Comorbidity Scores
CACR comorbidity scores are derived from principal and secondary diagnosis codes. Secondary diagnoses are first categorized according to a five point Likert scale of increasing severity (A-E) where E is most severe.2 Comorbidities are calculated for each severity level as
where Nis is the expected number of comorbidities of severity s for a patient with principal diagnosis i, pij is the CACI probability of complication for the jth secondary diagnosis given principal diagnosis i, and S is one of the severity levels, A-E.
Common chronic diseases enter the model as dummy variables separate from comorbidities. Both comorbidities and chronic diseases are constrained to be non-negative coefficients in the model calibration.
Strictly speaking, a procedure is not a patient characteristic but rather a provider care choice. For example, two physicians may opt to pursue two different yet equally effective courses of treatment for the same patient. Although procedures represent the discretion of the care provider, they can signal important information about the patient's overall health status. Certain procedures can serve as effective proxies for lab reports and treatment history that are not available in the current database, as well as for other unobservable critical factors. To be included in the model, procedures must be designated as "valid" for the patient's particular disease stratum. Additionally, the timing of certain procedures relative to the patient's hospital admission must be considered. Valid procedures are grouped into one of two categories based on timing criteria.
Each disease stratum has a unique set of valid procedures. If a procedure falls into Category 1, timing of the procedure is not considered, and the analytic program simply searches for the procedure's corresponding coefficient. (Procedures failing to be statistically significant are not included in the model and have no impact on the risk score.3)
If a procedure is mapped to Category 2, inclusion of the procedure in the model depends on the procedure's timing during the inpatient stay. If the procedure occurs within a critical time period from the patient's hospital admission, the procedure is included in the model. If not, the procedure is excluded. The critical time windows for Category 2 procedures are assigned by internal panels of clinicians.
For several disease strata, the risk model does not incorporate valid procedures. These groups include DRGs 103, 480, 481, 495, 512, and 513.
Missing Independent Variables
As with most large databases, some records may lack one or more independent variables. Dismissing these records completely from the analysis may eliminate important patient information and in turn shrink the base sample size. This is particularly true for public data sets where missing data elements are more common. Recognizing that independent variables have varying impacts on risk scores, the risk model is designed to tolerate missing values to some extent.
Principal Diagnosis, Age, and Birthweight (for neonates) are mandatory elements in the risk assessment model. Patient records missing any of these required elements are excluded from the model.
For most categorical variables, such as Admission Source, there is an 'Unknown' category designated for unrecognizable or missing values. Among the categories, 'Unknown' statistically has the greatest probability of having the highest counts, since missing data are due to random errors. In risk modeling, the largest and most common category is often used as the reference group. Assigning the 'Unknown' category as the reference group is thus justifiable, however, a high proportion of 'Unknown' values risks diluting the real characteristics of the reference group.
Due to tight quality control, 'Unknown' values are very rare in private client data. In public data, however, the missing portion ranges from a couple of percent to around ten percent. It is therefore necessary to check the distribution of the data before calibration. In general, the 'Unknown' values should not represent more than one third of the entire sample in order to be used as the reference group.
Income and Relative Distance are derived from zip code information. In the case of Income, the patient's residence zip code is used. For Relative Distance, both the patient's residence zip code and the hospital zip code are employed. If the patient's zip code is missing, the average Distance and Income of all patients in that hospital will be applied. In cases where both patient and hospital zip codes are unavailable, the Relative Distance is set to 1, and the national average income is applied.
Due to hospital discharge policies that can mask "true" mortality rates and measurement considerations, select patients are excluded from the CareScience Mortality Risk Model and do not receive mortality risk scores.
Discharged to Acute Care Facility
At the patient level, mortality is captured by the discharge disposition field in the administrative patient record. Patients expiring in hospital can be identified by discharge disposition codes of '20.'
Patients who are transferred to an acute care facility receive discharge disposition codes of '02.' These patients have an indeterminate mortality value and are consequently excluded from mortality analyses. The mortality risk for these patients is accordingly set to 'null.'
Insufficient Mortality for Measurement
Hospital-level mortality rates hover around 2 to 3 percent, however, wide variation exists across the model's 142 disease strata. Some of the strata have very low mortality rates, indicating that mortality may not be an appropriate performance measure for all disease strata. For example, among intervertebral disc disorder patients (ICD-9 722), mortality rates are less than 0.1%.
Death is so rare that mortality is difficult to model for these types of disease strata. As a result, these disease groups are omitted from mortality analyses rather than forced into a poor model.
The CareScience mortality model is based on linear regression, and consequently the predicted mortality risks may fall out of the range between zero and one at the patient level. Out-of-range risks are acceptable unless they exceed the "reasonable range" of -0.5 ≤ and ≤ 1.5 at which point they are considered invalid. If negative risks occur in aggregate reporting, they are rounded to zero.4
CareScience employs three main data sources: MedPAR, All-Payer State data, and private client data. All three datasets are calibrated separately.
MedPAR consists of approximately 12 million inpatient visits that are covered by Medicare each year. These fiscal year data are generally consistent and updated annually with roughly a one-year lag time. (e.g., Fiscal year 2004 data were available at the end of 2005.) MedPAR covers all U.S. states and territories and is publicly available. Unsurprisingly, many research projects and publications are based on MedPAR. MedPAR covers around one-third of all hospital inpatients, almost all of which are 65 and older. Consequently, some specialties such as Pediatrics and Obstetrics are practically absent.
All-Payer State data include all inpatients regardless of payer type or other restrictions, thus providing an advantage over MedPAR. Additionally, All-Payer State data contain a larger volume: roughly 20 million records from around 2700 hospitals. Despite these advantages, the data set has limitations. The most noticeable of these is that the data are less geographically representative. All-Payer State data come from fewer than 20 states located mostly on the coasts. In addition to this handicap, the data set lacks a continuum of data for each of the states, since changing regulatory laws often affect the availability of states' data from year to year. This lack of continuous data can severely limit the feasibility of longitudinal studies. Additionally, because State data is released by individual states with their own data specifications, the data are often inconsistent across states. As a result, All-Payer State data require significant internal resources to validate and improve its quality. The two-year lag time in release prevents All-Payer State data from being chosen as the model's calibration database, because the standards of hospital care are in constant flux (reflected in part by new codes appearing every year to reflect changes in diagnosis, procedure, DRG, etc). Despite the aforementioned limitations, All-Payer State data remains a good choice for hospital ranking because of its volume and completeness of disease segments. It also serves as a reference data set for CareScience's private data.
In addition to the public data sets, CareScience collects private data from clients. Client data are submitted in compliance with CareScience's Master Data Specifications (MDS), ensuring its consistency and quality. The data are updated frequently with three to six months lag and offer much richer content that allows exploration of new model specifications. Annually, the combined Premier-CareScience data base consists of about 8 million records from over 600 hospitals dispersed across the United States. Because the client base is continually changing, the number of hospitals and records may fluctuate each year. The quality and richness of the client data make it an ideal calibration database despite its smaller size than the two public data sets.
To avoid overfitting, CareScience's model calibration employs Stepwise Selection for private client data with critical significance set at 0.10. Variables are added to the model one at a time with the computational program selecting the variable whose F statistic is the largest and also meets the specified critical significance. After a variable is added, the stepwise method inspects all variables in the model and deletes any whose F statistic fails to meet the specified significance threshold. Once the check is made and the necessary deletions accomplished, another variable is added to the model. This process effectively reduces the possibility of multicollinearity caused by highly correlated independent variables. The stepwise process ends when the F statistics for every variable outside the model fail to meet the significance threshold while the F statistics for every variable within the model satisfy the significance criterion.
Due to the selection criteria, the number of selected independent variables ranges from several to dozens, depending on the disease. The R-Square of the model may be smaller than that of a full model without restriction but are far more robust than an overfitted full model. For out-of-sample predictions, robust parameter estimates generate more reliable risk scores.
Chronic conditions and comorbidities are restricted to positive-only parameter estimates due to their clinical attributes.
Public data sets are always calibrated on themselves. Because their parameter estimates are not used for out-of-sample predictions, a full model is preferred as it provides a higher R-Square.
Provider performance can be assessed for virtually any patient grouping (e.g., hospital-level, physician-level, principal diagnosis, DRG, procedure, etc.) through aggregation and comparison of the model's raw and risk complication rates. Positive deviations, as calculated below, indicate worse than expected (average) performance while negative deviations indicate better than expected (average) performance.
where n is the number of patients in the ith patient group.
Statistical significance tests can be used to determine whether complication deviations indicate reliable areas for opportunity. CareScience performance reports flag deviations significant at 75% and 95% confidence levels.
Figure 5: Computing Mortality Risk Rates and Deviations Example
Principal Diagnosis: Septicemia (038)
Sample Patient Characteristics
|Patient||Dependent Variable||Independent Variables|
|Income||Comorbidities Severity D||Comorbidities Severity E||Procedure 96.72|
Cont. Mech. Ventilation >96hrs
Principal Diagnosis: Septicemia (038)
|Independent Variable||Coefficient (Parameter Estimate)|
|Comorbidities Severity D||0.0694|
|Comorbidities Severity E||0.1896|
|Cont. Mech. Ventilation >96 Hrs||0.0939|
Mortality Risk = b0 + b1(age) + b2(ageˆ2) + b3(gender) + b4(income) + …
= 0.0186 – 0.0022(age) + 0.000043(ageˆ2) + 0.0123(gender) – 0.00000046(income) + …
= 0.0186 – 0.0022(42) + 0.000043(1764) + 0.0123(1) – 0.00000046(40,000) + … = 0.1882
Patient 1 has an 18.8% chance of expiring during her inpatient stay.
(0 = Survived, 1 = Expired)
|Mortality Risk Rate (%)|
Raw Rate = 2/6 = 33%
Risk Rate = 131%/6 = 22%
Mortality Deviation = 33% - 22% = 11% (excess mortality)
Mortality is a binary outcome; the patient either lives or expires. In the CareScience Mortality Model, however, risk scores may fall outside of the 0 to 1 range due to the inherently unbounded nature of linear regression models. One approach to correcting this discrepancy is to use a logit model.
Logit models are often the preferred choice for modeling binary outcomes such as mortality, since their output values are restricted to a range between 0 and 1. Mathematically, the model is expressed as
Log [ Pi/ (1- Pi) ] = α + β 1 x i1 + β 2 x i2 + …+ β k x ik
where kis the number of explanatory variables with i=1,…, n individuals and Pi is the probability that Yi=1. The expression on the left-hand side is usually referred to as the logit or log-odds.5 Similar to an ordinary linear regression, the x's may either be continuous or dummy variables. The logit equation can be solved for Pi to obtain
Pi = EXP ( α + β 1 x i1 + β 2 x i2 + …+ β k x ik) / (1+ EXP ( α + β 1 x i1 + β 2 x i2 + …+ β k x ik ))
This equation can be further simplified by dividing both the numerator and denominator by the numerator itself:
Pi = 1/ (1 + EXP (- α - β 1 x i1 - β 2 x i2 - …- β k x ik ))
The resulting equation has the desirable property that regardless what values are substituted for the β's and x's, Pi will always be a number between 0 and 1.
The linear regression model used by CareScience provides a good approximation to the logistic curve in localized regions of the mortality model.
At the aggregate level, the logit model generates similar results to linear model. At the patient level, however, the logit model offers better face validity. Although the logit model presents certain, considerations exist as well.
In-hospital death is rare among many patient populations. At the hospital level, the survival to death split is around 98% to 2%. This split can be more extreme among many disease groups. For a given sample size, the standard errors of the coefficients depend heavily on the split on the dependent variable. As a general rule, the model is better with a 50%-50% split than with a 95%-5% split. The logit model, however, has a unique sampling property that allows disproportionate stratified random sampling on the dependent variable without biasing the coefficient estimates. Under such sampling schemes, the intercept changes, and the data set needs to be specifically tailored to each disease stratum.
Convergence failure is a common issue with the logit model. Most independent variables are categorical and enter the model equation as dummy variables. Often some of the dummy variables exhibit the following property: at one level of the dummy variable every case has a 1 on the dependent variable or every case has a 0. This property causes complete separation or quasi-complete separation preventing convergence. Removing problematic dummy variables can achieve convergence. Alternatively, uncommon categories can be collapsed. In each case, the data set must be specifically tailored to each disease stratum, which is a labor-intensive process.
1Comorbidity Adjusted Complication Risk — Brailer DJ, Kroch E, Pauly MV, Huang J. Comorbidity-Adjusted Complication Risk: A New Outcome Quality Measure, Medical Care 1996; 34:490-505.
2Severity ratings are assigned by an internal panel of clinicians.
3See Sections 4.4 and 4.5 on Model Selection.
4Theoretically, it is possible to have mortality risks greater than 1 in aggregate reporting. In reality, however, these events never happen, since mortality is a relatively rare occurrence. (Aggregate mortality risks of ˜0.80 are already considered unusually high.)
5Transforming the dependent variable to an odds ratio, Pi / (1- Pi), removes the equation's upper bound of 1. The lower bound of 0 is removed by taking the logarithm of the odds.