Skip Navigation Archive: U.S. Department of Health and Human Services U.S. Department of Health and Human Services
Archive: Agency for Healthcare Research Quality
Archival print banner

This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: Let us know the nature of the problem, the Web address of what you want, and your contact information.

Please go to for current information.

CareScience Risk Assessment Model - Hospital Performance Measurement

Presentations from a November 2008 meeting to discuss issues related to mortality measures.

Appendix F — New Methods on Horizon

Logit Modeling of Mortality

1. The concept of Logit Model

Mortality is a binary outcome; a patient either lives or expires upon discharge. Mortality risk is a predicted probability within a (0, 1) interval. A problem with the linear probability model is that valid probabilities are bounded by 0 and 1, but linear functions are inherently unbounded. A logit model can resolve this discrepancy.

Transforming the probability to an odds ratio removes the upper bound. The lower bound is then removed by taking the logarithm of the odds. Setting the result equal to a linear function with the explanatory variables yields a logit model. For k explanatory variables and i = 1,..., n individuals, this model is expressed as

Log [Pi /(1-Pi] = α = β1xi1 + β1xi2 + ... + βkxik

where Pi is the probability that Yi = 1. The expression on the left-hand side is usually referred to as the logit or log-odds. Similar to an ordinary linear regression, the x's may either be continuous or dummy variables. The logit equation can be solved for pi to obtain

Pi = EXP(α + β1xi1+ β1xi2 + ... + β1xik) / (1+ EXP(α + β1xi1 + β1xi2 + ... + β1xik))

We can further simplify this equation by dividing both the numerator and denominator by the numerator itself:

Pi = 1 / (1+ EXP(-α - β 1 x i1 - β 1 x i2 - ... - β 1 x ik))

The resulting equation has the desirable property that regardless what values are substituted for the β's and x's, Pi will always be a number between 0 and 1.

2. The Algorithm of Calculating Risk

Logit model implementation requires the following equations to calculate risk at the patient-level:

[1] Logit of Mortality Risk = β 01*X i12*X i2+ ... +β n*X in
The Standard Error of Logit is defined as:
[2] Logit Mortality Risk Standard Error = Sqrt( X i[COV b]X i'),
where COV b is the nonlinear ML estimated variance-covariance matrix of the estimated β coefficients.

Note that the Log of the Mortality Risk is not the same as the Logit of the Mortality Risk, since the logit value (Logit p) is equivalent to the log of the ratio that represents the risk (log (p/1-p)). Therefore, the logit ratio can not be equal to a log ratio. The same logic applies to the Standard Error value.

3. The Algorithm at Front-End Report

When reporting Mortality Risk, CareScience analytics first determine the arithmetic average Logit Mortality Risk and then transforms it into the real Mortality Risk using the following equation:

[3] Aggregate Mortality Risk = [1/(1+exp( (-1)* Avg(Logit Mortality Risk)))]*100

Mortality Risk will always be a number between zero and one. The Raw Mortality Rate and deviation are computed as:

[4] Raw Mortality Rate = (Number of Expired Cases/Number Of Eligible Cases)*100
[5] Deviation = Raw Mortality Rate - Mortality Risk

4. Computing the Significance Flag of the Mortality Deviation

"Significance flags" indicate whether the deviation could plausibly be interpreted as the probability that the results could have occurred randomly if there were no a true underlying effect. In other words, significance flags represent the probability or confidence interval at which the value generated from a variable in the raw data (sample) reflects the value generated from the same variable in the calibration data (entire population).

Rounding of deviation values is performed to the first decimal place. For example, a value of 0.04% is rounded to 0.0%. Similar to other outcomes, if the Mortality Deviation equals 0.0% after rounding, the significance level is not computed. Accordingly, the calculations below are not performed.

In order to determine the significance level of the Mortality Deviation, the following algorithm is applied:

  1. First, transform the Raw Mortality value into the Logit Raw Mortality value by applying the following steps and equation:22
    [6] Logit Raw Mortality = ln(Raw Mortality/(1 - Raw Mortality))
  2. Compute the Logit Mortality Deviation as:
    [7] Logit Mortality Deviation = Logit Raw Mortality - Logit Mortality Risk
  3. Calculate the T_VALUE (Z_VALUE) as:
    [8] T_VALUE = ABS{(Logit Mortality Deviation)/ [AVG(Logit Mortality Risk Standard Error)]}
  4. Compare the T_VALUE to the T_DISTRIBUTION table and obtain the significance level.

Note: If the Raw Mortality Rate is either 0 or 1 OR the Average Logit Mortality Risk Standard Error is 0, the Logit Raw Mortality value is considered undefined. The significance level is set at the highest significance level (i.e. 90%).

5. Concerns about Implementing Logit Model

In-hospital death is rare among many patient populations. At hospital level, the survival-death split is around 98- 2. The split can be more extreme among many patient populations. For a given sample size, the standard errors of the coefficients depend heavily on the split on the dependent variable. As a general rule, we are better off with a 50-50 split than with a 95-5 split. Logit model has a unique sampling property. We can do disproportionate stratified random sampling on the dependent variable without biasing the coefficient estimates. The intercept does change under such sampling schemes. This solution is extremely useful in our situation. But the data set has to be specifically tailored to each disease stratum.

Convergence failure is a common issue with logit model. We know that most of the independent variables are categorical variables. They enter the model equation as series of dummy variables. Some of the dummy variables may have the following property: at one level of the dummy variable either every case has a 1 on the dependent variables or every case has a 0. That causes complete separation or quasi-complete separation. In either case, logit model will not converge. Removing problematic dummy variables can achieve convergence. It is equally effective to collapse uncommon categories. But again, the data set has to be specifically tailored to each disease stratum.

Logit model does not generate standard errors for the model. Only the covariance matrix of parameter estimates could be used to calculate standard error for individual patients during out-of-sample prediction. Therefore, the significance level on the front-end report is less robust, comparing to that of linear model.

At the aggregate level, logit model generates similar results to linear model. At the patient level, however, logit model offers better face validity. Implementing the logit model is a costly endeavor requiring the overhaul of current methods and programs. From a methodological perspective, the logit model offers greater advantages in the long term.

Prospective Risk

A specialized version of the model deals with "prospective" patient risk for adverse outcomes and resource demands. This model restricts explanatory patient risk factors to those known and recorded upon admission to the hospital. While most CareScience patient-level variables qualify for inclusion, the model excludes discharge disposition factors and most procedure information. The comorbidity score for this version of the model is based purely on known chronic conditions rather than on all secondary diagnoses from the discharge abstract. The model compensates for information unavailable on the day of admission by including information from prior hospital admission by linking visit records to unique patient identifiers.

Still to be developed is a version of the CareScience prospective risk model that makes use of test results and signals clinical findings within the first 24 hours of admission. An increasing number of CareScience clients are supplying such clinically rich data elements, but we are not yet able to construct a database that is large enough to build a statistically robust model.

Discharge Disposition Model (To be added)

Modeling Complication as a Binary Outcome (To be added)

22 Note: ABS = Absolute and AVG = Average

Return to Article Contents
Return to Mortality Measurement Contents
Proceed to Next Section

Page last reviewed March 2009
Internet Citation: CareScience Risk Assessment Model - Hospital Performance Measurement. March 2009. Agency for Healthcare Research and Quality, Rockville, MD.


The information on this page is archived and provided for reference purposes only.


AHRQ Advancing Excellence in Health Care