CareScience Risk Assessment Model - Hospital Performance Measurement (
III. Model Calibration
3.1 Data Source
Three data sources are employed in CareScience risk model calibration. They are MedPAR, All-Payer State data, and private data.
MedPAR consists of approximately 12 million inpatient visits that are covered by Medicare each year. MedPAR covers all U.S. states and territories and is publicly available. Many research projects and publications are based on MedPAR. CareScience acquires the pre-processed data annually from Solucient. The time range of the data is based on CMS's fiscal year, which contains the fourth quarter of the previous year and the first three quarters of the current year. MedPAR data is generally available with one year lag time. (e.g. Year 2004 data were available by the end of 2005.) MedPAR covers around one-third of all hospital inpatients; and almost all of its patients are 65 plus. Consequently, some specialties such as Pediatrics and Obstetrics are practically absent.
All-Payer State data includes all inpatients regardless of payer type or other restrictions, thus providing an advantage over MedPAR data. Additionally, All-Payer State data contains a larger volume: roughly 20 million records from around 2700 hospitals. Despite these advantages, the data set has limitations. The most noticeable of these is that the data are less geographically representative. All-Payer State data comes from fewer than 20 states located mostly on the coasts. In addition to this handicap, the data set lacks a continuum of data for each of the states, since changing regulatory laws often affect the availability of states' data from year to year. This lack of continuous data can severely limit the feasibility of longitudinal studies. CareScience acquires the pre-processed data annually from Solucient. There is usually a two-year lag. Because State data is released by individual states with their own data specification, the data is often inconsistent across states. As a result, All-Payer State data requires significant internal resources from Data Management Group and Research to validate and improve the quality. The lag time in release also prevents All-Payer State data from being chosen as the model's calibration database, because the standards of hospital care are in constant flux (reflected in part by new codes appearing every year in order to reflect changes in diagnosis, procedure, DRG, etc). Despite the aforementioned limitations, All-Payer State data remains a good choice for hospital ranking because of its volume and completeness of disease segments. It also serves as a reference data set to CareScience's private data.
In addition to the public data sets, CareScience collects private data from clients. Client data is submitted in compliance with CareScience's Master Data Specifications (MDS), ensuring its consistency and quality. The data are updated frequently with three to six months lag and offer a much richer contents that allow exploration of new model specifications. Annually there are around two million records from 140 hospitals dispersed in 35 states. Because the client base is continuously in change, number of hospitals and records may fluctuate each year. The general trend is increase. The quality and richness of the client data make it an ideal calibration database despite its being significantly smaller than the two public data sets.
3.2 Missing Outcomes and Independent Variables
As with most large databases, some records may lack one or more data elements. When outcome is missing, the record will be automatically removed from the analysis. When one or more alternatives.
Principal Diagnosis, Age, and Birth Weight (for immature newborns) are mandatory elements in the risk assessment model. They are considered essential, non-replaceable risk factors. Missing any of them will result in exclusion from the risk assessment. Time_Trend is a mandatory field to predict Charges and Costs. Omission of this value results in 'null' risk scores for Cost and Charge outcomes for these specific cases.
For most categorical variables, such as Admission Source, there is an 'Unknown' category designated for unrecognizable or missing values. Given 'Unknown' is due to random error, the missing value is most likely the category with the highest frequency. For example, ER is the most common admission source. If a patient's admission source is missing, ER would be the most likely admission source. In risk modeling, the most common category is often used as the reference group. Grouping the 'Unknown' category with the most common category (the reference group) is thus justifiable, however, the high portion of 'Unknown' values risk diluting the real characteristics of the reference group. Due to tight quality control, 'Unknown' values are very rare in private client data. In public data, the missing portion ranges from a couple of percent to around ten percent. It is therefore necessary to check the distribution of the data before calibration. In general, the 'Unknown' values should not represent more than one third of the reference group.
Income and Relative Distance are derived from zip code information. In the case of Income, the patient's residence zip code is used. For Relative Distance, both the patient's residence zip code and the hospital zip code are employed. If the patient's zip code is missing, the average Distance and Income of all patients in that hospital will be applied. In cases where both patient and hospital zip codes are unavailable, the Relative Distance shall be set to one, and the national median income will be applied.
3.3 SAS Programming
Since 2003, model calibration has been executed by SAS-based programs that are created and maintained by the Research Dept. The SAS-based programs replaced the previously used CareScience Regression Language (CRL), which was maintained by the Software Engineering Dept.
3.3.1 Data Transforming
CareScience's database is on Oracle platform. The database schema and table structure are designed for specific product and its related data processing tool. It is essential for Research to pull out key data elements and transform them into SAS-recognizable data sets. The technical details are elaborated in Appendix E (Technical Details about SAS Programming)
3.3.2 Model Selection
Not all variables carry the same weight. Some variables may have little impact on risk scores. Some variables may have impact on only specific outcome. CareScience's model calibration employs Stepwise selection to identify significant variables at 0.10 level, which is close to the upper limit that SAS recommends. Alternative selection options are discussed in Appendix E (Technical Details about SAS Programming)
With Stepwise option, variables are added to the model one at a time with the program selecting the variable whose F statistic is the largest and also meets the specified critical significance. After a variable is added, the stepwise method inspects all variables in the model and deletes any whose F statistic fails to meet the specified significance threshold. Only after the check is made and the necessary deletions accomplished can another variable be added to the model. This process effectively reduces the possibility of multicollinearity issue, which is caused by highly correlated independent variables. The stepwise process ends when the F statistics for every variable outside the model fail to meet the significance threshold while the F statistics for every variable in the model satisfy the significance criterion. Alternatively, the process ends when the next variable to be added to the model is the one just deleted from it.
Due to the selection criteria, the number of selected independent variables ranges from several to dozens, depending on outcome and disease. The R-Square of the model may be lower than a full model without any restriction. But the parameter estimates from the selected model are far more robust than an over-fitted full model. In out-of-sample prediction, robust parameter estimates generate reliable risk scores.
Public data sets are always calibrated on themselves. No parameter estimates from their calibration are used to assess other data sets. Therefore, a full model is preferred because it provides higher R-Square.
Regardless of its significance Time Trend is exempt of stepwise selection. It is forced into the Cost and Charge models by an 'include' option in the program. Internal studies have demonstrated that Time Trend is a strong predictor for measuring inflation rate for most diseases. The average inflation rate is around 8.8%. For the 142 disease strata, Time fails to meet the 0.10 significant level in only one instance.
Chronic conditions and Comorbidities are restricted to positive-only parameter estimates according to their clinical attribute.
3.3.3 Macro Function
It is a monumental task to manipulate millions of records with hundreds of fields and subsequently run more than 800 regression equations through SAS programming. In the current model, regression equations are specified distinctly depending on disease and outcome. Regression coefficients and covariance matrices are then reshaped to fit the structure of the SAS Macro language, which streamlines the programming and makes it at least partially an automatically executed program. The technical details about the Macro processing are elaborated in Appendix E (Technical Details about SAS Programming)
3.4 Beta Tables
Beta tables include a coefficients table (Beta table), a covariance table, and a regression information table. All three tables are initially generated by the model calibration process and then are manipulated to fit specific formats. These tables comprise the key components of CareScience's risk assessment tool.
3.4.1 Regression Information Table
As its name suggests, the regression information table summarizes the results of model calibration. For each model equation, regression information reports R-square, root mean square error (RMSE), number of selected independent variables, and the number of valid records in the calibration data.
R-square varies substantially across outcome and disease. Of the outcomes measured, mortality often suffers from consistently low R-square values, a susceptibility that can be attributed to two reasons. First, expiration is rare among patient discharges. At the hospital level, the mortality rate hovers around 2-3 percent. Most expiration cases are concentrated in a few high-risk diseases, such as Septicemia, AMI, and Lung Cancer. Furthermore, when mortality is an infrequent occurrence within a low-risk disease, it is more difficult to predict. The second reason is that the model calibration relies on claim data, which does not cover all clinical factors. Because the data are designed for billing purposes, it is unsurprising that financial outcomes such as LOS, Costs, and Charges have higher R-squares than for mortality. (R-square for the efficiency outcomes, LOS, Costs, and Charges can be as high as .70).
RMSE is used, along with the Covariance Table, to calculate standard error associated with predicted risk at the patient level.
3.4.2 Beta Table
Beta table includes all coefficients that are significant at 0.10 level. Coefficient (Beta) can be interpreted as the risk factor's marginal effect upon risk score. By applying a set of corresponding coefficients, risk score can be calculated for each outcome at patient level. This processing is handled within Data Manager (formerly VIMR and CRL, or CIA).
To categorical variables, each coefficient is corresponding to one category. The coefficient shows the difference between that category and the reference category. For example, Hospital Transfer (04) is one category of Admission Source. The coefficient for Hospital Transfer can be category of Admission Source.
Regarding valid procedures and chronic conditions, each code is treated as a separate variable in the model. Their coefficients can be interpreted as the risk difference between patients having and not-having that code.
For each model equation, the number of coefficients corresponds in both the Covariance and the Regression Information Table. This feature can be used as QA control during the manipulation of beta tables.
3.4.3 Covariance Table
The covariance table is derived from the covariance matrix of coefficients. Due to Data Manager's requirements, the matrix is reshaped into a two-way table, consisting of RowName and ColumnName. The covariance table contains millions of values and is much larger than the Beta table. Due to the legacy of an earlier tool (CRL), the correlation matrix of coefficients (X'X)-1 is actually used to compute standard error. Consequently, the covariance matrix has been transformed into correlation matrix in the 'covariance' table.
3.5 Model Implementation Test
Two steps are involved in a model implementation test. The first is a technical test, which confirms that Data Manager (formerly VIMR and CRL, or CIA) correctly implements the beta tables to compute risks and the associated standard errors at the patient level. The second is a clinical review, which validates outcome reports from the clinical perspective.
3.5.1 Technical Test
During model calibration, SAS regression procedures generate predicted values of dependent variables and standard errors for all observations if options are correctly specified. The SAS output term 'P' (predicted) is equivalent to the risk in the Data Manager output while the SAS output term 'STDI' is equivalent to the standard error in Data Manager. The SAS-generated data set can therefore serve as a convenient reference for the technical model implementation test.
The technical model implementation test is conducted by Software Engineering and Quality Assurance Team, with Research and Data Management Group assisting them in assembling the data set. Research also consults on technical requirements, ensuring that the test data set has full coverage of all 142 disease groups (ccms_crl_group_by).
The test data may be selected from one hospital or one hospital system that has a sufficient number of cases for all disease groups. A more conservative approach, however, is to randomly extract data from the entire calibration database by disease group. The latter method guarantees full coverage of the data at a global level. It is recommended that at least 1000 cases are pulled for each disease stratum, however, for a few strata (e.g. DRG103), this may not be possible. In these situations, all cases in the disease stratum should be included in the testing data set. It may take a couple of months, or longer, to complete the test. Research plays an essential role in the process. When there is a discrepancy between SAS score and analytics score, finding the reason or reasons requires extensive knowledge about the methodology as well as thorough understanding of database schema.
3.5.2 Clinical Validation
The main purpose of risk-assessment is to differentiate patients based on their individual characteristics. Statistically, including more independent variables often increases R-square and creates more distinctive risk scores at the patient level. However, a model with the highest R-square may not have the best performance when coefficients are implemented in an out-of-sample prediction and risk scores are extrapolated. For instance, allowing more procedure codes in the model will certainly increase the model fit, but it may also result in inflated risk scores among patients treated by multiple significant procedures. From a statistical perspective, a few outliers are acceptable and don't exert much influence at the aggregate level, however, they may become a serious issue when grouped in CareScience front-end reports by certain criteria (e.g. DRG), causing their risk scores to deviate far beyond conventional clinical wisdom. A clinically based review can help identify problems such as this. Since the front-end tool allows users to select virtually any combination of patients, extensive reviews by the Consulting team are required to avoid issues of clinically unfounded risk scores.
Clinical reviews are also necessary to improve the quality of CareScience's clinical knowledge base used in the model. This knowledge base includes the comorbidity-adjusted complication indices (CACI), chronic condition designations, diagnosis morbidity designations, valid procedure designations, etc. This information has been gathered over the course of many years and is continually reviewed and updated by internal and external clinical experts. Despite these efforts, gaps in the knowledge base exist along with areas where the data contradict itself due to changes in treatment or coding practices. Clinical validation of the model performance provides an opportunity to identify some of these problems and accordingly upgrade our clinical knowledge base.
IV. Clinical Knowledge Base
It would be a mistake to characterize the CareScience risk model as purely a statistical model. Clinical knowledge plays a key role from the beginning of data processing to the end of risk assessment. The following section systematically examines the key components comprising the CareScience clinical knowledge base.
4.1 Comorbidity-Adjusted Complication Indices (CACI)
CareScience uses a decision-theoretic model called Comorbidity-Adjusted Complication Risk (CACR) to track complications. This model assumes a nonstandard definition of complications, defining them as conditions that arise during a patient's hospital stay. By this construction, complications do not necessarily imply iatrogenic events or physician negligence.
The CACR model is based on the assumption that most secondary diagnoses do not occur purely as comorbidities or as complications. Instead, some proportion of each of these recorded secondary diagnoses represent conditions that emerge during a hospital stay while the remaining proportion represent conditions that were present when the patient was admitted (i.e., comorbid conditions).
A comorbidity-adjusted complication index (CACI) is the probability that a given secondary diagnosis is a complication (condition developed during a patient's hospital stay) for a patient with a specific 3-digit ICD-9 principal diagnosis. For example, the CACI for a secondary diagnosis of urinary tract infection with a principal diagnosis of simple pneumonia is 90%, indicating that for 90% of patients with this principal-secondary diagnosis pair, the urinary tract infection emerged during their inpatient stay. For the remaining 10% of patients, the urinary tract infection was present at the time of admission. CACI exists for most common principal-secondary diagnosis pair combination and are assigned by Delphi6 panels of physicians.
Because CACIs are probabilities, they can only operate effectively on aggregated data where they provide estimated complication rates. CACIs cannot be used pinpoint which patients have specific complications within a given population. CACIs are periodically reviewed and reevaluated as a result of changes in medical practice, contestations, or additional information such as empirically derived "present on admit (POA)" data.
4.2 Diagnosis Morbidity
Diagnosis is one of the most important factors used to measure patient risk. CareScience risk model is stratified primarily according to principal diagnosis at the three-digit level. Principal diagnosis alone is able to explain a great portion of risk variations across all patient population. When principal diagnosis is the same, secondary diagnoses provide critical information to differentiate patient characteristics. Two clinical outcomes, Complications and Morbidity, are derived from secondary diagnoses. Among the independent variables, comorbidities and chronic significance, a five-level Likert scale was created to denote the severity or morbidity of each diagnosis.
The morbidity Likert scale levels range from 'A' to 'E' with 'A' reserved for conditions that are least severe. Secondary diagnoses in category 'A' possess minimal or no impact on patient risk. Typical diagnosis codes in this category include Headache (ICD9 Dx 7840), Backache NOS (ICD9 Dx 7245), and Diarrhea NOS (ICD9 Dx 78791). Category 'B' and 'C' denote mild conditions that may impact patient risk and the course of treatment. These conditions may increase length of stay and cost of treatment. Most common secondary diagnoses fall into categories 'B' or 'C.' Diagnoses that are classified into category 'D' and 'E' are truly severe conditions (e.g. Oclsn, cer artery NOS w/infarction (ICD9_Diag_43491)) and sometimes life-threatening. These conditions may substantially increase probability of expiration and length of stay; patients with these conditions may require additional rescue treatment, and their costs of treatment may spike.
Morbidity designations are always assigned at the terminal digit level of a diagnosis code. For instance, uncomplicated type I DM (ICD9 Dx 25001) is classified into category 'B' while type I DM w/neuro (ICD9 Dx 25061) is considered more severe, thus designated into category 'C.'
The Diagnosis Morbidity Table was originally created to measure morbid complications (Morbidity: complications with morbidity designations of 'D' or 'E'). In 2003, the table was expanded to include all common secondary diagnoses at the terminal digit level. With this expansion, the comorbidity score was broken into five categories, corresponding to the morbidity Likert scale. This modification has substantially improved the model performance. There are currently 514 secondary diagnosis codes in the table that account for about 80% of all secondary diagnoses. The less common diagnosis codes are not dropped from analysis but instead grouped into category 'U,' which stands for 'Unspecified.' As the morbidity assignments' normal distribution suggests, category 'U' diagnoses tends to share similar characteristics as the most common categories 'B' and 'C.'
4.3 Chronic Diseases and Disease History
A secondary diagnosis can either be a complication that developed after admission or a comorbid condition that existed before the patient was admitted. A few secondary diagnoses are considered "pure" comorbidities. These preexisting conditions can be identified by their complication probabilities of zero in the CACI table. These diagnoses form the basis of the risk model's chronic diseases list, which takes the form of an expanded disease-specific Chronic Condition Table. The expansion takes into account the volume of common chronic disease codes in each of the disease strata (ccms_crl_group_by). To be included, a chronic disease code must occur among at least 1% of patients in a given disease stratum. Less common codes still undergo CACI processing but are counted into one of the six comorbidity categories.
Because some chronic conditions are similar, they tend to have similar influence on patient risk assessment, e.g. 491.2x (obstructive chronic bronchitis), 493.2x (chronic obstructive asthma) and 496 (chronic airway obstruction, NEC). These kind of chronic conditions are mapped to a common chronic condition code, and share the same coefficients.
Diagnoses in the Chronic Condition Table are incorporated into the risk model as independent variables. The following example shows how the algorithm works:
If a secondary diagnosis is on the list of chronic conditions for a given disease stratum and thus included in the Chronic Condition Table, the secondary diagnosis will not undergo CACI processing. The diagnosis will not be counted in any of the six comorbidity categories. Instead, it is treated as a separate independent variable with its own corresponding risk coefficient, which can be found in the beta tables assuming that it's statistically significant. If the corresponding coefficient can not found in the beta tables, the coefficient should be assigned a default value of zero. The chronic condition has no impact on risk score, although the chronic condition is considered clinically relevant.
For the purposes of reporting, all comorbidities and chronic conditions should be included in the comorbidity count of CareScience Quality Manager front-end reports. The total number of comorbidities is calculated as the sum of CACI_Severity_Score_Cate_A to E and U plus the number of chronic conditions. For comorbidity counts by severity category, chronic conditions are mapped to the appropriate morbidity level in the Diagnosis_Morbidity table and are then added to the corresponding CACI_Severity_Score categories.
Since the CACI and Chronic Condition Tables are reviewed separately by different clinical panels, discrepancies between the two can arise. Whenever one is updated, the other must be reviewed for consistency. If discrepancies exist, they are resolved before the update becomes effective.
4.4 Valid Procedure
The 'validity' of a procedure is defined by its clinical proxy of patient characteristics and its statistical significance upon risk score. The validity is all about patient risk assessment. An 'invalid' procedure has no influence on risk score because it lacks either clinical proxy or statistical significance. The method does not intend to judge whether a procedure is appropriate treatment to a patient. An 'invalid' procedure does NOT mean it is an un-appropriate treatment. To qualify for valid procedure candidacy for a given disease stratum, a procedure must satisfy a frequency criterion relative to that stratum. The minimum frequency is defined as:
0.05*N / EXP (LOG10(N) - 1),
where N is the number of cases in the disease stratum.
Procedures meeting this frequency requirement are then reviewed by the CareScience Clinical Expert Panel and classified into four categories. Category 1 consists of procedures that are unequivocally included in the model estimation. Most procedures in this category are considered defining procedures (e.g. ICD9 Px 361.x - Coronary Arterial Bypass Graft7) that are often administered upon admission. Category 2 includes procedures that are considered for inclusion dependent on timing and combination with other procedures. ICD9 procedure 4443 (Gastric bleeding endoscopic control) is one such example. Use of this procedure within the first 48 hours strongly suggests it is related to patient condition upon admission. On the other hand, use of this procedure in the later stages of a hospital stay may indicate it is caused by an earlier treatment. In the latter case, the procedure should not be included as a patient characteristic. Category 3 includes procedures whose model inclusion or exclusion remains unresolved due to clinical review disputation. Most procedures in this category are diagnostic and therapeutic procedures. Category 4 consists of procedures excluded as risk factors because of coding variation or features masking attributes of the care process.
Procedures are strong risk predictors, especially for length of stay, costs, and charges. Allowing more procedures to enter the model certainly increases model fit, however, as previously discussed, a higher R-Square does not necessarily result in better model performance for out-of-sample predictions. The number of procedures patients may receive can vary from zero to several dozen despite having the same principal diagnosis. Whether a patient receives a procedure is determined by the patient's clinical status, as well as the care provider's judgment. Adding procedures without considering their clinical implications may introduce an omitted variable bias by inadvertently including provider attributes that should be excluded from risk assessment. Moreover, coding practices vary across hospitals. Some hospitals do not regularly record diagnostic and therapeutic procedures while others do. Additionally, some hospitals record only the main procedures and leave out the linking minor codes. All of these considerations contribute to the polemic surrounding the use of procedures as risk factors.
The current Valid Procedure selection process is conservatively designed and aimed at avoiding data noise to the maximum extent. Consequently, the current Valid Procedure list is shorter than earlier versions with an emphasis on 'defining procedures' (Category 1) that are clinically relevant and seldom omitted from hospital data. These procedures are minimally controversial. Category 2 procedures are only included in the model if their timing criteria are met, reducing the ambiguity between patient attributes and care provider effects. The coding accuracy of these procedures, however, is less reliable than that of Category 1 procedures.
4.5 Other Clinical Elements and Considerations
4.5.1 Relative Value Unit
'Relative Value Unit' or RVU describes a procedure's intensity of resource usage. Thus, significant procedures such as coronary arterial bypass grafts (CABGs), which demand greater resources than minor procedures, possess greater RVUs. Although RVUs may seem to offer useful risk assessment information, they are not patient characteristics but instead reflections of the care provider's resource usage. For example, consider two AMI patients, one receiving a CABG (RVU 56.5) and the other receiving Mechanical Ventilation (RVU 1.89). Does the high RVU of the CABG procedure indicate that the condition of the first patient is worse than that of the second who is treated with a low RVU procedure? Although it could be argued that greater resources are only used when needed (i.e., the condition of the first patient must be more severe), the answer is probably "no." The high RVU of CABG only indicates that the hospital allocated more resources to the first patient. In terms of severity, the condition of the second patient is conceivably graver as suggested by the rescue procedure. For this reason, RVUs are not directly incorporated in the risk model. They are poor predictors of clinical outcomes, and their predicting power for financial outcomes is already largely captured by valid procedures.
4.5.2 "Do Not Resuscitate" Orders
'Do Not Resuscitate' (DNR) orders instruct hospital staff not to attempt life saving procedures or treatments should a patient's condition turn critical. In most circumstances, once a DNR order is issued, the patient expires within a few days. In certain cases, however, the patient actually displays signs of returning to normal status after supports are withdrawn in which case the DNR order is dismissed and supports restored. A crucial study at Cooper Health System revealed that patients receiving DNR orders represent a large percentage of all expired patients. These findings suggest that consideration of DNR orders may significantly alter the overall picture of hospital expiration. More sophisticated analyses with data gathered from a range of hospitals may help improve the accuracy of the mortality model. This possibly raises the following questions: 1) Does a common guideline for DNR orders exist across hospitals? 2) What are the implications of DNR orders issued at different stages of the hospital stay in terms of patient characteristics? 3) What is the interpretation of the DNR on/off switch from a clinical perspective? 4) What kind of role does a DNR order play in the conventional measure of mortality rates? Further study in this field is needed before DNR becomes a part of CareScience risk assessment model.
6 The Delphi method, named for the famed Greek oracle, consists of several "rounds" of input and feedback. In the first round, participants are asked for their independent judgment. Once all responses are gathered, the mean response is posted. In the second round, participants are given the chance to change their response in light of the group's mean. The revised mean is posted to begin the third round, and so on.
7 All CABG procedures are rolled up into a three-digit code 361 in the model due to their similarities.