CareScience Risk Assessment Model - Hospital Performance Measurement
Appendix E—Technical Details about SAS Programming
1. Transforming Date-Time Format
SAS/Access software allows SAS to pull out data directly from Oracle database. Most formats, except date-time, are kept the same in SAS data set. Oracle date-time formatting is recognized in the SAS environment, however, SAS automatically transforms Oracle formatting into a unique SAS date-time format when Oracle data are read into SAS. This feature affects numerous timing-related fields, including Treatment_or_Admit_Date, Inpatient_Disch_Date, and Diagnosis_or Procedure_Date, as well as numeric fields derived from timing-related fields. Time_Trend, a field derived from Inpatient_Disch_Date, falls into the latter category. Because SAS date-time format is used in the subsequent SAS data processing environment, care is taken to ensure the accuracy of variables whose format has been transformed.
2. Transforming Categorical Variables
For some categorical variables, the transformation is simple and straight forward (e.g. sex can easily be converted to a dummy variable with value '0' for all male patients and '1' for all female patients.) Some categorical variables, e.g. admission source, may consist of multiple categories. But each visit has only one corresponding category. They can be directly incorporated into model statement with 'class' option. They can also be transformed into dummy variables with 'if, then' clauses. For the purpose of manipulating parameter estimates and covariance matrix, the latter method is preferred.
Some categorical variables, on the other hand, may consist of various values for each visit. For example, one visit may have three chronic conditions while the other has none. For this kind of categorical variables, 'class' option is invalid, and 'if, then' clause is not technically feasible. SAS/Macro offers a solution to this issue. The details are elaborated in the section of Macro Function.
3. Adding New Variables
When the risk model is recalibrated, new variables are often introduced. The following three methods are commonly used to obtain new variables. First, some variables may not be included in the previous model calibration; but they do exist in the database For example, Race has long been available in client data; but it has been incorporated into model only after 2005. Second, new variables may not exist in the current data but can be derived from existing fields. Birth weight, which can be derived from existing diagnosis codes, is one such example. Third, new variables may not be present or able to be derived in the current data but can be requested. In these instances, Research proposes changes to the Master Data Specification requirements that define what fields are collected from clients. After sufficient accumulation of the data, the new variable can be implemented into the model calibration.
4. Modifying Existing Variables
The introduction of new variables may require modification of existing variables. For example, the separate inclusion of chronic conditions as independent variables in the model necessitates the adjustment of the CACR Comorbidity Score.
Alternative Model Selection Options
1. Forward Selection
Forward selection begins with no variables in the model. For each of the independent variables, the forward selection process calculates F statistics that reflect each variable's contribution to the model if included. The p-values for the F statistics are then compared to a user-specified critical value for entering the model. If no F statistics have a significance level greater than the critical value, the forward selection process stops. Otherwise, the process continues by adding the variable with the largest F statistic to the model. The iterative process continues until no remaining variable produces a significant F statistic. Once a variable is added to the model in the forward selection process, it remains in the model.2. Backward Selection
Backward selection begins by calculating statistics for a saturated model that includes all independent variables. The variables are then deleted from the model one at a time until the only variables remaining in the model produce F statistics greater than the critical value specified for staying in the model. At each step, the variable showing the smallest contribution to the model is deleted. Given the number of potential variables in our models, backward selection is less efficient than forward selection, since only a handful of variables typically meet the significance requirement.
SAS/Macro language employs two main devices: Macro variables and Macro processing. Macro variables enable SAS users to dynamically modify text in a SAS program through symbolic substitution. When the substitution expands to compiled SAS programs, the term 'Macro' is applied. When a Macro is called, the compiled programs are executed automatically. This feature called Macro processing. The benefits of Macro language are obvious: reducing the number of SAS statements in a program, decreasing manual mistakes, and saving time for SAS user. But Macros do NOT reduce data processing time. The following example illustrates how the Macro language works.
Patients may be treated with multiple procedures. Depending on the patient's disease stratum, some procedures may qualify as valid procedure candidates and require dummy variables assigned to them. The conventional SAS command for assigning dummy variables is 'If ... then...; else... ;' Because the valid procedure list is disease-specific and contains thousands of rows, the SAS command for assigning dummy variables must be repeated tens of thousands of times, since a record may be mapped to dozens of procedure codes. The scope of this task makes it impossible to accomplish manually, however, the use of Macro programming statements renders it feasible.
More specifically, a series of Macro variables, corresponding to the 142 disease groups, are first created using the SAS command 'select ... into ... from ...' Next, a Macro is created to repeat the data processing for the142 disease strata. In the Macro, the same SAS command 'select...into...from' is used to create a series of Macro variables corresponding to the Valid Procedures of a disease stratum; for each of the valid procedure Macro variables, a SAS program is executed that scans all procedure codes, picking up the patients that were treated by the procedure. The processing is automatically repeated until all valid procedure Macro variables have undergone the process. After all records have been assigned valid procedure dummy variables in a disease stratum, the Macro proceeds to the next disease stratum, and the valid procedure Macro variables are automatically re-written to reflect the different Valid Procedure candidates of the second disease group. The processing continues until all 142 disease strata undergo the Macro.
The Macro function simplifies SAS statements in data processing immensely. But the Macro logic is far more complicated than plain SAS statements, especially when Macro variables or multiple Macros are assembled into one global Macro. To that end, it is always recommended to test Macros independently before implementing them full-scale into a model calibration.