Appendix 1: Validity and Systematic Performance Misclassification
Methodological Considerations in Generating Provider Performance Score
A. What is validity?
The validity of a provider performance report is the extent to which the performance information contained in the report means what it is supposed to mean (rather than meaning something else).50-51 What a provider performance report is "supposed to mean" may depend on the purpose of reporting. But generally speaking, a report of provider performance is supposed to indicate something about providers (or something under providers' control), rather than something not inherently about providers or not under providers' control.xi
Put another way, a performance report will have high validity when its quality results truly represent the quality of care that a provider delivers. Similarly, a report with high validity will show efficiency results that truly represent provider efficiency, and so on. Reports would have low validity if they claimed to represent the quality of care delivered by a provider but instead truly represented the availability of parking in the provider's vicinity.
As a first step toward creating valid performance reports, CVEs should select performance measures that have "construct validity," which means that under ideal circumstances, the measures should actually represent what they are supposed to represent. For example, consider a hypothetical quality measure that counts the number of times drug X is given to patients with diabetes. For this measure to truly represent the quality of care, drug X should produce some kind of health benefit for patients with diabetes. If drug X helps diabetics' health, then the measure has construct validity. On the other hand, if drug X actually produces no health benefit (or even causes harm) for patients with diabetes, then the measure does not have construct validity.
Having performance measures with construct validity should be considered a bare minimum requirement for performance reporting. However, even when measures have construct validity, they can still be used to produce performance reports that have low validity. The following sections on systematic performance misclassification explain how even "valid measures" can lead to invalid reports of provider performance.
B. Systematic performance misclassification: a threat to validity
One way for a report of provider performance to have low validity is for the report to systematically misclassify provider performance. Systematic performance misclassification happens when the performance being reported is actually determined, to a significant degree, by something other than the performance that the report is supposed to present. To see how this can happen, consider the following scenario. Imagine that a CVE is reporting the performance of two hospitals on a measure of patient mortality and that the hospitals are identical in every way, except for one thing. One hospital serves a much older population than the other. Such a report of mortality is supposed to indicate which hospital is truly doing a better job at keeping its patients alive. However, because one hospital has an older patient population and older patients have higher average mortality than younger patients, the report will instead indicate which hospital has a younger patient population (rather than which hospital is truly better). The report will therefore have low validity, even if there is no random measurement error.xii
In the real world, it is unlikely that two hospitals will be alike in every way, and measurement error will be present. It would be possible for a hospital serving older patients to still outperform a hospital serving younger patients on a mortality measure, either due to extraordinary efforts or due to chance. But on average, we would expect a group of hospitals serving younger patients to outperform those serving older patients on mortality measures. Therefore, on average, the mortality measure will still represent the age of the patient population rather than measuring how good each hospital is at keeping its patients alive. In other words, the hospitals will be systematically misclassified on the mortality measure.
C. Causes of systematic performance misclassification
In this section, we present three major causes of systematic performance misclassification that are addressed in this report. These causes are statistical bias, selection bias, and information bias.
Statistical bias. When systematic performance misclassification is present because of differences in the patient populations served by different providers, the performance report contains "statistical bias" (also known as "omitted variable bias"). Two major techniques to address the problem of statistical bias in performance reports are case mix adjustment and stratification. Case mix adjustment uses statistical models to remove associations between patient characteristics and reported performance. For example, in a report that is case mix adjusted for patient age, there will be no association between patient age and reported provider performance (in other words, no providers will be "penalized" for having younger or older patients). Case mix adjustment is especially desirable when stakeholders feel that the patient characteristic in question is a cause of lower or higher measured performance.
Stratification, which means reporting separate results for different groups of patients (e.g., younger and older patients), can accomplish the same goals as case mix adjustment in some cases. A more detailed overview of these techniques is presented in the section on Task Number 5. We emphasize case mix adjustment here to make three key points:
- Statistical bias that causes systematic performance misclassification cannot be solved by adding more observations or specifying minimum sample sizes. The problem of systematic performance misclassification is methodologically distinct from the problem of performance misclassification due to chance (which is discussed in Appendix 2).
- Statistical bias can only be detected when the factor that is causing the bias (e.g., different patient age distributions) can be identified and measured by the CVE. In the example of hospital mortality rates, there would be no way to know whether statistical bias and systematic performance misclassification are present without first knowing the ages of the patients who receive care from each hospital.
- Even when statistical bias may be present, whether and how to account for it in performance reports is a value judgment. It depends on the nature of the performance measure in question, the story behind the statistical bias (i.e., the most likely reasons the bias is present), the purposes of public reporting, and the results of negotiations between CVE stakeholders.
- Selection bias. When the patients for whom performance data are available are not representative of the patients who will using a performance report, the validity of the report may be threatened by "selection bias." For example, a performance report may be based only on the care provided to patients in commercial health plans. If providers' care for commercial enrollees systematically differs from the care for other patient populations (e.g., Medicare or Medicaid), then from the perspective of a noncommercial enrollee, the performance report may systematically misclassify provider performance. In other words, a patient enrolled in Medicaid may believe that a given provider has average performance when the provider actually has low (or high) performance for Medicaid enrollees. CVEs can address this threat to validity by gathering performance data from a wide variety of patients (discussed in the section on Task Number 2) and by creating "stratified" performance reports that show different performance scores for different patient populations (discussed in the section on Task Number 5).
- Information bias. When certain providers underreport performance data (a particular concern when these data would indicate low performance), the validity of a performance report is threatened by "information bias." If providers with low performance tend to have more missing data than other providers, the report may systematically misclassify low-performing providers as having "observed" performance that is higher than their "true" performance. This threat to validity, and potential ways of addressing it, is discussed in the section on Task Number 4.