Summary of Methodological Decisions Made by a Sample of CVE Stakeholders
Methodological Considerations in Generating Provider Performance Scores
During the spring of 2010, we interviewed the leaders of nine Chartered Value Exchanges (CVEs) and their stakeholder organizations to get a sense of how their collaboratives were approaching the decision points discussed in this paper. In this section, we present a synthesis of these responses. Because the interviews were qualitative in nature and the sample of CVEs was small, we refrain from presenting counts of each type of response. The aim of this section is to convey the range of methodological choices and, in illustrative cases, provide examples of the rationale behind some of these choices.
What are the purposes of publicly reporting provider performance?
In general, while public reporting efforts may have begun with the primary goal of helping patients choose providers, providers themselves turned out to be the primary audience for public performance reports. For example, access logs for Internet-based reports revealed that the overwhelming majority of individuals reviewing the reports were located in physician offices or hospitals.
The reports were felt to motivate providers to improve, and some reporting organizations also engaged in efforts to assist providers' improvement efforts. In addition, reports were often intended to help patients become partners in producing high-quality health care. For example, a report of diabetes performance could be used to educate patients about their care.
What will be the general format of performance reports?
CVEs and their stakeholder organizations adopted a variety of reporting formats, ranging from simplified reports that used symbols to indicate categories of provider performance to more complex numeric displays of performance data. In some cases, Internet-based reports offered both simple and complex formats, with numeric results available by selecting a provider's performance symbol.
Often the distinction between reports of relative performance and absolute performance was blurred. For example, a report might continue absolute performance percentages (e.g., the percentage of patients who received a necessary medical service) and arrange the providers in rank order based on these percentages. Thus, the report had a table of relative performance in which some providers were, by necessity, at the bottom and others at the top.
What will be the acceptable level of performance misclassification due to chance?
CVE leaders tended to view the acceptable risk of performance misclassification as being subject to ongoing conversation and negotiation, partly because misclassification risk is a relatively new concept in the field of health care provider performance reporting. However, there was consensus that providers, patients, and CVE leaders all wanted valid and reliable performance reports (i.e., reports that displayed performance data that were close to providers' "true" performance).
Because "misclassification risk" can be a foreign concept to many CVE stakeholders, it was suggested that to engage stakeholders in fruitful discussion, CVEs could improve participation by debating more concrete questions, such as "What constitutes a fair minimum sample size?" These concrete questions are important, fundamentally, because they influence the risk of performance misclassification.
Which measures will be included in a performance report?
In general, CVEs reported measures of the technical quality of care. Less commonly, CVEs reported measures of patients' health care experiences, and very few CVEs planned to report measures of cost or efficiency of care in the near future.
Leaders of CVEs or CVE stakeholder organizations with several years of performance reporting experience tended to describe formalized measure selection processes. These frequently involved committees of providers, purchasers, patient advocates, academics, and other interested parties. Measure selection processes were designed to identify measures that were aligned with local and national priorities, that would be plausibly valid and reliable in the provider population to be measured, and that had already been developed. In some cases, some of the older CVEs or CVE stakeholder organizations developed their own performance measures when no existing measures were available to address key local priorities for performance improvement.
Newer CVEs gravitated toward performance measures that were in common use across the country and that were already familiar to local stakeholders, often because individual health plans already had begun to give providers feedback on these measures. Measures from the Healthcare Effectiveness Data and Information Set (HEDIS) of the National Committee for Quality Assurance (NCQA)—or measures that are designed to capture similar aspects of provider performance—were commonly identified as initial priorities for public reporting.
How will performance measures be specified?
The leaders of CVEs and CVE stakeholder organizations reported a strong desire to use nationally endorsed measure specifications whenever these were available. Leaders found nationally endorsed specifications advantageous because they allowed comparison with national performance benchmarks and because national endorsement facilitated stakeholder buy-in. CVE leaders generally were not eager to try to "improve" nationally endorsed measure specifications. When improvements were deemed necessary, some leaders indicated that their preferred strategy would be to present their suggestions for improvement to national bodies, with the aim of changing the nationally endorsed measure.
However, it was not always possible to use nationally endorsed specifications when constructing performance measures. For example, nationally endorsed specifications might be designed for application to health plan claims data, but the performance data available to a CVE might come from provider registries. These types of "data source mismatch" often necessitated modifications to the nationally endorsed specifications so that similar performance measures could be constructed from locally available data. When modifications were made, CVE leaders emphasized the need to explain to stakeholders that comparisons with national benchmarks probably would not be valid.
What patient populations will be included?
CVE leaders generally sought to include as many patients as possible within their communities, subject to the limitations imposed by the sources of performance data. When health care providers supplied the performance data, such as data from registries or medical records, all patient populations receiving care from the providers could be included. However, when health plan claims were used, only the patients enrolled in the participating plans could be included. In some cases, this meant that only patients with commercial health insurance (and in some cases Medicaid) were included in performance reports. CVE stakeholders almost unanimously expressed a strong desire for Medicare fee-for-service claims data so that performance reports would reflect the care delivered to the Medicare population.
What kinds of data sources will be included?
CVEs and their stakeholder organizations reported using a variety of data sources as the basis for constructing performance measures. These sources included health plan claims, data from provider registries or medical records, patient survey data, and "prescored" data such as Leapfrog measures of hospital safety. When using health plan claims, CVEs usually contracted with an experienced claims analysis firm (when "raw" claims data were obtained directly from health plans) or used a distributed data model in which each health plan processed its own raw claims according to the measure specifications the CVE supplied.
How will data sources be combined?
CVEs and stakeholder organizations with more extensive public reporting experience heavily emphasized the importance of having a complete and accurate provider directory when combining sources of performance data (especially for ambulatory care, since even a relatively small locality can have many ambulatory providers). Such a directory was felt to be the best way to create a "crosswalk" between data sources, since each source might have its own identifier for the same provider. However, the leaders of these experienced CVEs noted that creating an accurate provider directory required the investment of substantial staff time and financial resources over multiple years. In addition, once the directory was accurate, maintaining its accuracy required significant ongoing investment.
Establishing good relationships with the provider community was mentioned as a key ingredient for successfully building such a directory. But even the CVEs that had accumulated greater reporting experience using directories of ambulatory providers had reporting limitations. For example, these directories tended not to include providers in practices below a certain size threshold (e.g., below two to four physicians in a single clinic).
How frequently will data be updated?
CVE leaders generally described updating the performance data in their public reports every 1 to 2 years. However, nearly all expressed a desire to both increase the frequency of data updates and decrease the lag between the time clinical care is delivered and the time of performance reporting based on that care. Some expressed the hope that electronic health records would enable "real-time" data collection that would enable these goals to be achieved.
How will tests for missing data be performed?
CVE leaders' approach to missing data depended on the data source. For data obtained directly from providers, some CVEs used auditing procedures that examined a sample of provider records to ensure completeness. For health plan claims data, some CVEs contracted with experienced claims analysis firms and verified that these firms performed tests for missing data. In addition, some CVE leaders emphasized the importance of knowing the data source. For example, if a certain health plan is known to have capitation products for which no fee-for-service claims are generated, claims data for patients enrolled in these capitation products will be "missing" from the standpoint of performance measure construction.
How will missing data be handled?
The approach of CVEs and stakeholder organizations to missing data was generally to first attempt to recover as much missing data as possible by working with data sources. After this step, CVEs reported provider performance based on the available data. Performance data were not imputed (i.e.,statistically estimated), primarily because imputation was thought to be unacceptable to CVE stakeholders, especially providers. When a provider was known to be providing patient care in a CVE's community but no performance data for that provider could be reported, CVE reports generally displayed a symbol indicating that performance could not be reported due to a lack of sufficient performance data.
How will accuracy of data interpretation be assessed?
CVEs' general approach to ensuring accuracy of data interpretation was similar to the approach to identifying and handling missing data. However, in addition to working with experienced data analysts and knowing their data suppliers, some CVE leaders pointed out the importance of calculating community-level performance scores for a "reality check" (as an initial way to assess the accuracy of data interpretation).
How will performance data be attributed to providers?
Attribution rules varied from CVE to CVE and from performance measure to performance measure (even within the same CVE). For example, the attribution rules applied to screening measures might differ from the rules applied to measures of chronic disease care. Attribution to organizations (e.g., hospitals) followed national guidelines when these were available, but there was more heterogeneity in attribution strategies for ambulatory providers (including individual practitioners). In general, CVEs and CVE stakeholder organizations used plurality-based algorithms (e.g., majority of visits) or minimum-visit thresholds (e.g., at least one visit for a certain condition within the measurement year) to attribute ambulatory care measures to providers.
Will case mix adjustment be performed? (If so, how?)
When nationally endorsed measure specifications incorporate methods for case mix adjustment (e.g., measures of mortality rates for hospitals), CVEs and CVE stakeholder organizations generally applied these nationally endorsed case mix adjustment methods. However, when nationally endorsed case mix adjustment methods were not available (which was the case for most measures reported by CVEs), the leaders of CVEs and CVE stakeholder organizations reported that they did not apply new case mix adjustment methods in creating performance reports. For example, no CVE leader that we interviewed performed case mix adjustment of process measures of the quality of care. When there was concern that certain patient populations might be more "challenging" than others, some CVEs reported stratified results instead of results that were case mix adjusted. The rationale for using stratification rather than case mix adjustment was that adjustment would "hide" undesirable disparities in care, while stratification would allow fair comparisons between providers without hiding disparities.
What strategies will be used to limit the risk of misclassification due to chance?
CVEs and CVE stakeholder organizations used a wide variety of strategies to limit the risk of performance misclassification due to chance. These included:
- Basing performance thresholds on tests of statistical significance. When this option was chosen, CVEs generally used statistical significance thresholds to err on the side of classifying provider performance as "average." This approach limited the probability of misclassifying a truly average provider as above or below average to no more than 5 percent. But in some cases, statistical confidence intervals were used to always give providers the benefit of the doubt. Each provider's performance was classified in the highest category that overlapped the provider's 95 percent confidence interval for the measure in question.
- Using a "zone of uncertainty." Because the risk of performance misclassification rises as provider performance gets close to a classification threshold, some CVEs gave providers the benefit of the doubt whenever their performance was within a "zone of uncertainty" around each threshold. In other words, performance was reported as being above threshold for all providers whose performance was within the zone of uncertainty.
- Using a minimum reliability criterion. Some CVEs limited the risk of misclassification due to chance by setting a minimum reliability for performance reporting (generally using a minimum reliability of 0.7 when this strategy was chosen). For this strategy, CVEs calculated reliability on a measure-by-measure and provider-by-provider basis, excluding from public reporting measures and providers that did not meet the minimum reliability standard.
- Using a minimum number of observations (a "minimum N"). Instead of using a minimum reliability criterion, some CVEs used a minimum N criterion. In deciding the right minimum number of observations, some CVEs looked for guidance from other reporting collaboratives and negotiated with their stakeholders. Other CVEs took a more mathematical approach, calculating for each performance measure the number of observations needed to achieve a minimum level of reliability (or limit the risk of misclassification to a certain level). CVEs taking the more mathematical approach found that (1) the minimum necessary number of observations could vary by measure, and (2) the minimum number of observations could be far greater than the minimum numbers used by other performance reporting collaboratives that had not taken a mathematical approach.
- Reporting performance at higher levels of provider organization. Most CVEs reported the performance of provider organizations (including ambulatory clinics) rather than individual practitioners. But some did express the goal of eventually finding ways to report individual practitioners' performance in ways that would not introduce too much risk of performance misclassification due to chance.
Will composite measures be used?
Some CVEs and CVE stakeholder organizations reported provider performance on composite measures. When they were reported, composite measures were generally based on single health conditions (e.g., diabetes care, heart attack care) and used an approach based on taking the weighted average of the individual measures when calculating the composite measure score. However, some CVEs used an "all-or-none" approach to combining individual measure scores. CVE leaders observed two advantages of all-or-none composites:
- The range of between-provider variation on "all-or-none" composites was higher than the range on individual measures.
- The "all-or-none" approach was thought to be relatively easy to explain to stakeholders, including both patients and providers.
What final validity checks might improve the accuracy and acceptance of performance reports?
In general, CVEs and CVE stakeholder organizations described making final validity checks with the assistance of health care providers. These validity checks included giving providers a confidential preview of their performance results, which, in some cases, included patient-by-patient performance data. With these previews, providers could correct or appeal their performance results in some cases or, in other cases, opt out of a single round of public reporting to either correct problems with their data or improve their performance.