Pay for Performance: A Decision Guide for Purchasers

Phase 3. Implementation

Questions 14-18 need to be addressed in the implementation of a P4P program:

Question 14. How do we address providers’ concerns about whether risk adjustment adequately captures the severity of illness of their patients?
Question 15. If we currently sponsor a private or public report card, will P4P offer more of an incentive? If we are considering both a public report and P4P, which should we pursue first?
Question 16. Should we tailor P4P for subsets of a particular group of providers, e.g., safety-net hospitals?
Question 17. How should we think about P4P and its relationship to benefit design, including tiered networks?
Question 18. Is there any special advice for Medicaid agencies and Medicaid managed care plans interested in P4P?

Question 14. How do we address providers’ concerns about whether risk adjustment adequately captures the severity of illness of their patients?

Providers who treat a larger proportion of higher risk or less adherent patients may receive lower ratings on process and outcome measures, despite making equal efforts to practice high-quality care. Thus, providers legitimately want to make sure that a P4P program accounts fairly for patient differences. Risk adjustment models to correct patient outcome estimates (usually mortality rates) for underlying differences in patient populations have been under development for many years.19,41 Nonetheless, providers worry about the adequacy of risk adjustment.18,38,44 Furthermore, refusal to address such concerns may threaten the legitimacy and sustainability of any incentive program.10,45-47

Risk adjustment is generally less effective when administrative data are used because detailed clinical information (e.g., blood pressure) is typically unavailable. Analysts have shown, however, that in some cases the addition of a few simple clinical variables to administrative data would be sufficient to make risk adjustment comparable to that which can be achieved with the sophisticated databases many specialty societies have developed. 19,41 This is especially the case in States such as California that are adding "condition present on admission" indicators to their administrative data to distinguish pre-existing comorbidities from treatment-related complications.

A P4P sponsor could engage providers in the design of a clinical data collection system that is either consistent with one of the growing number of national databases (specialty societies, CDC, JCAHO, others) or a less burdensome augmentation of administrative databases and test whether the additional data actually make a difference in the distribution of rewards.

Finally, some approaches to P4P will be less sensitive to differences in patient characteristics than others. In particular, if a purchaser decides to reward providers for improvement relative to their own baseline rather than for meeting a common standard, risk adjustment will be less of an issue than if a tournament approach is used where only the top ranked providers receive a bonus.

Return to Phase 3 Contents

Question 15. If we currently sponsor a private or public report card, will P4P offer more of an incentive? If we are considering both a public report and P4P, which should we pursue first?

No studies have compared the effects of report cards relative to P4P. There is evidence that providers respond to public reports about their performance,18, 48,49 although hospital executives have indicated that their response to public reporting may wane over time, especially if there are no supporting financial incentives.12,45 Thus, the approaches may best be viewed as complementary, rather than mutually exclusive.

Public reporting may be part of a phase-in strategy for P4P; this appears to be the strategy chosen by CMS in the case of the Hospital Quality Alliance data, although the specifics of a P4P program for hospitals have not yet been determined. An advantage of this approach is that it gives providers time to improve their data collection and become more proficient in using methods of performance measurement before the measures become economically significant. This may facilitate the use of a measure set of greater scope than would be acceptable to providers if P4P were to start with the initial measurement period.

In some cases, public reporting and P4P may differ somewhat in focus. For example, research has shown that it is preferable not to include a large number of technical quality measures in a public report card if the goal is to affect consumer choice. So a report card might display a few composite measures of evidence-based care and patient experience, while the P4P program could separately target specific processes and outcomes where the purchaser has identified a shortfall in quality.

Return to Phase 3 Contents

Question 16. Should we tailor P4P for subsets of a particular group of providers, e.g., safety-net hospitals?

Providers treating patient populations that are low income and/or have low educational attainment or literacy may be disadvantaged by a "one size fits all" approach to P4P because these communities have poorer health behavior than others (patient differences could also affect patient experience of care, for example, because of cultural issues). To the extent that a payer is concerned about improving performance of all providers or is particularly interested in reducing disparities in the quality of care, a more targeted approach might be warranted.

Purchasers could tailor a P4P initiative in a variety of ways:

  • Purchasers could make the reward larger for some providers—either those providers with the lowest performance ratings or, for example, safety-net providers. One argument for increasing payments is that the costs of improving care will be greater for some providers because of geographic, linguistic, financial, and other barriers that they or their patients face or a lack of infrastructure and poor human resource capacity for quality improvement.
  • Purchasers could provide capital grants and/or technical assistance to poor-performing providers again as a way of offsetting their presumed higher costs of complying with performance standards. Independent Health in New York, for example, assists providers serving large numbers of Medicaid patients in planning quality improvement programs.
  • Purchasers could allow performance measures to vary across providers. Again, Independent Health involved providers with large numbers of Medicaid patients in the selection of site-specific quality metrics.50

A final strategy for tailoring P4P would be for purchasers to set lower performance standards for certain kinds of providers that have lower performance or fewer resources—for example, small practices or rural hospitals. To illustrate, a plan might provide a bonus to all urban hospitals that give at least 90 percent of their patients beta-blockers after a heart attack but advise rural hospitals (who in this example are assumed to have lower rates of beta-blocker usage) they need only achieve 80 percent adherence to receive a bonus. The important argument against this approach is that it will institutionalize disparities in quality. For this reason, approaches that differentially empower low-resource providers and those serving disadvantaged populations are preferred.

Return to Phase 3 Contents

Question 17. How should we think about P4P and its relationship to benefit design, including tiered networks?

P4P programs have been implemented in the context of health maintenance organizations (HMOs), point-of-service plans, preferred provider organizations, indemnity plans, and consumer-directed health plans.1 In principle, provider incentives can be established independently of benefit design, but in practice there will be important interactions to consider, including assignment of accountability and alignment of physician and patient incentives.

The first consideration is assigning accountability. In many HMO arrangements, patients must select a physician or medical group to act as a primary care "home" and possibly as a gatekeeper for referrals. These providers will then be a natural unit of accountability for the quality of primary prevention and chronic illness management. In contrast, in a setting where patients do not have identified or assigned primary care providers, attributing responsibility becomes somewhat more complex, but not insurmountably so.

Two basic strategies for attribution of responsibility for the quality of care of individual patients based on contact have been used in practice, each with advantages and disadvantages:

  1. All physicians with a minimum level of contact are accountable for a patient’s care.
  2. A primary responsible physician is determined retrospectively based on contact.

With regard to the first approach, if multiple physicians share responsibility for delivering a specific test or service, all have a reason to ensure quality, but shirking of responsibility also might occur. In addition, physicians might order redundant tests or services if they do not receive information about services provided by the other physicians.

With regard to the second strategy, a key disadvantage is that during the course of the year, physicians will be uncertain as to whether any given patient will affect their performance estimate because attribution is determined retrospectively.

The second important connection between P4P and benefit design is the congruence of physician and patient incentives. Although there is no empirical evidence of a connection, it is logical to conclude that patient and provider incentives each will be more powerful if they are aligned. For example, some purchasers in the Bridges to Excellence Diabetes Care Link program offer their employees rewards for participating in improving the management of their diabetes.51 Similarly, purchasers who have constructed or are considering tiered provider networks may want to consider focusing on the same sets of performance measures for P4P to intensify the impact.v


v. Some measures, however, may be appropriate for tiering but not for P4P—for example, the volume of certain kinds of procedures.

Return to Phase 3 Contents

Question 18. Is there any special advice for Medicaid agencies and Medicaid managed care plans interested in P4P?

In many States—including Michigan, Pennsylvania, and New York—Medicaid agencies—offer auto-assignment and/or financial bonuses to managed care organizations that perform well on clinical quality and patient satisfaction measures. Medicaid managed care organizations also have implemented P4P. For example:

  • The Local Initiative Rewarding Results program in California offers financial rewards based on the quality of ambulatory care for MediCal beneficiaries.
  • Hudson Health Plan, a Medicaid managed care plan in New York, also has a number of P4P initiatives including rewards for childhood immunization and effective management of patients with diabetes.
  • The Neighborhood Health Plan of Rhode Island uses P4P to target asthma care.
  • In North Carolina, the Primary Care Case Management program has introduced both financial bonuses and recognition for physicians that either reach a best practice performance goal (85th percentile of baseline performance) or improve by 20 percent and exceed the median level of baseline performance. Performance measures in the first incentive year (through June 2006) are related to care for asthma, diabetes, and prescribing patterns.

Purchasers such as Medicaid and Medicaid managed care plans face many of the same obstacles discussed above, particularly with regard to the need to protect safety-net providers and their patients (Question 16). In addition, constrained Medicaid budgets have resulted in below-market provider reimbursements so that program participation is an ongoing concern. These issues highlight the need to involve providers early and continuously in the development and evolution of an incentive program. The experiences of two New York Medicaid plans corroborate this observation. The Hudson Health Plan focused intently on provider communication. Health Now management developed its initial P4P program internally, albeit with the intention of creating a program that providers would find easy to understand and implement. A survey by the Center for Health Care Strategies found better provider acceptance of the Hudson Health program than the Health Now program, and Health Now has moved to increase provider participation in the redesign of its program.50

In addition, because of particular concerns with patient adherence in populations with low literacy and other challenges, Medicaid programs and plans may find it particularly beneficial to emphasize patient incentives alongside provider incentives, which is likely to improve provider perception of the P4P program as well. Patient incentives are currently in use by some Medicaid managed care plans to encourage appropriate use of services such as adolescent wellness visits and prenatal care.50 Executives at CalOptima, a Medicaid managed care program in California, believe that participants in a beneficiary incentive program in which department store gift cards are offered for adherence to preventive care recommendations are more likely to receive appropriate immunization and prenatal care.50

Medicaid programs may wish to consider P4P in one market in which they are the dominant payer and thus could have substantial impact: nursing home care. Legislation passed in 2005 in Ohio outlines such a program and sets aside 2 percent of average payments to be allocated to the best-performing facilities with regard to a set of structure, process, and outcome measures of quality (and casemix). Performance data on nursing homes are currently being collected and publicly reported by CMS; these data would be a natural platform for P4P. In addition, CMS has recently begun designing a nursing home P4P demonstration project, which may provide both momentum and information for State Medicaid agencies interested in implementing programs of their own.52

Finally, Medicaid programs will need to consider regulatory requirements, particularly if they intend to receive a Federal match for the payment incentive (Box).


Programmatic Issues for State Medicaid Programs Considering Pay for Performance

The method by which a State may choose to accomplish its quality-based purchasing program can vary greatly because of the variety of approaches available to a State to administer its Medicaid and State Children’s Health Insurance Programs. In general, States have broad flexibility, within established Federal regulations, to decide on medically necessary services that will be covered and rates that will be paid to providers or plans. CMS may review these plans through a State plan or a Medicaid demonstration project application or amendment and through various other mechanisms.

In general, if the pay-for-performance program is a part of a fee-for-service delivery system, a State may include its initiative in its State plan. While the requirements for payment for managed care are somewhat more complicated, CMS will work with States to determine the proper method to implement such an initiative. A waiver under Sections 1115, 1915(b), or 1915(c) of the Social Security Act may be necessary when the initiative will not be statewide; will impact the amount, duration, and scope of benefits; will affect the comparability of benefits across the eligible population; or will restrict beneficiary choice of provider.

Source: Jean Moody-Williams, Centers for Medicare & Medicaid Services.

Return to Phase 3 Contents
Return to Contents


Phase 4. Evaluation

P4P programs are a work in progress and, because there is little evidence as to the effects of specific approaches, will need to be monitored and improved on an ongoing basis. Although evaluation will naturally follow implementation, the two questions in this section need to be asked during the design phase to assure that the implementation of the program will support meaningful evaluation. They are:

Question 19. How can we tell if the P4P program is working?
Question 20. What unintended consequences should we look for?

Question 19. How can we tell if the program is working?

Learning about the impacts of a P4P program can be particularly challenging because a multitude of additional forces simultaneously affect the quality of patient care and costs. Ideally, purchasers would implement P4P in one market or sub-market and track the same performance measures on a set of comparison providers. Some large purchasers and CMS may be in a position to implement P4P is this way, but most purchasers will not design their programs as controlled trials. Therefore, some care is needed to disentangle the effects of the program from other trends.

At a minimum, purchasers should collect baseline data on the targeted quality measures (this will be a critical part of implementation too, of course, because providers without a clear understanding of their performance can hardly be expected to respond optimally to P4P). Then, as performance data are collected for payment purposes, the main effect of the program can be evaluated in terms of the change in performance, preferably compared either to some comparable but unaffected population or the trend in performance prior to implementation.

Purchasers will have to decide how rigorous an evaluation needs to be to ascertain whether a program is working and how to improve it. To adhere strictly to scientific standards of evidence may be too costly and produce evidence too late to be useful for decisionmaking. On the other hand, erroneous conclusions that may be drawn from anecdotal or incomplete information may have substantial costs as well.

Return to Phase 4 Contents

Question 20. What unintended consequences should we look for?

In addition to the hoped-for effects of the program, purchasers will need to monitor, and try to minimize, unintended negative consequences. Three important negative effects to look for are patient selection, diversion of attention away from other important aspects of care, and widening gaps in performance among providers.

  • Patient selection. Providers may avoid sicker patients in the belief that risk adjustment is not adequate and that caring for such patients will reduce their measured performance. Surveys done after New York instituted public reporting for coronary bypass found that two-thirds of cardiac surgeons admitted to avoiding the most severely ill patients.53 To minimize the potential for the P4P program to result in selection of the "easiest" patients or exclusion of high-risk or non-adherent patients, purchasers can focus on structural or process measures of quality. Risk adjustment of performance measures, particularly those that relate to patient outcomes such as complication or readmission rates, should help to minimize selection incentives as long as providers believe the risk adjustment is adequate. In addition, including explicit reporting of casemix data—which would show providers who are avoiding or accepting the more difficult cases—or providing differential rewards for meeting performance goals with more difficult patients could increase providers’ willingness to take on these cases. Another possibility would be to collect and report information about patients who change from one provider to another. A provider who was avoiding sicker patients would be identified by the high casemix scores of patients leaving his practice.
  • Diverting attention from other aspects of care. Targeting specific performance measures may focus provider attention on the conditions or care processes for which there is measurement and payment, to the detriment of performance in other areas.15 At a minimum, this problem suggests the need for careful measure selection and attention to interrelationships among targeted and untargeted domains of performance. Rewarding providers for performance on some broader measures of outcome, such as patient experience or decubitus ulcer (bed sore) rates and pain scores in hospitals, would mitigate this problem as well.
  • Widening performance gaps. This may be particularly likely to occur if the purchaser chooses to reward only providers that meet a high standard of performance or those that are the highest ranked among peers. If P4P results in a substantial redistribution of resources then some providers may actually worsen with respect to quality of care.� This will be a particular concern if those providers serve large numbers of beneficiaries/enrollees or are part of the safety net, and/or if there are not enough suitable choices for the population that receives care from these poor-performing providers. If these adverse consequences are anticipated or noted, purchasers can consider the solutions described in Question 16.

These examples give important clues about what evidence to seek in evaluating programs for unintended consequences. Clinician feedback should be sought about unexpected problems with the measures used, including difficulties with both access to care and pressure to offer inappropriate care. Since such data would come from clinician surveys (and unhappy clinicians would be expected to be motivated to respond), getting this feedback should not be too burdensome. Similarly, purchasers should consider tracking a set of performance indicators that are outside of the P4P program to better understand both negative and positive spillover effects from the program onto untargeted clinical domains. Finally, evaluation of the program should not just look at average performance but at the effects of P4P on different parts of the delivery system including providers with high and low baseline performance.

Return to Phase 4 Contents
Return to Contents


A Final Note—Sustaining Quality Improvement

Even the best-designed P4P program will require maintenance. For example, if the program uses fixed targets, the targets will need to be advanced as providers improve. We note, however, that if providers see that targets are fully adjusted to reflect gains in prior year performance, incentives to improve quality in the current period may be dampened. For most measures, there are also natural "ceiling" effects that will lead to diminished opportunities to improve quality over time. As adherence rates to evidence-based guidelines approach 100 percent, the incremental cost of improving quality is likely to increase as only the cases that failed to respond to initial quality improvement efforts remain.

As clinical evidence about best practices changes, structural (e.g., information technology requirements) and process measures will also need to be updated. Purchasers will have to balance the need to keep P4P programs effective by retiring measures that are no longer useful against the concern that P4P programs provide some stability so that providers can undertake larger investments with the expectation that the reward structure will not be dramatically altered in the short run (and hence a reasonable return on investment can be expected). To this end, explicitly including providers in the decisions about measure selection and retention may be desirable. One approach that has been adopted by some programs, including the California IHA, is to commit to medium-term plans (2 or 3 years) with regard to measure sets and introduce measures in a "testing set" prior to their full inclusion.

To the extent possible, purchasers should use their P4P programs to promote continuous innovation rather than institutionalize a single approach to delivering high-quality care. This concern might be addressed by rewarding, at least in part, outcome measures. Vigorous attempts to keep structure and process measure targets up-to-date with the latest technology will also reduce system rigidity, but political and bureaucratic barriers to change will be inherently limiting.

Return to Contents
Proceed to Next Section

Page last reviewed April 2006
Internet Citation: Pay for Performance: A Decision Guide for Purchasers: Phase 3. Implementation. April 2006. Agency for Healthcare Research and Quality, Rockville, MD.