Evaluating the Impact of Value-Based Purchasing: A Guide for Purchasers
How Do You Choose a Research Design?
A number of factors will influence which of the research designs (alone, or in combination) would be best-suited for an evaluation of your VBP initiatives. Purchasers, possibly working with other stakeholders, can get started by trying to reach agreement on the following questions:
- What Do You Want To Learn and How Do You Expect To Use the Information? The first task is to identify the research designs that can provide you with useful information. For example, some purchasers conduct evaluations to learn how the initiative is being perceived by stakeholders and to identify any barriers; in those cases, interviews or focus groups are likely to be most useful. Others want to gather data to get a general idea of whether the program is on the right track; so simple quantitative analyses are often appropriate. Still others pursue an evaluation in order to decide whether to continue the investment in a VBP activity; this goal may call for a quantitative research design that would support a solid analysis of the costs and benefits of the initiative.
- What Kind of Evidence Do You Need? One of the most important criteria for choosing a research design is the kind of relationship you want to see. In some cases, it may be sufficient to see evidence of a possible correlation; for example, a purchaser that has implemented an initiative to spur providers to adopt computer systems that double-check prescriptions may be satisfied to know that hospitals are investing more in information technology. In other cases, purchasers may want evidence of causation, i.e., results that demonstrate that the VBP activity is having its desired effect. To learn this, purchasers must choose the study design that will be the strongest for showing whether or not the activity causes the result the purchaser wants to see.
Purchasers must keep in mind that statistical analyses vary in their ability to detect an effect if one exists. And for all of the quantitative research designs, the statistical power will depend on the size of the effect and the size of the sample. For example, it is easier to detect a 10 percent decrease in mortality as opposed to a 5 percent decrease; and, whatever the effect is, it will be easier to detect with 1,000 observations than with 100. The earlier example from the New Jersey Medicaid program shows how power was reduced, despite randomization, because only half of the intervention group actually remembered receiving the report card.
- Do You Need To Defend the Results to an External Audience? A related issue involves the level of certainty you want to have about the results. If providers or health plans (or your own managers) are likely to scrutinize and question the findings, you may need to choose a design that can adjust for or explain the effects of variables other than the VBP activity. Your ability to implement one of these designs will depend on whether you have baseline data, comparison groups, adequate sample sizes, and randomized assignment.
- How Much Money Can You Put Towards the Evaluation? Some evaluation designs are more expensive than others; so it is important to know what your limits are. That said, other considerations may be more important than financial concerns. For example, if you need a strong analytic study with defensible results but cannot afford one, paying for a cheaper study that produces questionable results would not be a worthwhile option.
- Do You Have Access to Other Resources? Some purchasers can overcome financial limitations by taking advantage of resources available within the organization or through partners in the VBP activity. For example, academic researchers at local universities may be willing to donate their time to an evaluation (especially if the findings can be published); the purchasing organization may be able to provide the analysts with office space and computers. A related question is whether you have access to analysts who can handle sophisticated evaluation designs. While you can always find appropriate researchers if you have the funds to look outside of your organization, this option is not available to all value-based purchasers.
- How Much Time Do You Have for the Evaluation? The answer to this question will be driven primarily by when you need the results, but it may also depend on budgets and staff availability. The options available to you if you need an answer in 6 months are very different from what you can do if you can wait for 3 years. The collection of primary data is especially time-consuming.
- What Kinds of Data Are Available to You? The choice of research design is often circumscribed by the nature and scope of the clinical or administrative data that are readily available. To the extent that the data you need are controlled by health care organizations, you may need to consider how much cooperation you can anticipate from local providers and health plans. One way to address this issue is to plan ahead for the evaluation by incorporating requests for data into contract negotiations. However, a significant amount of data is now readily available thanks to standardized measurement tools such as HEDIS® and CAHPS® (go to Step 4 for a discussion of these two tools).
Step 4. Implement the Research
Once the research design has been selected, the evaluation itself can begin. This process may take many forms, but it generally requires three tasks:
Since this guide is designed for the decisionmaking purchaser, rather than the analysts who may actually implement an evaluation, this section simply reviews some of the issues and resources that purchasers should be aware of with respect to choosing measures and collecting the data. It assumes that the data analysis will be handled by experienced researchers, whether internal or external to the organization.
Task 1: Identify Appropriate Measures
During the process of selecting a research design, purchasers often have to consider how they expect to define and measure the outcomes in which they are interested. For example, if a VBP activity was intended to improve quality of care for employees with heart disease, how exactly will you measure quality? Will you look at measures of health status, patient satisfaction, or clinical processes?
The specific definitions of quality and cost are less important than the recognition that both are important for defining, measuring, and focusing on value. Although this point may seem obvious, it is a crucial step in thinking about how to assess the impact of VBP activities because it draws attention to both the costs of those activities and the extent to which those activities improve the quality or reduce the costs of care. This section offers a broad discussion of measurement issues for both final and intermediate outcomes of interest to value-based purchasers.
It is important to remember that the measurement strategy must fit the intended research design, with quantitative research designs and methods generally imposing more formal measurement requirements. For example, a pre-test/post-test research design will require the ability to measure specific outcomes before and after the intervention. As mentioned previously, data availability and measurement issues can preclude the selection of specific research designs
Measuring Impact on Health Status. Evaluators rely on a wide range of measures to capture the impact of VBP activities on health outcomes. But purchasers must think carefully about which of those definitions and measures they want to use, particularly if health plans or providers may challenge their decisions.
Health status outcomes are not easy to measure, in part because it is not clear which perspective to take (i.e., the patient's or the clinician's) and which domains of health to evaluate. On the one hand, you could evaluate health outcomes using clinical measures, such as weight, cholesterol level, and other commonly used metrics. However, clinical measures do not capture the perspective of the person whose health is being evaluated, and therefore can miss very important aspects of health, such as mental and social well-being. Many researchers recognize the importance of both the clinical and patient perspective in defining the health status of individuals, and use a combination of both approaches to arrive at a final judgment.
Available measures of health outcomes include the following:
Measuring Mortality. Mortality rates, i.e., the rate of death for a given population, are sometimes regarded as a measure of health. For example, researchers often compare mortality rates and life expectancy rates to gauge the health status of different countries. When they find significant differences in these rates, they may say that one country is healthier than the other. But differences in the mortality rate do not always point to the cause of the differences, which undermines its usefulness as a measure of health.
In theory, purchasers could also use mortality rates as a measure of the health status of their covered populations. However, the benefit of using this measure is questionable because only a small percentage of a population dies in a given year, particularly when the population is younger and healthier as in an employment-based setting.
That said, mortality rates are a feasible and useful measure for certain VBP activities. For example, to evaluate a VBP initiative that tries to steer bypass surgery patients to high-volume providers, it would be reasonable to assess the impact on mortality rates for those patients. When used for this purpose, mortality rates should be age-adjusted and based on reasonable time windows (i.e., annual mortality).
Measuring Morbidity. Morbidity is a term used to describe the average level of illness in a population. Morbidity accounts for pain, chronic illness, acute illness, mental illness, etc. On a societal level, researchers often measure morbidity by the prevalence of chronic disease in a population or by measures that are correlated with illness, such as missed school days and job-based disability claims. Not surprisingly, morbidity occurs more frequently than mortality.
From a purchaser's perspective, morbidity can be regarded as a function of both the prevalence of chronic illness and the level of functioning of those with chronic illness. In most cases, VBP activities are more likely to affect the latter than the former. For instance, a VBP program aimed at diabetics cannot be expected to reduce the prevalence of diabetes in a given population. In fact, to the extent that the activity is designed to facilitate the identification of diabetics, the activity itself may reveal a higher prevalence than existed at baseline. However, the VBP activity could reduce some of the negative effects associated with chronic conditions. For example, a VBP initiative aimed at asthmatics might help patients take better control of the disease, reducing complications due to asthma and lowering the number of unnecessary emergency room visits and missed school or work days. Thus, for the purposes of assessing VBP, appropriate measures of morbidity would include indicators such as hospital readmission rates or correlated measures such as absenteeism from school or work.
In the long term, it may be possible for some VBP activities to affect the prevalence of chronic illness in a population. For example, many people believe that early intervention programs focusing on diet, exercise, and regular screening can prevent or reduce the level of chronic illnesses such as diabetes. Early screening and detection can also minimize major complications associated with diseases like cancer.
Measuring Health Status. The set of tools for measuring health outcomes for populations includes survey instruments designed to measure self-reported health status. The most common of these survey instruments is the SF-36® (QualityMetric, 2001). These instruments, which can be administered to populations including those covered by purchasers, have been demonstrated to measure health status broadly. Similar but more focused instruments have been developed for measuring health status for people with specific conditions, such as depression or asthma.
Although purchasers could use health status assessment instruments to measure final outcomes, it may not be reasonable to expect VBP activities to have a strong influence on these outcomes at the population level. However, for specific VBP activities, such as those that target care for chronic illnesses, it may be feasible to use the assessment instruments developed specifically for those conditions to detect differences in health status for relevant segments of the population.
Health status measures such as the SF-36® do not capture patient preferences for various health states, but other standard tools permit preference weighting of health states. For example, researchers might consider the quality of well-being index or the health utilities index. These indices, which generate measures of quality adjusted life years, are the recommended approach by the panel on the cost effectiveness in health and medicine (Gold et al., 1996). Many disease-specific indices exist as well, although most are not preference weighted. (For more information, go to Gold et al., 1996.)
Measuring Impact on Satisfaction With Health Plans and Care Delivery. The difficulty with measuring satisfaction with health plans and care delivery is that the scope of services, activities, and benefits encompassed by these two topics is quite large. As a result, it is a real challenge to develop a single, meaningful measure and to cover all relevant domains without making the questionnaire unreasonably long. These challenges become even greater when purchasers want to learn what is causing satisfaction to be less than optimal so that they can identify and make appropriate changes in policy.
However, satisfaction and opinion surveys of this type do exist and are used by many purchasers. For example, the CAHPS survey, which is discussed in Task 2, provides measures in several domains that are relevant to consumers, including a measure that reflects overall satisfaction with one's health care plan.
Measuring Impact on Costs. The measurement of costs will likely include several different types of costs:
Measuring VBP Activity Cost. Measurement of this type of cost depends on the accounting systems involved and the extent to which resources devoted to value-based purchasing are shared with other activities. As a basic principle for cost measurement, the evaluator would identify all resources devoted to value-based purchasing and then assign a cost to those resources. One option is to take a narrow perspective that focuses only on costs borne by the purchaser. Typically, the cost of staff resources would be the percent of time that each staff member devoted to the activity multiplied by the relevant wage rate (plus fringe benefits). Other resources might include computer time, office space, printing, supplies and any outside consulting expenses. Some purchasers might want to take a broader perspective that includes provider-level costs associated with data preparation and implementation of the VBP initiative.
When relevant resources are used for activities other than value-based purchasing, you will have to decide how much of those resources to allocate to the VBP activity. One approach is to identify all of the costs that would disappear, in the long run, if the VBP activity were not conducted. For example, office space used by staff associated with the VBP activity might be considered a fixed cost that would be incurred even without the VBP activity, and therefore should not be included as a VBP cost.
Measuring Health Care Costs. If the VBP activity is intended to have broad effects on health care expenditures, competition among health plans, or even employee enrollment decisions, it is reasonable to use premiums as a measure of costs. Premiums are usually easy to measure if contracting health plans are providing a full range of administrative and risk-bearing services. Purchasers that use third-party administrators or other support entities should include the costs for those services in premium costs.
If benefit designs differ among the treatment and comparison group, the evaluators will have to make adjustments, either directly in the measurement of premium or in the research design. If direct adjustments are going to be made, they should be based on actuarial assumptions. One type of adjustment in the research design that could control for differences in benefit design would be the inclusion of indicator variables representing various aspects of benefit design as covariates in multivariate analyses. This is only feasible if there are a sufficient number of observations.
In the nonequivalent comparison group design, benefit design differences will only matter if they affect the trends in premiums. Any impact on the level of premiums will be captured by the trend in the comparison group. For example, if the differences in benefit design between the treatment and comparison group cause premiums to differ by a constant amount, the analysis will control for that difference.
In some analyses, evaluators may wish to measure health care costs using data on health care expenditures. Relative to premium data, this has the advantage of allowing detailed analyses of sub-populations or cohorts with particular clinical conditions. However, this approach does not capture any savings in administrative costs at the insurer level or any gains from competition among insurers. Measurement of spending on health care services at the individual level typically requires access to claims data. From the purchaser's perspective, the appropriate cost measure is the amount actually paid for services, including the contribution of the employee/patient. In some cases, such as pharmaceutical rebates, some effort may be required to determine true expenditures. (See Gold et al., 1996, for details on various approaches to measuring health care costs.)
It is important to remember that expenditures may not reflect true costs. Payments may be above or below what the providers actually spend to deliver the service. If the evaluators wish to measure true resource use, they will have to conduct a more detailed accounting of the process of care delivery, identifying resources used to deliver care and valuing those resources. In some settings, this can be done with provider accounting systems. In other cases, charges adjusted by cost-to-charge ratios could be appropriate. Either way, the evaluators must also pay attention to how overhead costs are allocated to various activities and services.
The duration of observation might have important implications for the observed impact of value-based purchasing on medical care costs. Many VBP activities generate some short-term costs. For example, programs to improve compliance with medications might increase short-term expenditures. Some VBP activities, such as asthma management programs, may produce offsetting savings even in the short run. For others, such as diabetes control programs, it may be many years until any medical savings are realized. Because it can take a long time for key gains to be realized, evaluators may want to rely on simulation techniques if they wish to construct a full analysis of the impact of certain VBP activities.
Measuring Costs Outside the Health Care System. Cost measurement from a broad perspective would entail measuring non-health care related resources, costs of informal care, and costs of patient (and family) time related to the consumption of medical care and a change in health status. If these costs are likely to be important, they should be included. However, because these costs are often borne by the patient and family, they may be captured in quality measures such as satisfaction. Gold et al. (1996) describe measurement strategies for these variables, but purchasers may want to measure these variables as costs only if the VBP activity is likely to affect them and they are not adequately captured by quality measures.
Choosing Measures That Match the Intervention
Sometimes, purchasers implement limited, focused interventions and then wish to detect an effect on broader, aggregate outcomes. But evaluations must focus on measures that reflect appropriate and relevant outcomes of the VBP activity. For example, suppose a purchaser implements a diabetes case management program that succeeds in reducing expenditures for diabetics by 10 percent. Even if diabetics represent 20 percent of the population, it would be difficult to detect a measurable impact at an aggregate level (e.g., by measuring premiums or population health status), particularly if there is a lot of variance in expenditures for the remaining population. More appropriate measures could include the annual costs of care for diabetics, their satisfaction with care, and complication rates for diabetics.
Measuring Impact on Labor Market Outcomes. Among the most difficult costs to measure are the costs associated with decreased labor productivity as a result of employees seeking care or experiencing poor health. This includes the costs associated with absenteeism, decreased productivity while working, and labor turnover. In theory, these costs could be measured by the value of lost production associated with the absenteeism and lost productivity, plus the administrative costs of replacing workers or adjusting production processes. In practice, measuring these costs is a serious challenge. Some evaluators assign a cost to absenteeism by valuing missed days from work at the wage rate of the workers. A more thorough analysis would use accounting principles to assess the impact of absenteeism on production costs.
In some cases, the evaluators might want to treat variables such as missed workdays as measures of quality. If so, they must be careful not to double-count these variables by also including them in the calculation of costs. Gold et al (1996) recommend that, if you are using a quality-adjusted life year measure of quality, production costs should be excluded, or at least reported separately. But measures of the impact on production costs can be important variables for many VBP activities, especially from an employer's perspective. If VBP activities may have a measurable impact on these variables, evaluators should try to measure the effects and include them with costs unless the effects are explicitly captured by quality variables.
Measuring Impact on Utilization of Services. If you want to use utilization as an indicator of quality, several measurement options exist. The easiest is to simply measure the use of the target service. For example, one could measure Caesarean section rates or mammography rates. Presumably, the VBP initiative would try to decrease the former and increase the latter, but this measurement strategy does not attempt to distinguish between appropriate and inappropriate changes in the rates of either service. The HEDIS® system follows this approach.
An alternative approach would be to conduct a detailed analysis of care, perhaps using medical records. This tends to be very expensive, but it is feasible and has been used in a variety of studies. For some illnesses, there are quality-of-care assessment tools that can be applied.
The utilization measures most commonly used fall into the following categories:
- Inpatient hospital admissions, days, and length of stay.
- Emergency room use.
- Outpatient hospital services.
- Outpatient physician visits.
- Referrals for specialty physician consultation.
- Pharmaceutical utilization.
These measures may be broken down by patient characteristics (e.g., gender, age, race), provider characteristics (e.g., high-volume vs. low-volume hospital), delivery setting (e.g., group vs. solo practice), diagnosis, and procedure.
Purchasers should recognize that savings from reduced resource use do not necessarily flow back to them. For example, if providers get paid full capitation rates, they will capture any savings unless the capitation rates are lowered. Similarly, if hospital admissions are paid on a per case basis, as with diagnosis-related groups, reductions in length of stay will not generate savings for the purchaser.
Measuring Impact on Health-Related Behaviors. Evaluators can assess whether VBP programs encourage low-risk behaviors by measuring changes in the number or percent of employees who engage in these behaviors.
Measuring Impact on Patients' Decisions. To assess whether a VBP activity has affected the choices that patients make, evaluators can look for changes in the percentage of patients or employees choosing providers or health plans that have been identified as "top" or preferred performers on report cards or quality evaluations.
Dealing With Multidimensional Outcomes
Evaluators are often interested in several outcomes. One important question to resolve is whether to examine these outcomes separately or in aggregate. This is similar to the issue that arises in any cost-effectiveness analysis when the intervention affects several aspects of health status. The panel on the cost effectiveness in health and medicine (go to Gold et al., 1996) recommends that multiple outcomes be aggregated into a quality-adjusted life years scale. Conceptually, this kind of aggregation is possible for evaluations of VBP activities, but it would require measuring the many aspects of health status that might be affected. This is difficult to do but may be feasible for VBP activities that closely resemble a clinical intervention.
In other cases, when the outcomes of interest are intermediate health outcomes (including process or structure measures), aggregation is more complex. One approach is to not aggregate the various outcomes. The evaluation would report the various outcomes, and users of the research findings would need to draw their own conclusions about whether the investment was justified. This approach is most feasible when the number of outcomes is small and the users are comfortable weighing measures of quality against measures of costs. An alternative aggregation methodology involves combining outcomes into one or more domains of performance based upon subjective values. For more on this topic, go to the discussion of grouping HEDIS® measures.
Task 2: Collect the Data
Data may take the form of qualitative information or quantifiable values. It can be obtained from a variety of primary sources (where the evaluator collects the data) and secondary sources (where the data are collected by someone else but used by the evaluator). Regardless of the type or source of data, the quality of the program evaluation will depend on the data's reliability and validity. Since final and intermediate outcomes are the focus of VBP activities, the manner in which you measure these outcomes is crucial for producing credible and useful evaluation results. However, the importance of measurement accuracy applies equally to non-outcome data that are used in evaluations to control for other confounding factors.
For the purposes of evaluating VBP activities, common primary sources of data include administrative claims data, medical records, stakeholders (e.g., the health plans involved in guideline development), and health care consumers. Since this guide presumes that purchasers interested in gathering data from primary sources are likely to consult and contract with outside experts, this section is limited to a discussion of two major secondary sources:
- HEDIS® (Health Plan Employer Data and Information Set).
- CAHPS® (Consumer Assessment of Health Plans).
A Quick Look at HEDIS®. HEDIS® is a set of about 60 process and outcome measures designed to capture dimensions of health plan quality. Initially developed by a group of large private employers, HEDIS® is now administered by the National Committee for Quality Assurance, Washington, DC. To date, HEDIS® has been used primarily to monitor the performance of HMOs, although research is currently being conducted to examine the feasibility of HEDIS® indicators for PPOs and other types of insurance products. Because each of the approximately 60 performance measures includes specific guidelines for data collection and reporting, the results are standardized. This allows purchasers and others to compare the performance of any health plan to the performance of other health plans nationally, regionally, and locally.
With the exception of the CAHPS® composites that recently became part of the HEDIS® reporting requirements (see more on this below), HEDIS® measures do not capture final outcomes. However, expert panels selected the measures included in HEDIS® because research evidence indicates that they are correlated with both costs and health status. For example, some of the HEDIS® measures capture the utilization rate of health care services and surgical procedures that are often overused, resulting in unnecessary costs and risks to patients. The Cesarean section (C-section) rate is an example of one such measure:
- First, C-sections are more costly than vaginal deliveries.
- Second, the scientific literature suggests that many C-sections are unwarranted (since vaginal deliveries are possible) and that those unwarranted C-sections put women at unnecessary risk for infection and other post-surgical complications (Sakala, 1993).
Other HEDIS® measures capture utilization rates for preventive care services and for screenings that are recommended for subsets of enrolled populations. For example, HEDIS® includes rates of mammography screening for women, prostate cancer screening for men, and immunizations for infants and adolescents. Although preventive care and health screenings do not directly capture any of the final outcomes in which purchasers are interested, they are thought to be correlated with health status and costs since screenings can lead to early detection and less expensive treatment, and prevention can lead to the avoidance of illness. HEDIS® also includes several clinical measures of treatment for selected diseases, such as rates of prescription of beta-blockers following a heart attack or readmission rates following discharge for a mental health diagnosis.
The collection, analysis, and dissemination of HEDIS® data have been a major focus of employers' VBP activities in the last 8 years. More recently, employers have been analyzing HEDIS® results to evaluate the impact of those and other activities. There are many approaches purchasers can use to measure the effects of VBP initiatives on HEDIS® scores. For example, you could examine whether a plan's scores surpass minimally accepted standards, or compare a plan's scores to regional or national averages or the Nation's top performing plans. Another option is to look for changes in a plan's HEDIS® scores from one period to the next.
However, for the individual purchaser, it is not clear how well HEDIS® serves as a source of data for evaluation purposes. One complication of using HEDIS® to assess the impact of VBP activities is the fact that there are more than 100 rates (i.e., a single measure may include separate rates for men and women or for people of different ages). It is not uncommon for plans to perform well on some rates but not on others, making it difficult to conclude anything about the overall performance of the plan across all rates. (Go to the box for more on this topic.) In addition, because many purchasers collect and analyze HEDIS® data, there is no way to know whether changes in performance were due to the collective focus of all purchasers or the specific activities of a single purchaser.
Additional information about HEDIS® can be found on the NCQA Web site at www.ncqa.org.