Evaluating the Impact of Value-Based Purchasing: A Guide for Purchaser
A Guide for Purchasers
Focus Groups. Focus groups bring together individuals who meet a specific set of criteria (e.g., health plan enrollees who have diabetes, hourly employees enrolled in health maintenance organizations [HMOs]) to discuss a set of topics or questions as a group. This research design aims to shed light on an uncertain situation by eliciting feedback from one or more groups and gleaning insights from the various perspectives that participants share. It is a useful technique for getting general reactions to VBP activities and a sense of how well the programs meet the needs of their target. For example, to evaluate a VBP program to reduce health risks associated with diabetes, you could convene a focus group of patients to learn how much they know about the ongoing program and what they think of it. Similarly, you could convene a group of physicians to solicit their opinions and ideas about the program. Focus groups can also provide insights into findings from quantitative studies; for example, evaluators could use focus groups to help elaborate on and interpret survey findings.
An evaluation using this research design generally includes two or more focus group sessions. (One session is never sufficient because there is no way to know whether the results have been biased in some way—for example, if one person dominated the group or if the participants were somehow different than their counterparts in other markets.) During each of these sessions, an experienced moderator (sometimes called a facilitator) leads the group of approximately seven to ten people through a series of discussion topics. This person is responsible for encouraging everyone to share their own views, making respondents feel relaxed and comfortable with the process, drawing out any pertinent concerns or issues, and recording responses in a nonjudgmental way. To guide the conversation, the moderator uses an open-ended interview guide, or protocol, that is basically an outline of topics or questions that need to be covered.
Sessions typically take 1 to 2 hours. Participants are usually compensated for their time in some way, such as a small cash payment or gift certificate. After the session, the moderator analyzes the discussion and produces a report that captures any recurring themes, concerns, or feelings of the participants.
Advantages of This Approach. Focus groups are a relatively easy way to learn about a particular issue from key stakeholders. In particular, they can be very effective at creating a relaxed atmosphere that encourages people to share opinions, interact with others, and express views they might suppress in more formal one-on-one interviews. As a result, the method facilitates a sharing of information, ideas, opinions, and experiences that can result in unexpected insights.
Another benefit is that focus groups tend to be less expensive than the one-on-one interviews that would be required to get a comparable level of feedback. The turnaround time for results is also likely to be faster.
Drawbacks of This Approach. Because the sample size for a focus group is so small and the sample has not been randomized, you cannot use the findings to make inferences about the larger population. Consequently, while you may gain insights into various opinions, you cannot determine how widespread an opinion is or how deeply held it may be. Another issue is that focus groups can only capture the thoughts, feelings, and opinions of people who are able and willing to verbalize their views. As a result, this technique does not capture the perspectives of others whose contribution could be very valuable, such as people with speech or hearing problems, those who are very young or very old, those who are shy about speaking openly in public, or those who could not participate for other reasons (e.g., because they are ill, cannot afford to take the time off from work, or do not have child care).
Depending on how many sessions you conduct, focus groups can become expensive, especially if you use a professional firm to recruit or conduct the sessions. (Travel costs may also pose an obstacle if the potential participants are scattered around the country.) For busy stakeholders, scheduling can also be a significant barrier to implementing a focus group; one-on-one interviews may offer greater flexibility.
Example: Focus Groups
Consumers' Responses to Reports on Health Plan Performance
Purchasers. The California Public Employees' Retirement System (CalPERS), the Missouri Consolidated Health Care Plan, and General Motors.
Description of the Research Activity. The "Report on Report Cards" was a 2-year study that aimed at documenting and assessing two VBP programs of five prominent public and private purchasers: their reports for employees on the performance of contracted health plans and their use of financial incentives to promote quality. (In addition to the three purchasers listed above, the study evaluated VBP activities of The Alliance in Denver, Colorado, and the Cleveland Health Quality Choice program in Ohio.) As part of this study, researchers conducted a series of focus group sessions with employees to learn about their experiences with and attitudes towards health plan report cards. These sessions covered several topics, including the information sources that consumers used to choose a health plan, the factors they considered in their decision, whether and how they used the report cards, their reactions to the report cards, and their suggestions for improvement.
Evaluators. The study was conducted by the Washington, DC-based Economic and Social Research Institute (ESRI) and its contractors, with funding from the Robert Wood Johnson Foundation. An experienced moderator with Lake Snell Perry & Associates, also based in Washington, DC, produced the focus group findings.
Research Design. The evaluators conducted focus groups with employees receiving insurance coverage through three of the five purchasers studied: the California Public Employees' Retirement System, the Missouri Consolidated Health Care Plan, and General Motors.
Methods. An experienced moderator with a strong background in health care issues conducted eight focus groups between October 27 and November 10, 1998: three for CalPERS in Sacramento, Fresno, and San Diego; two for the public purchaser in Missouri in St. Louis and Jefferson City; and three for General Motors (two in Dayton, OH, and one in Detroit, MI). In preparation for these sessions, the moderator consulted with the ESRI researchers to establish the goals of the focus group design and to determine what issues to cover. The moderator then drafted a protocol that was reviewed and approved by ESRI. Employees were invited to participate; in only one site, CalPERS, were the participants randomly recruited by a professional recruiting facility. The other two purchasers identified the participants themselves.
Each session lasted 1 to 2 hours and included 8 to 10 participants; all sessions were recorded and transcribed. After each set of sessions, the moderator produced a report summarizing the findings and worked with ESRI to identify cross-cutting themes and recurrent suggestions.
Results. By introducing the consumers' perspective, the focus groups made a valuable contribution to this evaluation of VBP activities. In particular, the focus group findings offered insights into the barriers that report cards face in being embraced by each group of employees and what needs to be done to make the reports more widely accepted and used. While the feedback from each session reflected a unique set of experiences and perspectives, as well as the specific content and style of each purchaser's performance report, the analysis of the focus groups also uncovered a number of similarities across the sessions. These cross-cutting themes included skepticism about performance information, uncertainly about factoring quality information into decisions, appreciation for the convenience that the reports can offer, problems understanding the presentation of data, and concern about information overload.
Advantages and Disadvantages of the Evaluation Strategy. The focus groups were an important complement to this study's extensive interviews with purchasers, health plans, and consumer representatives (e.g., union leaders) at each site. In addition to offering a fresh perspective, the findings allowed the evaluators to see the ways in which employees' perceptions and reactions to the report cards were consistent with or different from the impressions of the stakeholders interviewed for the study.
The downside of the focus groups was the inability to generalize the findings to all employees, let alone all consumers. There was no way to know whether the participants were more or less familiar with or inclined towards the report cards than their colleagues would be. A second problem was that, because of the costs associated with the focus group sessions, the study was not able to include comparable research in all five sites.
Source: Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. Vol. II. Washington, DC: Economic and Social Research Institute; 1999. (See also Meyer et al., 1998.)
Interviews. Interviews are a means of collecting various kinds of information, including facts, impressions, opinions, and concerns. Through interviews, evaluators can gain insights into the intent of a VBP activity, learn how the activity is really being implemented, probe the perceptions and responses of key stakeholders such as health plans or providers, and identify issues and barriers not apparent on the surface. Whether interviews are conducted on their own or as part of a case study, their findings can play a critical role in identifying questions and hypotheses worthy of further research and appraising the potential value of a quantitative research design.
Depending on the goals and design of the evaluation, individual stakeholders may take part in one or more interviews over a period of time. These interviews are typically conducted by one or two people, either in person or over the telephone. When two researchers are present, one usually takes responsibility for asking the questions while the other takes notes. Some researchers prefer to rely on recordings and/or transcriptions. Prior to the interview, the researchers develop a protocol, or interview guide, consisting of open-ended questions that are designed to keep the conversation focused on the topic at hand. Depending on the purpose of the interview and the nature of the respondents, the interviewers may follow the protocol very closely or just use it as a prompt when needed.
Another option: cognitive interviews. A cognitive interview is a variation on a standard, one-on-one interview that aims to elicit opinions and other information. What makes a cognitive interview different is that it is designed to find out whether and how the respondent understands and thinks about a given set of materials, tasks, or activities. Cognitive interviewers may observe how a respondent navigates through a set of materials, or may use techniques such as "think-aloud" exercises, in which the respondent openly expresses thoughts and questions while reviewing the materials. Traditionally, these techniques have been used by survey developers to assess whether respondents will understand survey questions in the ways in which they are intended. More recently, evaluators have been using cognitive interviews to learn how people perceive, interpret, and use information on health care quality (Select for example.)
Advantages of This Approach. As an evaluation method, interviews offer the benefit of flexibility; they can be formal or informal, detailed or superficial, long or short. They also offer a rich source of individual data that can be mined for useful insights, common themes, and potential trends that may invite further investigation. Because of their intimate, one-on-one structure, interviews can also elicit more honest feedback and assessments than may be available through a focus group, where participants may feel reluctant to disagree with others or offer their own opinions and ideas.
Drawbacks of This Approach. Depending on the number of interviews, the difficulty of recruiting respondents, and the design of the study (e.g., if the interviews need to be done in person), interviews can be both time-consuming and costly. If the sample of respondents is not representative, the evaluators are also limited in their ability to generalize their findings. For example, if the researchers are able to interview all but one of the medical directors of the health plans involved in a regional VBP activity, they can be fairly confident that their findings reflect the "population" of health plans. But if the interviewers also spoke to a handful of the doctors practicing in the area, they would not be able to draw any broad conclusions about the "population" of physician practices.
Example: Cognitive Interviews
Medicare Beneficiaries' Use of Comparative Information When Choosing Health Plans
Purchaser. Centers for Medicare & Medicaid Services (CMS), formerly the Health Care Financing Administration (HCFA).
Description of the Research Activity. The Medicare program provides beneficiaries with information about health plan costs, benefits, and performance (expressed in HEDIS® and CAHPS® measures) via the Medicare Web site and a toll-free telephone number that beneficiaries can call to request the information. Researchers attempted to understand how Medicare beneficiaries used the comparative information when evaluating their health plan choices and what they thought of the information (e.g., the amount of information, how easy it was to understand, its usefulness, the presentation, whether beneficiaries' trust it, what they like, and what improvements they would like to see). The results offer insights into how Medicare beneficiaries incorporate various pieces of information into their decisionmaking process and suggest ways to improve the existing information.
Evaluators. The evaluation was conducted by academic researchers from Research Triangle Institute and Pennsylvania State University.
Research Design. The evaluators conducted cognitive interviews with 25 Medicare beneficiaries from three counties in Pennsylvania.
Methods. The researchers first assembled booklets that used the same format and presented the same information as the Medicare Compare booklets available through the 1-800-Medicare number. They also developed and pilot-tested an interview protocol. Using the booklets and protocol, the researchers then conducted exploratory qualitative research using a convenience sample of Medicare beneficiaries.
The study participants were presented with a booklet that included descriptive information about the options available through Medicare, as well as comparative cost, benefits, and quality information from the Medicare Compare database for plans available in their county. The interviewer asked the participants to imagine that they were choosing a health plan for themselves using the information provided, and to "think-aloud" while comparing the plans and making their ultimate choice. The interviewer observed, used scripted probes when applicable, took handwritten notes, and audiotaped the interviews. The interviews lasted 1.5 - 2 hours each.
To identify themes and make the data systematically comparable across interviews, the evaluators subjected the qualitative data collected during the interviews to content analysis. They used the transcripts to create a spreadsheet that included data from all 25 interviews, then used the spreadsheet to group similar/dissimilar responses and obtain frequency counts across interviews.
Results. All subjects read through the information booklets sequentially. Most compared the plans on the specific costs and benefits that mattered to them personally, but used the performance measures to confirm or supplement their choices. Participants spent the most time on the costs and benefits section and the least on the specific quality ratings, but rated quality just as important as costs for picking a particular plan. Overall, the majority felt confident in their ability to pick the best plan. Most were generally satisfied with the amount of information provided, said it was useful for comparing plans, and trusted it. The subjects generally liked the information, but many felt the costs and benefits could be presented better and would like comparisons of Medi-Gap plans to be added. Finally, respondents had some common areas of confusion and misunderstandings regarding the information presented to them.
Advantages and Disadvantages of the Evaluation Strategy. The primary advantage of the study design is the ability to obtain in-depth information about how beneficiaries incorporate different types of information into their decision-making process, how they navigate the materials, how they understand and interpret what they see, and what they think of how the data was presented. This feedback also suggested ways to improve existing information to make it more useful to beneficiaries. Two disadvantages of the design are that the results are not generalizable to the entire Medicare population and the study participants were not making binding health plan choices.
Source: Uhrig JD. Beneficiaries' Use of Quality Reports for Choosing Medicare Health Plans. [Ph.D. Dissertation]. Pennsylvania State University; 2001.
Quantitative Research Designs
The purpose of quantitative approaches is to establish numerical evidence regarding correlation or causality between VBP activities and their intended outcomes. While there are a number of ways in which this goal may be accomplished, quantitative research designs basically vary in two ways:
- Timing of observations.
- Use of a comparison group.
One key difference is in the timing of observations relative to the intervention (i.e., the implementation of a VBP activity) and to other observations. For example, some quantitative designs call for the collection of baseline data before the intervention as well as data after the intervention. Also, some designs collect data once, while others specify that data be collected in multiple periods, both before and after the intervention.
The designs also differ in their use of a comparison (or control) group, which is a group that is similar to the intervention group but is not affected by the VBP activity. From a research perspective, a related issue is whether the individuals or organizations affected by an intervention were randomly selected; however, randomization is often not feasible for VBP activities.
These two variables determine whether the design allows for comparisons, which enable the evaluator to isolate or disentangle the impact of an intervention from other events or phenomena that could affect the outcomes of a VBP activity. VBP activities take place in a complicated environment with ongoing changes in purchaser and provider behavior, most of which have nothing to do with any VBP activities. Having a comparison point before the intervention or a comparison group adds validity or strength to a research design's ability to capture the causal effects of VBP activities.
This section describes the five quantitative research designs that are likely to be the most helpful for evaluating VBP activities:
- Cross-sectional design with no comparison group.
- Pre-test/post-test (or before/after).
- Cross-sectional design with comparison group (or static group comparison).
- Nonequivalent comparison group.
- Time series.
Cross-Sectional Design With No Comparison Group. The cross-sectional design without a comparison group basically involves measuring variables of interest in the intervention group (or the population affected by VBP activities) one or more times after the VBP intervention occurs. For example, to measure the impact of a disease management program, evaluators could collect and analyze detailed measures of health status or health care utilization. There is no pre-test or comparison point from before the intervention, nor is there any comparison group not receiving the intervention. Thus, the outcomes must be interpreted relative to an internally defined set of standards or external benchmarks. (However, if external benchmarks are used, you may interpret them as a comparison group, depending on how they were derived.)
Select for Figure 2 (5 KB).
To analyze data collected for this design, evaluators typically use simple descriptive statistics and statistical tests of the difference in means and frequencies. If there are multiple observation points after the intervention, researchers can also use statistical tests to compare the indicators or measures with each other over time. Finally, multivariate statistical techniques allow researchers to see if the outcomes of interest vary across subgroups (e.g., by the gender or age of a patient, or by clinic site within a large health care system).
Advantages of This Approach. Since the evaluators are gathering only post-intervention observations for the intervention group, the data requirements are lower than those required for other design approaches. In addition, this design lends itself to an in-depth analysis of the intervention group and may provide important and useful insights regarding the functioning of the program. The basic cross-sectional design is especially useful in early stages of implementation, as it can provide valuable information regarding how a VBP program was implemented in practice, whether the activity appears to be correlated with the intended outcomes, and how the program might be improved before a more rigorous evaluation is undertaken.
Drawbacks of This Approach. Because there is no pre-test or comparison group, this design cannot be used to make statements about the impact of VBP activities relative to what was occurring before they were implemented. Also, since this design cannot disentangle the effects of the VBP activity from any of the other many forces that might influence the outcome variables of interest (such as a time trend that would have occurred anyway), you cannot use this approach to establish causal relationships.
Example: Cross-Sectional Design With No Comparison Group
An Evaluation of a Defined-Contribution Model
Purchaser. The Buyers Health Care Action Group (BHCAG), a health benefit purchasing alliance in the Minneapolis-St. Paul area.
Description of the Research Activity. BHCAG is a coalition of employers that created an innovative direct-contracting model designed to provide local service delivery organizations (known as care systems) with the incentive to compete on the basis of premium cost and quality. The purchasing model incorporates risk-adjusted payments, standardized benefits, and the dissemination of a report card containing satisfaction and quality information. Based on competitive bids, the care systems are placed into one of three cost tiers, with higher premiums required for care systems in the middle- and high-cost tiers. One design feature of this model is the use of "level dollar" (also known as fixed or defined) premium contributions by employers. This policy exposes employees to the marginal difference in premiums. In theory, when employees pay more of the marginal cost of insurance, their choices should be more efficient; the expectation is that individuals who do not value insurance at its full marginal cost will choose cheaper alternatives. The purpose of the evaluation was to determine whether a defined contribution model increases employees' sensitivity to premiums.
Evaluators. Academic researchers from Cornell University and the University of Minnesota evaluated the model by determining whether employees responded to premium and quality differences across the care systems.
Research Design. The evaluators conducted a cross-sectional survey of employees enrolled in BHCAG's program.
Methods. To collect information on enrollment, premiums, and provider group characteristics, as well as demographic data and measures of socioeconomic status and health status for employees, the evaluators fielded a post-intervention telephone survey and reviewed administrative files. They then constructed regression models to predict the probability that single (nonmarried, no dependents) employees enrolled in one of the care systems as a function of the out-of-pocket premium, characteristics of the care system, employee characteristics, and report card ratings. Conditional logistic regression methods incorporating characteristics of provider groups and employees were used to estimate care system choice models.
Results. The empirical results indicate that single employees are, on average, very responsive to premiums. The sensitivity to premiums is reduced for older employees and for those who value high-quality care. However, employees with more experience with the health care market are more price-responsive. Employees are also sensitive to differences in the quality of care systems, as presented in the report card, and to differences in convenience measures, such as the distance to clinics. Based on these findings, it appears that a defined-contribution policy may make employees more cost-conscious in their health care decisions.
Advantages and Disadvantages of the Evaluation Strategy. This evaluation was able to control for factors such as age and gender that may have biased the effect of premiums. Nonetheless, the effects of other important factors such as the fact that primary care physicians can be affiliated with only one care system or variations in employers' fixed dollar contribution policies cannot be separately identified in the data. The lack of pre-intervention data and a comparison group also limit the inferences that can be made.
Source: Schultz J, Thiede Call K, Feldman R et al., Do Employees Use Report Cards to Assess Health Care Provider Systems? Health Services Research 2001;36(3):509-30.
Pretest/Posttest (or Before/After). This approach allows evaluators to compare individual or organizational data collected after an intervention to data collected prior to the intervention. The assumption is that the analytic units (i.e., individuals or organizations) under observation would look the same at both points in time in the absence of any intervention. In general, as long as the data were collected and aggregated in the same way at the two points in time, this design can be implemented.
Select for Figure 3 (5 KB).
Evaluators typically use a multivariate analytical approach to determine whether any differences in the health care outcomes measured before and after the intervention are statistically significant. However, when data are not available at the individual level, it becomes more difficult to conduct multivariate statistical analysis because the number of observations is insufficient. For example, if you wanted to know whether and how a VBP activity affected the health plan premiums of the few plans you offer to employees, an analysis of premiums before and after the intervention would have limited statistical reliability because the data came from only those plans. In some cases, statistical tests can be conducted with aggregate data if appropriate denominators exist. For example, if you had data on the population at risk in the time periods before and after an intervention, you could test whether a VBP activity was associated with a statistically significant change in admissions per thousand.
Advantages of this approach. The strength of this approach is that researchers can use pre-intervention information to help interpret the information collected after the purchaser has implemented a VBP activity. As with the other quantitative designs, more than one observation can be made post-intervention. By collecting multiple observations after the intervention, researchers are better equipped to investigate such questions as whether intervention effects continue over time and whether or not there is a lag between when the intervention is implemented and when effects become observable.
Drawbacks of this approach. This design suffers from several weaknesses, as there are many "rival hypotheses" or alternative explanations for changes from the pre-test to the post-test outcome observations. First, the design does not control for time trends; so it is possible that any observed change would have taken place even in the absence of the intervention because of a trend already underway. For example, if HMO premiums were falling for reasons unrelated to the VBP activity, this design would tend to attribute those falling premiums to the VBP activity when in fact they would have occurred anyhow. Another weakness of this design is that it cannot tell whether another intervention or external event that was occurring simultaneously with the VBP intervention—rather than the VBP activity itself—could be responsible for the apparent change between the pre-test and post-test observations.
A third weakness of this design applies when the people or organizations involved in the evaluation know they are being observed or measured in some way. In some cases, the act of being observed or studied before the intervention leads to a change in the subsequent observation, even without any effect from the intervention. For example, simply completing a survey regarding attitudes and satisfaction before the intervention might lead to improved attitudes and satisfaction (i.e., the act of measurement becomes a sort of intervention itself). This effect (referred to as the Hawthorne effect or a testing effect) can lead to erroneous conclusions about the apparent impact of an intervention.
Caveat: Watching for the Hawthorne Effect
In quantitative as well as qualitative research, the Hawthorne effect refers to the possibility that the process of evaluation may affect the results (Babbie, 1998). In other words, if individuals and organizations know that they are being monitored, their performance may improve regardless of whether or not the program or intervention is actually effective. This effect tends to lead to estimates that are more favorable to the intervention than otherwise would be expected.
The Hawthorne effect should be anticipated in evaluations of VBP efforts, although it is not necessarily undesirable if the ultimate goal is improvement. For example, if health plans know that the purchaser is analyzing certain HEDIS® data to evaluate a VBP initiative, plans may be more prone to improve on these measures than if they did not know what measures the purchaser would be examining. When analyzing the data, evaluators may have to try to distinguish if plans were improving simply because they were being watched or examined or because of the direct effect of the VBP activities.