Summary of the Presentations (continued 3)
Expanding Research and Evaluation Designs to Improve the Science Base for Health Care and Public Health Quality Improvement Symposium
Session III. QII at the Multiple Clinical Systems Level: Improving Chronic Illness Care Evaluation
Tracy Orleans, Ph.D. (Chair/Moderator)
Senior Scientist/Senior Program Officer, Department of Research and Evaluation, Robert Wood Johnson Foundation
The Improving Chronic Illness Care Evaluation
Edward Wagner, M.D., M.P.H.
Director, Center for Health Studies, MacColl Institute for Healthcare Innovation
Dr. Wagner noted that when the Robert Wood Johnson Foundation (RWJF) funded the Improving Chronic Illness Care (ICIC) program, there had been advances in chronic illness care that the Institute of Medicine's (IOM) Crossing the Quality Chasm report showed were not being implemented in practice. In addition, a reasonable body of evidence showed what worked for translating research findings into practice, but these findings also were not being used to change care for most patients. In addition, there was evidence that the continuous quality improvement (CQI) methods implemented in the 1980s and 1990s and traditional continuing medical education were not effective. The Chronic Care Model (CCM) came out of the ICIC team's reading of the literature and their attempts to implement the findings in the Group Health system. His group next wanted to examine whether busy practices could change their practice systems in accord with a multi-component change model, such as the CCM, since multi-component approaches are more likely to be successful than a single component approach. The questions they wanted to address were: can busy practices change their practice systems in accord with a multi-component change model, and, if so, what impact will it have on the quality of care and the outcomes of their patients?
The most successful model for implementing the CCM appeared to be the Breakthrough Series. At the heart of these collaboratives is an approach to quality improvement (QI) developed by Lloyd Provost and his colleagues called the "model for improvement," which has three basic elements:
- Set a clear aim.
- Have a measurement system in place that charts whether or not progress is being made.
- Implement a set of grounded changes and test them using rapid cycle PDSA methods to determine whether or not the changes accomplish what they were hypothesized to do.
The Breakthrough Series is a yearlong process that brings together teams from organizations wanting to make change with faculty in a series of in-person meetings, or learning sessions, with much electronic communication in between the meetings. During learning sessions, teams plan sets of changes to be tested in the action periods that follow. One of the questions they hoped to tackle with the Chronic Care Model was whether or not it is truly a generic model, meaning that if it works for diabetes, it should also work for other conditions. Dr. Wagner said over the course of several years, Improving Chronic Illness Care conducted collaboratives in a variety of chronic conditions including diabetes, depression, asthma, and congestive heart failure. In the earliest collaboratives, participating organizations reported mild to moderate improvements in the quality of care. However, they wanted to conduct a more rigorous research-oriented evaluation. Through a competitive process, RANDHealth was selected to evaluate early collaboratives. The program, the Improving Chronic Illness Care Evaluation (ICICE), was next described by Dr. Keeler.
Design Decisions and Lessons Learned in the Improving Chronic Illness Care Evaluation (ICICE)
Emmett Keeler, Ph.D.
Senior Mathematician, RAND Corporation
The Improving Chronic Illness Care Evaluation looked to answer these main questions:
- Did participating in the collaboratives induce positive change?
- Did implementing the CCM improve processes of care?
- Did implementing the CCM improve patient health?
- What did participation and implementation cost?
- What factors were associated with success?
On average, organizations in the study made more than 30 systemic changes over the year. These changes enabled the researchers to determine if moving toward the CCM is good for patients. The results showed that the process of care, patient self-management, and some outcomes improved more for intervention patients than control site patients. The areas that were emphasized in the learning sessions were the ones that changed the most. The Web site http://www.rand.org/health/ICICE includes findings from multiple papers and other information about the study.
Dr. Keeler discussed design considerations. In any intervention, it is important for the "signal" of the treatment effect to stand out by reducing the statistical "noise." One way to achieve a high signal to noise ratio is to have a large number of subjects, which will eliminate or decrease problems caused by random error. Another approach is to have tight, homogenous criteria for enrollment and tight protocols for treatment. Tight designs reduce noise, but limit the generalizability of the results. The RCT paradigm eliminates biases further through randomization and blinding.
Is randomization possible in all studies? Organizations want success (meaning that their organization improves), not necessarily science. Moreover, the organizations with which the ICICE team worked stated that their patients did not like to "feel like guinea pigs." His group was not able to convince participating organizations to accept randomization of sites for the trials, and patient-level randomization was not possible because this project addresses systemic changes. Components of strong study designs that are alternatives to the RCT include:
- Before and after with a matched control group.
- Multiple sources of data and an evaluation logic model.
- Planning for and testing potential biases.
The first study design decision concerns the constitution of the control group. Dr. Keeler suggested using another section of the organization, such as a clinic or a group of doctors, to serve as an internal control group. A before and after with a matched control group design can be used to control for secular trends and unrelated changes in the organization. Also, patients with chronic diseases that are fairly stable, such as diabetes, can be used as their own controls. One of the benefits of an external control site is the lack of contamination from intervention, although these sites are less likely to cooperate and are more expensive to include. The ICICE used internal control sites. To identify patients, Dr. Keeler's group used sampling frames from site registries of patients with the diseases of interest. There were some inaccuracies in the registry, and patients who said they did not have the disease in question or did not obtain care at the sites were excluded.
Five different sources of information were used:
- Patient telephone surveys—The patient is the best source of information on what the care provider does in terms of patient education and effective communication and whether information was retained. With chronic diseases, the patient is the one who has to do the work of self-care. The phone calls were used to cross-check if improvements noted in medical charts also were reported by the patient. The telephone survey could be used to determine if improvements in charts are documented or real. The cost of the survey was $100 per call, with approximately 4,000 patients called. Phone surveys were done at the end of the collaborative.
- Medical record abstraction—Information from charts can provide true before and after intervention data.
- Clinical and administrative staff surveys.
- Monthly progress reports—The collaborative asked the team at each intervention site to complete a brief report each month for the leaders and for the Institute for Healthcare Improvement (IHI). The surveys were helpful in determining what the sites did, but the lack of standardization across sites reduced the value of their statistics.
- Final calls with leaders—The team conducted follow-up phone calls with leaders one year later to determine 1) major successes, 2) major barriers in implementing the CCM efforts and how they were addressed, and 3) continuation and spread of CCM efforts.
The team examined how "close" control sites were to intervention sites in terms of geography and overlap of staff. "Closeness" was a mild predictor of control sites showing improvement. The team also asked during exit interviews whether control sites were engaging in other QI activities.
Several other important points were made regarding contamination, bias from the use of volunteers, concerns about pressure from funders, and potential ways to decrease evaluation costs, such as 1) to lower multi-site institutional review board (IRB) and consent costs perhaps via prior patient consent for quality improvement and QI research activities, and 2) to reduce data collection costs via the use of electronic medical records or clever use of existing data, like claims.
Comments on QIIs at the Multiple Clinical Systems Level
Marshall Chin, M.D., M.P.H.
Associate Professor of Medicine, University of Chicago
Dr. Chin concurred with most of Dr. Keeler's points and used his time to build upon them. He noted that people sometimes propose a time series design as an alternative to a before-after design, but often it is difficult to gather the necessary data points. In addition, the patient-level cohort design (tracking the same patients longitudinally) can be difficult to implement because of concerns related to the Health Insurance Portability and Accountability Act (HIPAA) of 1996. When institutional review boards require informed consent from individual patients for such studies, the resulting sample of patients may be biased. He compared studies using the "population of focus" (meaning a subset of the organization) mentioned by Dr. Keeler with those using a random sample of the whole clinic population. The results are frequently different depending on which method is used, although the direction of bias can vary. For short-term evaluations, it is reasonable to use the population of focus model because in a short period of time, one does not expect the intervention to spread to the whole clinic. However, over the long term, the more important goal is improving the overall quality of care of all patients; thus, the analysis of the whole clinic population is more appropriate in that case.
Dr. Chin endorsed Dr. Keeler's approach of using multiple sources of data. However, in Dr. Chin's studies of health centers, monthly reports have been of limited use because participants do not always supply detailed data.
Dr. Chin reviewed the key research questions and methods challenges in health care QI. He noted that there are some nice reviews that profile critical components of QI collaboratives and the traits of successful collaborative teams. Dr. Chin recommended articles by Wilson24 and Ovretveit.25
Some of the challenges in QI research include isolating the effects of individual components within multifactorial interventions, since policymakers and managers want to know where to invest limited resources. Another challenge is how to go beyond general statements, such as the importance of "leadership support and buy-in," via more detailed analyses and new measurement tools. Third, there is a need to be able to tailor interventions to an organization's stage of change. Fourth, one must strike a balance between structure and autonomy. The Breakthrough Series is a general QI process rather than a set of prescribed interventions. Centers also have found it useful to have a menu of intervention choices that they can adapt to their needs. Fifth, an important area for future research is to identify what incentives and assistance can enhance and sustain quality improvement efforts further.
It is very hard to get the appropriate data and there are few sophisticated models for analyzing the business case for quality improvement, which may be why there is so little on this in the literature. Yet, for QI to be viable, we need to be able to make the business case. The societal perspective is also important and difficult to study. Sustainability is a difficult issue with limited data, partly because funding cycles are short and studies frequently are limited to one to two years. What are the unintended consequences of QI? This topic is rarely examined, but should be.
Dr. Chin concluded by noting that RWJF has a new initiative titled "Finding Answers: Disparities Research for Change" to apply QI and other innovations to disparities issues. The goal is to identify what interventions work to reduce disparities in real world practice.
Dr. Wagner was asked to address the value of the external evaluation of his program. Dr. Wagner responded that the RAND evaluation has been very helpful. New papers are becoming available that will aid in dissemination of the CCM. Health care leaders are becoming increasingly sophisticated and understand the limitations of their own QI data.
A participant indicated it will be difficult to "unpack" some multi-factor interventions, especially when certain studies have found that some interventions perform better when they are "packed." Concepts and methods are available for examining bundles of activities or configuration work in organizational science, which is not often referenced in health services research. We may find that the "key ingredient" only works when it is in interaction with four or five other ingredients. Another point is that there may be multiple routes to the same destination, or equifinality. It would be useful to have studies comparing different approaches to improvement to test the equifinality hypothesis.
In response, Dr. Keeler cited a recently published a meta-analysis of the components of QI efficacy studies. Each of the four parts of the CCM that was testable was effective, but none was essential. Statistical analysis showed that the remaining components are effective even when one component is removed.
In response to a question about why there was so little use of provider-level statistics, Dr. Keeler replied those statistics lacked standardization. His study found that not every practice had an existing registry. It would be possible to put more pressure on participants to standardize in order to get usable data, but Dr. Keeler's group decided just to use their own measures. Dr. Wagner noted that these were early collaboratives and their ability to guide teams has gotten better since then. IHI has a philosophy of not letting measurement slow down improvement, adding that if an organization has a measure it is accustomed to using, the researchers just use that measure in order to facilitate the improvement process. However, over time, there are more and more standardized measures in use.
One participant noted the years of work on the part of Dr. Wagner and IHI to determine what model was going to be tested to aid in the rapid cycle of change. Dr. Wagner's ability to publish his early work and refine the model over time was important in warranting a major investment in this research.
A participant asked what will be the QI example paralleling the clinical world's example of hormone replacement therapy (HRT), noting that HRT had been found to be safe in non-RCTs and its cardio-protective nature had biological plausibility. For HRT, a randomized trial was thought to be nearly unethical since it seemed self-evident that HRT was a good thing. When an RCT was conducted, it was found that HRT put women at increased risk of cardiovascular disease in the first 12 months of treatment. In other words, what is the QI disaster that is waiting to happen? Dr. Orleans replied that the U.S. Preventive Services Task Force conducted a systematic review of the literature on this topic. Similar standards are needed for accumulating, critiquing, and synthesizing the evidence as it becomes available, and that is a role for journal editors, funders, and everyone involved in this new field. We need to build the ability for systematic review in the field of QI. Others noted that we have a positive publication bias and that many studies are short-term and describe the intervention generally. There needs to be more attention to disseminating the specific details of an intervention.
Dr. Keeler was asked to comment about the variation across sites and analyses of the factors contributing to success. Dr. Keeler replied that there are about 20 sites and hundreds of variables on site characteristics, subjects, etc. Statistically, it is not appropriate to examine so many variables for so few sites. However, it is valuable to take a common sense approach to talking to people about what worked. Dr. Wagner noted that Steve Shortell et al. have an article in Medical Care26 on organizations and quality improvement teams that is an important step in the direction of understanding process and mediators. Dr. Orleans noted that one important outcome of Shortell's research was the new taxonomy that was developed.
Session IV. QII at the State/Regional Level
California State Tobacco Prevention and Control, Public Health QII Evaluation Design Issues, and Lessons for Health Care QII in the Policy Environment
Peter Briss, M.D., M.P.H. (Chair/Moderator)
Chief, Community Guide Branch, Centers for Disease Control and Prevention
Shawna Mercer, M.Sc., Ph.D. (Chair/Moderator)
Health Scientist and Senior Advisor, Office of the Chief Science Officer, Centers for Disease Control and Prevention
Dr. Briss introduced this session by mentioning that the California tobacco control model was a complex multi-component strategy with evolving interventions and different approaches to measuring outcomes. It presents an important, real world problem mixing rigor and relevance.
Program Evaluation in Public Health: California's Effort to Reduce Tobacco Use, 1989�2005
David Hopkins, M.D., M.P.H.a
Staff Scientist, Community Guide Branch, Centers for Disease Control and Prevention
Dr. Hopkins noted that this presentation was on program evaluation in public health, and that at the end of the presentation, he would discuss bridging the gap between how public health thinks about these issues versus how health care thinks about these issues. Dr. Hopkins asked participants to think about how California's experience mirrors participants' experience with QI or differs from it, and about what can we learn from this population-based public health approach.
Dr. Hopkins began by presenting background information on California at the state's tobacco control program's inception in 1988. With a population of more than 28 million people, California had 4.8 million adult smokers (22.8 percent), which was the second lowest average in the country. Led by a health coalition, voters approved Proposition 99 in November 1988, which increased the excise tax on cigarettes by 25 cents per pack and earmarked funding for a statewide program, which became the largest tobacco prevention and control program in the world.
California then had a great deal of money for programs and an urgent need to decide what to do and how to evaluate its efforts. In 1998, there was limited experience with effective population-based tobacco control interventions. There were some studies at the community level on cardiovascular disease risk reduction, and some experience with multi-component programs. Meanwhile, there was an appreciation of the effects of policy change on behavior as econometric studies on the effects of price on tobacco use were being published.
California had program options and had to decide which interventions to use and how to implement them. California opted not to take a top-down approach in which all programs would be implemented statewide. Instead, they had a couple of statewide program components and left the rest to local decision-makers. They opted to pursue a "full court press" from the outset, instead of developing programs built on the results of smaller-scale demonstration projects. Meanwhile, the National Cancer Institute (NCI) was advocating a comprehensive approach that was based on local control of the issue and emphasis on, and funding of, community coalitions to examine issues. The goal of NCI's approach was to have multiple channels, such as media campaigns and school-based programs, and multiple targets, such as increased cessation and reduced initiation of tobacco use, for the interventions. California provided a field test for the approach that NCI advocated. In 1990, California adopted tobacco prevention and control initiatives, which included a paid mass media campaign, funding for school-based programs, and funding for intervention and treatment research. Although there were challenges in the evaluations of this program, the state built evaluations into the mandate. Surveillance systems were put into place and more surveys were added, enabling the state to receive answers quickly. Components of the program were evaluated through contracts (independent evaluators) and a research program was funded within the University of California. This had the benefit that both good news and bad news got published. Funding for local intervention and research projects came with the stipulation that 10% of the budget be spent on evaluation, and support for local groups was provided that directed them to experts who could consult on or conduct evaluations.
An oversight committee was appointed to conduct an annual review of the surveillance and research results and to provide advice and recommendations based on the findings.
Outcomes of the California program included a decrease of 32.5% in smoking prevalence among California adults between 1988 and 2004. In addition, consumption decreased 55.6% in California, compared to a 32% decrease in the rest of the U.S., between 1988 and 2003.
The California strategy plays a large role in the tobacco control literature, having produced dozens of publications influencing tobacco prevention and control efforts. The evaluations have documented the overall impact of a comprehensive tobacco control effort as well as the independent contribution of some components, such as the helpline and smoke-free policies. The evaluations contributed to program survival. At the same time, local program impact is unclear because most evaluations have not been published and comparisons have been difficult. The effectiveness of some interventions, such as school-based programs, remains unclear. There have been some adjustments made to enhance evaluation, such as adopting more uniform surveillance tools.
California's program has become the model for state-level comprehensive tobacco control and the California experience contributed to the contents of the Centers for Disease Control and Prevention's (CDC) Best Practices for Comprehensive Tobacco Control Programs27.
Questions revolved around issues of sustainability and ceiling effects.
Dr. Hopkins noted that once the tobacco control and prevention program was de-funded in Minnesota, there was a relatively quick increase in susceptibility to smoking among teenagers. He noted that many political issues make funding these programs difficult. Several states have de-funded their programs, suggesting that the weight of the evidence was not sufficient to counter the political processes that undermine such programs.
A participant asked if Dr. Hopkins has observed a plateau in the decrease in smoking. If so, are there plans to change the approach to deal with the "bedrock" of smokers who will not quit? A national survey on systems did show a leveling off in the decline of the numbers of smokers.
Comments on the California Tobacco Program
Edward Wagner, M.D., M.P.H.
Director, Center for Health Studies, MacColl Institute for Healthcare Innovation
Dr. Wagner noted that the California tobacco control and prevention experience has some analogies or lessons for the improvement of medical care. The program, while it used a multilevel approach that goes beyond most published health care QIIs, has many parallels with health care QIIs. These include an attempt to change a system and a culture (tobacco acquisition, initiation, and use). The intervention occurred at the levels of society/policy, organizations, schools, community institutions, and individuals, the last occurring through quitlines and other direct-to-consumer services. Local adaptation was a built-in, essential feature of the program; thus, internal variation was encouraged and program standardization was not encouraged. Evaluations depended on ongoing surveillance data supplemented by intervention-specific measurement and evaluation activities. This is a nice model for complex quality improvement interventions.
The same questions we would ask of a medical care quality improvement intervention were asked in the California effort. General evaluation questions included: Do QIIs work? If so, what is the contribution of individual components? If not, were there promising components whose effect perhaps was overwhelmed by ineffective components?
For this complex intervention, the gold standard of evaluation of a rigorous test of a standardized, simple, unchanging treatment over a relatively short time span would have been totally irrelevant. The ability to randomize was not the issue. Researchers randomized where they could, but some situations, such as the state-level change in excise tax, precluded randomization. The evaluators "played the hand they were dealt" and developed the strongest quasi-experiment possible within the context of this natural experiment. Evidence and theory, such as the COMMIT trial experience and studies of economic effects, suggested that comprehensive system change was indicated, and this approach indeed was taken in California. The standardization of an intervention across sites and over time precludes learning. It was clear before the program began that excise tax increases and quitlines were two effective methods in tobacco control. There was not such a strong evidence base for what local communities could do, so innovation at that level was encouraged.
Comprehensive system change often is necessary. Interventions such as the molar randomized trial of a single pill or a single machine are irrelevant when one needs to change multiple interacting components. In the California tobacco case, the conceptual model was to make tobacco less attractive to buy, more difficult to use, and easier to give up. In the future, health care interventions also will have to be multilevel because practices do not exist in a vacuum. They interact with other practices, insurance providers, and larger organizations and they are affected by policy changes. Future interventions will need to consider these interactions. Sequential testing and factorial designs may not be feasible or possible in all cases to test all of the multiple components; thus, we need to increase our efforts with multi-component interventions and increase our ability to learn from them. Dr. Wagner asked rhetorically whether multi-component, multilevel, adaptive, system change interventions are evaluable. For this question, we can learn valuable lessons from evaluation sciences in other disciplines, such as education, social services, and welfare, and in the use of regression discontinuity designs, for example. At a conference that Dr. Wagner attended many years ago, Mark Lipsey emphasized small"t" theories of treatment, which are not general theories of how the world works. Instead, they are attempts to describe how complex interventions work by positing how the various elements of the intervention work, what the mediators and moderators are, and what outcome variables will be affected in a given subpopulation at a certain time. To develop a small"t" theory, first one would lay all that out and then attempt to develop an appropriate measurement strategy. For example, one might theorize that leadership influences the effectiveness of quality improvement teams in an organization and then the effectiveness of the quality improvement team affects the depth and the relevance of the changes which next lead to improvement in care.
The lessons that Dr. Wagner drew from the California example are:
- Maximize the design within the constraints.
- Maximize learning from variation within the intervention.
- Validate and then use existing surveillance measures.
- Use multiple and differing measures of the critical phenomenon to increase one's confidence that there is a real treatment effect.
Discussion focused on expansion of Dr. Wagner's comments and enhancement strategies for doing quick turnaround research on natural experiments, the interface of clinical care with public health messages, and political pressures.
Dr. Green affirmed a point made by Dr. Wagner, which is that a comprehensive program has merit not just because the various elements support each other synergistically but also because some elements work for some people and other elements work for other people. Dr. Green attended meetings with the Oversight Committee (California's Evaluation Advisory Committee) which was being asked which elements of the program worked and which ones did not. Certain groups were looking to cut funding for parts of the program that could not be defended, but the Committee argued that the program should not be dismantled. Contracts that were given to private companies to evaluate the intermediate effects for specific interventions indicated that some things were not working in some places, but did work in other areas. In a multi-level, multi-component intervention, individuals are surrounded by variations on the program message through different channels and different means of communication; some people are more responsive to certain channels than others, and different people respond to different channels. All of this is what makes comprehensiveness valuable. This is what made it worthwhile that the planners remained steadfast regarding holding the elements of the program together, even when some elements were faltering in some places. Dr. Green encouraged those in medical care settings to consider the principle of avoiding disaggregation of complex programs and to look for ways to evaluate them comprehensively.
Commenting that many policy changes are occurring which are not preplanned and have not incorporated evaluation, suggestions were sought concerning funding mechanisms and strategies for capturing information about the effects of the policy changes. Currently, we are not doing a good job of capturing real world changes. To what extent does one benefit from having baseline data? What does one do when one does not have baseline data? Typically, a policy change will have already occurred by the time one secures the funding needed to study the change. Dr. Briss responded that it is a conventional practice in public health to measure important outcomes over time using surveillance, thereby providing for baseline measurement. When changes occur, scientists can then take advantage of the natural experiment. This approach could be taken in health care systems as well. Dr. Wagner noted that a survey of communities interested in RHIOs (regional health information organizations) was conducted. Participants were asked what function they saw RHIOs serving. Only half reported that they were going to use them for community-based performance measures, thereby missing an opportunity to get community-based information exchanges to produce community-wide performance measures. Resulting data could provide a platform for evaluating policy change.
Pointing out that there are many public health messages in California, another participant asked what happens when teenagers visit their primary care providers and are not asked if they smoke. She is involved in an intervention program with primary care providers and feels this program interacts nicely with the different levels of influence adolescents receive from public health messages.
A participant noted that both the intervention and the target were evolving over time in California. The panel was asked if they had any way of examining industry pushback in response to their evaluations. Dr. Hopkins referred participants interested in this topic to T.E. Novotny and M.B. Seigel's article titled "California's Tobacco Control Saga" published in Health Affairs.28 Dr. Hopkins described a period in the 1990s when tobacco industries contributed more to California political campaigns than they did to Congressional campaigns, made an effort to have the word "addiction" removed from program materials, and made other efforts to interfere with the program. Dr. Green noted that the industry pushback was felt on the evaluation side. The tobacco control effort in Massachusetts was able to build on California's effort and began with a higher tax and a faster take-off for the program. The CDC extracted from these efforts a set of "best practices." There were concerns about using that term for findings that were not based on randomized studies; however, other states gave greater weight to these two States' experiences than they did to decades of findings from randomized controlled trials. This may be an important perspective for quality improvement efforts, in that other practitioners may pay more attention to findings from settings similar to their own than to other data.