Lessons Learned from the Process Used to Identify an Initial Core Quality Measure Set for Children's Health Care in Medicaid and CHIP
A Report from the Subcommittee on Children's Healthcare Quality Measures for Medicaid and CHIP Programs (SNAC)
By Rita Mangione-Smith, MD, MPH, Associate Professor of Pediatrics, University of Washington, SNAC Co-Chair
This article provides a brief overview and evaluation of the process used by AHRQ's Subcommittee on Children's Healthcare Quality Measures for Medicaid and CHIP Programs to identify the recommended core set of children's health care quality measures. It also suggests ways this process might be improved for similar efforts in the future.
Title IV of the Children's Health Insurance Program Reauthorization Act (CHIPRA; Public Law 111-3) required the Secretary of the U.S. Department of Health and Human Services (HHS) to identify and post for public comment by January 1, 2010, an initial, recommended core set of children's health care quality measures for voluntary use by Medicaid and Children's Health Insurance Programs (CHIP), health insurance issuers and managed care entities that enter into contracts with such programs, and providers of items and services under such programs.
In response to this legislative directive, the Agency for Healthcare Research and Quality (AHRQ) and the Centers for Medicare & Medicaid Services (CMS) signed a memorandum of understanding giving AHRQ leadership responsibilities for identifying the initial core set, working in very close partnership with CMS. CMS has the authority for implementation of all CHIPRA provisions.
As one of the first steps in the process of identifying the recommended core set of measures, the AHRQ Director approved a charter creating the AHRQ National Advisory Council on Healthcare Research and Quality (NAC) Subcommittee on Children's Healthcare Quality Measures for Medicaid and CHIP (SNAC). The AHRQ NAC had agreed to provide advice to AHRQ and CMS to facilitate their work to recommend an initial core set of measures of children's health care quality for Medicaid and CHIP programs. To provide the requisite expertise and input from the range of stakeholders identified in the CHIPRA legislation, the NAC established the SNAC.
The SNAC included four State Medicaid program officials (from Alabama, Minnesota, Missouri, and the District of Columbia) and one State CHIP official (from Alabama). Other members represented Medicaid, CHIP, and other State programs more generally (i.e., representatives of the National Academy on State Health Policy, National Association of State Medicaid Directors, and the Association of Maternal and Child Health Programs).
Representatives of health care provider groups came from the American Academy of Family Physicians, American Academy of Pediatrics, American Board of Pediatrics, the National Association of Children's Hospitals and Related Institutions, and the National Association of Pediatric Nurse Practitioners, and there was a Medicaid health plan representative. The interests of families and children were represented by the March of Dimes. Individual SNAC members provided expertise in children's health care quality measurement, children's health care disparities, tribal health care, pediatric dental care, substance abuse and mental health care, adolescent health, and children's health care delivery systems in general. Two members of the NAC also participated in the SNAC.
The SNAC was charged with providing guidance on measure evaluation criteria to be used in identifying an initial core measurement set, providing guidance on a strategy for gathering additional measures and measure information from State programs and others, and reviewing and applying criteria to a compilation of measures currently in use by Medicaid and CHIP programs to begin selection of the initial core measurement set. SNAC recommendations were to be provided to CMS and the NAC, which in turn would advise the Director of AHRQ. The Directors of AHRQ and CMS would then review and decide on the final recommended core set to be presented to the HHS Secretary for consideration.
This paper provides a brief overview and evaluation of the process the SNAC used to identify the initial recommended core set of children's health care quality measures and outlines how this process might be improved for similar efforts in the future.
With assistance from CMS, AHRQ staff identified a set of 77 measures that were currently in use by Medicaid and/or CHIP programs. The next step was to decide on an evaluation process the SNAC could use to assess these 77 measures. The SNAC co-chairs, AHRQ staff, CMS staff and other representatives from the CHIPRA Federal Quality Workgroup agreed that the SNAC should use the RAND/UCLA modified Delphi process to evaluate the identified measures.1
When applied to quality of care measures, the RAND/UCLA modified Delphi process involves a series of assessments by a panel of experts, in this case the SNAC. The experts are usually provided with standard definitions for measure validity and feasibility and then asked to apply these criteria to each measure under consideration. The measures are scored on a 1 to 9 scale for each criterion. Scores of 7-9 mean the measure is considered highly valid and/or feasible, scores of 4-6 are assigned to measures with equivocal validity and/or feasibility, and scores of 1-3 indicate the measure is not considered valid and/or feasible. These measure assessments are first done individually at the panelists' home institutions. This is followed by a group discussion of the measures in a face-to-face meeting, after which panelists individually score the measures again. The summation of this final set of individual assessments is used to determine whether particular measures in the set under consideration are retained or deleted. Explicit ratings are used to determine which measures are included in the final quality measurement set because in small group discussions some members tend to dominate the conversation, and this can lead to a decision that does not reflect the sense of the group.2
To facilitate the SNAC members' individual assessments of the 77 measures under consideration prior to their first face-to-face meeting, they were provided with measure evaluation criteria definitions for validity and feasibility before the meeting. Because one of the main charges to the SNAC included providing guidance on the evaluation criteria to be used in evaluating measures for the core set, it was clear that the criteria definitions provided for the first round of the Delphi process would need to be reviewed and would potentially change at the first face-to-face meeting. Although this process was not ideal, it did facilitate a round of quality measure assessment prior to the first SNAC meeting, and it was felt to be necessary given the constricted timeframe in which the Subcommittee had to complete this work. Doing this pre-meeting scoring also oriented the SNAC to the Delphi method early in this process of measure selection, which facilitated subsequent rounds of measure scoring and assessment.
When scoring the measures for validity, the SNAC members were asked to assess the degree to which the measures were supported by scientific evidence and/or expert professional consensus, whether the measures supported a link between the structure, processes, and outcomes of care, and whether the majority of factors that determine adherence to a measure were under the control of the health care organizations subject to measurement. For feasibility, the SNAC was asked to evaluate whether:
- The data needed to assess the measures were readily available to health care organizations.
- The measures were currently in use (thus supporting their feasibility of implementation).
- Estimates of adherence to the measure based on available data sources were likely to be reliable and unbiased.
The median scores for validity and feasibility were used to determine whether candidate measures would be discussed at the face-to face meeting. Traditionally, when using the RAND/UCLA modified Delphi process, all measures are discussed at the face-to-face meeting regardless of their first round median scores. However, this was not feasible given the time constraints under which the SNAC was working. As such, measures with a median validity score of 6 or 7, a median feasibility score ≥4, and a relatively wide distribution of scores across members (suggesting little consensus among the group) were discussed by the SNAC. Forty-five of the originally identified 77 measures in use by Medicaid or CHIP programs met these scoring criteria and were discussed.
Refinement of the Measure Evaluation Criteria
Refinement of the measure evaluation criteria involved reviewing, discussing, and reaching consensus on the definitions the SNAC would use for validity and feasibility (including reliability) when evaluating candidate measures in future rounds of the Delphi process (see evaluation criteria definitions for Delphi Round II). In addition, importance was added as a third criterion, along with validity and feasibility, for the SNAC to consider when evaluating potential measures. This refinement process, although important and necessary, led to some inefficiency and re-work related to identifying the recommended initial core set of measures. Ideally, the SNAC would have had the opportunity to meet, discuss, and reach consensus on the measure evaluation criteria definitions prior to doing any individual measure scoring.
Other Steps and Decisions at the First Face-to-Face Meeting
The SNAC's discussion at this first meeting resulted in the recommendation that more information related to measure validity, feasibility, and importance (VFI) would be needed before any further consideration and evaluation of the measures could take place. The SNAC also determined that a call for nominations of additional pediatric quality measures in use (either within or outside of Medicaid and CHIP programs) should be used to identify a larger set of measures to consider for the final core set. AHRQ staff was also asked to identify VFI-relevant information on the measures scored in Delphi Round I. SNAC members felt it was important to open the nomination process as broadly as possible to other stakeholder groups.
Ideally, the decision to conduct a broad nomination process of quality measures in use both within and outside of Medicaid and CHIP programs would have been made much earlier in the process before any measure evaluation and scoring had occurred. AHRQ had initially felt it was important to limit measure consideration to those already in use in at least one Medicaid or CHIP program for feasibility issues related to implementation. Nevertheless, the SNAC felt that it was essential to broaden the measures considered to those in use by entities outside of Medicaid and CHIP; otherwise, many valid, feasible, and important measures would not have been considered for inclusion in the initial recommended core set. Thus, after the first face-to-face meeting, the final decision was made to conduct a broad measure nomination process.
Developing an Online Measure Nomination Template
During the 2 months between the first and second SNAC meetings, AHRQ staff worked to develop an online quality measure nomination template. The measure nomination template asked for key pieces of information that SNAC members would need to evaluate the VFI of nominated measures. An ideal nomination would include the following information on the measure: the numerator and denominator, scientific evidence supporting the measure, evidence that the measure truly assesses what it purports to measure, detailed measure specifications, evidence of the measure's reliability, whether the measure addresses an area of care mandated for inclusion in the CHIPRA legislation, and evidence of variation in performance on the measure in different populations or organizations. Unfortunately, many of the nominated measures as submitted lacked much of this information. AHRQ staff and the SNAC co-chairs worked to fill in information gaps for several of the nominated measures and for all of the measures that required reassessment after Delphi Round I.
The AHRQ staff worked to find measure specifications and information related to importance criteria, e.g., evidence of variation in performance across insurance types or racial/ethnic groups. The SNAC co-chairs performed focused literature reviews to identify scientific evidence supporting links between structure, processes, or outcomes of care for the nominated measures. They also assigned grades to the level of evidence supporting the measures (Table 1) using the Oxford Center for Evidence Based Medicine grading criteria.
|Evidence Grade||Definition of Grade||Definition of Study Types|
|A||Consistent level 1 studies||Level 1: Randomized controlled trials|
|B||Consistent level 2 or 3 studies or extrapolations* from level 1 studies||Level 2: Cohort studies; Outcomes research|
Level: 3: Case control studies
|C||Level 4 studies or extrapolations from level 2 or 3 studies||Level 4: Case series|
|D||Level 5 evidence or troublingly inconsistent or inconclusive studies of any level||Level 5: Expert consensus opinion|
All of the information for the measures supplied by the nominators, the AHRQ staff, and the SNAC co-chairs was abstracted into one-page summaries for each measure (example of the one-page summary sheet). These summaries were made available to all SNAC members to review during their next round of Delphi scoring.
The second round of Delphi scoring included a total of 119 quality measures: the 70 measures that either passed Delphi round I (25 measures) or were discussed at the first face-to-face SNAC meeting (45 measures) and 42 new measures nominated after the first meeting. While SNAC members had more information during their individual scoring for Delphi round II, much of the needed information was still missing (Table 2).
|Criteria||Number of Measures||Percent|
|No reliability data||59||50|
|Not in use||29||24|
|No measure validation||42||35|
|No evidence/Unable to grade|
|No information on variation/disparities||76||64|
Additionally, the SNAC members had 1 week to assess 119 measures, which limited the amount of time that could be spent evaluating the merits of any one measure in the set. Given these limitations, the SNAC adopted a philosophy of "leaving an empty chair" rather than recommending quality measures that were too weak or not enough information was available (Table 3).
Of the 119 measures evaluated in Delphi Round II, 65 were scored as being valid, feasible, and important by the SNAC members. Due to the abbreviated timeline and the need to identify a reasonable core set of measures (the SNAC's target number was 25 measures for the core set), the initial plan was to discuss and consider only these 65 measures at the second face-to-face meeting. However, initial discussions at the meeting resulted in adding back five measures that did not strictly pass the second Delphi round (i.e., those with high median feasibility and importance scores [≥7] and median validity scores of 6 or 6.5 rather than the cutoff of 7). Thus, 70 of the 119 measures scored in Delphi round II were discussed and considered for the core set at the meeting.
The RAND/UCLA Delphi process usually involves the experts re-rating the measures individually after all discussions are completed. The scores of the panel are then summarized and measures with passing median VFI scores would then go on to be included in the recommended core set. However, given the large number of passing measures in the initial phase of this round of Delphi scoring (54% - 65 of 119 measures assessed), it was unlikely that re-rating the measures after discussing them would result in 25 or fewer measures being in the final recommended core set. It was also important that the SNAC be able to recommend a balanced core set in terms of the requirements of the legislation, with at least some measures representing several different areas of care (e.g. prevention and health promotion, provision of acute care, and provision of chronic care). Thus, the SNAC agreed to use an alternative approach to further assess the remaining 70 measures under consideration.
This alternative approach involved a series of private votes using electronic voting devices to further reduce the number of measures under consideration. The process involved discussing and prioritizing measures according to legislative criteria and eliminating over-lapping or redundant measures (e.g. there were multiple dental measures and measures pertaining to healthy birth, including the prevention of premature birth) that passed the criteria for VFI. This process resulted in 31 measures for final consideration.
Getting to a Parsimonious and Grounded Core Set of Measures
Three rounds of voting were conducted in succession on the 31 remaining measures. SNAC members could vote for their top 20 measures out of the 31 that remained. In round one, members individually voted for their top 10 measures; in round two their next 5 measures; and in round three their final 5 measure choices. In the first round of voting, measures received 3 points per vote, then 2 points per vote in the second round, and finally, 1 point per vote in the third round. A priority score was then calculated for each measure that represented the total points assigned to that measure by SNAC members after the three rounds of voting. The top 25 measures according to final priority scores were retained for the final recommended core set.
If AHRQ and CMS were to embark on a similar process in the future, ideally it might be organized in a different way than what was described here. With a short timeline, the order in which the steps of the process are pusued is critical for efficiency. Because the first charge to the SNAC was to identify measure evaluation criteria, this is likely where we should have started. But even prior to this, an open nomination process over 1 to 2 months (rather than 3 weeks) where various stakeholder groups could recommend measures for consideration may have resulted in a much richer set of measures for consideration during the process. That said, the Federal Quality Workgroup, including AHRQ and CMS, had to balance the need to consider a comprehensive set of measures in use with the need to ultimately recommend a feasible set of measures (in terms of numbe rof measures) for implementation by Medicaid and CHIP programs.
For similar efforts in the future, more time and resources should be allocated to both the evaluation of nominations and gathering of missing information based on those evaluations. As much as we tried to "level the playing field" for the measures under consideration, some of them had far more complete VFI information than others. In some cases this occurred simply because there was not enough time or resources allocated to gathering all of the missing information for the nominated measures.
One advantage to the short timeframe to complete this work is that it did result in the timely recommendation of a relatively good set of quality measures for the initial core set. The recommended core set is not perfect and neither was this process. That said, the SNAC felt that it was critical to not let the perfect become the enemy of the good. If we set our standards at a level that was too aspirational, we would have had very few measures to recommend. By design, we took into consideration the staffing, funding, and infrastructure that would be needed to implement the recommended measures. In the end, if we wanted these measures to have a chance of being implemented by Medicaid and CHIP programs, we determined that the recommended core set had to be a grounded, parsimoneous set of measures that were in use and thus demonstrated to be feasible to implement. This may be a lower bar than we should have established for the core set. Fortunately, CHIPRA provided support for advancing and improving pediatric quality measures and called for priorities to be set to guide a new pediatric quality measures program. This provides the opportunity to improve the core set moving forward.
By critically analyzing the process used to identify the initial core set of quality measures for voluntary use by Medicaid and CHIP programs, we learned which parts of the process worked and which parts need improvement. We hope similar processes of evaluation and improvement of child health care will be stimulated by implementation of the recommended core set of quality measures.
1. Brook RH. The RAND/UCLA appropriateness method. In: McCormick KA, Moore SR, Siegel RA, eds., Clinical practice guidelines development:methodology perspectives. Rockville, MD: Agency for Health Care Policy and Research; 1994.
2. McGlynn EA, Kosecoff J, Brook RH. Format and conduct of consensus development conferences: a multi-nation comparison. In: Goodman C, Baratz S, eds., Improving consensus development for health technology assessment. Washington, DC: National Academy Press; 1990.
Disclaimer: The views expressed in this paper do not necessarily reflect those of the Agency for Healthcare Research and Quality (AHRQ) National Advisory Council Subcommittee on Children's Healthcare Quality Measures for Medicaid and CHIP Programs (SNAC), the Agency for Healthcare Research and Quality, the Centers for Medicare & Medicaid Services, or other components of the U.S. Department of Health and Human Services. The work was supported by AHRQ Contract HHSN263200500063293 to teamPSA, with funding from the Centers for Medicare & Medicaid Services.