2012 Meeting of the Subcommittee on Quality Measures for Children's Healthcare
Agenda Item: Reflection and Recommended Process Improvements
Charles Gallia: We could have a little evaluation sheet that you fill out and respond and give us a scale and then leave. We would look at it after you are gone and that would be the end of it. But we thought it was important to spend some time talking about the process getting here, but to be solution oriented and forward thinking. I want to keep that. We started complaining at the end the day we could be here for a while. There are some challenges and then some observations that I wanted to share with you that might trigger a discussion that we could have about how we go about this in moving forward.
One of the things that became apparent to me in implementing or using the Delphi process was that we are at a different stage in what we are doing. Instead of nominating measures to a blank slate where there was nothing there, we are now in a position of having relative importance be a consideration that was not in existence previously.
The process that we had used was a mirror of what had succeeded in getting that thousands and thousands of measures winnow down to a few. And now we are at a stage where we are actually making improvements in an existing core set. That just tells me that we have to think differently about how we are going about it.
In an ideal world, we would have some data-driven decisions and in fact, we do have some based on what States' behavior has been in terms of reporting. We know which measures are working, which ones are adopted and used. But there is a lot more that needs to go into that and that is how well this answering the key question at least as far as I am concerned is. How does this core set really reflect the question and the charge that was put to the group through the legislation? How well does this core set reflect the quality of care that is provided to children? We needed to have a process that helps us move closer to having not perfection because that will not ever exist, but progress and improving that core set to be reflective of the quality of care that is provided. I am not sure exactly how to do that.
I have some ideas about a checkbox approach if we have the IOM domains of quality that is listed. That we have life age groups that are reflected. We have delivery systems that are reflected. And to make certain that the end result covers each facet of staying healthy, getting better, and living with illness to make sure we have something that is covered there like was done in the beginning.
And then going back to some of the earlier work that said that we know that there are some gaps, care coordination, medical homes, and the ones that span delivery system locations. What is in the works with either the COEs or not in moving us forward? I just talked.
I am hoping that you might have some thoughts or ideas or suggestions about moving forward. And I know that since there is only five minutes left, we will not be able to get into the details of it. I am going to ask purposefully for people that might be willing to work even more on a group to help shape some suggestions and advice about a process moving forward.
Denise Dougherty: Charles, can I also ask that we talk—I was going to say we are not likely to change this whole Delphi approach addressing evidence base and importance and all that measure by measure in the future. That is up for grabs as well. Just about some suggestions about how the actual mechanics of this process worked and did not work and suggestions for improvements as well as the big—I know the big charge is much more important.
Charles Gallia: I was fishing.
Clint Koenig: I hope this on queue. It is not directly addressing what you are—but I kind of was keeping a list of opportunities maybe. The first one was feasibility and not knowing what States and other organizations felt about this or were concerned about the measure. Perhaps some kind of feasibility weigh in by appropriate stakeholders would be helpful. Luckily you were here to help weigh in on that. I cannot guarantee I will be next to you next time.
And I thought that there was some discussion about convening the adult versus the pediatric measures and some overlap. I did not know if it would be appropriate to maybe convene a group or look at a process that helps reconcile those between the two and if that might make sense.
I noticed this morning that we had a low, medium, and high process and the voting bar was set. But putting that medium piece actually potentially diluted the measure out. And things that could have passed if it were a single yes or no ended up being a no. Perhaps maybe consideration to just an up and down.
Charles Irwin: People should have been voting 1 if it was 1, 2, or 3. People should have been voting 2 if it was 4, 5, or 6. And people should have been voting 3 if it was 7 through 9. We had to somehow adopt a metric that was equivalent to the metric that we were using in the 1 to 9. I am not certain—I think it was hard because people tend to go to 2 a lot of times, not to extremes, whereas I think people might have gone to a 7 in the past when in fact they would not go to a 9. Three was 100 percent when it really was not that.
Clint Koenig: I think just functionally a 2 is really the same as a no as it comes out because your resolution is binary. Either it passes or it does not pass. A 2 is—I am not really sure what that means in the outcome.
Glenn Flores: Actually, I want to interject something about that because that has been brought to my attention that there is a long history, several decades of Delphi methodology and it is fairly rigorous. There are some well-proven approaches. One of them is that you do use the 1 to 9 scoring because otherwise you get this problem of you can have these categories that completely push these things out that if you had the 1 to 9 scale you wouldn't. I would suggest that going forward we think about making sure that we are consistent with the proven methodology so that if you want to write a paper about this and get it published, people are going to say that is not how you usually do the scoring for Delphi.
Denise Dougherty: Well, limited Federal resources. I am serious. AHRQ bought the system with only five buttons on it. That is why we had 1, 2, and 3. That was the reality. It still would not have been consisted with the 1 to 9.
Francis Chesley: We are not going to solve this—but I do think that is well taken. There are other methodologies to facilitate voting other than the one we chose.
Clint Koenig: I had a couple more if that is okay. I was not sure if there was an explicit process for retiring or replacing measures. If a C-section measure was good and it was better than the one—how does that happen? My voting was influenced by we are going to potentially have two C-section measures here and have States and so on jump through hoops. That process I think would have been helpful for me to know and what is the process for stopping measuring.
Again, for me one of the issues was voting for the first generation or seven generations down the road. Do we vote for things that are practical and feasible now or do we vote for things that are incredibly aspirational? I was really torn between that. I do not know if some context or guidance would be helpful.
Charles Gallia: Those are great comments. One of the things about the retirement—when I write in contracts and do State-level performance measures, there is a tendency for people to want to keep doing the same thing. Once it is there and it becomes routine. You try to change it and get really panicked because there is a new measure. But there is some performance asymptotic kind of end points at which are we really doing anything more. It is not necessarily even the replacement or substitution, but it is reconsidering. Is this an important area to be considering on an ongoing basis moving forward? There may be some advances and changes or success that we should reconsider whether or not it is even part of the set because of gains that are made and were there. Part of that criteria is really something that does not exist yet as far as I know.
Feliciano Yu: When we are going through this journey for—measurement, data collection is a big burden as we operationalize this idea especially in the execution phase. Carole has raised the emergence of electronic health records in our clinical settings. Perhaps one opportunity here is to look at of all those measures that we have suggested which of them might over time we can tweak to make the definitions ready for EMR or EHR or some kind of system so that then over time we decrease the burden of data collection and increasing the specificity of the data that we send out so that then we can look at the data more accurately. Maybe some kind of effort moving in that direction.
One of the parameters for our form is health information technology. Some of them are blank because you could not think about it. Maybe over time it is an opportunity that we increase that specification.
Naihua Duan: To follow up on that point, it would be very useful for us to think about the possibility of facilitating AHRQ and CMS to do some evaluation or at least monitoring how those measures work out in the field. What do we think is feasible? We think a GEE modeling is doable. How does it really work out in the field? And that potential would be useful for future years to improve. Marsha talked to me before lunch and she thought it would be something she would be very interested in doing. If we as a committee have some kind of a recommendation that might help facilitate them to do that.
Another thought I want to share is this morning when we were discussing measure—the blood pressure screening, I realized—I never knew before about the statistical methods of using the interquartile range. There is a fallacy that if you make the high score lower, it actually makes that measure look better. That measure in Delphi—the second—had a median of 7. That was pretty good. If you look on the plot on the top, then there was a very interesting distribution. Five people scored 7. Five people scored 9. That made the interquartile range large. That measure I guess did not make it in the Delphi too long.
For that measure if the five people who scored 9—scored lower instead. If their score is 7, their measure would have had a smaller interquartile range and they would have made it into accepted category.
What made me realize this? There are two things. Statistically the idea of looking at a median and an interquartile range is really for unimodal distributions like bell shape. A number of the distributions we are looking at are not actually unimodal. In that way I think, the dispersion we really care about is whether there is a group that is scoring lower and whether there is a group that scores 9 or 8 or 7 really does not matter.
I think one possible remedy I would suggest we consider in the future is not use the IQR and instead like what we did today, use the percentage. What is the percentage that scored 7 or higher?
Another possibility is to use the left—
Participant: (off mic)—that we have been talking about is using the actual—software for the Delphi—which gives you the absolute deviation from the median which is really what they tell you whether there is agreement, disagreement, or it is indeterminate. It gives you much richer information—a more accurate information than what you are getting here—
Naihua Duan:—deviation dispersion in both sides, but we really only care about the dispersion on the left side, not the right side. One option is to use the first quartile and the median, but do not worry where the third quartile is. It is something we can think about and maybe in the future years we can think about maybe a more informative way to do this.
Stephen Saunders: A couple of comments. One is just sort of a mundane how to make the process better comment. It seems like we could have made it more efficient if we had—we only had like 10 or 12 measures to review. Those could have been staffed on a one pager, the numerator, the denominator, and some of the key facts that we could have had up on the screen right away or handed out as opposed to trying to find them and fishing for them. And that would have probably cut an hour or two out of the process.
But the other thing that is probably more important is there were also some measures that we looked at that we thought were important like the asthma hospitalization, but we did not like it because it was population based. There ought to be a way that we report that back to the author and say we really like this measure. It has a high score. Everybody thought hospitalization was an important issue, but we did not like this one part. Would you think about changing and modifying it slightly and actually formalize that so it is not just—maybe it happens but we come up with a couple of recommendations. We would have approved it if it were based on Medicaid beneficiaries with that, the diagnosis of asthma. That would have been a good measure, for example. Then we could have—as opposed to making them guess. You did not choose me. Was it because you did not like it? And so on.
Rajendu Srivastava: Following up on that point, I wonder if—and I thought a little bit about this because I know the timelines were really compressed. Trying to do my due diligence reading all the measures, trying to understand them all, score them within a very tight timeline and then doing it a couple of times. You probably already know that is difficult.
The piece I was trying to figure out is in addition to getting more time what is an alternative solution. I do not know if this is a great one, but I will float it out there. Similar to grant review processes, there is a number of grants that go in and when I review on a committee I know, yes, I am responsible for voting on all of them, but I only have to present a few and sort of talk. I am kind of surprised I am arriving at this because I have never liked the grant review process like ever. Even when I have gotten grants, I have not been very happy. What I was struck with was it was a little disconnected. I am thinking back June. June was the first one and then August was the second. And here we are September 15 and a few things have happened in my life. I do not actually remember certain things.
And then we are looking at this and the ones we discussed this morning, the Medicaid ones, I think actually got a bit of a benefit for being discussed because they were discussed. We have to refresh. People talked about it. And then you could see they either passed or did not pass with stronger opinions because we got to norm. I do not know if you want to consider something that allows some or a group of people to take ownership of the measure and try at least on the SNAC panel that are separate to say I am going to spend 3 minutes and summarize the measure and tell you what I thought the strengths were and the weaknesses or something. I just was worried that if nobody talked that measure got no press and maybe that is fine. But I sort of felt like at times I had to rally to try to save a measure if I thought it was reasonable. I do not know if that is the right process.
Denise Dougherty: We can follow up with you. One question is the sequencing. Do we do that from the beginning or do we do that just from having this final meeting where people are responsible?
Rajendu Srivastava: You only want us to talk for the ones that matter I think.
Charles Irwin: Everyone around here has worked overtime doing this. Reviewing all the materials has been truly overwhelming. I was trying to think of how one like you—how can we facilitate this process. I think we spend a lot of time reviewing measures that we should have never had. I feel like someone else should have reviewed those. I do not know who should have. I edit a journal and only half of what—1400 manuscripts come in, but we only send out 600 for formal reviews. I am trying to think of is there a process that AHRQ or someone can do the first cut or take some of these away, which really—because the hours spent reviewing this were unbelievable.
Francis Chesley: I think those are excellent points. I have to tell you that I took my group down the pathway, Raj as you suggested, as thinking about how we might have some folks on the SNAC represent some of the measures so that they could do a deeper dive than everybody, which would be better than a superficial dive by everybody. And we talked about a number of strategies, but we actually ended up not going that pathway for this meeting. We can talk offline. There was a variety of reasons that we felt that that was not the way to go. It is something I think we could do if we did a better job of planning it in advance.
What we did not want to do is—and wasn't an easy marriage to make was a measure and its domain or disciplines involved and matching that with individuals on the SNAC because it wasn't that precision without adding more people to the SNAC. But nonetheless, I think that these are excellent comments. I think there is something between not doing that and assuming everybody is going to do everything equally. I think it is fair to say that in the next iteration we will have a revised approach.
Charles Gallia:—a parallel on what you are saying. It is a primary—in some of the grant reviews, but also in the human subjects review. We have a primary presenter and a secondary reviewer. There are two people that would do that. Essentially, it is a summation presentation of what is there. It is dispatched in its presentation and then recommendations that would follow. It just makes things move so much quicker, but it does take a degree of trust that the reviewer is doing an adequate job in the assessment and has the subject matter expertise to raise the questions or considerations in their presentation.
Francis Chesley: And in the last trust piece of this and the reason why we chose to do it this way is because we would need to develop some trust amongst the measure submitters that they would understand that a second party is representing their measure and would represent their measure fairly and objectively. There are just some steps to go through. I think we can get there. But we wanted to be very careful with understanding the overwhelming nature of the work we asked the SNAC to do that we treated all the measures that were submitted in the same way for this round recognizing that it was not going to be perfect and that there is going to be room for improvement.
Mary Evans: I would like to speak to the idea, support the idea of some triage. I was very disappointed with some of the applications and that there was no evidence whatsoever. I would like to really congratulate the people in the Centers of Excellence for the kind of material that they gave us, which allowed us to make intelligent decisions about the measure.
In particular, I was really disappointed with some of the emotional behavioral mental health issues that came forward where they said there was no evidence whatsoever or they did not have any down there. I think those are the things that should be triaged. I think the use of the new form is going to be very helpful because it requires that you actually respond. I think that if it comes to AHRQ and it does not have that kind of information, it should not be considered by a group like this and take up the time of people who cannot then evaluate the evidence.
I also hope just personally that some of the COEs will consider taking on some of the emotional behavioral and mental health issues in greater detail so that we can actually move those forward. They came really close this time. But because they did not have the evidence, we did not get them to 7.
Andrea Benin: I sit on this committee called the MAP, which is CMS and NQF committee to look at the adult core measure sets. The idea of that committee is to both create the core measure sets specifically for the inpatient rule making, but also to create this broader set of measures that in theory is going to be used by all payers or whatever. The idea is that conceptually it is a single core set that anybody could access. I think that is the idea of that group. It is a very similar process although actuated very differently.
The advantage that that process has is that there is a fair amount of information on the technical review from the NQF process so they will be able to say this one came to a technical review panel that said it did not meet X, Y, Z. I think that some of what I am hearing is sort of that lack of the initial technical review. And I do not know if there is a way to formalize that. There are several different propositions I guess around that.
My other comment about that is that there were a number of pediatric measures that we voted on. Then I think they go to CMS for the rule-making process for this year. It would be nice to think that maybe in this next coming cycle that we would better look at some—if they are on that list, maybe we should be reaching out to those intellectual property owners and saying do you want to nominate it for this process because to me the idea of coming together and creating these master sets of metrics that people are going to use regardless of what the scenario is and improve child health is that we would potentially look at that. Now, I know that this is actually very specifically about what is Medicaid going to report to the feds and that is a different question to some extent. But there is I think a space for this moving forward to kind of cross reference. Some of these measures that are on this list did get into that, but shouldn't those be footed against each other so that the two groups are at least considering each other's recommendations in some way, shape or form?
Charles Irwin: I want to thank everyone for all their hard work. It is 5 after 5. I think it is time to adjourn the meeting. We have been going since 7:45. It is now 2 o'clock in the afternoon for me. We started at 4. I want to thank everyone for coming and really participating fully. It was a great group. I think we accomplished a lot. I think the comments made at the end will be really helpful as we move the train forward. I hope everyone has a safe—I also want to thank the people at RTI who really worked hard, all of you at AHRQ, people at CMS, all of you have just done a lot of work.
(Whereupon, at 5:03 p.m., the meeting adjourned.)