Pediatric Health Care Quality Measures Program
Measure Criteria Expert Meeting Transcript
February 24, 2010 - Morning Session
Dr. Dougherty: Good morning. Welcome to AHRQ everybody. I'm Denise Dougherty, I'm the senior adviser for Child Health and Quality Improvement at the Agency for Healthcare Research and Quality. Francis, you're up here. My boss, Francis Chesley, who's the Director of the Office of Education, Research, and Priority Populations is coming to the table.
So, first things first. I put this together and put the finishing touches on it this morning before I got in my car and heard the weather.
If you haven't heard the weather forecast close your ears, you don't need to know this. But if you are already thinking about the weather, there is some prediction of some snow, maybe some wind, whatever, overnight. I don't have the exact details, but some people do, and they have already asked if they can leave tonight. So here's what we're going to be doing. We don't want you to spend your time on your laptops, looking for the weather forecasts or which airline is canceling. We will keep an eye on the weather and the airlines for you and provide updates.
If you really are certain that you need to leave tonight, that's fine, and we can give you a number so you can call in and join the conversation tomorrow, if that's your wish. Or you can stay and we will see how the weather goes tomorrow and we can end early so people can get out of here if they need to, or if we somehow get stuck, you can stay overnight tomorrow night.
I want to thank everybody for agreeing to participate in this important meeting and for doing the pre-work that you did that we sent you many, many E-mails and documents, even during the last snowstorm as Nora reminded me, and then for participating today and tomorrow. So we really appreciate your being here and thanks so far for the work you've done.
We have a very busy agenda and a lot of work to do today. You are intentionally a very diverse group, so here we have clinicians who may not be measurement experts. We've got lots of measurement experts, some are child measurement experts, some are what we call generic measurement experts. We have users of measures who may not be measurement experts—and who else do we have. If you're not in one of those categories, you must be bureaucrats. We've got lots of those, yes.
So, this is going to be a challenge. The nature of the task is a challenge and maybe having conversations with all these different languages going on may be a challenge, so we ask you to be patient with each other and with us. We did this intentionally, but we don't know how it's going to turn out, and we would like to know what you think. So we will be sending you an evaluation or feel free to E-mail or call me or whatever if you have any comments, or come up to us during the break and let us know what you think.
So I'm just going to give a little bit of the purpose of the meeting and the overview of the day just to get us oriented. You have your briefing books with the agendas in them. So why are we here? We're here for improving health care quality for children. and since CHIPRA has been referred to as a model for perhaps other measurement development and quality improvement road maps, this may be applicable to other populations, other settings, other situations.
So here's the model. We measure, we examine the measures—which doesn't always happen, you know—examine the data, then act to improve the situation if there are quality gaps or disparities in quality, then re-measure, continuously re-measure to see how we're doing with those quality improvements and eventually we succeed, and we have a healthy population. So, a healthy population with lots of high-quality health care. So we're in part of that road map, that model, and one CHIPRA goal is to have an improved core set of children's health care quality measures. So, we started with an initial core set of Medicaid and CHIP measures, and you're going to hear about how we did that and the challenges we faced from Rita Mangione-Smith a little bit later this morning. So that's over here in 2009. Now we have the 2010-2013 period, and there's lots of stuff going on with measurements and with quality improvement activities.
We're getting public comments on the initial core set, and we also will be learning from experience the way we always do. CMS has just announced its quality demonstration awards to 10 States, some of which are multi-State programs, and a number of those proposals said that they would work on improving the initial core set or testing the initial core set and building additional measures and identifying additional needs. So that will be going on, and what we do here today will be relevant to that quality measurement activity. Then we have in the legislation—it's very unusual that we have this big effort for quality measurement and improvement and money to actually do it, especially for kids. So we have the Pediatric Quality Measurement Program of grants and contracts, and AHRQ is taking the lead on that, working very collaboratively with CMS.
Then there will be other measurement developments. So when we get to the point in 2013 when the CHIPRA legislation calls for the improved core set of children's health care quality measures, all of that will be going into our learning to identify the improved core sets. So we'll probably have a public call for other measures that have been developed while these other CMS activities are going on and perhaps another subcommittee-like group to determine the consensus on the improved core set.
So that's basically why we're here today, because to have an improved core set that is usable by public payers, private payers, providers, and patients across the whole spectrum of children's health care, including perinatal care, we really need to have some consistency in what all these folks are doing. The CMS demo people, the Pediatric Quality Measurement Program people and whoever else wants to use the core set. We can't force anybody to use these criteria elsewhere, but we certainly can force it—sort of force it, I hate to use that word when I'm a government employee—but strongly encourage and provide technical assistance to people to use it in the CMS demos and in our grants and contracts program.
Our target audience for what we are doing here today is really these awardees. The states who have received the CMS quality demonstration awards and the forthcoming awardees of the Pediatric Quality Measurement Program, but the announcement is not out on the streets yet, so don't ask me for it now.
So our questions are, when we started out here and said wow, how are we going to get to a consistent set of core measures that are applicable across all these programs the way CHIPRA wants was to ask if existing measurement criteria are sufficient and specific enough to guide future developers and enhancers of measures. And that means that the criteria could be used prospectively by these awardees, and they would know for example that they would have something to go the National Committee for Quality Assurance (NCQA) with or go to the National Quality Forum (NQF) with. So are the measurement criteria specific enough to give them guidance so they don't spend 3 years working on measures and then find out that they're not going to pass any of the current criteria? And if they're not specific enough, what additional recommendations would experts make? And you all are the experts who are going to give us additional recommendations today. And then assuming we will not get down to the nitty-gritty of specific criteria and confidence intervals and stuff for every criteria domain and sub-domain today, we want you to tell us what next steps we should be taking.
So here's a summary of what we'd like from you. Needed changes to the current measure criteria, particularly in the context of the CHIPRA requirements, and the more specific and targeted to our audiences and awardees the better, and then thoughts and recommendations on next steps that we should take.
How are we going to do that in these 2 days? Our agenda today is that after we go around and say our names and affiliations very quickly, we're going to have some framing by AHRQ and CMS leaders, and that will be followed by some stage-setting about criteria. We're going to have Rita Mangione-Smith talking about the National Advisory Council on Healthcare Research and Quality Subcommittee on Quality Measures for Children's Healthcare in Medicaid and CHIP (SNAC) CHIPRA process, the identification of initial core measures. Ernest Moy is going to talk about the National Healthcare Quality and Disparities Reports' criteria. Helen Burstin will join us by phone, we have her slides here, to talk about—to do some updates on what NQF is doing, and then I will give you the charge for the day for the breakouts and for tomorrow too.
We'll have two breakout sessions this afternoon, one on validity—both on validity. Tomorrow, if all goes well—these have not been updated due to weather conditions—we'll have three more breakouts and an overall synthesis of where we are and then additional thoughts about next steps. Just an FYI, I heard Gil Scott-Heron has a new album out, and he's the one who said the revolution will not be televised, so it stuck in my head. So this evolution of criteria, however, will be recorded. It will be posted on the AHRQ CHIPRA Web site as a slightly edited transcript of the meeting and then possibly as an audio for more accessibility. That would be true of the plenary sessions only, not the breakout sessions. We don't have enough equipment for that, and you need some time when you're not being listened to, I assume. So if you do want your name associated with comments you should speak into the microphone and say who you are before you make a comment. So we'll be calling on you for Q&A in a little while.
Now I'd like to actually see if anybody has any—no? Okay, we're going to then I think go to Carolyn. Carolyn Clancy has some welcoming and framing remarks for us. She's coming to us technologically via DVD. There she is.
Dr. Clancy: Good morning and welcome to the Agency for Healthcare Research and Quality. You can probably see just a few remnants around the building of "Snowmageddon" as President Obama called the blizzards that hit us a few weeks back. It was a very interesting time in Washington. I can't recall another time when weather caused the Federal Government here to shut down for 4 consecutive days. We also had the President's Day Federal holiday and 2 days for which Federal employees could use leave to avoid getting caught in the storm.
So basically our schedule was affected from February 5 through February 15. When we came in last Tuesday for the first time since the blizzards, the snow was plowed so high around the parking lot that you couldn't see the cars when you came through the gates. Fortunately, most of that is behind us now. I personally cannot wait for warm weather and sunshine, and we can get back to a more normal schedule. Unfortunately my schedule did not permit me to be with you today, so I very much appreciate the opportunity to share some of my thoughts on the work that you're about to begin.
But first, I want to thank each and every one of you in advance for coming together to offer your expertise for this extremely important project. Obviously, the key to our success in this endeavor is having standardized measures for use in the improved core measure set, the Children's Health Insurance Program Reauthorization Act of 2009, or CHIPRA, calls for. Your knowledge will be critical to ensuring that we're able to accomplish this goal because as we all know, health care quality is still an emerging science. Unlike other fields of science it has no well-tested theories and few criteria that can be used prospectively by new entrants in the field.
Until now, we haven't had substantial funding to support new entrants into health care quality measurement development and testing. Now that Federal funding has been made available, your work will give us a head start. As I've said ever since its passage a little over a year ago, CHIPRA gives us a road map and a model for linking health care quality measurement to quality improvements, and as so often happens, our child health colleagues are leading the way. Here we have an unprecedented opportunity to develop and enhance children's health care quality measures and improve on the initial core set that has already been published for public comments. We also have a deadline, January 1 of 2013.
We're holding this meeting so that those who will be working to develop and enhance children's health care quality measures under CHIPRA programs will have a common set of criteria from which to work. Only with a common set of criteria can we hope to have a standardized set of core measures.
Standardization can be a scary word, but many State programs and health care providers have come to see the wisdom of consistency across measures. As we work to build this consistency, here is our challenge for today. The health care quality measures and the child health communities have made enormous strides in identifying and often agreeing on key criteria domains of validity and feasibility, but we've let thousands of flowers bloom in very different kinds of soil. We remain consensus-based and not as transparent about the application of criteria as we could be, and this means we can't give measurement developers and enhancers a clear set of rules that will let them know whether their new or enhanced measures will pass the test we've created. This is the challenge we'll begin to overcome today and tomorrow with some very hard work.
Of course, 2 days is probably not enough time to develop quantitative and universal criteria for every aspect of quality measurement, but we hope it's time enough for you to get us off to a good start. We're also very interested in getting your ideas on how we should move forward to achieve our goals. So here's to a very productive meeting. I'm looking forward to hearing about it and seeing the results. Thank you.
Dr. Dougherty: Obviously that was recorded before the latest weather forecast. Okay, now we're going to hear from Victoria Wachino who is going to give us some framing comments and some guidance from CMS, our partner in this effort.
Ms. Wachino: Good morning. It's really exciting for me to be here with you all this morning. I'm Vicki Wachino. I'm the director of Family and Children's Health Programs at CMS, and my job and my group's job—many of you probably already know Barbara Dailey—is basically to manage all of the policy and operations and procedures and management for the Medicaid and CHIP programs, in partnership of course with the States. And that includes quality, and I wanted to talk with you a little bit this morning about quality and why the work you're doing is so important to all of our efforts at CMS.
I think the starting point for all of CMS' work in general and with respect to our quality efforts in particular is really fulfilling the commitment and the promise that the Medicaid and CHIP programs offer to low-income kids and families, and fulfilling the promise of giving them health care that meets their needs and improves their health status. I think that's a big priority, Federal priority in general, I think it's a big priority for this Administration, and I think it's clearly a big priority for the states.
CHIPRA and also the Health Information Technology for Economic and Clinical Health (HITECH) Act create fantastic new opportunities to bring to bear on your work and your thinking to move quality forward. And I think as Carolyn said, it's really quite unprecedented. And since we recently passed the 1-year mile marker of the enactment of those laws I wanted to reflect a little bit on CMS' accomplishments, particularly as they pertain to CHIPRA which I think is what brought you all here today.
If you look at CHIPRA, it did a lot of different things, but its efforts really concentrate around two areas. One of them is expanding coverage to as many eligible low-income kids as possible, and it does that through a combination of creating new options for States and new incentives for States to enroll eligible low-income kids. And so over the course of the past year we have worked closely at the Federal level and in partnership with States to really bring new focus and resources to bear on getting kids in the door. And Secretary Sebelius is very aware and committed to the fact that right now there are 5 million kids out there who are eligible for our programs but not yet enrolled, and CHIPRA brings a lot to bear to that effort of reaching those 5 million kids. And over the past year we've seen a number of States taking up options around eligibility expansions and making it easier for families to enroll by streamlining their programs, by rethinking the way they do things and get families in the door, and that's been extremely productive.
The second and related area that CHIPRA brings a ton of focus to is obviously quality and the reason you're all here today. And the reason I think about those two things in concert is I think it's both important to get all of the kids who need our programs and are eligible for them in the door, and once they're in the door, it's also critically important that our programs work as well for them as possible in terms of providing them with the quality of care that is designed to meet their health care needs. That's why you're here today, and that's why I'm excited to be here with you.
In my career, I haven't spent as much time on quality as I have over the past 3 months at CMS, and I was just saying that even though I'm reflecting on the past year of CMS' efforts, I can take credit for none of those things that I just described because I've only been there for 3 months. But it's been really striking to me as I've gotten to know and understand the CHIPRA quality provisions and as we've been implementing them in partnership with AHRQ. What an inspired piece of legislation it really is, and I say inspired for three reasons. First, because it really brings a level of commitment and focus to efforts around quality measurement and management for kids that I don't think has existed before. Second, it brings a level of Federal resources that I'm quite sure is unprecedented when it comes to quality of care for kids and families. It's $225 million over 5 years. And third, is the very strategic nature of the process that it laid out. It's really quite a thoughtful piece of legislation, starting with the development of the initial core measurement set, which as Denise and Carolyn said is out for public comment now. Then moving to the stage of the quality grants and making sure that CMS and AHRQ are working with States and moving along at the State level as fast as we can and as thoughtfully as we can around the adoption of quality measurement, trying to move towards consistency as Carolyn said, but recognizing that consistency across a 50-State program can be extremely challenging.
I'm inspired because in the next step of that process, even as we are finalizing and refining and polishing the initial core set, is what you all are doing today and starting to think about pediatric quality measures for the whole population. Obviously, that's not the end of the process, and I think one of the striking things about CHIPRA is the way it takes where we were a year ago and moves us so much closer by forcing us all to do this work to a model where we're really able to measure outcomes for kids.
And I will tell you that one of the very striking things about CHIPRA to me and one of my personally favorite parts of the legislation is that we start with Medicaid and CHIP, and that's a good place to start because we cover—at any point in the year a quarter of the kids in the United States come through our programs. So just by developing the initial core set for Medicaid and CHIP, we've made a starting point in improving quality of care for kids.
It's exciting to me that we start there and then build to the rest of the pediatric population because too often in the history of changes to the health care system or improvements to the health care system, we start with private insurance first, and we get as far as we can with that, and then we look back and we say, well what about Medicaid? You know, what about CHIP? What about the public programs? What are we going to do for them? And so it feels right to me both because of the size of the population, the kid population and in the sequencing that we deal with Medicaid first. So I think it's really exciting.
You have a really hard job today, and I will say I know there's snow coming, but I hope you all brought your snow boots and are intrepid and stick it out. I find defining what constitutes quality care and establishing measures and management to be extremely difficult. And I say that, and I don't have the background that a lot of you do, but just—I find thinking about quality in a meaningful way to be extremely challenging, so I don't envy you. As hard as that job is, I think I'm going to make it a little bit harder by telling you about some of the things that we at CMS would like you to think about as you think about quality. And there are just a few so I won't totally over-burden or overwhelm you.
The first is access to care, and making sure that access and quality are thought of together is extremely important. I think if we look at the experience of kids in Medicaid and CHIP, making sure they have access to the services they need is incredibly important.
The second thing is balancing consistency with State variation as I kind of said a little bit earlier. It is critical, and I think this is why everyone's doing this, to have consistent national measures. In Medicaid—it's challenging in any health program, but I think it can be especially challenging in Medicaid and in CHIP because it is a 50-State program, and states are at all different places. Some of them are leading the way, and some of them haven't had as much time to think about it. So really thinking about how we can accommodate State variation and work with States is very important.
The other thing I'm sure that none of you will lose sight of is the need to think about health disparities, disparities in care for different subpopulations, particularly racial and ethnic subpopulations. But I think underlying it all, and the single thing that's most important, is really thinking about quality and quality measures as tangibly as possible so that we at CMS and other health insurers and the States can really use quality measures as a management tool. I think that can be extremely challenging to move from the abstract to the particular, and to really establish—to find the right measures is hard, to define quality is hard, and then to think about the data and what we will do with the data over time as managers of this program and as purchasers—Medicaid purchases one-fifth of the health insurance and health care services in the United States. So we at CMS really think of the work you're doing as management tools, and as hard as it can be to be tangible, I would just urge you to take it to that level in your conversations to the degree that you are able.
I've said a lot about CHIPRA, but I also wanted to talk just a little bit about HITECH, which is also such an enormous opportunity. And as you're sitting here today I feel like we're standing at a turning point where we now have this level of investment to move from systems that have been designed primarily for payment, to pay providers, to information systems that are designed primarily to measure and gather information on the quality of care, and that is both unprecedented and extremely welcome. But I think it also presents some challenges.
One of the challenges for you all may be that the environment is changing so quickly, and who knows what these systems are going to look like. I think we have some idea, but it is a very big change for us. And the second challenge and one that I bring to all of my thinking about HITECH, although I say that and need to note that there are many people at CMS who do more thinking about HITECH than I do, but for me it's really making sure that the technology isn't driving the quality measurements. The goal of HITECH, health information technology (health IT), and electronic health records (EHRs) is to measure quality, and the quality measures should always come first, and we should always be thinking about how all of these new technological resources that are being brought to bear support the goals of quality measurement. So don't put the cart before the horse, don't put the technology before the quality measurement. I would urge that we do the opposite.
The second challenge, and this is a challenge I think for us at CMS and perhaps for AHRQ as well—and I will say that from what I know of our collaboration with AHRQ so far, it's been really fabulously successful and something we value a great deal—is really ensuring coordination at the ground level between CHIPRA and HITECH. I think we have staff that are trying to do that. I think one of the ironies is that after years of maybe not spending as much as we could on quality and certainly not spending as much as we could on health IT, suddenly we had these two huge pieces of legislation enacted almost at the same time, but to my thinking and reading, not a lot of thought about how they would relate to each other. So I think one of the things that we're doing at CMS is trying to bring them together and to make sure that those efforts are complementing and not duplicating, or worse, getting in the way of each other.
The last thing I wanted to say since I'm new to CMS is that we really, really want to know how it's going for you. We want to know how this effort is going, we want to know what more we can be doing to improve the quality of care for kids in this context, in measurement, and just in general. I know a lot of you have interactions with the Medicaid program outside of this context, so we're looking for feedback. And I kind of make that offer at my own peril because I don't want to over-promise and I'm not over-promising, but I can tell you that we won't be able to solve any problem that we don't know about. I say this to almost every audience I talk to—we really want feedback on how it's going for everyone that our program touches, including beneficiaries and providers. A number of you are extremely thoughtful, and senior researchers who thought about our programs for a long time, and we want to hear from you. So thanks so much for having me here, and thank you for all of the work you're going to do over the next 2 days. It's really important to us.
Dr. Dougherty: Thanks very much. I realize we skipped the part where we all go around and say who we are, but you'll have lots of time to do that during this meeting. We only have a few minutes for Q&A. Obviously Carolyn is not going to be able to answer any questions, but we have other people here from AHRQ, and Vicki, you'll be here for a few more minutes?
Ms. Wachino: I'll be here for the next few minutes.
Dr. Dougherty: Okay, great. So let's—anybody have any questions, comments for AHRQ? Yes, Judith.
Dr. Thierry: I just have a question probably for CMS, just a comment, is portability of insurance, interstate issues?
Ms. Wachino: Can you say a little bit more about that?
Dr. Thierry: Well, for American Indians/Alaska Natives, they go to off-reservation boarding schools. They may live in one State, have family in another State, need services, and their services aren't portable.
Ms. Wachino: I think that's an important point, and I don't know—I mean, Barb might know better than I do. We have Native American quality grants, specific Native American quality grants that we are reviewing the applications right now, and I don't know if there are opportunities there to address that issue, but we'll certainly look out for it.
Ms. Dailey: And also, I think the opportunities before us under HITECH, as well as the CHIPRA provision on EHRs, we're developing a program in a specific format for children. That is one of the areas where we're actually seriously taking a look at how those records will help support portability and sharing of information. So in that aspect, we are definitely focused on that.
Dr. Dougherty: Anybody else? Yes. Can you say your name?
Ms. Reuland: I'm Colleen Reuland from the Child and Adolescent Health Measurement Initiative (CAHMI), and I just want to make sure I'm crystal clear on the goals of the measures because as we think about the models your "uber" goal is probably the most important thing to drive how that model plays out. And so I'm hearing you say that the goal of the measures is to improve care and to drive improvements. And so I just want to make sure that that's the centering goal, because when I look at some of the legislation, it seems like there's also a goal to have standardization to be able to compare care across States and to be able to have benchmarks about—to be able to have like a statement about what's the quality of care that children receive. That could lead you in a different direction than trying to have measures that drive improvement, because some measures may allow you to be able to compare States but not be able to drive improvement.
So could you help clarify that dissonance in my head? Because it would affect how—it affects the validity, it affects the usability, it affects the feasibility conversation if the goal is slightly different or if the goal is maybe all of those.
Dr. Dougherty: Well, I would leave that to my CMS colleagues, but I kind of disagree with you that measures for accountability cannot be used to drive improvement because there's nothing like seeing that some State is better than you at something and then trying to figure out why that is and improve your care. But as to the general issue of comparability, Barbara or Vicki?
Ms. Dailey: Actually, part of the CHIPRA program also does require annual quality reporting, and one of the roles for CMS is to develop procedures and a format for States to submit information so that the Secretary of Health and Human Services (HHS) publishes an annual report that is transparent and provides information in terms of what is going on with the States in terms of improving care, and I think ultimately, too, to demonstrate that we're improving health outcomes. So that is also a component of that. So in terms of driving comparability, I think the first part is the focus on improving care and then, through this transparent annual reporting, identifying ways if we need to as a next step if we haven't gotten to the point we can do comparability, if the States are still saying we're struggling, we haven't been able to meet the specifications.
We're still exploring in terms of what the options are going to be for the States. That's one of the things we're very anxious about with these public comments on the initial core measures is what can they do and what can they not do with what's existing now. And then that information is going to feed into the quality measures program. So ultimately at the next set of initial core measures we're hoping that there will be some sort of degree of comparability, but that's going to be part of our learning process in the next 2 years as part of this. I hope that answers your question.
Ms. Wachino: The only thing I'd add to that, and it's a smaller subpoint, and I probably should have mentioned it earlier, is that the State reporting on the initial core measure set is voluntary for the next several years, which I think is really important—because I think it's important for us to learn where States are and what they're able to do easily and what's hard, moving gradually over time to consistent national reporting. And I am glad that CHIPRA gives us the early experience of working with States who are volunteers and the time to really learn what works and what doesn't as we move forward towards consistency because as I said earlier, it's very, very tough.
Dr. Dougherty: Hello, who's there?
Dr. Miller: Oh, hi, I'm sorry. This is Marlene Miller.
Dr. Dougherty: Oh, hi Marlene. Thanks.
Dr. Miller: I was a little bit late starting out, but I also wanted—I signed on a few minutes ago, but the last speaker I could barely hear.
Dr. Dougherty: Okay. You mean during the comment, during the Q&A period, or while she was speaking at the mic? Which you don't know where she is, so that's hard for you to answer. Okay. Can you make sure your mic is on, Vicki? You're not coming through as clearly as Barbara. Okay. Well thank you for joining us, Marlene. So, I think one message for this group today, even though these are voluntary and may eventually get to State comparability, I think States will compare with each other if they want to, and I think it behooves us as a quality measurement scientific group to provide as much consistency in the measures as we can so that the States just don't willy-nilly have to develop their own measures, and then they will never be comparable. So I think that's one of the charges for us. I don't want us to think oh, this is all voluntary, so we don't need any consistency. I think we're on the path toward more transparency and consistency and rigor in measurement, and this meeting is part of that.
Ms. Dailey: And that was our experience in working with States for the CHIP program, since I think it was 2003 or 2004. We had four measures with clear specifications that we asked States to report on, and again, they basically had to change the specifications to meet either their eligibility systems, you know, the whole enrollment period issues that develop. They couldn't have a full year of criteria for some of their kids that were churning in and out, and over 5 years, we still weren't necessarily able to compare across those programs. And so I think that's why this is so critical to give them something in terms of expectation and a scientific nature behind it, you know, educating them why it's this way.
Ms. Wachino: Yes, I think one of the challenges we're going to have, having worked with Medicaid agencies for 13 years on measurement for accountability and for improvement, is that trying to have measures that all States can collect might compete with trying to have measures that actually can drive improvement. So what is measurable and what can actually be feasibly and sustainably done might not actually be what can drive improvement versus the measures that could actually drive improvement might be a longer track. So we'll have that balancing act. And thankfully, there's a Federal commitment to support the work that needs to be done.
Ms. Dailey: And through our reports to Congress, we can make recommendations going forward if there's additional need.
Dr. Dougherty: Yes. Thank you, Vicki. And I think we'll move on to the next set of speakers. As I said, Rita Mangione-Smith is going to provide us an overview of the challenges we faced during the identification of the initial core measurement set. You will also learn what the identification of the initial core measurement set and the SNAC [National Advisory Council on Healthcare Research and Quality Subcommittee] are because I've been flying these little acronyms around. So thank you, Rita.
Dr. Mangione-Smith: Good morning everybody, and thank you Denise for the opportunity to talk to this group about what was really an incredible experience last summer and fall. I think it's fitting that I have 10 minutes to tell you about it because we felt like we had about 10 minutes to do it. So I'm going to be talking to you about lessons learned in that process that the SNAC—we call it the SNAC, actually my co-chair came up with that name I think at 2:00 in the morning as we were producing slides for the NAC. He said, "SNAC CHIP, don't you think that's a good name for it?" I E-mailed back, "Yeah." So that's how the birth of the SNAC happened. So I want you to understand what a multidisciplinary process this was. I definitely want to take a minute to recognize my co-chair for the SNAC, Jeff Schiff, who was amazing to work with. Denise and her staff were incredible during the whole process, also Barbara and her staff. We had two NAC members, which is the AHRQ National Advisory Council for those of you who may not be familiar with health care quality at AHRQ, Tim Brei and Kathy Lohr. And then we had another 20 individuals who took part, some of whom are sitting in the room, and it's nice to get back together with them today. So lots of different areas of experience around the table.
So what was our charge? Our charge was first to do some of what we're talking about today, provide guidance on what criteria we should be using to identify the measures, or to assess the measures that we were going to recommend for this core set. So that was our first charge. Our second charge was to provide a strategy that we would use to try to find all the measures that we should be looking at, and then finally, come up with a strategy for applying the criteria we agreed on to those measures to then come up with our recommendations for the core set. So, our activities went between July and in all honesty the end of September. So it was a really, really short timeframe to get a lot of work done.
So this was our process, and I'm going to kind of go over this in broad brush strokes and focus more on sort of what we learned from the process, what were the problems, what might we try to do differently next time. So it started out with a big effort by AHRQ and the Centers for Medicare & Medicaid Services (CMS) to identify existing measures in use by Medicaid and CHIP. We then got together, the co-chairs and the Federal Quality Workgroup and decided that for our first meeting in July, it would be most useful if we could present the Subcommittee with those measures, with some definitions for criteria for evaluating those measures so that when we got to the meeting we would actually have some evaluated measures to talk about. So that's what we did.
We decided we would do a University of California, Los Angeles (UCLA) RAND modified Delphi process, and if anybody knows my history why we did that is probably obvious. Beth McGlynn's in the room, she taught me everything I know, and you know, that's the way I've been taught to evaluate measures. So that's the process that we decided to embark on. We took the validity and feasibility definitions that are traditionally used at RAND, but then we also worked together as a group to modify those definitions a bit for the process that we were about to go through, which was a little bit different than doing the kinds of panels that we do at RAND.
First, we sent all the measures to the Subcommittee, we sent them the criteria, and we said in a week, can you please get us all of your evaluations back? And they did, which was amazing. So we generated scores from that process and had our meeting. At that meeting, one of the very first things we did as our first charge was we looked at those criteria that we had just used for validity and feasibility, and we did a lot of tweaking. It was a great group, and they had a lot of great ideas. We did decide that the criteria we were using needed to have some changes made to them. We also added an additional criterion we felt was needed, which was importance of the measure, and worked to gain consensus on what that criterion—how that criterion should be defined.
The other big decision we made at that meeting was we needed to go beyond CHIP and Medicaid measures. Everybody felt very strongly about that. We would stick to measures in use, but we wanted to look at measures that were outside of what was being used in CHIP and Medicaid. We then went between the two meetings through the process of trying to find measures and come up with a process that would allow people to nominate measures, and I'll go into that in a little bit more detail. So we got a whole new group of measures, we applied our criteria again at our September meeting and went through some more ranking and voting and came up with the final 25 measures that were put up for public comment.
Okay, so what was our problem? I think I've already made it clear it was the short timeline. Already you can probably tell, there were some problems with how we went through our process. Our process required that we establish draft measure evaluation criteria for validity and feasibility before we ever even met as a group. So what did that result in? We ended up doing some rework. There was some inefficiency that was introduced because of that, because once we came to our own consensus criteria, we all felt like we needed to look at the measures again that we just scored because you know, we wanted all of the measures to be graded using the same criteria. So Delphi Round 1 was not a total waste of time. I think it gave everybody a chance to get their feet wet with using the Delphi process and applying criteria to measures, but it really was a practice round for the group.
So I'm not going to spend a lot of time talking to you about what we agreed on for definitions. Denise has given everybody a handout that basically outlines our definitions for validity, feasibility, and importance, but for the sake of time, I'm not going to go over them in detail right now.
This was a conceptual model that was actually suggested by one of the people who is on the subcommittee and sitting here in the room, Marina Weiss. She had this idea from all of our conversations the first day that what we were really talking about was this, all of our conversations seemed to revolve around this, that there were clearly some grounded measures, there were some intermediate measures that had been developed but had not been used extensively, and then there were the measures that we all wish we had, the aspirational measures. So as a group, the Subcommittee decided what we really wanted to make our scope for the core set were the grounded measures. We wanted to try to come up with a group of 10 to 25 measures currently feasible according to our definition and part of that definition, the one thing I will tell you in that feasibility definition, that I'll emphasize, is that it had to be a measure in use, and there had to be existing detailed specifications for the measure. The intermediate group, again, we don't know how many measures are out there that are like that. Some of them have good specifications, but they aren't being used as widely. And then aspirational measures, obviously the measures that need to be developed through this upcoming program.
Other important decisions that we felt about the scope of what we ended up with was that we really needed to be realistic about staffing and funding and all the needs for collecting, analyzing and reporting data at the State level. Given the economic crisis that we're in, this just was something that we had to keep at the forefront of our thinking. And I already mentioned that we were going to expand beyond the CHIP and Medicaid measures. So those were the main decisions that happened at our first meeting.
Between the meetings, AHRQ developed an online nomination template for measures. So people could go in and they could nominate measures. Many representatives from the Federal Workgroup did that. There were some that were actually entered by AHRQ from people in the public who wanted to enter measures. So basically when you went into that template, you had to enter information about feasibility, validity, and importance. It kind of guided you through the template asking you key questions to help us get at how well that measure met the criteria.
So the problem was that template became available online at the beginning of August, and we stopped accepting measures about 3 weeks into August. We had to do that because we had to have some time to synthesize that information for the Subcommittee meeting in September. So we got many incomplete submissions, unfortunately. Some of them had no specifications attached with their submission, didn't give us any evidence related to the scientific soundness of the measure, and presented incomplete information about some of our importance criteria.
So what did we do? We said okay, well, as much as we can we're going to try to fill in the gaps for the missing information. So Denise and her staff and my co-chair and I spent last summer trying to fill in the gaps. We looked for specifications on measures that had been nominated, we attempted to obtain information related to some of our importance criteria, especially around disparities and variation in care according to the measure. We did evidence reviews of the literature, trying to look for any evidence that supported a given measure and then also went through the process after we identified evidence of grading it using the Oxford Center for Evidence-based Medicine criteria which I think Denise also gave you a copy of what we were using. And then we decided the Subcommittee would go crazy if we didn't distill this even further, so we created these one-page summaries that basically told them the measure name, who owned the measure, the numerator, the denominator, the evidence that supported it, whether there was any validity testing on the measure that had been done, any reliability testing, and whether it met some of the importance criteria.
So we got to look at 119 measures, 119 one-page summaries, and despite all of our efforts there was still a lot of missing information. So for 22 percent of the measures, no specifications could be identified. There was no reliability data on about 50 percent of the measures; 24 percent of them, despite what we said at the very beginning of the nomination template, were not in use currently by anybody that we could identify. The evidence grades, about what we expect for pediatric quality measures. There's not a lot of randomized controlled trial data which is Level A evidence. A lot of outcome studies, cohort studies, that's Level B in the Oxford system. There was very little information about variation in performance on the measure or disparities. Less than half of the nominators provided that. We couldn't find it. So we had our second Delphi process. We had more information on the proposed measures than we did the first time around, but it was still pretty incomplete.
We had 1 week to assess summaries on 119 nominated measures before the meeting, and we decided as a group to adopt a philosophy, and this actually came from the first meeting, that we preferred to leave an empty chair rather than to try to find a measure, no matter how weak it was, or how little evidence there was for its validity. So, here are some of the empty chairs. These are things that are called for in the legislation, areas of measurement that they really want, you know, wanted in the core measurement set that we honestly just could not find measures that we felt good about recommending. Those included things that have to do with measuring the medical home, most integrated health care systems, and you can see down the list, duration of care, inpatient care. Mental health care was a big one where we really just had a very hard time finding good measures.
So what lessons did we learn? If I was going to do this again, here's how I would do it. I would start with the nomination process before we ever met, I would want more time for a few reasons. First of all to obtain nominations from people because I think we missed out on some potentially good measures because of that truncated timeline, more time to evaluate and summarize the nominations and have more than like five of us trying to fill in all the gaps would have been really helpful, and then more time for the Subcommittee to really take in the information and do their scoring. I just felt it was so rushed that it was really hard for people to make good sound assessments about the measures.
I think it probably would have been smart to reach consensus on our evaluation criteria before we tried to grade measures—that would have prevented a lot of rework on the part of the Subcommittee. And I've come to the conclusion even if we had had years, the process was never going to be perfect. So I'm going to stop there because we have other people who need to talk.
Dr. Dougherty: And we'll hear these three presentations and then have some Q&A. Dr. Ernest Moy has been with AHRQ for about 10 years now. He came to us from the AAMC, the American Association of Medical Colleges, if I have that acronym correct, and he was working on disparities there. And since he's come to AHRQ, he's been working on the AHRQ-produced U.S. Department of Health and Human Services congressionally-mandated national health care quality and disparities reports. So he's going to share his wisdom in trying to sort out which measures are good enough for those reports to Congress. Thank you, Ernest.
Dr. Moy: Thank you, Denise. I want to start off with that last comment about having forever and the process still being imperfect and the outcome still being imperfect. I can certainly reinforce that, since our activity is a little bit different. Ours relates to selecting measures and a measure set for a national health care quality report and disparities report, and this involved picking measures and developing measure sets so there's that commonality, but there's some differences as well. And so the message that I'm trying to relate to you is certainly not to do it our way because we've been working on our measure set now for over 10 years, and it is still imperfect, and we still add to it every year, and we still refine it every year. So, probably my message would be not to do it our way unless I wanted Denise to kill me. But some of the things that we explored are probably things that might be of interest to this group, and so I just wanted to tell you our story.
This is kind of what we went through and some of the lessons that we learned. Two seconds on what the reports actually are. The long and short of it is we were asked by Congress through authorization in 1999 to produce these two reports, the National Healthcare Quality Report, which we view as a summary and trends in quality of care in the Nation, and the National Healthcare Disparities Report, which focuses on disparities related to race, ethnicity, and socioeconomic status. Operationally, what does that mean? Well, that means the first five Institute of Medicine (IOM) categories: effectiveness, safety, timeliness, patient-centered, and efficiency are in the quality report, and all of equity is in the disparities report. Operationally this also means the disparities report is several times larger than the quality report.
I wanted to talk about some of the similarities and differences perhaps between our endeavor and this endeavor. We had a number of advantages, I think, first of all, because this was given to us by law. So we kind of knew or we had some insight into what folks wanted from us, whereas this group might have a little bit less insight I think. And so these were some of our assumptions. First of all, the law did specify some of the things we were supposed to do. So for instance, the disparities report said look at racial, ethnic, and socioeconomic disparities, not other disparities.
Then we could make some intelligent assumptions about other things, things that were important for us to specify in criteria for measures and for measure set. First and foremost was our primary audience. So we knew our primary audience as Congress, and so we knew that we were supposed to look at the national level, provide this big picture kind of reporting as opposed to other ways of looking at this information.
Secondarily, we knew that our analytic unit then was the Nation or other geographic units, not the provider, and so that's the big advantage. We did not have to incorporate the aspect of accountability. I think someone raised that issue, you know, you want something accountable, you're going to have different criteria. We didn't have to deal with that. We did not have accountability criteria because we were not going to be looking at providers. Third, our primary purpose, our primary use was viewed a priori as national tracking. So how are we doing, what's the direction we're going, maybe providing some geographic benchmarks, not quality improvement, not pay-for-performance, not public reporting. So again, we didn't have to deal with those aspects of developing a measure or measure set that relate specifically to that, and we don't have criteria necessarily that involve those topics. We also had a number of constraints, annual reporting, using extant data only, and these obviously entered very strongly into our assessment of feasibility.
Our process was perhaps similar to the CHIPRA process but probably also a little bit different. We also had a call for measures. The time period for this, this is 1999 until 2002 when we really started working on the first set of reports. We had a call for measures. We got over 600 nominations from different kinds of organizations, and at that point we'd had a Federal interagency workgroup work with these measures. This was an internal process, not an external process. Again, a little bit different probably, maybe a little bit more streamlined as a consequence. This group whittled those 600 measures down to a smaller set. This set was then published in a Federal Register notice and solicited input into this initial measure set. This was further tweaked. We also had the National Committee for Vital and Health Statistics have public hearings about this draft measure set to solicit input, and from this we got what we finally came up with.
Another building block that we had available too, which we had available to us—I don't know if you have something like this available to you—was the IOM framework. And so a priori again our organization said we're going to use the IOM framework for quality of care, and it pretty much arrays the major dimensions, domains of quality of care affecting the safety, timeliness, and patient-centeredness against patient perceptions of care, staying healthy, getting better, living with illness or disability, and coping with the end of life. So that was the initial framework for the quality report.
The IOM also told us to then to take that and look at it for disparities. They never graphed this out to see what that actually looked like, and when we graphed it out this is kind of what we made of it, and it also made us appreciate how big an undertaking that disparities aspect really is. You take this matrix of all these measures, this square as it were, and now we're going to array it by race. We have more than two racial groups, so it's multiple racial comparisons, multiple ethnic comparisons, multiple socioeconomic comparisons. They get this cube. And so that was our framework. Not necessarily the most elegant thing, but it gives you an appreciation of what work is involved with incorporating a disparities element, something that I think this group wants to do.
In addition, we had to add the dimension of access to care for the disparities report because you can't really look at disparities. Someone had made that comment earlier, you can't differentiate in access and quality from a disparities perspective, and so we had to build this access concept. And then in writing of course this is all supported by and driven by health care needs.
So those are the building blocks that we had, and these are the criteria that we applied. We started off with the IOM's recommendations, their criteria for measure selection, scientific soundness, and feasibility, and I think at least those latter two are the foci of the breakouts for this group. But in addition, because we're the Federal Government, we wanted to be consistent with existing consensus-based measures where possible, and so we relied heavily on other things, both inside and outside the Federal sector. So a strong preference for Healthy People 2010, National Quality Forum (NQF), other kinds of consensus-based measure sets. And that's pretty much our criteria for measure selection. And our outcome was basically about 150 measures of quality of care, and then using a similar process we got about 50 measures of access to care, and that's what we started off with.
Well, about a year or two after we came out with the first report, we said this is not the most optimal of outcomes. Picking a measure set based purely on the measure selection criteria had a couple of problems. First of all, we had this huge measure set, over 200 measures because people would say well, this measure is as good as that measure so you should include that one as well, and this is much too large really for reporting. It was much too large for communicating this information in any effective manner. Secondly, and I heard the balance term used a lot, it was unbalanced. Picking the measures purely on their individual measure criteria resulted in a very unbalanced measure set, often difficult to interpret. So we would have multiple different measures for a particular topical area, and it's kind of hard to explain to a policymaker what to make of it—there are three up and four down, and so on—they don't want to hear that. They want to hear a single message, so it was very difficult to interpret.
Lastly, this measure set had a large number of measures that weren't applicable to disparities. So we went through this process of looking at the quality of the measure and we wound up with a number of measures for which there was no disparities information, not usable therefore in the disparities report.
So we wanted to rework it, and in reworking these were our goals. One was to get a core measure set that was smaller, and so this was what we ultimately operationally figured out was feasible. We can report on about 50 measures every year and actually, we think, say something reasonable about them, track them every year, and people can know that they're coming up and know what to expect and have some knowledge about that number of measures, not more. We wanted to balance out the measure set, and I'll talk about some of our balancing criteria. We wanted to emphasize understandability and for policymakers, understandability meant summarization and composites. We had a number of expert panels helping us with that process. But that's the direction we went in intentionally. And we wanted to make everything usable in the disparities report, so everything ought to have a disparities analogue.
So, we went to Phase II measure set selection criteria, and again we got our whole Federal interagency workgroup to review our measure set. And we've done this pretty much every year, continually adding to it, refining it. And the criteria we used obviously have grown, naturally, as our measure set has. So we still use the IOM criteria, importance, scientific soundness, feasibility, we still try to maximize consistency with what others put out there so that people don't get conflicting information, and these are new kinds of criteria that we considered in looking at the measure set as well as individual measures. We focused on issues that were of high utility for directing public policy, and for us that means things where we think that there actually is a driver there. We looked at things that potentially were sensitive to change, and that was a bias. That favors then processes that are changeable over outcomes, which are more difficult to change. Ease of interpretation, applicability to overall population, data collected regularly and recently, ability to do multiple disparities as opposed to only one or two, and ability to support multivariate modeling. That's one of the things that came out to us in our disparities report was that certain constituents really wanted to be able to do multivariate models to isolate the specific racial or socioeconomic disparities effect. So these are some of the new measure criteria we considered in improving our measure set.
In addition, we had a whole series of balance criteria. So these are not looking at measures one by one but instead stepping back and looking at our whole panel, our entire measure set, and these are some of the things that we've tried to balance across. One, balance across quality domains. When we started out, we still were very, very highly focused on effectiveness, but we think we've been able to enrich our safety, timeliness, patient-centeredness, and equity measures over time. Another was even though process measures are more actionable, we intentionally wanted to balance process and outcomes measures because many policymakers are interested in the outcomes, and preferably if the process and outcomes can be linked as in the CHIP criteria, that was the best story that could be delivered. We wanted to balance across different kinds of health conditions. We wanted to balance across different sites of care. We intentionally wanted different types of data. The notion here was that every kind of data has its own kind of limitation, its own potential biases, and so we felt the most comfortable operationally when we had different kinds of data, administrative data, surveys, and clinical data telling the same story, and we actually felt the most confidence about that assessment of quality.
We wanted to include at least some measures that had State data and some measures that allowed for multivariate models. Even after 10 years, there were a couple of different criteria that are still unresolved for us. One, we still took the ultimate model of taking a quality measure set and then looking at it across different populations to assess disparities. We did not include specific measures for specific populations. So for instance, if you have one population that has a very important kind of topical area that isn't applicable to the general population, we don't have measures that are like that. We also don't have explicit measures of disparity itself. We're always taking quality measures and then looking at differences as our measure of disparity. We know this is suboptimal, but these are just simply unresolved issues, even after 10 years. So that's our process and hopefully it might be helpful to you.
Dr. Dougherty: Thank you. So you had a lot of the same issues we dealt with last summer. And if people want to see the good—actually, it's a very good measure set I think, not to bias your public comments that you're going to send in, but in the back of your briefing book you can see the 24 measures that are actually out for public comment that came out of all that work that Rita described. So now I'm going to ask Helen, are you ready?
Dr. Burstin: I'm on the line.
Dr. Dougherty: Okay, great. I'm going to try to find—this is Helen Burstin who is the Vice President for Performance Measurement, is that still your title, Helen?
Dr. Burstin: Vice President of Performance Measures at NQF, yes.
Dr. Dougherty: At NQF, okay. And she is going to speak through the phone.
Dr. Burstin: Great, thanks so much. My apologies for not being there in person. I can't be split in two today unfortunately. What I want is address some of the areas that Denise asked me to do in advance. So I'll be talking a bit about the National Quality Forum (NQF) measure evaluation criteria and how we use them, and then specifically thinking about some of the children's health care quality work we're doing currently, including a recent meeting NQF held with National Initiative for Children's Healthcare Quality (NICHQ), our current child health outcomes project and then plans for endorsement that we're talking about for the initial CHIPRA core set that I'll talk about at the end.
These are the updated NQF evaluation criteria. They were updated about a year ago with a couple of important changes that I'll mention. The first is the importance to measure and report criterion. This is now a must-have criterion. If a measure doesn't make it through this criterion we actually won't look at the other criteria at all because we think it's really important to make sure that the measures we're putting forward, there's clearly an appropriate level of evidence for the focus of the measure, there's an opportunity for improvement overall or a significant variation across providers or regions, and that there's a relation to a priority area, one of the national priorities, for example, or a high-impact area of care. We define that quite broadly in terms of cost, morbidity, and mortality for a given population.
The second one really gets into the measurement properties itself, scientific acceptability with a specific focus on reliability and validity. I'll give you further information on each of these later. Usability is an important one because you really want to understand does the—can the intended audiences use the results of the measure for decisionmaking to make better decisions? And lastly, feasibility, can we implement the measures without undue burden in terms of paper trails or things like that, can we move towards capturing those electronic data or an electronic health record (EHR) moving forward?
So as I mentioned, the importance to measure report. We really tried to make the argument that the measure focus is important because you want to ensure that it's important enough to extend the resources for measurement and reporting, not just that it's an important broad topic area. I mentioned the issue about relationship to a specific National Priorities Partnership (NPP) goal or a high-impact area of care, evidence to support the measure focus and opportunity for improvement is a really important one, and it's often somewhat challenging in terms of having the baseline data available to say what the current level of performance is overall, to say whether it's an important area to measure. It's fine if there's a reasonably high level of performance that there's significant variation across providers of population.
Next is scientific acceptability of the measurement properties, as I mentioned. Probably the most important one here really is the measure has to have precise specifications such that those who measure it in different settings are able to get the same answer. Reliability and validity obviously are quite important. We do have an opportunity for measures that otherwise pass all the other NQF endorsement criteria but have not yet undergone adequate testing to come in under a time-limited endorsed status for the year while the measure is tested to be able to give us further information on reliability and validity, but it's really not endorsement like. There's an expectation that that measure otherwise will fulfill all the other NQF evaluation criteria. As much as possible we'd like to be able to see whether there's comparability, if there are different data sources used to collect the information for the measure, and we also want to ensure, particularly given the strong emphasis of the disparity, obviously quite relevant to the CHIPRA discussion, that the specifications allow for identification of disparities. If it's an outcome measure, there should be appropriate risk adjustment or there should be some justification to explain why risk adjustment is not needed, and where increasingly you're looking hard at exclusions with the idea that oftentimes it's the exclusions that are making these measures very difficult to implement if the measure is very weighted down with exclusions. We're increasingly asking developers to give us evidence to demonstrate the impact of the exclusion on the overall rates of the measure.
Next I mentioned usability overall. Are the results meaningful and usable, understandable to the intended audiences? An important consideration as well from NQF's perspective and obviously very appropriate for the CHIPRA measures that we're thinking about here is that the intent of an NQF-endorsed measure is that it's useful for both public reporting and informing quality improvement, but especially that it is appropriate for public reporting. We don't endorse measures solely for quality improvement.
The last item as well is important. We want to ensure that the measures are harmonized—we don't want a cacophony of measures on similar topics with different specifications—and that the measure provides a distinct or additive value to what is already endorsed.
Next is criteria feasibility. Again, as I mentioned, the extent to which the data are readily available without undue burden. This is particularly important as we move towards thinking about getting these data elements moving forward out of electronic sources, and we actually are now requiring that people at least provide for us a credible, near-term path to how these data elements can be collected in an electronic system and moving towards actually specifically documenting which data elements are part of the NQF quality data set. This is work that was supported by AHRQ that has been very important to thinking about what those core data elements are that need to be in the EHR that we will have to allow us to get quality measurement reporting and improvement. As we think about this quality data set, we want to get at those standard elements. That might include the code set, actually, in this instance for example for diabetes, the ICD-9 code set, the code lists for diabetes the specific codes, and then the ability to specifically get at a similar language and terminology around quality data types and active diagnosis for example. And we've now done this across the 500 measures within the portfolio, and obviously, as we move towards more children's measures that quality data set will be alive and well, and we'll add those data elements as needed to make that work.
So just turning in the last couple of moments to where we are in terms of some current initiatives and work to follow on CHIPRA. We did do a meeting with NICHQ, Charlie Homer's group, in January, and so the specific idea here was to apply the NPP framework to help the needs of children, to think about how those definitions and categories might need to be interpreted slightly differently perhaps to meet the overall framework. Thinking about Charlie's group and others, thinking about how you then build a broad engagement around that framework for use for a broad set of stakeholders including States and others. There is a whole set of emerging gaps in child health quality. The report is not completed yet, but just a couple of areas that were highlighted as emerging gaps: patient and family engagement, mental health care coordination, population health safety and overuse, and access. And we'd be delighted to share with the group those proceedings as they're available.
In the last couple of moments, just to give you an update of where we are in our child health outcomes project and our work, the add-on work on CHIPRA. To date as you know from prior discussions, NQF has endorsed more than 70 child and perinatal measures, but there is a recognition that there were very few outcome measures for children.
We formed a child health outcomes steering committee within our overall outcomes project funded by the Department of Health and Human Services (HHS) and ably chaired by Marina Weiss and Charlie Homer. They're going to be meeting in May to review 32 submitted measures with an expectation of endorsement of those outcomes by November of 2010. There was actually some very good thinking by the committee to really push the envelope for what is an outcome for children beyond just the health care setting, for example, from just titles and some measures that were submitted, including number of school days missed, children living with illness, safe schools and neighborhoods, and pediatric pain assessment. As you can see, there's definitely a pushing of the envelope towards a different set of measures.
Lastly, here are the planned CHIPRA projects. We've been working with HHS, specifically the Centers for Medicare & Medicaid Services (CMS)—to think about how we might bring in these core measures as they're finalized after the comment period for evaluation. We're going to use that same child health steering committee I mentioned since they're already in place and begin with a call for measures for this CHIPRA set of measures likely in July.
One important consideration is that NQF does not just bring in a limited set of measures. We would need to do a call for measures broad enough to bring in the core set of measures but also other measures that could be used to assess care provided to children at the State and programmatic levels. We'll obviously work with HHS to think through how to write that call so we're not deluged, but at the same time, we're thinking about the future iterations of the CHIPRA core set moving forward. We plan to have reviewed in September, going through our process of hoping that these would be endorsed by April of 2011.
Again, reaching back to this concept of Health IT. We were specifically asked by HHS to think about how these measures ultimately might be brought to multiple data platforms including the EHR. And we've currently got some work beginning around retooling of measures this year and the development of a measure offering tool that will allow measure developers to develop measures de novo for EHRs or retool them using this tool built on the quality data set. So we'll—we'd obviously be delighted to work with the measure developers to move towards retooling those endorsed measures to get at the EHR. If you have any questions, I'm happy to take them.
Dr. Dougherty: Okay, thank you very much, Helen. So you have a few minutes for questions. And before I go into what we're going to do here today—but you already know this was broad stage-setting, except for Rita's talking about some specification issues and so forth—do you have any questions or comments for Helen, Rita Mangione-Smith, or Ernest? Rita, you can come up here if you want to use the mic. There's also another pediatric quality measure activity going on. Congress under the CHIPRA legislation called for an Institute of Medicine (IOM) study of pediatric health measures, health status measures, and health quality measures. So they're also looking at this issue. They'll be done by July 2011, and by the end of 2011, we'll have lots of information. Hopefully we can put it all together. Yes, Jerod.
Dr. Loeb: Denise, I'm Jerod Loeb from The Joint Commission. I guess I'm getting increasingly worried as I listen to the discussion here and at many other tables about a variety of trains operating on a variety of tracks, all going at different speeds, all of which appear to be crashing at some point. There are measure prioritization activities underway every place I go, and everybody is doing their own little siloed measure prioritization. I guess my question is how do we get these trains on one track? This is—it almost defies logic.
Helen, you've heard me say this a gazillion times already I know, but this is, you know, it's legislatively driven in some respects, it's driven at the Federal and the State level, it's driven by payers, it's driven by entities like my own and others. I don't know how we take that cacophony that Helen just talked about and truly harmonize it, not in words only, but really make it happen.
Dr. Burstin: Just one brief response Jerod, and I agree with you it's really important to get a handle on this. That was part of the idea of asking NICHQ to kind of take on this role of bringing together the stakeholders to try to at least make some sense on the child health side. But I agree the more we can stay coordinated moving forward, which is why I specifically wanted to bring up some of the measures already in the pipeline that might be used for consideration as well as there's this new effort to bring in new measures.
Dr. Dougherty: Yes. I mean, the one good thing is that CMS and AHRQ and HHS more broadly are all involved in this and keeping each other posted, and hopefully we will be talking to each other. I mean, we do have—the CHIPRA legislation did require the Secretary of HHS to have a multi-stakeholder activity, and we interpreted that to mean a highly publicly transparent process, to identify the initial core set of measures, put it out for public comment, and then to have this next phase of enhancing, improving the measures that we already have, and developing new measures where such are required. I know CMS, maybe Barbara can speak to this, is really trying to figure out how do we keep all these trains moving on the same track. And when we do our pediatric quality measurement program that will be in coordination with the CMS quality demos and also we'll have a coordinating center that will have public engagement in what the awardees do. And the last stage is this improved consistent set of core measures by January 1, 2013, which as I said will take not only the pediatric quality measures program into account, but the CMS demos measures and every other measure that's out there in order to come up with this improved core set.
As for priorities, we are also asked/required in the CHIPRA legislation to have the Secretary come up with a set of priorities using multi-stakeholder engagement. We have not put out a Federal Register notice on that yet because we figured we had enough from the first process we had, the SNAC [National Advisory Council on Healthcare Research and Quality Subcommittee] process, to have some priorities, fleshing that out with some CMS priorities in terms of high prevalence and so forth. So as you said, some is driven legislatively, some is driven in other ways.
Dr. Loeb: To be clear Denise, this problem isn't unique to the peds population. This problem is absolutely ubiquitous.
Dr. Dougherty: Yes, right. Anybody else? Please say your name.
Dr. Romano: This is Patrick Romano from UC-Davis in Sacramento, CA. I'm also a contractor to AHRQ. I'm going to pick up on Jerod's point because I share a little bit of concern that this process, as well-designed and as conscientious as it is, may be reinventing the wheel a bit. And NQF is the organization that is sort of the official congressionally recognized multi-stakeholder organization for endorsement of health care quality standards and measures, and they've thought quite a bit about child health care quality measures. I've been involved in a couple of committees with NQF that have looked at child health care quality measures. Obviously they're heavily involved in this CHIPRA process as well. So I'm wondering if people could comment a little bit more on why there's a need for a separate process here to create a separate set of priorities other than what NQF has already enunciated, and how will we make sure that we're reconciled with what NQF is doing?
Dr. Dougherty: Okay. And that was going to be in my next presentation, so I can go through that more quickly, but what we found when we asked people to send in their measure criteria is that all the basics are there, validity, feasibility. We decided not to do importance because of the separate effort on priority-setting. All the basics are there. What you can't really tell from that is if somebody is starting to develop or enhance a measure, you can't give them something they can use prospectively in enough detail that they will be able to pass that NQF test. I mean, there are a lot of—I know there's lots of call for documentation by NQF, by National Committee for Quality Assurance (NCQA), by the American Medical Association (AMA). Everybody has the same kind of big criteria domains, right? And then they have varying levels of what you need for documentation and so forth. But the decisionmaking process once you submit your measures is less than quantitative and transparent, and I think if we're going to say to awardees you have to use these criteria and at the end we're going to be able to tell Congress these awardees for $60 million or whatever have used a standard set of criteria, and here are our new measures or our better—improved measures, we don't have a way right now to do that except, everybody can interpret. I don't know how many awardees we'll have. Everybody could interpret what you need to demonstrate underlying scientific soundness by themselves, in which case we'd have even more chaos at the end because we wouldn't have a consistent set of measures to give to the States.
Now, whether the States use those consistent measures, tweak them or whatever, the legislation did not speak to that. But we need to be able to give guidance to the people who are going to be taking this process forward on behalf of HHS and the Congress. Does that make any sense? Okay, thanks. And I don't want to diss anybody's measures or, I mean, I think it's been fabulous. Everybody is coming together on what the domains are, and there's lots of commonality across everybody. It's just prospectively, I think prospectively this is beginning to happen now because now people will say okay, I need to pass the NQF endorsement test so I need to look at what their documentation is so I can develop my measures to meet those. But I think we still need to go a step further at least for people who are going to be grantees or contractors and tell them specifically what we mean by criteria. And that's the purpose of this exercise. I'm glad you asked the question. Yes?
Dr. Burstin: Just one small comment. You made the point that oftentimes the transparency is limited. I just want to point out that actually everything we do is completely transparent, I mean down to the transcripts of the steering committee meetings on our Web site. So that should not be a concern. I certainly understand some of the need to give grantees additional guidance, but in terms of what makes a measure at the end of the day a standard that could be endorsed and used as a national consensus voluntary standard, I mean that should be a set of criteria that I think has been already held up to the test of time here.
Dr. Dougherty: Yes, that is very true, and I didn't realize those transcripts were on the Web site. But I think in addition, CHIPRA had some additional—there's overlap between what CHIPRA is calling for and the generic set of criteria domains and criteria. So CHIPRA wants these measures to be applicable across all public and private payers, public programs, patients, and providers, right? Consistent across all those. Well, we don't have criteria for developing a measure that looks like that yet. Also, to be able to identify racial and ethnic disparities in quality—we really don't have prospective criteria to tell people how to develop or enhance a measure so it does that. Or to identify children with special health care needs and disparities between children with special health care needs and other children. So that's another reason for this kind of meeting and exercise. Does anybody want to add anything? Now I don't have to do my other presentation.
Dr. Savitz: Actually, this is Lucy Savitz from Intermountain. Could I just comment on two things? First of all, thank you for letting me join by phone. My two concerns, and I hear this sort of scattered throughout the comments and also in some of the materials, is a desire to draw on measures that can be constructed from EHRs and also the importance of the issue of looking at health disparities. And I'm just concerned that there's such low uptake of the EHRs, and there's limited flexibility in those records, that I think many people don't appreciate that they may not allow people to draw down the data that would be necessary to construct measures. So I think that's an important thing to keep in mind as we go forward. And the other thing is the absolutely poor data that exist and in many cases don't even exist at all on race and ethnicity. And I don't know whether or not there's consideration about requiring that those data elements be collected, but as we move forward—or standardizing that in some way?
Dr. Dougherty: Yes, thank you for bringing that up, Lucy. And yes, that's on everybody's mind here I think. We are going to be talking in-depth about the feasibility issues, especially with relation to the American Recovery and Reinvestment Act (ARRA) regulations and also about how we move forward on identifying those racial and ethnic and special health care needs groups, given the data that we have. And maybe—what Ernest said is about selecting some measures that are very important to specific populations may be a way to go at that. But we'll be talking about those tomorrow. Luckily you're on the phone, so you don't have an issue of whether you have to fly home tonight. I think what Vicki Wachino said this morning, which was very heartening, was let's not let the technology drive the quality measurement and improvement. Let's think about what the quality is first and then figure out how we get the technology to do the right thing and help us out with that. So does that help, Lucy?
Dr. Savitz: Yes, yes it does. I just wanted to be sure that people were clear because even at Intermountain we have problems on both of those issues.
Dr. Dougherty: Okay, any other comments, questions? Helen, are you still there?
Dr. Burstin: Yes. The only other comment I would make is that while I agree that the EHR piece is aspirational, I think it's important to at least get the measure developers to begin thinking about what those key data elements are that would be required to make that measure work in an HER environment, and just to add a piece to that, we actually did explicitly include within the quality data set race/ethnicity language as core data elements for clarity on exactly how those should be built in, should be part of the quality data set.
Dr. Dougherty: Okay, thank you very much Helen. Now, we're mixing the agenda up a little bit. Now that you've heard all the big picture stuff that's going on, let's talk about what we are asking you to do for the rest of the day and for tomorrow. This is just a little summary of what we asked our measure criteria entities and all of you to do during the pre-work. As you know, we're getting toward an improved course set, which means we're going to try to make it as consistent as possible going forward with the CMS demos and the pediatric quality measurement program.
Why are we looking at measure criteria? I think we've gone over that already. We learned one lesson from Phase I, that it's better to have criteria up front than to try to retrofit. Even though we weren't trying to develop new measures, it's still better to have the criteria up front. The users are going to be these awardees, and others can certainly use the criteria we come up with. So our goals today —we're going to be handing you some templates and asking you to suggest changes to any existing measure criteria with very much of a focus on choosing the most important change, addition, deletion, or piece of documentation that's needed and writing that up and giving that back to us. We know we can't cover every criteria, domain and subdomain today.
Then, since we won't be able to cover everything, we'll be asking for your next steps about how to move forward on this process of getting more specific and predictable and usable criteria for prospective uses. And just let me say, we know that this is an issue—and it's not just an issue for quality measurement. We have had, you know, RCTs, randomized controlled trials, at the top of the hierarchy of evidence as if they were the most terrific thing since sliced bread. We also have multiple sets of guidance for people who are going to be doing RCTs, who are going to be publishing RCTs about what you need to report, exactly how you need to do the randomization, exactly how you need to do your sample selection, how you need to do the analysis—so this is an ongoing scientific process across all of health care and medical science. So the quality measurement folks are part of that now. I think it's important to be, as I said, transparent, predictable, and as rigorous as possible so that we can share our information.
So what did we do first? We asked leading holders of measurement criteria who are using the criteria for either endorsing or developing health care quality measures either generically or for child health to share their measure criteria. We then got a lot of different documents, and we tried to put them into a common spreadsheet by criteria domain. That was not an easy process as I'll mention later. And then we sent the section of the draft spreadsheet to each measure criteria holder and asked them to look at whether we had it right. But we still probably don't have everything right because in a lot of cases, there is no right answer.
Then most recently, in the last couple of weeks, we asked you, after we had looked at all these criteria, to use an Internet response method which we didn't quite have figured out, I mean I didn't quite have it figured out. So we sent you four questions, and some people said I can't possibly answer these last two items, and they were absolutely correct for Items 3 and 4.
So what did we ask? And the survey monkey questionnaire—it's not really a survey, it's a bunch of questions—it's number four or six. Number four, okay, so we asked what general approach should be used in developing and coming up with criteria and using the criteria. We asked about two aspects of validity, one was underlying scientific soundness, and there we used the Oxford Center for Evidence-based Medicine/U.S. Preventive Services Task Force criteria and the SNAC modified Delphi criteria that Rita talked about. And then we had a category of "other." I have a "do not enter" sign there for the validity of the measure itself. That was supposed to be open-ended questions, not quite articulated clearly enough, so we didn't get much useful information there, and that's not anybody's fault but mine. Also, feasibility was also open-ended, and that didn't work out too well. CHIPRA criteria, that was open-ended and we did get some interesting information there.
Okay, here are the results from the measure criteria sets. Criteria domains, as I said before, we found people—entities use very common domains, validity, two aspects, underlying scientific soundness which we may now add to—I think Helen put it under importance which may be the way to go because it's, you know, why have a quality measure for something, some service that's not evidence-based and going to make a difference? So that's a possibility.
So then the other one is the validity of the measure itself, which has a lot of aspects to it. Reliability we included with validity as most people do and feasibility, as I said, we're going to talk about tomorrow, and we got their information on that. We excluded importance for the reasons said before. We included understandability in our request for criteria or our analysis of criteria, but when we looked at the agenda today, we didn't have much difference. People basically say understandability means the measure is understandable by the user, so we didn't have much variation, and I didn't think we should spend our time drilling down on that topic. Certainly people have. We also added the CHIPRA domains, which are child-specific, which we interpreted as focusing on the multitude of care settings and financing approaches that are used for children's health care and also the ability to identify disparities.
As to the results, we had 12 entities participating in full. For this meeting, we're focusing on eight of those, the National Quality Forum, National Committee for Quality Assurance, the AMA Physician Consortium for Performance Improvement (PCPI), the AHRQ Pediatric Quality Indicators, the National Quality Measures Clearinghouse, The Joint Commission, the National Association of Children's Hospitals and Related Institutions (NACHRI), and the AAP Steering Committee on Quality Improvement and Management (SCOQIM). Those last two are child-specific, along with the pediatric quality indicators. We thank these folks who gave us their information.
Here are some general observations, and you've heard these already. A rose is not a rose is not a rose, terms are not used consistently, so making the spreadsheet not easy to scan across and say okay, when they say underlying scientific soundness, you know, here's exactly what it means across all the different measurement entities which is, again, not surprising. That's the science, science is science, you'd think everything would be organized, but it's definitely not. We still have discussions of what evidence-based medicine means. So the bottom line is scanning is not easy, but you know, lots of wonderful work and collaboration have been done, and there's enough similarity overall so that we can understand the state of the art. However, work as I've already said millions of times is still needed, you know, what documentation is required, there's not consistency about that, and how the criteria and documentation are actually used in practice. So does one confidence interval get you into the endorsement and another confidence interval not get you into the endorsement? We don't have that level of specificity yet. Work is still needed on the CHIPRA-specific criteria domains as people have mentioned.
IOM actually did a report for CMS focused on Medicare performance measures and devoted a whole chapter to saying we really need a research agenda on performance measurement development. So this is our chance to do that for children's health care quality measures and not just a research agenda for the sake of having the most wonderful psychometric properties of measures, but actually to do something useful. Our tentative conclusion was that it would be difficult to give awardees criteria in specific enough detail to arrive at a standardized core set by January 1, 2013. So that's why you're all here.
Okay, the survey monkey results. The general approach, the top choice, and again, not everybody filled this in, is the pre-specified quantitative cutoffs for criteria to the extent possible with calculations publicly reported. So that was the one with the blue. The blue columns are strongly agree, the purple columns are agree. So the strongest disagreement was about choice one, which was evidence-informed consensus without the details publicly reported. So I think we're headed in the right direction, or at least the people we invited to this meeting think just like we were thinking initially.
Dr. Miller: This is Marlene. I have to say I at least had a very hard time completing that survey and ended up not completing it because it was unclear what was being asked.
Dr. Dougherty: Okay. Yes, I admit to that, and some people had difficulty because they felt like they didn't have the expertise. Other people had difficulty because the questions were unclear, and so this is an N of about 12 to 14 or 15 filled this out. We can discuss later whether you disagree with this general approach.
Dr. Miller: No, I think it was just that each question was actually 10 questions embedded into one. So when I hear you say oh, everyone sort of agrees on it, I can at least say I didn't even know what was being asked because a lot of questions were bundled into one.
Dr. Dougherty: Yes, well this particular one though was I think pretty clear. I mean, first what we asked people to do was pick their topic since we're going to break out into breakout groups today by topic because you can't possibly discuss medical home and inpatient measures and health outcome measures in the same discussion when you're talking about validity. I think you could, but we thought it was difficult. It would be better to make it more concrete. So the first thing we did was ask people to select the topic that they were responding on, but most people actually said all topics. Most people who responded said they were talking about all the topics. So I haven't included that here. I think this one was clear, and I can get you the responses on this one. The differences aren't huge, there's no statistical significance here, it just gives an idea that I think some people who responded think we might want to have some more specific and transparent information, but not everybody certainly, and probably not for every purpose.
So this other one was choose your topic and then tell us which—this one I think also was pretty clear because it only had a couple of options. For the topic you've chosen, indicate the extent to which specific measure criteria should be required for assessing whether a proposed measure has sufficient underlying scientific soundness. And choice one was the Oxford Center for Evidence-based Medicine hierarchy or the U.S. Preventive Services Task Force, and again, low response rate, but that got the strongest ratings for strongly agree and agree. But there is, as you can see, considerable disagreement for the options of disagree and strongly disagree. There were quite a few people who said that neither of those should be used.
Now we're skipping the validity of the measure itself and the feasibility, but going on to the CHIPRA domains. And here we asked—we said here are the CHIPRA requirements criteria, evidence-based, understandable, children's special health care needs, children in general, racial and ethnic disparities, socioeconomic disparities. Is there any factor that you think we need to take into consideration while we're developing criteria. And here are the results. We'll circulate these when we get a chance here.
People said the measures must be actionable, there has to be concern about the extent of the quality problem versus special interests—I'm not sure what that means. Shouldn't be cost-prohibitive, rely on existing data sources, be able to do subgroup analyses within States, evidence-based is a factor, and there are small samples for some conditions. Evidence-based means we don't have a whole lot of evidence for a lot of pediatric health conditions and their services. Here are some others on the race, ethnicity, and gender issues, concerns that Lucy raised in her comment before. Race and ethnicity are very hard to get a handle on, there are a lot of unknowns—what do you do with those unknowns? And then for States with extremely small numbers for some racial and ethnic groups, when should those results not be reported? What's too small? We need to think about that. And then again the stratification and risk adjustment issues, the importance of socioeconomic and cultural factors, and then the applicability of measures to those socioeconomically and culturally diverse populations. Risk adjustment is needed. This is on the children with special health care needs. We need risk adjustment somebody said, and then we need a common definition for children with special health care needs that is used across all of the measures and disparities assessments.
So here are my tentative conclusions. People can disagree. This is a small sample size, a bad survey you might say. More transparency and rigor are desirable. There was interest in all the measure topics combined, which may put a little crinkle in getting you to go into specific topic groups, and there are limitations. So, we weren't able to get you to actually focus on what the specifics of the criteria should be because of the bad—not so great survey.
Here's what we're going to ask you to do. Taking all that in mind or not—yes, Sarah. Can you say your name for the record?
Dr. Scholle: Sarah Scholle. So I've been struggling with this morning's discussion because the National Quality Forum (NQF) endorsement criteria I think are very clear, and that's something that the National Committee for Quality Assurance (NCQA)—we looked—those criteria are very similar to the criteria that our committee on performance measurement uses to determine what measures get into the Healthcare Effectiveness Data and Information Set (HEDIS), and I know the American Medical Association (AMA) Physician Consortium for Performance Improvement (PCPI) has a set of criteria that are very similar, so we've all been trying to work towards the same set of criteria. And when I see some of the topics that you've listed here, to me they're not criteria for a measure as much as information about the measure that you'd like to have that would determine whether the measure would be useful in reporting on CHIPRA—for CHIPRA under the CHIPRA rules. So for example, the question about race/ethnicity—having information about race and ethnicity, you could ask, as part of your importance criteria you could say is this a problem more in a minority population, or you could ask people to discuss that as part of setting up the importance of the measure. But you could also say to the grantees in the States that are developing measures and testing measures that we want to know what it looks like when you use this measure in different populations. And you want to do that because you want to be able to set sample sizes, and you want to set—you want to know something about prevalence of the problem and how well this measure is going to work in those different populations. But whether that measure is actually useful, you know NQF endorses measures for use in a lot of different settings I think, and so it would be up to different settings to say that's good. So are we advising AHRQ and the Centers for Medicare & Medicaid Services (CMS) about what—how a measure should perform in different populations, and what information should be collected so that you can gauge whether this is going to really be useful to you in evaluating the care for minority populations as well as for children with special health care needs?
Dr. Dougherty: Well, I think what we're trying to do is have some consistency. So—among these awardees—so that every awardee for every measure does not do something, specify something differently for dealing with unknowns, or for knowing what are the quantitative criteria that should be used to say whether a particular State or health plan or whatever has just such a small population of African-American children that you should disregard those data. We're trying to set some bars for consistency because without that, when CMS gets the data from States using these, hopefully the initial core measure set and the improved core measure set, right now they have no way to know, even if they're not comparing States, they have no way to know whether one State is actually doing—collecting the data in such a way that it's valid. Is that reasonable to say? Maybe somebody else can help me here who's more of a measurement expert. So we want to be able to give guidance to the people.
Now, maybe for race/ethnicity it's not so much a measure development issue as a data collection issue, but that's important. We want some specifics to say, you know, here's the best we can tell you about collecting data and how you collect data across four particular racial and ethnic groups. And I think Patrick Romano may be able to help us. The Healthcare Cost and Utilization Project (HCUP) with the State inpatient data system has done that a lot already. I almost took inpatient measures and disparities off. So that there are only, you know, 22 States that they feel comfortable when they're doing a national estimate, there are only 22 states when they look at how the data are actually being collected and reported that they feel comfortable including the data, say, in the National Healthcare Quality or Disparities Report in this case. And they've done a lot of work on that, and I would imagine that the State health data organizations and the hospitals hopefully know what those criteria are ahead of time. So if they want to participate and know what their disparities rate is in a particular quality issue for inpatient care, they know that they need to collect the data the same way that the other states are doing it. Does that make sense? Maybe somebody else can explain it better.
Dr. Scholle: So I'm still struggling with this. I don't know if it's taking us off track, but in order to evaluate the measures that could be used as the core set in 2013, you need to have consistent information on each of the measures, so you want to have—you want it to be in a structured format so you can just go down the list and say here's the importance and you need what those criteria are.
Dr. Dougherty: Not importance. That's different.
Dr. Scholle: Well—so if we're getting into the issues of feasibility and rating of disparities and sample sizes, then it's going to be important—at what level of reporting should we have in our mind as we're thinking about these criteria? Are we thinking about it at the State level, evaluating State Medicaid programs, State CHIPRA programs, or are we evaluating it at a practice or physician organization level or an individual clinician level? Because you'd have to have separate criteria for risk adjustment if you were going to do—I mean, we want to consider whether risk adjustment would be different for population versus for an individual clinician because we know there's a lot of selection bias. So that's where—if you could say how these criteria would be used—this is information that you want every measure developer to provide on each measure, and the measures are going to be used to evaluate Medicaid programs, State Medicaid programs—at the State level?
Dr. Dougherty: Well, I mean, Congress in its wisdom wanted measures and criteria that would go vertically, right? You could collect it at the—and it's the electronic health record (EHR) idea, right? You collect it once and Beth, you've certainly talked about this, maybe you can help out here. You collect it once, then you roll it up, and then you roll it up, and then you roll it up again; that's I think the goal. We're not there yet, but I think CMS is still struggling with that question. Certainly the State Medicaid programs may want to compare their health plans, and the health plans will want to compare the providers they hire, and maybe someday Congress or somebody will want to compare across States. Even if for just confidentiality and for the purpose of saying where do we need to provide more technical assistance, more quality improvement money, you know, even if it's not a publicly reported thing, which the legislation says it should be eventually. So I think it's not possible to answer that question right now.
Ms. Dailey: I think that's one of the unknowns. What we have struggled with is States have had various experiences in terms of the majority being in managed care. When we've worked with CHIP we saw similar challenges in terms of are they going to be collecting information that is patient-centered versus practice-oriented or if they have a fee-for-service program. We have almost no information in that kind of a delivery system. So ultimately I think what we're looking for are measures that can go across delivery systems and ultimately it would patient-centered because we want to evaluate health outcomes, but I think that's more aspirational and that's where the EHR comes in as a venue for collecting information, again being aspirational, we want to reach.
So our struggle now is States are going to be required to submit information to CMS. In order for them to get their information, how are these measures going to be structured so they can collect information from different types of providers? And so that is our struggle in terms of trying to give you guidance. States are going to have their collection methodologies, and then they have to report to us, ultimately we're hoping in a comparative way, but again, that's aspirational. We're not anywhere near at that point. So when we have to give technical assistance to States and tell them what kind of specifications they need to use to collect information, that's where we're really struggling in terms of, okay, what kind of criteria do we give those States?
Dr. Dougherty: Right now, unless it's a structural measure of certain types, I think most of the data come from the individual provider level. So no, that may be—if you're looking for a level here, and people disagree with me—yes.
Ms. McColm: I'm Denni McColm, I'm with Citizens Memorial and just looking at the 25 core measures, it's like a mix. I see the confusion. It's a mix of things that the provider wouldn't have to report, that would have to be at the State level, like the whole population of patients who had an ED visit. So those are two different things.
Dr. Scholle: Most of the measures—14 out of the 24 are HEDIS measures, and those are health fund population measures that could be used for other populations, but they largely come from claims data or claims augmented by chart review.
Dr. Dougherty: And so that means in a sense yes, they're coming from individual providers' claims? No. Okay. Yes?
Dr. Scholle: They're not intended to represent an individual provider. They are intended to represent the populations served who are members of that health plan.
Ms. McColm: If a patient doesn't have a visit with their pediatrician, what provider are you going to allocate that one to?
Dr. McIntyre: And I think we do talk about this as far as discussion about accountable, you know, was the State accountable or were we talking about the provider, and I think we kind of ended up saying we were really looking at State-level accountability because some of these measures really were not, even though we pulled them from information related to a provider, that ultimately we were looking at this point at trying to figure out what to do as far as from a State entity and rolling it up to the national side. So even if we ended up from a State standpoint using something, one of the measures to look at providers, that was not how we ended up designing or actually saying that we wanted to put a particular measure in there. So from my standpoint, what I've been trying to explain because I've had a lot of people asking about well, what are you trying to do with the CHIPRA measures? We're looking not just at Medicaid, but also with our private insurers and other groups because we already had them pulled together to say that we would ultimately like to get some consistency in what we're looking at when it comes down to children's health care quality and together work to get some improvement so that it's not so much what's happening at the provider level, but what's happening in the State when it comes down to health care quality.
Dr. Dougherty: But there is some trickle-down. If the State has to collect things one way, then they will have the health plans and the fee-for-service providers collect that data the same way.
Ms. Dailey: And of course we have the churning issue with the children in and out, across programs, between private, between Medicaid and CHIP, and that is why we've been struggling in terms of how do we set up the specifications that apply to not only different providers but different delivery systems if a child is in transition.
Dr. Dougherty: And that's high on the priority list for next time, but it's not on—we're not breaking out to figure out how to do that today.
Dr. Brown: Question? Hi, this is Julie Brown from RAND. You keep talking about providing specifications that say these higher aggregated entities can hold these lower, more granular entities accountable for—and when you say "specifications" do you really mean guidance, instructions, specifically this is what you must collect and how you must collect it?
Dr. Dougherty: Well, I think you could probably give us an example from the CAHPS work, right?
Dr. Brown: I guess I'm kind of with—I think I'm with Sarah on this whereas where I'm struggling is I'm trying to imagine how would you pull a State-level measure, maybe it's, I don't know, immunizations delivered within pediatric practices over the prior 12 months. How would you do that without conducting chart review, and who would conduct the chart review, and what are the entities that they would, you know, would they visit each individual practice and how would you make sure everybody is charting it the same too?
Dr. Dougherty: Well, I think NCQA has specifications or guidance on that issue.
Dr. Brown: Well, that's the point. When you say "specifications" you're pretty much saying collect this information the way NCQA requires health plans to collect and report this information.
Dr. Dougherty: Except that here we're not starting from the basis of NCQA or CAHPS or whatever. We're saying, okay, in the best of all possible worlds if we wanted to collect the data across all providers or health plans what would be the guidance that we would give. So specifications—I think Rita, can you—is "guidance" another word for "specifications?"
Dr. Mangione-Smith: I mean, I understand the discomfort here because I think, you know, I have certainly felt that also, like what exactly do we mean when we say there need to be good, detailed specifications. It's different when it's for a health plan versus what was an improvement of QA tools measure. The way we specify one of those is vastly different from the way we would specify a HEDIS measure. So I think it is a little bit daunting for us to decide on consistent criteria when we don't know the level at which we want measures to be developed. What's the unit of analysis? Is it State, is it health plans, is it providers, you know? And I think it's very hard with these specifications.
Dr. Brown: It's like you're being asked to weigh something that could be weighed in grams, or in tens of thousands of pounds, and the specification you give for measuring that is really going to vary if you're measuring a mouse or an elephant.
Dr. Scholle: I wonder if what we—because what I'm hearing is that you're very interested at the State-level accountability delivery system and whether it's across the delivery system. So that's one level of interest. And then the other thing that I'm reading between the lines is that EHRs are important, and that somehow it's going to be there. So for convenience sake maybe what we need to do is say if you're doing it at the State accountability level, what would be the issues that are important? If you're doing it through EHRs, what would be the issues that are important at a provider level because the EHR reporting would be sort of physician organization level? And we could maybe—because I think they're going to be different, and that might help to organize our thinking as we go through this. I'm looking to my colleagues who've worked with us because that's what we've sort of set up that we're going to be doing specs at both of those levels.
Dr. Dougherty: And I think to get us moving forward that would be a good way to go.
Ms. Fei: My name's Kerri Fei, and I'm from the AMA PCPI, and we've worked closely with NCQA on some measures. Traditionally we have done the provider-level measurements, so the majority of levels like on the Physician Quality Reporting Initiative (PQRI) we have developed. One concern that I have, maybe you can clear it up for me. It seems like when we're talking about these criteria, as a measure developer would there maybe be a separate set of criteria we would need to meet for a pediatric measure versus our criteria we already have set for ourselves based on NQF? Because I mean, we follow NQF pretty much to the letter. So if in developing a measure, if we meet the NQF criteria is there going to be some separate criteria that come out of here that we would also have to meet? Because that would be concerning.
Dr. Dougherty: Well I mean, NQF doesn't have criteria for how you collect the data or design a measure so that it could be applied across, say, the primary care provider, the managed behavioral health plan, and the public mental health system. And for a lot of kids that's an issue. Right now those other settings are pretty much excluded. So, for States, and States correct me if I'm wrong, Medicaid, I mean kids are going to a lot of different places for care. And so the CHIPRA does not say, you know, use the NQF criteria, though certainly there's lots and lots of overlap, but most of the measurement development work in this country has focused on the Medicare population or slightly younger adults. There are similar issues for some of the elderly that we haven't grappled with, like the transition between long-term care and home health care, which are similar to some of these kids' issues, but we certainly haven't grappled with them at a detailed level, the issues with kids and their fragmented health system. Jeff?
Dr. Thompson: Well, I just wanted to comment on that. I mean, as States, we're just in pandemonium right now. And I think you developed these measures at a time when there was a little bit more stability, but I don't know of a State that isn't sort of waxing or waning in not only benefits, but eligibility. So I'd really like you to sort of think about that. And we're probably looking at, you know, unless health care reform comes around which, you know, probably means another 3 years of instability at the State level. And so from your idea of sort of aspirational, I think we've got to tone it down. Because I can tell you, from a chart review, we're cutting provider rates, and then we're going to ask them to do chart reviews at no extra payment? It's just—it's not going to happen.
Dr. Dougherty: I mean, that's in part why it's voluntary and in part why the CHIPRA legislation said CMS shall provide technical assistance to the States. At the same time we realize this isn't -
Dr. Thompson:—the financial system. We're barely keeping afloat. So I just think we've got to, you know, sort of tone down a little bit from a State perspective. It's just unbelievable what's going on at the State level.
Dr. Dougherty: Yes, I think you're right, and this is definitely an evolutionary process. We are not going to answer every question and have specifics at the end of today. We want to make some progress so that when people are developing measures or enhancing the measures we have now for the future that they have more guidance than they have now.
Dr. Thompson: And I'm not going to say stop, but I'm saying think about things. I know we can't talk about this, but you know, eligibility for the denominator. You know, 6 or more months of stability in fee-for-service or managed care or whatever would help out, rather than a denominator where you're trying to figure out did they switch three or four times during a year, or even a year of sort of eligibility in one plan or the other would make it a little bit more doable. But if you start cutting it too fine I don't think you're going to get the answers you want at CMS.
Dr. Dougherty: Okay. But let's do what we can, for a brighter future at some point. Yes, Mary. Then we're going to break for lunch.
Dr. McIntyre: And I hate to bring this up. I just wanted to give a concrete example of actually—and this is Mary McIntyre, Alabama Medicaid—of trying to take the measures that currently exist. We, in these immunizations, and I'm just going to do the 2-year immunization, okay? And in there it talks about on or before the child's second birthday, and you're looking at the individual measures and then there's the combo measures. You've actually got two combo, two combo three. Well, when we go in and we look because we're looking specifically at Medicaid data, and you're looking at the continuous eligibility requirements as the specs identify them and allowing for a gap and what that gap is, so that basically you end up with a 30-day gap that we can identify and look at, which means 11 months of eligibility. Well, when we do all of that, and I roll it, and I actually put them down because I couldn't believe the results, we ended up with a 3 percent for the combo three, okay? And a lot of that deals with when we look at on or before the second birthday, some of those kids get it but it's 30 days after, 60 days after, and then the fact that we have we do not have the information for any kind of gap, so that the population number that you start with that's 2 years old that you end up with that's actually in that timeframe where the view—there is some drop-off. So those are things that need to be considered as far as with the specifications, with the population that you're looking at. And we've done that with several of the other measures just to see what we could get if we stuck strictly to the specs. So there's modification that really needs to happen in order to make them so that they're usable for the population that we're dealing with.
Dr. Dougherty: Okay. I think lunch is out where the registration area was. Bring it back here and then we'll ask you to go into breakout groups and take on one of these challenges and come back with at least one idea for an improved specification, improved specific, concrete specification. Thank you.
Lunch Break
February 24, 2010 - Afternoon Session
Dr. Dougherty: Just a couple of announcements and then I think we're going to do this a little bit differently than we originally planned. We were going to have all of you give the thumb drives back to us, but now we're going to ask either the facilitator or the reporter to read what you did and sort of summarize it, and then we will—we'll synthesize it as we go or later. We have a public comment period at 2:45 which is one of the announcements.
Okay. Just a couple of announcements. Weather, and I'm biased here. I looked at the local weather, and it seems like by tomorrow sometime there's going to be one inch of slushy mix but high wind. So, and supposed to be 36 degrees tonight and in the upper 30s or low 40s tomorrow. Okay, Linnea. You can get it from her because I'm biased, I want to keep you all here. She has checked the airports and there are no weather cancellations. Okay, hang on to your flash drives because you're going to report back using those as quickly as we can because I think some of you had some experiences and changed around some of the criteria and you know, the assignment of course. We expected that.
The other couple of things is I don't know if there's anybody here who's planning to make a public comment at the public comment period which is at 2:45 to 3:30, but if you are, could you please go and sign up at the registration desk for that? The other thing is that we said we would organize if people wanted to go out and have dinner together. I believe we have made a reservation at the Thai Farm restaurant which is about—it's very, it's walking distance, easy, easy walking distance from the Sheraton Rockville Hotel if people want to go there.
So I floated a little bit and saw that you were all having an interesting time on this assignment. So let's get some reports back. Let's start with the easy one, medical home, underlying scientific soundness for the medical home. You know, when will we know we have a valid measure in terms of underlying scientific soundness for medical home, what kind of criteria do we need, so forth. So either the facilitator or the reporter—Gareth, you're on.
Dr. Parry: Yes, this was so easy. So, we had some kind of criteria laid out for us. I'll start with the first one which says a quality measure should be considered valid if it meets the following—well, you've seen it I suppose, the following criteria: scientific soundness, adequate scientific evidence, and so on. I think you've all seen it. What we decided to do was we decided to keep that criterion but also to add some new criteria to it, specifically around what was actually—or how we actually defined adequate scientific evidence. And we suggested that adequate scientific evidence be in the following order of importance, the first thing being professional consensus, then existence of one or more published quality improvement (QI) or QI-related studies in a peer-reviewed journal, followed by the existence of evidence-based guidelines. We put it in that order because we thought that professional consensus is likely to exist if actually there are things like published QI studies already in existence, so that's why it's kind of in that order.
Dr. Brown: Can I just make one point? We just thought it was so new, the content as one of our team members described it medical "hominess" was so new that we might need to start with—sorry—might need to start with professional consensus and then over time you could move to the peer-reviewed journal and then to the highest degree.
Dr. Parry: Then under the required documentation for criteria to be kept or added we put in specifics from a medical home. A definition of a medical home or should be validity piece, that consensus was fine as in absence of actual guidelines themselves that realizing—and pediatrics, that's probably a necessity that we need to do that. Trying to think on the others—that the idea of evidence of linking to improved health and avoidance of harm is something that you have to continue to monitor over time to see if there's a linkage, that it's not always clear, and that we would look for required documentation as it relates to the measures in terms of combination of different measures to make sure that—if you're doing a composite, for example, we talked a lot about that. You would still tie it to the guidelines.
Dr. Loeb: The only thing I would add to what Barb said is we also talked about the notion that a pediatric measure is not an adult measure that's dialed down. Rather, where appropriate, the scientific evidence should be gathered in the pediatric population, and where not, it needs to be thoughtfully reconstituted with the notion that it would be better if it was in fact within the pediatric literature. But, considering there are large gaps, it might be okay at least on a temporary basis to go ahead and use something that is dialed down. But that's not optimal by any means.
Dr. Dougherty: Okay, let's move on to health outcomes. Patrick Romano.
Dr. Romano: Okay, so I think it's fair to say that generally we were in support of these three criteria that were specified as they apply to health outcome measures. We did think that the third one on the list, which is the evidence of a link to improved health or avoidance of harm, maybe ought to come first because it's really sort of central to decide whether an outcome measure is a legitimate health outcome. So—and we talked about some examples such as looking at missed days of school or looking at hemoglobin A1cs for kids with diabetes. We agreed that evidence of relationship between process and outcomes is very important, that we would ask measure developers to present evidence if it's available regarding the ability to improve an outcome, specific changes in the health care delivery system, and how health care is organized or what specific treatments are provided. And we also agreed in some cases that the evidence would be insufficient, but the professional consensus should be relied upon. And here we thought that there needed to be more attention because professional consensus bodies have generally focused on processes and deciding which processes are evidence-based, but they also need to think about which outcomes are evidence-based. So we suggested that there really should be more focus and more effort to develop professional consensus around what are valid outcomes. We mentioned that U.S. Preventive Services Task Force is sort of the gold standard. Other government task forces, professional societies, some of them have been active in this area. But we were concerned about sort of favoring multidisciplinary professional consensus processes to avoid perhaps undue influence from people who may have a stake in a particular measure or outcome. Does that cover?
Dr. Dougherty: Okay, anything on documentation?
Dr. Romano: Nothing specifically.
Dr. Dougherty: Okay, thank you. And let's see. We have disparities. I don't know if you chose a facilitator, but Dr. McIntyre was the -
Dr. McIntyre: Well, I was the reporter, and we ended up with several people acting as facilitator, and if it's okay with the group, I was going to go ahead and report. I can't get this thing to open up, so I'm just going to go from what I have written down here. And we had a lot of discussion on the whole issue about what the evidence was when it came down to disparities, that there's evidence there relating to racial and ethnic disparities, socioeconomic disparities to specifically worse outcomes when it comes down to specific conditions. But when it comes down specifically to—the first one talks about a causal relationship, and then it talks about the type of measures as far as structuring process, linkage to structure and outcome, process and outcome, and so the real discussion then ended up being centered around that there was still not a lot of evidence when it came down to those areas, but that if we didn't get the—if we didn't go in and actually look at disparities, we never would have evidence. So there was discussion about actually looking at this as not being something that was maybe necessarily initially required until we could get the systems where it needed to be in order to get the info. There was some discussion about a lot of the reports that are there now did not include that information and that we were not able to get it from the bottom up versus the fact that we can generate it from a State level, but not necessarily with other entities, and I think with managed care plans and some of the hospitals that was the information that we were given. So when we ended up looking at this and not really throwing any of these out, but looking at the fact that there is no real—while there's an evidence link to—with showing worse outcomes about looking at the improved link for health and avoidance of harm, that it's really not there at this point.
Where evidence is insufficient we did talk about professional consensus to support the stated relationship that that's probably where we were going to have to be at the very beginning for this. But we don't really have, and we were kind of confused, I'm going to just tell you, about what we were supposed to be doing. We actually went to the other document and started trying to address it from the standpoint of looking specifically at those areas under the validity part, and that's where we spent a lot of time in the beginning, and then we came back to the specific areas under the validity and underlying scientific soundness. So the ultimate result is that we don't think that we're really there when it comes down to the evidence, but we really think we need to start pulling the information in order to be able to get the evidence. Group, was that what we said?
Ms. McColm: So a clarification of the next round. Are we supposed to be looking at it relative to this criterion, or just this?
Dr. McIntyre: We were like, and then somebody ran out and said find Denise.
Dr. Dougherty: But I saw Ernest in there so I walked on by and went to another room.
Dr. McIntyre: Well, Ernest was trying to help us.
Dr. Dougherty: What we've tried to do here is—on this sheet—is to take out the domains mostly from the NQF descriptions. So the purpose of having the spreadsheet is if you have time to see how other people and NQF have more specifically defined these topics here. So we couldn't replicate the entire spreadsheet and still give you room to say which ones should be in or out.
Dr. McIntyre: So we should be going through this list right here?
Dr. Dougherty: You don't need to because I think what we're getting here is more of a sense of the group about what's very important rather than going down a checklist that exists now and then checking off what you should delete and what you should add to and so forth which, you know, neither way is right. But I think given the short amount of time, and it may get shorter, that we have here today, if you have to go for something, go for what your gut says is really the most important thing to focus on. And it sounds like folks have done that.
Now, you know, what would have been wonderful is if you could tell us exactly the process that should be used for professional consensus, and some people got a little bit closer to that. You know, if you're saying use professional consensus, what are the specifics that we mean there. And we heard use a multidisciplinary group, that kind of thing. So, but the big ideas, give us a sense of what's most important, and then we'll talk tomorrow about next steps to actually flesh those out more. And that could be while people are developing and enhancing the measures themselves along with our coordinating center and that kind of thing. So this is an ongoing dialogue.
Dr. Moy: Denise, I was just going to put in yes, I did help with the confusion there and contributed to it, I think greatly. But we spent at least part of the time thinking about how scientific soundness related to disparities, and I think what we thought was in the best of all possible worlds you would want scientific soundness demonstrated not just for the general population, but for different populations that you would want to compare. But then when we thought about that more, we thought this is a really high bar, and so we were willing to give that up, although if you're prioritizing measures we would probably give bonus points for those where it had been demonstrated through scientific literature to be applicable to all the populations that we were comparing. So it was a bonus, not a criterion that you had to meet.
Dr. Dougherty: Okay. Would the rest of the group agree with that? Okay. Not just the dominant population or whatever. Whatever we're calling middle-class white people these days. Okay, so child. I think that was Rita. And you're not the reporter. Mark Antman?
Dr. Antman: Yes, thanks Denise. So Rita's group began by—we started by just quibbling a bit with the description of the measure topic in the template, given that it's described a causal relationship between structure and process, structure and outcome, et cetera, our thought was simply that causal relationship may be very, very difficult to establish, so our thought was that it should be more correctly worded as a link or association between structure and process, et cetera. Beyond that I think a lot of the comments that we had as to scientific soundness are very comparable to what some of the other groups have said, but just to run through them. By all means it's a key criterion to keep. We added several items to the criterion, specifically that there must be—that the application to children within specific age ranges must be clear, and it must be based on scientific evidence or research that has in fact been conducted in that age range and not that is extrapolated from research in adults. With a nod to something that Marina Weiss said this morning, we noted that it's critical to prioritize finding evidence for scientific soundness in particular for children with severe health conditions, which has been under-studied. As to outcomes, we noted that it's critical that the outcomes be well-defined, but they should also be well-defined with some clarity as to whether or not we're talking about outcomes measured at the population level, at the individual clinician level, at the plan level, whatever the unit of analysis is. And then as to professional consensus, we kept that in there, but it was noted that, and as I think was also said earlier, that if measures are based on consensus, there should be an effort to do some outcome validation. And then in the required documentation column, we noted that there must be documentation of the scientific evidence to support the linkage, as I said, between structure and process or other linkages, and that documentation again must be specific to the age group. And we also noted that there should be some specific documentation as to the type of evidence that was found, meaning randomized controlled trials, consensus-based recommendations, et cetera, and if available, there should be documentation of the grade of evidence if it's been provided. And I'll look to the others in our group if I've missed anything.
Dr. Dougherty: Okay. And this documentation I'm assuming should be publicly reported along with the measure?
Dr. Mangione-Smith: We assumed that the grantee should be required to document this information. This is why we have this measure because here's the evidence to support it.
Dr. Dougherty: Okay. Thank you. And now last but not least is the meta measures group. So there was a question about what meta measures means, so count your blessings if you were not in that group. They got a lot of credit. And they didn't have a facilitator formally so they did group work. So Chris Carlucci is the reporter on that one.
Dr. Carlucci: Thank you. I'll just use the one Mark used. So we we're certainly cursed and not blessed with meta measures and no facilitator to boot. So we looked at each of the three topics and basically can say—criteria to delete, nothing on each one. Key criteria to keep, we indicated all as drafted. However, we did put some suggestions in the new criteria to add. For the quality measure should be considered valid if a topic, we indicated that—in order to rely on professional consensus, evidence must, underscore must, be insufficient and obviously validate that that evidence is insufficient. And the second criterion to add to this is looking at meta measures across measures a defined conflict of interest disclosure process confirming that no relevant interest exists for all members of the expert panel developing, maintaining, and updating the measures should be a criterion. Obviously when we look at the required documentation section, those conflicts of interest disclosures having been documented would apply to this as well. We also had discussed that measures should be ranked and transparent, and Jeffrey provided us an example of number needed to treat, or NNT, number needed to harm, NNH, and return on investment or ROI. And then an additional comment that we made for each of these is that each component measure must meet scientific soundness as per the Centre for Evidence-Based Medicine (CEBM).
On the explicitness of evidence-based, again, the new criterion to add is in order to rely on professional consensus, evidence must be insufficient similar to what we had for the first, and also each component measure must meet scientific soundness as per the CEBM. And then on the evidence of link to improved health, avoidance of harm—we also added the comment on professional consensus and that evidence must be insufficient, as well as evidence of improved health/avoidance of harm must have documentation of the resource cost and fiscal impact. So the required documentation of that also would be evidence of improved health and avoidance of harm has documentation of the resource cost and the fiscal impact.
Dr. Thompson: From my perspective at the State level, if we're going to spend money, we need to know how much are we going to spend, what's the FTE count. It doesn't mean that you have to show savings, it just means we need to know how much are we going to spend to get these outcomes. I mean, the pie is only so big. So it could be how much are we going to require a health plan to spend or how much are we going to as a State spend for the fee-for-service operation. There has to be a fiscal impact.
Ms. Reuland: For the aspect of health care being assessed or for the composite measures being collected or for the outcomes that are impacted based on those?
Dr. Thompson: Well, I'm assuming that for every outcome there's going to be a process. Those processes have a resource, an FTE, and a cost, a fiscal cost, and that should be transparent. Right?
Dr. Dougherty: Okay, that's something we can drill down on—a different point I think. I mean, fiscal impact—whoever the interested and accountable party is probably. So we have some questions? I just want to say that I think the recurrent themes I see here are one, professional consensus will have to do for a lot of the underlying scientific soundness right now, but that—and I heard Jeff, your group and Chris's talk about if there is scientific evidence, professional consensus can't be used to trump it. Did you drop that one, or is that embedded in there somewhere?
Dr. Thompson: It's embedded. You have to have some sort of evidence to support what they do or disclose that it's only -
Dr. Dougherty: Okay. And then I think which may seem contradictory to professional consensus, but I don't think it is more transparent documentation of where the measures are coming from, what the specifications are, and what the cost of measuring and not measuring are. So—yes. We open it up for questions. Dr. Gonzalez, thank you for coming.
Dr. Gonzalez: Yes, our airport just closed last night. I want to follow up on Jeff's comment because I'm—pediatrics doesn't have the level of objective evidence to support treatments or interventions as the adults do. We've known that for 1,500 years. So we've accepted—I've only been here for 50 of those. We have accepted professional consensus. That term scares me, and it scares me because my colleagues sometimes have a different opinion of what professional consensus means. To many of them, it's a consensus of one. And I think really, not because there is such a thing as the art of medicine and—especially when there is a very defined treatment intervention with very defined outcomes. My only fear is I'm not sure that I'm willing to go as far as this last group where for you to accept consensus you must have no objective evidence, you know, done by other means. So I just think we need to define what we're going to mean when we ask the possible grantees to—if you use professional consensus, please define it, what does that mean? I know they mentioned a multi-specialty group, that's one good way to begin, but again, that could be a surgeon and a primary care provider over a beer.
Dr. Dougherty: Would anyone like to build on that about the—how you get some more definition around professional consensus?
Dr. Mangione-Smith: I think it's hard to define consensus but I think that it does—kind of one of the conclusions our group came to was it does then put the burden I think on people who are going to apply for this funding to have some plan to show a link to outcomes. I mean, if you're doing a process-based measure that's based on expert professional consensus, which is 90 percent of what you're going to get, I mean I really think so, then there should be some onus to linking it to outcomes and to showing return on investment. I think they're both going to be important things to demonstrate.
Dr. Dougherty: So where would you take that with medical home? Going back to the medical home where, well, I mean that group said professional consensus is a start.
Dr. Mangione-Smith: Where would I take it? A study that we're apparently going to launch at Seattle Children's which is looking at trying to create a shared care model for children with chronic illness in the outpatient setting, which is trying to really build on them having an outpatient medical home and trying to keep them out of the hospital for things that can be managed in the outpatient setting. And part of that evaluation is trying to look at how does setting up this new program, which we think is creating a better medical home for these children, how does that impact outcomes, including fiscal outcomes?
Dr. Thompson: We agreed that we should run it as a randomized trial, that there really ought to be two arms. And we're doing that with the adult medical home in Seattle because without that, we really don't know.
Dr. Dougherty: Very interesting. This also suggests that, you know, there's lots of medical home demonstrations out there, and there will be more if you look at the list of the Centers for Medicare & Medicaid Services (CMS) demos. That somehow we need some way to keep collecting that information on these QI studies and understand which are the most important elements and which lead to outcomes and which don't. And I'm not sure right now that there's a plan to do that for medical home. There's no plan to do it for any other disease or condition except when AHRQ does an evidence review, but by the time an evidence review gets done, there will be a new model of the medical home. So I think there needs to be more of a link than—and this is my personal opinion—between quality measure developers and the evidence as it's emerging out there. So that's just my editorial. Yes, Nora.
Ms. Wells: This is an overall comment. It didn't exactly—but I think it comes up as we think about all these comments. Another important component in thinking about what is happening to help improve children's health care would be the families themselves, the people who are receiving the care. And we talked in our group about, you know, there's little pockets of evidence that say something works, but if the partner, the family doesn't know that that is even out there, then they have no opportunity to take a role in it. So I'd like to charge—you know, I loved what the person from CMS said this morning about this is a wonderful moment in time, it's an opportunity, and I'd like to raise the stakes because I know the law says we're going to be including everybody in all of these things, but if AHRQ is going to be giving money to developers who are working on these criteria, these—I'd like to charge you with thinking about how those partners, those families who are the people who actually carry out the care 90 percent of the time at home, how they can be involved in this measurement development at the beginning stages. It isn't easy. I'm a pretty experienced family leader and I've been sitting here totally bewildered with a lot of this conversation. Over the 35 years I've been involved in kids' health care I've been confused a lot of the time, but I know there are some very basic principles, and one is that if we can make things understandable to everybody in the room, then we have a better opportunity, I think, to move forward. So I guess that's the charge to AHRQ. I'd like to see those people applying for the funds to show some way that they are involving the consumer element in their work.
Dr. Dougherty: So would you add that as a criteria domain?
Ms. Wells: It would need to go across all—I mean, it's not really in any one of them. It's kind of an overarching thought I would say.
Dr. Dougherty: Okay, so it's part of the general approach then. Yes, Cathy, and then we're going to take a break.
Ms. Hess: I'm not understanding actually—I'm trying to figure out if this is a comment or a question—why we're taking this up measure by measure. It seems to me that if you want us to talk about sort of important, salient things that they are generally things that cut across everything. So it's a little unclear—and I'm not suggesting you—what each of them means, but just talk about the rationale a little bit or when we're in our breakout groups should we not worry about that as we're having our conversations?
Dr. Dougherty: Our thinking was a lot of this language on domains and criteria and stuff is so abstract that we thought in order to have people actually start to talk to each other we needed something concrete, which is why we focused on these different kinds of measurement topics. Certainly if you all want to go into your next group and say we're not only talking about medical home, we're talking about inpatient because it's all the same, and you can make that justification when you come back, how it applies to a bunch of different topics. Well, certainly these are not the only topics in the world, right? There's duration of enrollment, you know, which is a required measure. There's availability of services we're not talking about.
Ms. Hess: I guess what's hard is if we have a crosscutting comment, which would be mostly all that I would have because I'm not an expert in any particular area, I have nowhere to say it. So that's what I'm trying to figure out I guess.
Dr. Dougherty: Well, I think you should feel free to give your crosscutting comment anywhere you can fit it in. That's what we generalists do, don't we? Are there any other thoughts on that? Anybody—you know, this was kind of a pilot test of this process. I think folks did really well under challenging circumstances. Does it help to have at least a starting topic to go with? And if it doesn't help you in your group, just say let's throw that out and do a different topic. Is that okay? Or do all topics. Okay. Well, it's time for a break. And come back in 15 minutes if you'd like to make a public comment, and you can do that if you're part of this group as well, because we have you pretty restricted in these little breakouts and stuff. If you want to make an overarching comment, please sign up at the registration desk. We have 45 minutes for the public comment. If we don't need it all we won't use it.
Ms. Hess: Can we put comments in writing after the fact or is it just during this process?
Dr. Dougherty: Oh, you can put comments in writing after the fact. Sure.
Break
Dr. Dougherty: Thank you. Well, we were supposed to have a public comment period now, but we don't have anybody signed up to give us a comment from the public or from anybody in this room, so the good news is we can start earlier with our next breakout session which is even tougher—the validity of the measures themselves—and end early today.
So a couple of reminders again. One is please say your name for the transcriptionist and the voice recording unless, you know, you don't want your name associated with your remarks. And then we weren't clear about this, but Barbara wants to make it clear, about the issue that Sarah Scholle brought up this morning. I thought we had come to consensus that we were looking at measures at the State level or at the electronic health record (EHR) level, assuming they're reportable. So if you're still having difficulty because you're not sure what level, that's what we're going for in this process. So, does anybody have any questions? No? Okay, so the good news is that the groups are in the same rooms. The bad news is that you're probably in a different group.
So that's why we're moving on. Okay Kerri, could you be the facilitator for the medical home group? And you can stay right there I think. And Linda, you'll be the recorder for the medical home group. And the others, please join the medical home group colleagues.
For the inpatient group we don't have a facilitator, and Beth McGlynn, I'm not sure that you're—are you able to kind of join that group? No?
Dr. McGlynn: I'd prefer not to be the reporter. I'm waiting for something that's coming in. You also had me in two groups, so.
Dr. Dougherty: Okay yes, that's—we did not create the program to sort people into groups. So can somebody else in that group which is meeting in the Rock Creek Room decide who your facilitator is and your reporter? And the issue—the topic we're moving into now, the criteria domain is the validity of the measure properties themselves, the validity of the measures themselves. And there's a 2-page, more than 2-page list of possible validity topics that you could address in this for your specific measure.
The health outcomes group is Patrice Holtz, okay, and you're in this room. So that'll be your table. And the reporter for that group is Nora Wells. Can you join Patrice Holtz from CMS? Thank you.
For disparities, Lisa Iezzoni. Would you be able to go to the Great Falls Room and lead that group? And is Cynthia Tuttle here? Hi Cynthia, nice to meet you. Okay, thank you. Could you be the reporter for that group with Lisa Iezzoni? Thank you.
The child group, Sarah Scholle, could you facilitate that group, and Patrick Romano, well you may have to choose a different reporter because he doesn't seem to be here right now. That would be in the Room 1101. Okay. Did I skip something?
And the meta measures group, Gareth Parry you really get to show your stuff now. You're the facilitator for the meta issues group, meta measures issues. And Denni McColm, okay. Can you go with—that would be in Room 1111.
Meeting of Breakout Groups—2:50 pm - 3:40 pm
Dr. Dougherty: Okay. Thank you all very much. I know we didn't give you enough time to get through 2-1/2 pages of sub-criteria for validity of the measures. Just one announcement before we get started. Team PSA, our logistics contractor is going to send all of you—I don't know—she'll send it to me so I can send it to the Feds, an E-mail with the call-in number in case you can't get here tomorrow morning or decide that you need to leave tonight or something like that.
Okay, Patrice Holtz is going to start with health outcomes this time.
Ms. Holtz: Okay. We started with—we didn't get finished our list unfortunately—but I can tell you, with my group's help, how far we did get. We started with well-defined and precisely specified as to how important they are for the validity of a measure, and everybody agreed that those were important to keep in the list for criteria. As far as face validity, content validity, and construct validity, the group felt that all three of those were important, probably face validity a little less important than construct and content validity, and that the health outcome should be something that needs to be improved to maintain the quality of care.
The one I think that the group felt needed to be explored a little further, not necessarily to take it off the list—but we didn't come to resolution as to how to address it—was whether the measure demonstrates quality of care provided. And we just felt that sometimes good care may not necessarily equate with a good outcome and vice versa. We did not get, I apologize, to clinically sound and accurate or comparable.
Dr. Dougherty: Okay. And say your name.
Dr. Glomb: Brendle Glomb from Texas Medicaid and CHIP. I think that demonstrates how—it was great just within our group how much thought really has to go into each aspect of these measures. One could make a good case either way. Obviously something has to be useful that you're measuring, and you'd love for it to affect the health outcome, but the two do not necessarily go hand in hand, and as someone pointed out, old measures should not necessarily be forgotten in favor of new problems that you need to concentrate on, or we may lose the progress made with old health outcome measures. So I think of your example of how there's only so many seats on the plane. You can only look at so many things at one time. It is so hard to decide where you draw the line and include or exclude various measures. This was one of the most useful exercises of the day.
Dr. Dougherty: Oh good, okay.
Ms. Reuland: Oh sure. This is Colleen Reuland from CAHMI. When we were talking about the limited seats on the plane, we kept going back to what's the goal of the CHIPRA measures because if they are to be surveillance and baseline measures that we use to assess quality of care in the States and to be able to compare the States, then you have a certain framework to decide how many—who gets to sit on the plane. But if you're using the framework that these are the measures that are supposed to drive improvement, then you may take off some of the measures that are on the plane because you want to have only the ones that you think you need to try to improve care. And so that was some of our back and forth was, well, if you only have the ones you need for improvement, you won't have good surveillance about what's going on in the States.
Dr. Dougherty: Okay. Anybody else from that group want to add anything? Okay. Who would like to go next? We're trying to be a little more democratic here. Okay. So we will have this—let's see.
Dr. McIntyre: Okay, and I'm Mary McIntyre from Alabama. And we didn't get past the first part of it. We did not get into reliability, but we basically looked at all of these and said that we needed to keep them all, they needed to be included. But then we had some specific comments regarding that, and then some suggestions also as far as—for face validity we indicated that this should not be the only thing addressed because on one of the spreadsheets it was saying, if face validity was the only thing that it should systematically do something, and so we wanted to make sure that we identified that we didn't want a measure with just face validity.
We got into the well-defined and what is documentation to support the criteria and that these were not really separate constructs when we looked under it, and this is basically when you get into identifying the fact that we need all of this. Also, how do you apply these in the real world, how valid is valid, and what are you willing to trade off? This was part of the discussions from that.
Under precise we talked about the need to be able to have a data dictionary, but that it was a balancing act between defining precisely and what it was you were trying to capture, and that this was an overall issue.
In addition to the comment about face validity not being the only thing that was looked at, when presented we wanted to know well how do you document that, what is the process that it needs to be defined when presented to providers, what would they say that the measures actually do. Do they do what the intent is? Does it make sense to them? And this could be in a number of ways, done in a number of ways, like an external reference point such as looking at consistency with guidelines or convening a multi-stakeholder panel made up of providers of different specialties and getting some kind of consensus from them, but defining that process and what it was.
Content validity, and we looked at this as being important in the scheme of things, but is it important in the scheme of things? You know, that you could have a measure that could be clinically sound, but it could really not matter. And then we looked at examples, like with the included populations or specifications being so narrow that really there was no information that was important that was obtained from that process. So we had a hard time sorting what this means. And that was really what we got into because there was a lot of discussion spent. And we did end up saying that example would be, you know, we thought maybe like one of the inpatient hospital examples was with claims data and the inability to capture maybe date of onset or time of onset, and that the data source may not be sufficient to capture what it is you really want to capture.
Construct validity—if developing new measures would have nothing to compare it to, then really this is an issue that makes less sense because it really only makes sense if you're trying to get a measure to replace or update one that's in existence versus brand new measures. So it's probably of less importance. If a measure demonstrates quality of care provided in outcome measures, then they really should be risk-adjusted. That doesn't need to happen necessarily for process measures, but it does need to happen for outcomes measures.
And clinically sound and accurate, when the hospital gets in the game versus the clinical evidence, and we talked about this, that time of arrival was an example and what that meant varied widely initially. But that clinically really what they were interested in was the symptom onset, but that was something that couldn't be captured by the system consistently, so that what was used instead was the arrival to the hospital.
And then comparable, we talked about making sure that you consider that the measure is generalizable outside of the setting that it was developed in. You know, if it was developed in a general children's hospital, is it actually something that could be implemented in a general hospital, and is it generalizable? That needs to be considered.
Did I miss anything, group? Anything else? Okay.
Dr. Dougherty: Could you say a little bit more about that, about the children's hospital versus the general hospital? Do you mean—oh, you weren't the one.
Dr. McIntyre: I wasn't the one, but I know exactly what they're talking about, that if you develop a measure where specific processes and things are in place, okay, that may not apply in another setting. Can you pick that measure up from where it was actually developed and put it into—and I'm going to give you an example of things that occur all the time in research hospitals or places, and then they're put out into what we call the general environment where none of the things are the same, okay? And so now you have a measure that was developed in what I call a vacuum, okay? Not in the real world and then we're trying to apply it in the real world. So that's the whole idea of when you're looking at the generalizability, does it actually, you know, staffing, other things, they need to be considered.
Dr. Dougherty: So if we were giving advice to awardees and they wanted to develop an inpatient measure would the guidance be that they should develop the measure or enhance the existing measure so that it's applicable across all different kinds of hospitals? Rita, I know you have your—you want to say something about this topic?
Dr. Mangione-Smith: Yes. So we're currently in the process of developing some inpatient hospital measures, first using medical records data at three children's hospitals, then trying to translate that to use PHIS data, the Pediatric Hospital Information System data. And one of the comments we got back from the National Institutes of Health (NIH) study section was just because it works with the PHIS database doesn't mean a community hospital could measure this with their administrative database. So then we came back in our revision and said we'll beta test it in three community hospitals using standard administrative data. So I think that's kind of what that's getting at a little bit.
Dr. Dougherty: Yes, okay. I wasn't sure whether you were saying we need different measures for children's or research versus other hospitals --
Dr. McIntyre: We need to make sure that they apply in the different settings.
Dr. Dougherty: Okay. Thank you. Somebody else want to say something?
Dr. McGlynn: I was just going to add to that—Beth McGlynn—that the sort of more generalizable comment is your ability to move across different data sources. So just because you've developed a measure that's kind of valid within a particular environment, or you know, this will be particularly true as we move into the wonderful world of EHRs, that the translation of a data element that has the same label in your EHR and my EHR, your claims data set and my claims data set may be wildly different in its actual definition. So it's just kind of a little bit of a warning about—and sometimes a flag for that is "what's the environment in which the measure is developed and tested?"
Dr. Dougherty: Thank you. Anybody else on that point? Okay. Who would like to go next? Medical home? Meta measures? Medical home, okay.
Ms. Fei: All right. Hi, Kerri Fei. So I'm going to report out on what we talked about in medical home. Some of what we talked about had to do directly with medical home, and then we kind of got into more general conversation about measures in general. So I think it's kind of a little bit of a mish-mosh of both.
But, starting with—we actually did get through everything except disparities, so I think we did actually a pretty good job. So starting at the top. We decided that all of the criteria here would be required except we felt that you could maybe take out the "measure demonstrates quality of care" and "clinically sound" and "accurate" only because we felt that it was covered under the well-defined, precisely specified, and the different types of validity listed. Under documentation required, for face validity it's similar to what other folks already kind of discussed in that you'd want a multi-stakeholder group, and we also specifically called out that you'd want consumers to take part in the expert panel as well. Beyond that we talked a lot about clinically sound and accurate and comparable. Under clinically sound and accurate, we were kind of looking for definitions. You know, what does this mean? It might need a little more definition so that people can provide the proper documentation.
And under comparable we talked a lot about what are we comparing? You'd want them to be specific about the units of analysis they were looking at so that you can kind of compare apples to apples. So that was for the first question. And did I miss anything? Does that cover it, group? Okay.
So then under reliability. So under "the results are consistent and repeatable over a period of time," same population under similar circumstances—we talked a lot about requiring some very specific documentation here, some actual statistical numbers perhaps demonstrating that the measure is reliable. Thinking about the lowest that would maybe be acceptable would be 0.7 which seems to be pretty standard. And then documenting the who, how, and what of how they tested it and how they came to that conclusion. Okay. Then we thought it was good to include the test/retest, inter-rater reliability, and internal consistency, but it should be included when it's appropriate based on the type of measure that they're developing. Some of these tests are more appropriate for different types of measures, and the example that was given was that test/retest reliability may be more important for measures of functional status or patient experience, but it wouldn't be as appropriate for other types of measures. We talked a little bit about exclusions, decided that we'd like to keep both of those in. Under risk adjustment, they would need to provide the model and why the measure needs to be risk-adjusted. Also be looking for documentation that their adjustment methodology was tested and validated as part of their testing process.
Dr. Dougherty: Could you say the second one again?
Ms. Fei: Oh sure. So providing the model and why the measure would need to be risk-adjusted along with that they tested their model as part of the testing process, that it's been validated. Underscoring and analysis. Here, looking out for documentation of the field-test results, and that the results on the scale would be meaningful to users of the measures if it's publicly reported, and that if it is publicly reported that the scale is meaningful to those who would be interpreting the results.
And then under multiple data sources, we had a little discussion about what does this mean? Are we talking about can the measure be used across multiple data sources, or are you looking for testing across the same data source in different sites? So for example, looking at three sites that use three different electronic health records (EHRs), are you getting the same results, or are you looking at the same measure specified for administrative claims and HER? Are you getting the same kinds of results, and would you expect to get the same kinds of results? So I think a little clarity here would be helpful.
Dr. Dougherty: Okay. It says here National Quality Forum (NQF) language is if multiple data sources and methods are allowed, there is demonstration they produce comparable results. So—
Ms. Fei: So is that interpreted as same measure, different specifications depending on data source?
Dr. Dougherty: Why don't we ask Helen Burstin when she comes tomorrow?
Ms. Fei: Because I think both are—we thought both were valuable, but we just need clarity as to what we would be looking for.
Dr. Dougherty: Okay, great.
Ms. Fei: Did I miss anything? Okay. And that was it. The only thing we didn't get to was disparities.
Dr. Dougherty: Okay. Well, thank you. This is—you're being a lot more specific on this one, but that's easier to do, and we appreciate it. And everybody is doing a great job. Barbara came up to me and said maybe people think they're not doing the right thing, but believe me you are doing the right thing. We are getting a lot of great stuff that I think we can follow up on and make things very specific for our awardees, and then continue the process. Okay. Who would like to go next? What do we have? Disparities and children's health care payers and programs and meta issues. Yes?
Ms. Tuttle: Disparities can go next. Okay, I'm Cindy Tuttle, I'm with the National Business Group on Health. And I just want to start by giving kudos to our facilitator who got us through the whole list in a very short period of time. We had a very good discussion. I spent a good deal of time typing, so I know that the members of my group might want to fill in as I go along.
Just starting off with well-defined under key criteria to keep. We just put the note that we need to capture information that is meaningful related to racial, ethnic, and socioeconomic categories and have standard criteria. And then under required documentation we put that a little more clearly. We said we really need to define what is meant by ethnicity, race, and then list the literature for the evidence so that there's standardization for those definitions.
Dr. Dougherty: The evidence for how to define a measure?
Ms. Tuttle: Yes. Well, the disparities issues, the categories. And then for the three issues of validity, kind of similar to the group that reported out first, we had a little bit of discussion about face validity versus content and construct that ended up that we felt that they were all important, but probably face validity was the least important of the three. In terms of comments for the validity categories, we had a lot of comments in the comments column, so I'll be reading a lot of comments.
Dr. Dougherty: Okay. You can just give us the most important comments, and we will have your thumb drive.
Ms. Tuttle: Okay. Face validity needs to be looked at through the lens of disparities. Culture and community may be more important. And it needs to be developed so that it's measuring the same thing in different cultures, across cultures. I don't know if anyone wants to add to that. We had a pretty good discussion around the issues of validity.
Dr. Mangione-Smith: That was focused around concern that when survey measures are developed in one population and then get translated for another population many times because of cultural differences or interpretation, they're not really capturing the same construct. That's kind of what we were trying to get at, that you needed to be sensitive to that if you were doing an outcome measure development that was in different populations.
Dr. Savitz: This is Lucy at Intermountain. Can I make a comment? On this particular topic I'd like to recommend that we think about applying the definitions of race and ethnicity to what we have in the U.S. Census, so that there would be a way to tie in to other data sources and display the measures. And then the other thing, we might want to give people guidance on how to collect the data because I've seen a lot of variation where sometimes it's self-report, sometimes it's the way somebody sounds on the telephone, so some guidance for people in terms of trying to standardize the data collection as well.
Dr. Dougherty: Okay, thank you. Those are good points.
Ms. Tuttle: Just in addition to that comment, the Institute of Medicine released their report on racial and ethnic disparities just late last year, and I know they're encouraging Federal agencies to try and adopt standardization of these definitions as well. Okay, on to the next category that we looked at, which was "measure demonstrates quality of care provided." And the comment we had with that one was that this criterion needs to be better defined. Discriminate validity. And I'm not sure if somebody wants to speak to that.
Dr. Mangione-Smith: Well, Mia actually read to us from the NQF definition, and when she read it to us and said it was basically the measure should be able to tell poor from good outcomes, or poor from good quality, so that really felt like discriminate validity, that you know, as far as a measure property which we agreed we would want.
Ms. Tuttle: In terms of the reliability measures, we really just put one comment that was related to all of them, and the comment was that reliability measures are particularly important for survey-based measures measuring the same constructs, that they need to be developed so that they're measuring, again, the same thing in different cultures.
Dr. Dougherty: Okay. We will get that from the thumb drive.
Ms. Tuttle: The next one that we had a comment on was "clinically necessary measure exclusions are identified and must be supported by evidence." And our comment on that one was specific to children with special health care needs should be considered, that the PedsQL (Pediatric Quality of Life Inventory™) doesn't apply to children with special health care needs, and we need better and meaningful outcome measures.
Dr. Dougherty: But there are PedsQLs for specific -
Dr. Mangione-Smith: Age groups but not for children with special health care --
Dr. Dougherty: Okay.
Ms. Tuttle: Under exclusions and patient preference, if the child or parent is giving the response there may need to be some cultural sensitivity. A caveat that patient preference can be used as an excuse for disparities when there may be communication issues or institutional racism occurring.
And then under risk adjustment, we should not adjust away for race, ethnicity, and socioeconomic status and other factors that could relate to disparities, so we should do risk stratification first to see whether the different subpopulations are being accounted for. Under scoring and analysis, the way you analyze can mask disparities. We need to be careful with this. Under multiple data sources we need to ensure that the way that different data sources capture the subpopulations of interest is identical. And in terms of identification of disparities through stratification, we just wrote "yes."
So that's it. I don't know if people who were in the group would like to add to that.
Dr. Iezzoni: Thank you, Cynthia. She really sat there very calmly during a barrage of comments from the group. This is Lisa Iezzoni from Harvard Medical School. I do just want to state at the very outset that a framing issue that I put on the table was that we do want to include children with special health care needs as a subpopulation that experiences disparities in care. My understanding from the group is that there is specific language in CHIPRA about children with special health care needs, and I want to make sure that all discussions about disparities include them as a subpopulation of interest.
Dr. Dougherty: Yes. The legislation—you didn't write it?—is very clear that disparities must be identified by race, ethnicity, socioeconomic status, and special health care needs.
Dr. Mangione-Smith: Just the other kind of thing we said up front and this echoes to what the person over the phone had said was that if somebody were developing a measure and chose to use their own homegrown definitions for race, ethnicities, special health care needs, that they provide a rationale for why they're not using more standard definitions for those.
Dr. Dougherty: Okay. Thank you. We have lots of room here for being more specific I think. So who would like to go next? Sarah. And your group was?
Dr. Scholle: It was the comparisons across different populations. So we talked pretty generally, and some of the comments have already been made. I did want to pull out a few specific issues that came out that are different from what other people have said. We talked a lot about the need for the specifications to focus on issues around program eligibility and coding sets and how to handle measurement when you're looking at different programs, PCCM, fee-for-service managed care, or different payers, Medicaid, CHIP, or uninsured. What data are available, how the availability of benefits or availability of data might vary if there are carve-outs. And in terms of the validity issues, one of the concerns that we had was we were interested in construct validity but concerned that that may be challenging to do. If there are existing measures that could be used to correlate with the measures, or if there could be information on cost and utilization, how the measures correlate with cost and utilization that would be useful.
In terms of accuracy of the information, to make sure that if there's an opportunity, to compare data to gold standard data. So for example, claims versus patient or survey or medical records to validate the information, that that's useful, what diagnosis—is it the diagnosis from the mental health provider or from the primary care provider that does a better job of getting at the patient's underlying need, that those kinds of things should be done in the testing.
For the reliability issues, as the other groups have noted, not all of these types of reliability are salient for all kinds of measures. In particular, we didn't think test/retest reliability was a big deal for most measures that would fall under this category. Inter-rater reliability would be important if there were manual chart review. Internal consistency would be relevant if you're doing a scale, but there are some measures that would be indexes where you don't expect all the components to be internally consistent in capturing one thing. So it's not just one thing, you can't ask all measures to meet the same criteria.
And in terms of the exclusions, we had a concern about the need to allow exceptions where the data are being used for some sort of action, like pay-for-performance or payment incentives, so that if data identified a low-performing doctor who would then be in trouble that somehow that doctor would be able to look at the data and say and try to correct the information, say there should be an exception. And one thing to consider for an exclusion is how to handle immunization refusals, which vary across the population.
And I think we talked a little bit about it, we had to cut our conversation short, but we did talk a little bit about the scoring and how it really depends on how you're going to use the data and what's the purpose of the measure and whether it's being used for population evaluation versus taking an action to exclude a provider from a network or for pay-for-performance. There needs to be enough information to make a judgment like that, and there needs to be some sort of justification what the peer group is and how you're making a benchmark for comparison. Anyone from my group have anything?
Dr. Dougherty: Great, thank you. And now our favorite, meta. Oh, I just wanted to ask you about the gold standard. Is there any gold standard for anything? That's my question. I mean, you said claims versus medical record, but how would you—how would you go about knowing whether something was a gold standard?
Dr. Scholle: I think it depends on the measure and the piece of information. For some information, we might say that the pharmacy claim—you might say a pharmacy claim is the gold standard for whether the patient actually filled the medication, right? So it depends.
Dr. Dougherty: It depends, okay. Anybody else have any questions, comments on that one? Okay, yes Darryl. Could you say your name?
Mr. Gray: Darryl Gray, medical officer at AHRQ. In terms of the—sorry, the payer categorization, I'm not sure how often—in the case of an inpatient admission, patients who were originally Medicaid-eligible but not enrolled on admission and subsequently were enrolled during the admission actually means that if you're thinking about the measure and whether or not they're—whether or not to ascribe it to a patient who was previously uninsured and one who was a Medicaid recipient, I'm not sure how often that's an issue and whether or not that's something that needs to be considered explicitly when deciding how to categorize this.
Dr. Dougherty: Yes. There was one article on that when we were looking at the inpatient measures during the SNAC [National Advisory Council on Healthcare Research and Quality Subcommittee] process. But Patrick, I don't know, do you want to say anything more about that? There's one article that suggested that about 5 percent of people who are listed or recorded as Medicaid as expected payer and then Medicaid does not pay. So I think that's something—the hospital records it that way, as who the expected payer is. So there's some issue with that, but it's not very big is what this one study said. But there hasn't been a whole lot of money to further look at that area, so it's maybe a topic for your next intramural research project.
Mr. Gray: Not necessarily where the claim is denied, but I'm just saying I'm not sure what proportion of kids that may have come in that end up—what percentage of kids that may end up being discharged with Medicaid as the actual expected payer that actually came in uninsured and actually were—their eligibility was established during admission.
Dr. Dougherty: There was a lot of that.
Mr. Gray: Yes. And so I mean I think that that's probably potentially something that we would want to make a distinction between patients -
Dr. Dougherty: I see.
Mr. Gray:—and patients that came in as Medicaid-eligible.
Dr. Dougherty: Okay, so why should the Medicaid program be responsible for a hospitalization of a kid.
Mr. Gray: Well, I mean, I'm just saying that it ought to be thought about in terms of making it—there may or may not be reasons to classify them one way or the other, but just that the idea of thinking about that as a specific issue.
Ms. Brach: Hi, Cindy Brach from AHRQ. And I'm just pointing out that that's not just a, you know, inpatient issue. There's the whole question of how long should a person be enrolled before you hold that payer accountable, and then in the case of Medicaid in particular where there's a large amount of churning and therefore kids staying on for very short periods of time, and therefore they, you know, aren't there for the 12 months that you would need to properly measure some of the measures that you would like. The issue hasn't been raised that I've heard about how to measure kids who have brief stays on Medicaid and their quality of care.
Dr. Dougherty: Yes, that's something for the subcommittee work, the SNAC work, that Jenny Kenney did a paper on, measuring duration of enrollment. But she did it as a quality measure, and the next step I think needs to be how do you have a common denominator so that you can address these issues?
Ms. Brach: That's a different issue which is about retention and continuity of coverage as a measure of quality.
Dr. Dougherty: But I'm saying the next step now is to make sure, is to develop a denominator so that you're not excluding a lot of kids—like she found 25 percent of kids would be excluded from some National Committee for Quality Assurance (NCQA) measures because they weren't enrolled long enough. So how do you adjust for that in your denominator? And that's a big, big issue that needs to be addressed. So it's high on the priority list, even though we haven't announced the priorities yet. But it certainly is. So kids don't just get excluded from the whole quality measurement process. Okay, anybody want to say anything about that from the hospital people? Okay, meta issues.
Dr. Parry: Yes, I'll—this is Gareth Parry from the Institute of Healthcare Improvement. Yes, we had a lively discussion about meta measures.
Dr. Dougherty: Issues, actually.
Dr. Parry: Yes. One of the things we did think about, and I think we agreed with someone who mentioned it over here was in that first kind of series of criteria, the clinically sound and accurate, we felt that can probably be deleted, especially if all these other kinds of criteria are met. We kind of had our composite, pulling-things-together hat on, so we thought we could probably delete that one. But within that first raft of validity properties, here some of the key kinds of issues we came up with around meta measures or composite measures or whatever you want to call them. Clearly, each of the individual measures which would go into a composite need to be well-defined as indeed must the overall composite measure. But in terms of kind of reporting out a composite measure, we think it's also important to report out what the individual components are too, especially if these things are going to result in some kind of improvement.
We also talked a little bit about what would make up a good kind of composite measure. We talked about perhaps having things that go together, which in their total are more than the sum of their parts, otherwise we may as well just report these things individually. And we also talked about the idea of timeliness, that in putting together a composite measure it might be a good idea to put together things that are supposed to be done. And yes, we thought an awful lot about process measures here actually in this.
We talked a lot about the idea of things that are supposed to be done around the same time period. It didn't really seem to fit, to have a composite measure that would work, which said do something now, do something in 6 months' time, and then maybe in a year's time, but we kind of thought that if you're going to put a composite together that is meaningful, have it focus on a fairly short time period.
Things around face validity, content, construct validity. First of all, face validity, actually I think we did think this was very important, especially when we have composite measures. If a measure, especially a composite measure, doesn't make sense to the kind of front line staff or the people who are being—who potentially could be measured according to this, they're not going to engage with it, and if they're not going to engage with it, you're not going to see any improvement. So we thought face validity is actually very important if we really want these measures to end up being used for improvement. Let's see, what else.
Construct validity, we also thought—we looked at this piece which said it correlates with other measures of the same aspects of care, and we thought that especially in a composite measure might be hard to do. And we also thought therefore that maybe we should link it all up to see how it correlates with outcomes as well. So there should be a strong sense that these composite measures do in an evidence-based way correlate well with outcomes.
Measuring—the measure demonstrates quality of care. We thought there about demonstrating quality of care provided, but maybe if we thought a bit in terms of the measure is actually sensitive enough to demonstrate improvement so a composite measure, if it changes, we can actually be confident that improvement has occurred. Again, thinking about these things being used in terms of quality improvement measures, they often are kind of static measures.
Yes, on the comparability—or comparable bit, we may not have quite understood the definition there, but we've talked about this in terms of comparing States, organizations, entities, or whatever, and we thought one key thing that a composite measure will need to do is maybe to be able to be used in a way that can actually identify excellence I suppose, or so that it can actually discriminate well. One of the big problems in kind of putting together a whole pile of measures is that you do this kind of automatic regression towards the mean, and everybody just ends up in the middle, and there's a key piece here which we need to make sure that doesn't happen and that, again, would go into some of the stuff we talked about before about how you actually put together measures that would actually add up to more than the sum of their parts.
I hope I'm—yes, going down the inter-reliability piece, I suppose the answer to that is yes. And then going down a little bit, let me see, exclusions. We started to talk about exclusions as we ended, as we were wrapping up, and we thought that—we didn't really finish this discussion or come to any consensus on it, but we thought that there really does need to be some kind of clear definition on what happens when a patient's preference is that they don't want something. We think there needs to be a clear definition and a clear approach for that.
The only other piece I think we talked about, other people can chime in if I've missed anything that was important here, was the risk adjustment piece. Because we were thinking about these in terms of process measures rather than outcomes, we didn't think that was such a big issue here. If any of us in my group, if I've missed anything, I would be happy for people to add.
Dr. Dougherty: Okay. Anybody else? So this is a lot of—a lot to get us started, and we need to organize this. We're not organizing it today, but does anybody want to point out some themes that they heard here that we should be especially careful to attend to? I mean, we will be doing kind of an overall synthesis tomorrow afternoon assuming we're all here, but it's just information overload. That's how I feel, so that's why I'm asking you to come up with some things.
Okay. Well, thank you very much. I mean, this is really terrific. We're really getting down to what's important for children's measures and where we need some more documentation and criteria and so forth. So I think this is extraordinarily useful to us, to the awardees and to the states as they will be reporting to the Centers for Medicare & Medicaid Services (CMS). So I don't know if you want to say anything, Barbara?
Ms. Dailey: I'll wait until the discussion is done tomorrow.
Dr. Dougherty: Okay. So tomorrow morning we start here at 8:00. So you can come a little bit earlier if you want breakfast. There will be a shuttle.
So before we end for today, if anybody wants to, they can come up confidentially or shout it out to everybody else and make some suggestions for improving our process for tomorrow. In the morning we're going to start out with a couple of very brief background presentations on what all this health information technology (health IT) stuff that's happening is about, how it relates to CHIPRA, and how it provides some opportunities for the core measures, the core measure sets, and so I think that'll be helpful. It will be helpful to me. And then we'll get into talking about the feasibility criteria and the extent to which we can use certain elements of EHRs and other health IT. There are health information exchanges and other things too—what the criteria should be for these awardees, how far they can possibly go with the EHR and health IT criteria, at least in the beginning and then moving forward. So that should be interesting.
And then we will get into—we'll try to wrap up some of what we've heard here, trying to go back to the fragmentation in payment and delivery systems for children, make sure we've got it covered there, and then going back again to the disparities issues to make sure that we've got some—not consensus, but you know, that we've covered all the bases to the extent we can on racial and ethnic disparities in particular. And we'll cover the children with special health care needs as a disparity issue too, so. Any questions, comments, or suggestions?
So go back to your hotel and put on the Weather Channel.
Adjourn
February 25, 2010 - Morning Session
Dr. Dougherty: Good morning. Welcome back. I see it's a bright and sunny day. I have just a couple of reminders about the agenda for the day. We are hoping to be able to do three breakout groups, one on feasibility criteria, one on criteria specific to children that would bring together the previous criteria and see if there's anything to specific to children, and the third one would be on racial and ethnic disparities, again bringing the other criteria together and seeing what is specific to identifying racial and ethnic disparities. And so disparities affecting children with special health care needs will be in the children's group.
So we're going to start. Just a couple of reminders—if you are asking questions, making comments, could you please say your name for the benefit of the transcriptionist?
Another reminder—I understand and not surprisingly by the end of the day yesterday, some folks had kind of lost track of who our target audiences are. The target audiences for these criteria are going to be the awardees for the Children's Health Insurance Program Reauthorization Act (CHIPRA) Pediatric Quality Measures Program, which is described—well, there is a section of the law for the Pediatric Quality Measures Program.
Unfortunately, that announcement is not out yet, so I can't say too much about what that is except for what was on the Web site and in the grants.gov guide back in November when we had hoped to release the announcement, which is that these will be cooperative agreements, which means for-profit entities are welcome to apply. It also means that the awardees will work very, very closely with each other and with the Agency for Healthcare Research and Quality (AHRQ) and the Centers for Medicare & Medicaid Services (CMS).
So that's our opportunity for future development of criteria that we don't quite get to the, you know, really nitty-gritty of specifics at this meeting. So your effort here will not be wasted.
And the users will be the awardees of the CMS Quality Demonstration grants in their quality measurement efforts that they have said they are going to do. One of the foci is school-based health centers, so—of one of the States. But it is very diverse and good group of awardees.
The other reminder is that when we talk about what levels we are measuring, who is going to be doing the reporting, and it's the State level. Of most interest to CMS is the States have to report. That's in the legislative language.
And also in order—there is the American Recovery and Reinvestment Act (ARRA) language that is going to be closely connected to CHIPRA somehow, that Michele Mills is going to explain today and Jon White. So we're interested in the electronic health record (EHR) and how far you can get with that.
So the focus of our first breakout is feasibility, with a heavy emphasis on the health information technology (health IT) and EHR components. And so this morning we have to give you some background on that because it can be hard to keep track of what's happening with health IT and EHRs, we are going to ask Michele Mills from CMS to give us a little background on that.
And then Jon White, who is the health IT lead here at AHRQ, he is also going to give a little background on that on the overall picture as well as the AHRQ and CMS partnership on creating a model template for EHRs.
Dr. White: Thank you and good morning. My name is Jon White. My official title is Health IT Portfolio Director here at AHRQ. I've been here for 5 years now.
I want to start off by saluting those of you who stared in the face of impending doom from snow, only to see the threat fade in the face of your strengths. So, well done.
I also want to say it is a pleasure to see some friends here, Denni McColm, a long-time AHRQ grantee, and expert health IT user. And on just a very personal note, I want to welcome Kevin Lorah to AHRQ. It was completely unexpected when I walked in yesterday.
So 5 years ago, I was a family doctor. And I was delivering babies in Lancaster, PA. And Kevin is a neonatologist in Lancaster, PA. So I would catch them, and I'd hand them to Kevin. And Kevin would buff them up and hand them back to mom. Or he'd say I'm going to take this one back to the workshop for a little bit, and kind of go back and buff him up a little bit, and then bring him back and hand him to mom. So it was a delight to see him. And Kevin and I still haven't had a chance to talk and say hi. But we will.
So thank you for coming. Let me just give you a little sense for what's happening in health IT, especially with the Recovery Act. And then Michele will connect the dots for you.
So health IT has long been of interest. AHRQ has actually funded health IT—AHRQ and its predecessors for 30-plus years, research in a lot of difference places, whether it's Intermountain Health or the Regenstrief Institute, the Partners Healthcare—what is now Partners Healthcare up in Boston. So it's been of interest to the Agency as a means to improving health care quality.
And for the past 5 years, Congress has set aside part of AHRQ's budget to be able to fund research and synthesize the best evidence about how you do that, improving quality using health IT.
For a long time, it's been talked about, the interest has been mounting. And then 1 year ago, the world changes because the Recovery Act passes. And of the 400 pages of the Recovery Act, 100 of them are dedicated to health IT and establishing the Office of National Coordinator (ONC) and setting up all sorts of advisory councils and, most importantly, setting up an incentive program, okay, where for the first time, the Federal Government, in a very substantive way, will pay doctors and hospitals and some other folks to adopt and—this wonderful phrase of art, meaningfully use, health IT. You know as opposed to the meaningless use, which has happened previous to this. Now it is meaningful.
So—which has been a really fascinating concept and it's been, you know, this was not in the plan when I went to medical school, but it has been a really fascinating time to be in the Federal Government and watch all this unfold.
In essence—and the meaningful use component is what I really want you to be able to pick up on because it's not just that, you know, the doctor buys the equipment and bitches about it for a while and then eventually gets back up to speed with productivity. They have to also then be able to do certain things with that and demonstrate that they can do certain things with that to get their money from CMS.
And when we talk about their money, it's for doctors and hospitals. It is through Medicare and Medicaid, okay. And the estimates are that it is on the order or $20, $30, $40 billion. It's not specified. It's formulaic, okay, but it authorizes CMS to make those payments out. So nobody knows exactly how much.
But there is a regulation out on the street now, notice of proposed rulemaking, a draft regulation that specifies what the incentive program looks like. The comment period is up until March 15th. There will be a period of time. And then a final regulation will happen. And that regulation will specify the payment rules for 2011, when the program starts, and to a limited degree, 2012—not completely. I think there will be some tweaks in the payment regulation for 2012.
So it is still forming. It's not set yet. But it's on its way. A key part of the definition of meaningful use, as established by Congress and then further promulgated out in this regulation, is the reporting of quality measurement data through these systems.
The way it is set up right now in the proposed rule, a number of specialties are called out specifically. Pediatrics is definitely one of them. A number of measures are put in there for 2011, with a proviso that A, that may change, and B, it can definitely change in future years.
Let me tell you what the reality is. The reality is that most of the systems that exist right now currently cannot do this. I'll just, you know, tell you that.
Denni is working on the grant, and she can probably tell you very explicitly the challenges that go along with actually doing this. It is a real challenge.
For the first year in the rule as proposed right now, providers are asked to attest to the fact that they have gathered these data. And then in 2012, provided the Secretary can accept it, the providers are going to be asked to send the data in. Where, how, don't know yet. Still working on that, okay, so it's still under development.
So the key thing that you need to know as you move forward with this terribly important work that you are about today—yesterday and today—is that maybe not in 2011, maybe not in 2012, but in 2013 and in years subsequent to that, there likely will be an expectation that the measures that are decided upon get baked into the information systems. And a means to gather that data, aggregate it, interpret it, report it out is going to be baked into these systems.
So as you go about this work, I know you're going to spend some time doing this, and I'll stop mostly on that and just talk briefly about the pediatric formats that are going to be a key part of that. So as you go about the work that you do, that's the background that you're working on.
Just very briefly, part of the CHIPRA legislation was a demonstration or, you know, the requirement to define—I'm sorry. Erin Grace is in the back. She is part of the health IT team. Erin has been the Project Officer and what I genuinely consider a shining example of good intergovernmental working relationships has been just a paragon of virtue working with CMS around the establishment of this task order.
CMS and AHRQ have partnered up to compete and award a task order to establish pediatric EHR formats as required by the CHIPRA legislation through one of our contract mechanisms. So that project has been solicited for, but it is not yet awarded. They're still going back and forth in negotiating.
But there are two key things to understand about that. Number one, over a period of time, these folks are going to develop not a new EHR, okay, but formats that will be expected to be able to be used by pediatricians using EHRs, and that they will meet the needs as laid out in the CHIPRA legislation. And quality measurement is definitely going to be a part of this.
The other thing to note about that is that you are aware of the Medicaid demonstration grants that were awarded recently. Those folks are expected to interact with the development of those pediatric EHR formats.
So just very briefly that's a touch on that. And you've plumbed the depth of my knowledge. If you want to know more, talk to Erin.
So thank you very much. I hope I didn't overextend my time. And I will relinquish to Michele.
Ms. Mills: Thanks, Jon. I'm excited to come and talk to folks here today about this. And I'm going to extend what Jon was talking about and tie it back to the overlap with CHIPRA. I've been one of the folks working on the Notice of Proposed Rulemaking at CMS. And also I came out almost exactly a year ago to help implement both CHIPRA and the Health Information Technology for Economic and Clinical Health (HITECH) Act.
I came out from the Chicago Regional Office for the purpose of making sure that the Health IT parts of both the Recovery Act and CHIPRA didn't fall through the cracks because there were parts of both. And our interim center director thought we needed one person to make sure that we didn't have elements—the overlap elements falling through the cracks.
So I've been working on this specific issue for about a year now. And so I'm excited to talk to you about this today. Taking a step back, we had this pediatric core measure set that everyone here has been working on now for the last 6 months or so. And we had the pediatric measures that were proposed in the Recovery Act under the HITECH Notice of Proposed Rulemaking that was just published about the same time.
So if we look at this as a Venn diagram, we had about four measures that were overlapping in the middle. For the voluntary measure set that you folks have been working on, we have measures that States can't pay for—or maybe they are paying or proposing incentives under managed care or some other activities in their States now—but we're hearing from States that they don't know how they are going to implement or pay for these programs in their States.
So we're looking for ways to help leverage HITECH with the CHIPRA core measure sets, since it is a voluntary program. We have right now four measures in that interim space in the middle. So we want to look out on the horizon for how we can extend that electronic set of derivable measures for the long run.
I think what we want to talk to you about today is how we can continue to think strategically from what we're doing now—the activities this week and what we will be doing with the rest of the measure activities going forward.
I think that with what Jon was just saying, the program going forward will be—we want to look at alignment where possible and—let's see here—
Dr. Dougherty: We can just have people ask you questions.
Ms. Mills: So I had a couple of things I wanted to say, and Jon said about four of them. So let's do that. Let's just go to questions. The idea that I was just trying to get across was that we want to make sure that as many of the measures that we began to look at are going to be electronically derivable as possible. We know that many of the measures are going to be survey-based. And that's necessary to cover a number of the populations and components of the program that you are looking at.
But going forward, we need to continue to have overlap with HITECH as much as possible because we need to leverage the program or otherwise the CHIPRA measures won't be successful and States won't be able to implement them.
Dr. White: I just can't emphasize enough how bureaucratically important it is to—and I say that, you know, without smiling, which is something—how bureaucratically important it is to have somebody like Michele do exactly what she just described, which is keep their eye on both things.
It is not hard at all for trains like that to uncouple, you know, with the result being a collision down in the stockyard much later. So you've just described a terribly important function.
Dr. Dougherty: Yes, Barbara?
Ms. Dailey: I just wanted to add two more points. Thank you very much, Michele, for that.
One of the other points I wanted to highlight in terms of the work that Michele has also done, we've mentioned the partnership that we've had between CMS and AHRQ as being very successful.
One of the things we found between ARRA and CHIPRA is we've actually had to talk to a lot of Federal agencies. And we've had so many interesting discussions and successes. And our work with the Office of the National Coordinator has also been one of those significant successes because they also were working on a regulation at the same time in terms of the certification technology.
All of that intermingles. And as Jon was just pointing to, we all have to make sure that we are aligning all of these efforts in order for providers and States to be successful with these efforts.
One of the points I wanted to make when you go into your breakout groups is specifically for Medicaid, there are five eligible provider types. And this is why it is significant because it goes beyond physicians.
We have physicians, but there is a focus on pediatrics, as was mentioned. Dentists are eligible for incentives. Certified nurse midwives are eligible for incentives, as are nurse practitioners and physician assistants.
This is significant for Medicaid because of the types of complex needs that these children have. We've been talking about medical home alternate care settings, all of this comes into play in how we are going to utilize exchange of information, electronic health records, and interoperability of these systems in various care settings.
There could be telemedicine to a rural area that may even—how are we going to collate this information and demonstrate that these children are getting quality care? So I just wanted to put those thoughts to you when you are brainstorming that, you know, there are acute care hospitals, there's children's hospitals. But we are looking at how this is also going to interplay with alternate care settings. So I just wanted to mention that.
Dr. Dougherty: Thank you, Barbara. Okay, questions?
Mr. Young: John Young at CMS—this is more of a point than a question. Just a couple of things piggybacking on what Barb has had to say. For the medical home also, the interfaces that sort of work beyond just medical care but looking at the interfaces of public health, looking at the interfaces with other activities, I mean in terms of school-based health centers and so forth, so that's part of the equation as well.
And I've got to admit, one of the things that I thought would be very difficult early on was this sort of fusion between ARRA and CHIPRA because the intent is a little bit different. CHIPRA is in a developmental stage where we're looking at experimenting and assessing activities that States can do and can't do within their programs, whereas HITECH and ARRA are a little bit more mature in that sense.
So I think going back to what Michele was saying, that whole fusion between the two becomes critically important. How do we work down that path without extra burden on our providers and managed care plans and States as well? So that's what makes that exercise I think that much more important.
Dr. Dougherty: Well, can I start with one question that I've continued to ask my CMS colleagues? And tell me if you can't answer me. The fact that there are four measures in the middle of those Venn diagrams, CHIPRA and ARRA, does that mean that when we're really talking about what core measures you want States to choose to voluntarily use, we're only talking about those four? Or are you talking about the whole CHIPRA core measures set?
Ms. Mills: And could you tell us what those four are?
Ms. Dailey: Sure. The four measures that overlap are body mass index (BMI) for children 2 to 18 years of age. It is a National Committee for Quality Assurance (NCQA) measure.
Dr. Dougherty: Is that BMI documentation? The same as the CHIPRA core measure?
Ms. Dailey: These are the ones that overlap, yes. Followup care for children prescribed medication for attention-deficit/hyperactivity disorder (ADHD), annual hemoglobin A1c (HA1c) testing, but for CHIPRA, and this is where we get into the specifications, it's targeted obviously for children and adolescents with diabetes versus the NCQA measure, which focuses on adults. Is that right, Sarah?
Dr. Scholle: Yes.
Ms. Dailey: That one is for adults specifically?
Dr. Scholle: I believe that it—the endorsed measure is for children. We have both.
Ms. Dailey: Oh, thank you. Okay. And then the last one is appropriate testing for pharyngitis. Okay. So those are the four measures.
To answer Denise's question, for CHIPRA, the intent is for voluntary State reporting for as many of the core measures as possible. Obviously, the four—we wanted to have some overlap, as mentioned by Michele, because we want to have some financial support for States to be able to at least pursue those measures, if possible.
One of the things that we're really curious to see is how the final comments come in. We're in the last week now of public comments for the initial core measure set in terms of what the States and various users anticipate as being problematic.
We have until next February to release the procedures and the approaches that we're going to be recommending to States to use in voluntarily using these measures. But this is new territory for us. I mean Medicaid is in its infancy in terms of fully utilizing the full quality improvement (QI) and management cycles.
So we have a lot to learn, and these are complex kids. It's all kinds of populations. They utilize all kinds of care settings. And so it is a learning process.
And we wanted to start somewhere. And that was the purpose of the Subcommittee on Quality Measures for Children's Healthcare in Medicaid and CHIP (SNAC) this summer by trying to use what we called the grounded measures, you know that there had been some experience, some—like a scientific evidence base to show the meaningfulness of those measures.
But it is new ground. And we recognize, again, for States and the position that they're in in terms of serving their beneficiaries, how are they going to be able to apply resources to this?
So my final note on this to move the quality agenda forward is we're also—this is the year we're finalizing a report—our first report to Congress on quality. And we're going to be having a couple of sessions with State and Medicaid CHIP directors and then a separate one with national stakeholders to get their input. What do we want to tell Congress we need to move the quality agenda forward?
And obviously resources is a big one. But we need to be specific. If you were given resources, how would you use them? And so that is—you know how we tie this into EHRs and health IT, this administration has been extremely focused in having this be a successful endeavor. And it's moving—it's helping us to move our quality agenda.
But maybe the need for resources hasn't been put in the right place. And so here is our chance to use our voice. And so we'll be looking forward to talking to you more about that in the next couple of months.
Dr. Dougherty: And you did mention to me that what's happening here and your suggestions or recommendations will be fed into, you know, information or requests for resources about specific different criteria building.
So, Colleen, you had your hand up? We'll do only—sorry—only do about 5 minutes for questions since we're already 15 minutes behind. And then go into our breakout groups which, Michele and Jon, you are welcome to join, and Erin as well. So go ahead, Colleen.
Ms. Reuland: I was wondering in the CHIPRA legislation it talks about having the measures stratified by groups, raised up in the city, children with special health care needs, are you guys exploring in the model EHR format methods by which patient-reported data can be implemented in the EHR?
So, for example, children with special health care needs, there has been a lot of work to develop a non-condition-based approach. But it is a parent-reported or patient-reported measure. If it could be inserted into the EHR, it could be used to stratify the data.
So are you guys looking at those elements in trying to keep coordination in considering that important avenue?
Ms. Dailey: Yes, we're very early in this phase. So we're exploring all kinds of options. And I know that there has been discussion about that.
I don't know, Erin, did you want to say anything to that point? I mean it's basically too early to really have any specifics that we can provide.
Ms. Grace: Erin Grace with AHRQ. In the solicitation that we put out to the offerors, we referred them back to the CHIPRA legislation and the import of the kinds of things that CHIPRA referred to. So as Barbara said, we're early in the process. And so we haven't, you know, finalized who is going to get this. But this is certainly something that we're trying to be aware of as we work with the contractors.
Ms. Reuland: And are you guys, when you are coming up with the model format, are you—is there going to be a separate process in which you work with the major makers of EHRs to try to incorporate that? Because just in working with Kaiser, it's been—one region of Kaiser can't share what they do in their EHR with another region of Kaiser.
So is that—given that they are normally for for-profit entities, how do you guys see that working?
Ms. Grace: Most of the proposers have in their proposals ways to work with vendors. And those that didn't—and when we're doing the negotiations, we're certainly aware as we are selecting the contractor of how are they going to be incorporating vendors.
And also the CHIPRA legislation and then hence the solicitation was very specific on a dissemination plan of the model format, and obviously vendors are a key audience in that.
Ms. Dailey: And under the CHIPRA provision, for the EHR format, it's actually an EHR program which CMS is still fleshing out. So we'll be providing more information.
But as an example, one of the components that we're pursuing as part of that program is how we're going to have outreach to parents and caretakers to educate and encourage them to use EHRs. So the point in terms of how patients can actually have their information included is being evaluated.
Dr. Dougherty: Nora, you had you hand up?
Ms. Wells: My name is Nora Wells, and I'm from Family Voices, a national organization that speaks on behalf of children with special health care needs and their families.
And I would just like to encourage—kind of building on Jon, your point, and a couple of points that have been made here, definitely now is the time to get people involved in the thinking about how consumers are going to have a role here.
We've had a little teeny tiny input in some of the HER pieces and the health IT. And we know there's a lot of issues and concerns. And there are clearly, I think, avenues for us to start—you know, this is an opportunity, as was said yesterday, and I think we have to start at all levels to include those voices in the thinking and the discussions.
So I just want to applaud what I heard about this morning, which was the opportunities that these grants are giving for youth, a project that has just been funded, right, for youth to actually report on their own care. It has nothing to do with EHRs, but it is the principle of it.
So you have awarded one of these grants. I think we need a huge push—this is what the legislation says really—and I think all the history has been in a different direction. So it's kind of like a change of direction.
Dr. White: I just want to say one or two or three things about that. The legislation is obviously very, you know, doctor and hospital focused.
Dr. Dougherty: Which legislation? Not CHIPRA?
Dr. White: Not CHIPRA. The Recovery Act and the HITECH stuff in particular. Here's my expectation over the next 5 years, okay? There is going to be a lot of angst and, you know, blood in the streets, and by the end of that period of time—there will be a reasonably broad installed base of these tools in the health care system. Okay?
We have some lead time here to be able to look at what those needs are. And at least at AHRQ, we've already started looking at that.
If you go to healthit.ahrq.gov, there is actually a nice report on a series of 20 or 30 focus groups that we held around the country asking consumers about health IT and what their attitudes were about it and really what they wanted from their health information. As you know, that's a woefully under-explored issue.
What I would love to be able to do is start things in motion now so that 5 years from now, when these projects are ready to bear fruit, we are in a place to answer the questions now that we've got this installed base about how it should be used to be able to address exactly those needs and those issues that you want.
So yes, I completely agree. And there are things moving in that way.
Ms. Mills: Additionally, in our Notice of Proposed Rulemaking we identified a number of places where Cindy Mann, our Medicaid Center Director at CMS had asked us to point out areas where we will look at duals and children and children with special needs and CHIPRA. These are areas of priority for us for health IT. And we will continue to look at these things.
We're hoping to get a lot of feedback and comments and places where folks have provided data for us for the final rule so that we can continue to consider these issues. But these are areas of priority, especially when they have an impact on the consumer population.
Dr. Dougherty: And your comments on the final rule are due?
Ms. Mills: March 15th. And then we do expect to have a final rule on the street by the early summer, June.
Dr. Dougherty: Okay. And the comments on the CHIPRA measures are due March 1st. So you have the Federal Register notice, at least one copy on your table. Yes? One more question.
Ms. Hess: Cathy Hess, National Academy for State Health Policy. And I'm sorry, I missed the presentations. But I'd be surprised if you touched on this.
Just a question, and also I guess a plea, and Cindy Mann may have raised this also, that as we work on health IT and health information exchange (HIE) that we think about enrollment and retention needs as well. There is a whole movement going on to try and use technology more effectively for that purpose.
And having kids enrolled, as we know, is essential to the ability to implement—to have them access quality care and to be able to measure that care. So I don't think there has been a lot of connection going on between those. I don't know if you can speak to that a little bit.
Dr. White: Yes, you know, one of the neat things about the legislation, there's a lot of provisions in there. There is a provision for State grants. And ONC had to decide where to put down their chips. They had about $2 billion in discretionary funds to put down. And they put down $600, $700 million into grants to States to develop HIE, in particular.
Now that's key both—you know, it was established in the legislation also on how they rolled it out—just a couple of weeks ago, ONC awarded 40 State grants. And those groups are going to be key. Absolutely key.
Erin—I'll let the cat out of the bag, you were one of the reviewers—and actually Erin was also in a nice confluence of events, was a project officer for contracts that we had running for the past 5 years called State and Regional Demonstrations, which really, in many ways, are prototypes of what has rolled out subsequently.
So those folks are going to be focused on it. And I promise you the State governments are involved. And the State Medicaid programs, I'm pretty sure are extensively involved in the applications. Is that fair to say?
Ms. Grace: Yes, they are. The State applicants had to have a letter of support from the State Medicaid Director. And there has been a lot of overlap between CMS and ONC in terms of reviewing the State Medicaid health IT plans and also CMS staff reviewing the State HIE plans, which were required for the grant applications.
So that goes back to what Barbara was saying in terms of a really successful and strong overlap in collaboration between the ONC and CMS.
Ms. Dailey: And right now CMS has had a total of 43 States that they've been working with on those plans. So—
Dr. Dougherty: So we could talk about this.
Dr. White: Yes, as the life blood of money pumps out from the Federal aorta and diffuses out to the capillary beds, that's where that diffusion exchange is going to happen. And that's going to be a key place to monitor it.
Dr. Dougherty: So thank you very much. And we will take your wisdom. Obviously there is a meeting at least every week, right, on this topic that goes on for days. So children aren't always the focus of the meeting, but they are definitely in there.
Ms. Dailey: And one of the things that I think shows Cindy Mann's dedication to this is within 3 months of her starting at CMS in June, she had a new Deputy Director specifically brought on to focus on quality and health information systems. So she is very focused on this.
Dr. Dougherty: Next we'll turn our attention to the OPM, Office of Personnel Management, and health IT.
Ms. Mills: And she is going to be talking about this to the National Governors' Association (NGA) today, too—this issue.
Dr. Dougherty: Great. So—and I'm sure she'll bring up children and CHIPRA. So—
Ms. Mills: Yes, she's talking about the CHIPRA HITECH overlap.
Dr. Dougherty: Okay, great. Well, thank you very much. And feel free to stay and offer what wisdom you can. People are going into their breakout groups to—now—yes, Patrick?
Dr. Romano: I just—on the agenda, it says that we're supposed to do application to examples.
Dr. Dougherty: Yes.
Dr. Romano: I'm wondering what that means, did you had specific examples in mind?
Dr. Dougherty: Yes, the examples are the measurement topics. And they are the same that we had yesterday. So—and here are the room assignments. And I have the thumb drives up here if you want to come and get them.
And do people still have a sheet saying where they should go? Actually, unless you've been asked to be a facilitator or a reporter, you can go wherever you'd like. So—if you have any questions, come up and ask. And, again, this feasibility list is a very long list. So I would encourage you, since we only have 40 minutes for this, to—somebody still has a thumb drive out so please go into a group and share it.
Dr. Dougherty: Okay. Thank you all for doing a speed dating on two pages of feasibility criteria. It couldn't have been easy. So who would like to go first with their guidance that we—okay, Ellen Schwalenstocker, say your name and where you are from. This is how we're all introducing each other.
Dr. Schwalenstocker: Okay. I'm Ellen Schwalenstocker. I'm from the National Association of Children's Hospitals and Related Institutions. And I was in the medical home group. So I'd invite my team members to just jump in if I'm missing a key point.
To preface our remarks, we thought it was important to define exactly what we were talking about with regard to medical home, meaning are we talking about actual measures of medical homeness? Or were we talking about measuring the effectiveness of the medical home? So we chose the former. So our responses are based on feasibility about measuring medical homeness. But we also noted that how we answered the questions would differ if we were actually evaluating the effectiveness of the medical home.
Dr. Dougherty: That's right. And that's, you know, we could have gone the other way where underlying scientific soundness and the need to link the measures, the outcome measures to the process measures. But we had that from yesterday so—
Dr. Schwalenstocker: Right, so, the other thing that—to just preface our remarks with is that the work with regard to this construct is very nascent. And we made the grounded assumption, if you will, that probably a lot of these measures would be survey-based, either a survey of the provider or a survey of the consumer.
So with regard to the individual criteria, we thought, obviously, the data availability and collection feasibility were very important. That we would need clear definitions. And we would probably have consumer measures as well as provider measures.
That the cost of collecting data is likely to be somewhat high, given survey measures. And so we would want grantees to note how they are going to approach the issue of cost in developing the infrastructure to collect the data—having a good idea for the States on what the cost is going to be.
We assumed that the confidentiality issues would be addressed by whatever institutional review board (IRB) process, et cetera, that the grant would go through.
We thought that feasibility with regard to health information technology (health IT) may be too high of a bar to hold, especially for survey data. But that it is a potential testing area. So, you know, are there new technologies that could facilitate data collections of survey data? I think Julie mentioned some work that they are doing that may ease the collection of survey data.
Dr. Dougherty: That's with Consumer Assessment of Healthcare Providers & Systems (CAHPS®)?
Dr. Schwalenstocker: Yes. So feasibility of precision of specifications, the words in the parentheses there threw us a little bit so we thought if you were restricting it to electronic health records (EHRs), that it may, again, be too high of a bar to pass. However, the precision in that measure and the definitions are important, you know, if the parentheses—the words in parentheses weren't there—we would say precision is, obviously, an important criterion, especially since there are many different operationalizations of, you know, how to measure a medical home.
We, again, assuming that we're looking at the measure of medical home, exclusions we did not see as applying. But even if they did, we figured we could do that with skip patterns, et cetera, in surveys. And so that same comment applies to the next one.
Readiness for operational use, again, you know, we noted that there is not really a standardized definition or operationalization of medical home. That we'd want to see measures tested in a variety of settings, especially survey measures, to make sure we didn't have response bias issues, et cetera. But, again, having the measures already in use would be a high bar, since this is an early enterprise.
That we should be capable of subgroup analysis with survey data, given adequate sample sizes. And again, noted we would want some understanding in looking at analyses of subgroups, et cetera, or analyses of survey data overall, to know that given the likelihood of having less that 100 percent response, you know how the response bias may have impacted the results.
That is my hesitant summary. Does the team have anything to add?
Dr. Mangione-Smith: Just one thing—I would add that we talked about identification of susceptibility to exclusions, that with survey data, we felt it was important, especially when we kind of brainstormed about ideas about how the surveys would be collected, either by phone, possibly by Web-based surveys, mailed surveys, that you wouldn't have unintended exclusions because of no access to telephone, no access to the Web, survey language issues, that kind of thing.
And that it would be important for grantees to—depending on the methodology they were going to use, to address those issues.
Dr. Dougherty: Thank you. Okay, who would like to go next?
Dr. Thompson: I'll do health outcomes. This is Jeff Thompson. We thought that the feasibility complemented very well some of the validity questions. And so a few additional comments. One on the feasibility of non-health IT data, grantees must have the appropriate resources to chart the non-health IT reliability or retrievability.
And then on the confidentiality, we said that the grantees may need to specifically look at their mental health and medical privacy rules because when you are actually trying to retrieve some of that data, it may not be available to you because of State rules, especially mental health when it comes to things like attention-deficit/hyperactivity disorder (ADHD).
On the feasibility of health IT, we said attention needs to be directed toward State sources and the consistency of those sources because we may have to marry many sources together. And so—and different locations—and so there has to be a lot of attention to the definitions for the retrievability that may be provider-specific, may be venue-specific for each of those cares.
On the feasibility of precision, we thought there needs to be attention for every element to look at the precision—and very well documented in how that precision is either available or not for every element.
Dr. Dougherty: Could you say your first clause again? Your first sentence—there needs to be—
Dr. Thompson: I'm sorry. I've got a cold so it's hard to multitask, too. Which one?
Dr. Dougherty: The first—you said—before you said need to document, you need to be—
Dr. Thompson: Well, precision needs to be documented on every element.
Dr. Dougherty: Okay. Thanks.
Dr. Thompson: On the feasibility of readiness, system capabilities by States and plans need to be—they need to have adequate time and attention to resource needs because especially if you are marrying up several data sources, then you'll need to spend a lot of time talking about definitions and how the codes or system codes may have to be married, especially if it is many different sources.
If there is a subgroup analysis, we thought that there should be some agreement and communication with those people that are going to look at these subgroups because, for example, you know, one group might not agree what the ages—age spans for adolescent or child or infant may be, and they'll just throw out all of your data if they don't agree with your age spans or something like that. So you probably need to work with your communities on that if you are doing sub-analysis.
And then on feasibility of statistics, we thought these actually hit well the considerations of power, statistical analysis, the estimates, confidence intervals, and those things. And they need to be looked at and asked some questions prior to reporting out on how these are going to be compared so that people accept the comparisons.
Dr. Dougherty: Okay. Thank you. Very helpful. Who would like to go next?
Dr. Scholle: This is Sarah Scholle. I'll do disparities. And so we were thinking about the CHIPRA language on disparities related to race, ethnicity, children with special health care needs, and socioeconomic status. And in terms of criteria about the availability of data and retrievability, that one of the key issues that AHRQ and the Centers for Medicare & Medicaid Services (CMS) could help with is to recommend specific definitions for children with special health care needs for each kind of data source. And so there is a different way of defining it from surveys and then from—so there are lots of different definitions out there. And having standardization would help.
The same goes for race/ethnicity. That it should follow the Institute of Medicine (IOM) committee's recommendations for how to use the Office of Management and Budget (OMB) categories.
In terms of some of the other feasibility issues, the group thought that we need to think strategically about how different data sources can be linked to create better information over the long term. And that we should have a focus on low-cost systems and how to do that.
But it may be over time, augmenting data sources. And that should be part of what measure developers should be able to do is to talk about how those data sources could be linked and what the costs are so that we're focusing on low-cost mechanisms rather than really elegant approaches.
In terms of the confidentiality piece, the focus should be on how to protect data confidentiality. And that CMS and AHRQ should help States get over the reluctance of other parties to share data. And should, you know, make that an expectation that labs will share data and make—and help States work through some of the roadblocks that they are facing, maybe use examples from States that have done this successfully to make that feasible.
And in terms of making data available electronically, again, we don't want to focus only on measures that are available from currently existing electronic data. Or from currently existing measures that we should be thinking about. What are the new measures we could create with data—electronic data that could be created in a new system so that there is a path to electronic data collection and—rather than saying, you know, you have to just use what is available today.
The last—I want to call out under readiness for operational use, I think our group was concerned that the focus shouldn't be on things that could be operationalized now but rather what could you do in the future. And that there really should be sort of a long-term vision maybe and steps to getting to that vision. And that measures should be able to be stratified.
We're not sure that on the last criterion about statistical feasibility, we want to be careful about that because there are some categories, particularly if you are looking at disparities, where you can get very small numbers. And that doesn't mean we want to throw out the measures or not look at those issues. But think about whether we need to aggregate data over years or across States for reporting items rather than saying there is a strict criterion for a measure based on available sample size. That may eliminate some of the things that we're most interested in looking at. Anybody from my group want to add anything?
Dr. Dougherty: Okay. Very nice, very visionary. Okay, we'll move on. We have the trial group and the meta-measure issues and the inpatient group. Who would like to go next?
Glenn, which group are you?
Dr. Takata: Children's health care payers and programs. First we had some general comments we wanted to make that—and forgive me, if I'm repeating what the other groups have already said—for each of these criteria, they have to be developed and applied differently, especially in nontraditional settings. We felt, for example, in schools where data may be collected and, as we've discussed earlier, that enrollment issues may make feasibility difficult for measure collection as well.
In terms of data availability, we do want to keep that criterion with requiring documentation on data sources and data files and details about both of those. And the comment was, if something was not billable, it would be difficult to get the data.
Feasibility with regard to cost, we would keep that criterion and required documentation might include total cost estimates based on pilots and total cost to everyone in the system and also cost to end users. And I think the National Quality Forum (NQF) application already requires that you state, for example, if there is an end-user fee for proprietary measures.
Let's see, confidentiality, we would keep that criterion, and documentation would be agreements compliant with Federal and State laws. Feasibility with regard to health IT, we sort of echoed the discussion the other groups had that it is sort of a high bar currently to reach so we would not—we would delete that as a current criterion for the measures.
However, that it certainly is a key criterion for the future, and perhaps one could get bonus points for things that can be collected electronically currently. And for the next—
Dr. Dougherty: I'm sorry, the grantee should get bonus points or the—
Dr. Takata: The measure could get bonus points, and I guess the grantee also.
Feasibility of physician specifications, we had the same difficulty as the first group with the parenthetical part of that criterion. And so we actually split it. That we thought the electronic, again, collection component of it could be deleted at present because again, that's sort of a high bar to achieve. But definitely we certainly agree that physician specifications is important in general.
And then I think my group might need to help me on our comments but yes, if you have a timeline for movement into electronic data collection, for example where two measures might be equal otherwise, it could be the tiebreaker for those that currently cannot be measured electronically.
The other general comment was it may be difficult in nontraditional settings to collect things electronically, for example schools. Harder to do—so harder to collect electronic across payers and programs at least currently. And hard to be precise when, again, collecting data by different regions, States, and programs, at least currently.
Feasibility of data sources for exclusions should not require additional data sources, we agree that should be kept as a criterion. Identification of susceptibility to exclusion, we agreed that should be retained.
Readiness for operational use, we thought that was important to keep as a criterion, and if there was documentation on field test results, that would be important to include. Subgroup analysis, we agreed that that should be retained, that we recognize that some States currently may not be collecting the data that would allow for such analysis. And we also discussed the need for standardized definitions in this area.
And actually I think there were only two of us in the room when we discussed the last one. But we both decided yes—yes, we would keep the last criterion, that feasibility regarding statistical issues. So the rest of the group, please speak up.
Ms. Hess: Cathy Hess, National Academy for State Health Policy. He did a great job.
Just to reinforce that we did have a comment similar to, I think, another group that in the area of confidentiality, that there is a need for guidance, technical assistance around that, what are actually legal requirements. There are a lot of assumptions about what they are and what they aren't, et cetera.
And the other thing I probably harped on more than others, but this whole question of the precision of specifications across payers and programs, I think is going to be extremely challenging. And I'm not an expert in these kinds of things but it may come down to a question of balancing the degree of precision with how many different entities you want reporting it.
I mean I was in a group yesterday where experts were talking about, you know, how precise is precise? How valid is—you know all of these things are a bit relative so that you may really need to look at the standard for that when we're talking about having so many different entities reporting.
Dr. Dougherty: Thank you. Okay, I think we have meta and inpatient. Yes?
Dr. Antman: Mark Antman representing the American Medical Association (AMA) and the Physician Consortium for Performance Improvement (PCPI).
Before I start, I'll note that we noted specifically that we are thinking of meta measures as applying either to measure sets or composite measures. Some of the considerations may be different whether you are talking about sets or composites, but we tried to consider both types of meta measures.
So, in general, we felt that all of the criteria, all of the feasibility criteria are keepers. They are all important except for one, which I'll get to shortly.
Under cost—so just briefly, is the data availability, yes, that's important for meta measures. Under costs, we noted that it is particularly important for the grantees to separate their development costs from their implementation costs. And in particular, if they can anticipate what their costs will be in the first year or subsequent years of implementation, that will be—and also testing costs, if they are going to be testing measures as well, that will be especially helpful to be able to assess return on investment over time.
For confidentiality, again, important but we discussed the fact that for meta measures, whether it be individual measures in measure sets or individual components of composites, the confidentiality of those individual components may need to be considered carefully individually, and perhaps it may be a consideration if there are different confidentiality issues for different components of a composite, that may influence what, in fact, is put into a composite measure.
As to the next two, emerging health information technology (EHIT) and specifications, we felt that the precision of specifications is a must. That's essential. The availability of the data electronically is certainly desirable. We wouldn't say that that is required at this point in time.
As to data sets for exclusions, again, for meta sets or meta measures, we noted that there could be a difference in the exclusions among the different measures, the individual measures in a set or the components of a composite. And that there needs to be acknowledgment of the fact that there may be differences related to the exclusions. That is particularly challenging for measure sets and composites.
And there is the potential that individual exclusions could undermine the validity of a measure. And so the grantees need to note that for each component for these measures.
Regarding susceptibility—identification of susceptibility to exclusion, this seemed to us to be something that may not be identifiable until testing. And you'll hear that that's a recurrent theme for some of the remaining feasibility elements.
And, again, this may be especially difficult in a composite measure. We discussed the fact that noting differences among the individual components that go into a composite would be an important consideration.
I noted that there was one element that we thought could be dropped, and that's the next one, the readiness for operational use, which we felt—which our group felt may be a little bit too much to expect for new measures in development. Again, once the measures are tested, it would be better—it would be more possible to say how ready they are for operational use. But at the front end, we thought that that may be a bit too much to expect from the grantees.
Then the last two, subgroup analysis and statistical, we talked about these very briefly, felt that they are criteria that should be kept. But, again, may not—the feasibility of these elements may not be known. It may not be identifiable until initial testing has been done of the measures.
Then we just had some general comments related to documentation. Documentation, from the grantees, of all of the elements, all of the elements that we're keeping is certainly important. Our thinking was that higher scores should be given to the grantees, depending on the extent of documentation that they provide for each of the elements.
We noted that in particular, going back to the IT readiness, that would be—someone used the phrase bonus points—that would give additional credit to the grantees if they demonstrate the electronic readiness of the measures. And in particular, we noted if they are ready to enter into the Questionnaire Development System (QDS), that would be—that would certainly speak well for their readiness. But again, in general, the more documentation for all of these elements, the better. And I'll look to the other members of the group if I've missed anything.
Dr. Dougherty: Okay. These grantees have a lot of documentation to be done, which is good. I guess we'll have transparency. Now the inpatient group, I think—Patrick?
Dr. Romano: Yes, I'll take the lead on that. I don't want to repeat what others have said, but basically we pretty much endorsed the NQF criteria under feasibility. So we felt that all of these criteria were relevant to some extent, and we didn't really have any new criteria to add.
We did discuss issues related to data availability, and the fact that the medical record, whether it is paper or electronic, may not capture some aspects of care very well. So really this may need to be specified. We don't necessarily want to give up on measures that aren't easily available from the EHR, although that may be an overall goal.
Also, there is a need to consider both the direct and the indirect costs associated with collecting data for a measure, the effect that it may have on decreased efficiency, including bottlenecks within hospitals due to a limited number of provider staff who may have appropriate training. So there is a need to assess the overall economic impact of the measure and not simply say that because a data element is available in an EHR, that it is cheap.
We also talked a little about confidentiality and the fact that the confidentiality issues generally are resolvable in fairly standard ways, such as aggregate reporting or reporting whether a particular test or procedure was done rather than the results of that procedure for an individual patient.
I think we agreed that the EHIT specifications would have to be considered optional but desirable at this point. They may not apply to all types of inpatient measures.
With regard to the data sources for exclusions, we felt one potential issue for hospitals is the ability to link multiple data systems within the hospital—for example laboratory systems, radiology systems, other types of systems—and so hospitals may be encouraged to take advantage of those linkages to meet this criterion better.
We also had some discussion about the need for specific procedures for auditing. In general, grantees should propose some specific provisions for data auditing that can be implemented. Similarly, grantees should propose or describe specific pretesting that the measures has gone through, recognizing that new procedures may need to be established or existing procedures may need to be modified based on the result of that ongoing pretesting.
For subgroup analyses, I think we basically deferred to the disparities group because that's principally an issue related to disparities, but we did talk a little bit about the fact that children are very unevenly distributed across hospitals. And so there are some children's hospitals that take care of the great majority of kids with serious diseases. Some measures, like central line infections, for example, may not apply to the great majority of hospitals because most hospitals don't put central lines in kids.
So there just needs to be an effort to explain what types of providers and hospitals, in this case, an indicator would apply to and which ones it wouldn't. And that really ties in with the feasibility. I don't know if others from my group—
Dr. Gregory: Kim Gregory from Cedars-Sinai Health System. I think the only other thought about the indirect costs was that the cost to report the measure is not necessarily the same as the cost used to analyze the data to drive quality improvement (QI). And that inpatient hospitals needed to know that.
Dr. Dougherty: Okay. That's a good point. Well, I know you can't see everything. I think one common theme I saw was definitely this issue of—well, grantees providing us with good documentation and with specifications, including guidance for moving forward on how the data should be collected and audited and so forth.
So I think that's something we hadn't quite thought through all the way. And that's very helpful.
The other one is about the use of the EHR or other health IT capacity as a requirement. And since we've been—I can't tell you what's in the funding opportunity announcement (FOA), but there is a timing issue here. So, you know, we award the grants in late September, and we have them working on some high priority issues that will change over time perhaps because we know what some immediate high priority issues are. We may identify more as time goes on.
So thinking about what kind of guidance, I'm not sure exactly how long it takes to develop a measure. So on Day 1, would we have the grantees start assuming that they would, for example, be using the QDS for whatever measure even though I'm not sure whether the QDS has applications for medical home, for example, or for race and ethnicity. So—
Yes, Helen, I would like for you to address that.
Dr. Burstin: Helen Burstin, NQF. I was just going to make the point that in general, our experience—and the measure developers that are here can share theirs—but it is probably at least a year before you really get to a measure that you are comfortable with that is developed. So I think the key thing is, as you are developing the measure, to consider what data elements would need to be in your measure if you are thinking about a measure that will eventually live in the EHR.
So it's still a prospective process you can go through even if you are not at the end yet. I think that's still the right way to go especially thinking—and we talked about it in our group actually around the disparities group but also about how important it is, even if your measure itself is not a measure that can be built into an EHR, thinking about how it could be interoperable and connected with these other electronic data sources to get at some of those key demographics and things like that.
Dr. Dougherty: Yes, Mark?
Dr. Antman: Mark Antman from the AMA. I would just add from a measure developer's perspective, I know our staff, as they have been entering measures, the data elements for measures into the QDS, they have found that it is especially important to understand the relationship between different data elements in an individual measure because there must be absolute clarity in the QDS system as to what is dependent on what. So identifying the relationships and the interdependencies of the different data elements is critical.
Dr. Dougherty: Can you give an example for those us who don't do this?
Dr. Thompson: This is Jeff Thompson from Washington State. I can give you an example. Looking at the denominator in clients, one could get it from the client eligibility data source, which we call ACES [Washington State's welfare information system], which is done outside of the agency. Or you could look at your claims data and they don't necessarily correlate all of the time.
In other words, if you've got eligibility data sitting out in the community and what we call our community services offices (CSO) resources in another dataset, it doesn't necessarily—I think it's like 90, 95 percent correlated with eligibility that might be contained in a Medicaid data source with claims data. And that's a threat to validity and reliability.
Dr. Dougherty: Okay, so I think when we talk about what's doable now, it seems like we're talking about what States will be reporting to CMS in the very near future for their purposes in terms of the initial core measure set versus development or enhancement of measures in the core measure set that may be coming down the pike, where we have the opportunity to really have people think hard about how you collect this using electronic methods.
Is that—we didn't talk about it this morning, but maybe Erin can help us here—Erin Grace from AHRQ—about the health information exchange (HIE) because as people have said over and over again, you know, just because you have the electronic record or any record from a primary care office or a particular specialist, doctor, doesn't mean you have the full picture of the children's health and outcomes or the services they receive linked to the outcomes.
So is there any opportunity? What would grantees or the States do to make sure that they either work with their HIEs in their States or get the data that are now available?
Is there some sort of way that they—we could make the contact for a particular State or a particular awardee working on some topic and integrate them into the HIEs? So where is all that?
I mean to go beyond the individual physician, even though we have dentists and nurses and so forth, there are other places out there that are collecting data and giving care to kids. Schools, public health clinics for immunizations and for mental health care—is it reasonable to expect any linkages there? Helen?
Dr. Burstin: I think the term "health information exchanges" is too limited. That's implying data flowing through a given region.
I think the key thing for the grantees is to identify how they'll get at data aggregation, how they are going to look towards these other datasets and think prospectively. Even if you can't do it now, as you are building this, how do you make those linkages so you can pull in data from schools or other places to get at the bigger picture?
Dr. Dougherty: But—I agree, but to give the grantees sort of a place to go to see how to do it, how to aggregate the data and make the linkages, do we have any—is there any assistance that the existing HIEs in regions could provide? Have they collected any data from outside—some of them were supposed to be linking hospitals and emergency departments (EDs) and primary care docs and specialists. So can we send the awardees to someplace and say here, if you want to figure out how to do this linking and aggregation, here's a way for you to start?
Mr. Young: John Young, CMS. I think there has to be an acknowledgment of a significant investment that CMS has made in transformation grants, $150 million to about 22 States that looked at many of the issues you just talked about, linkages from those—and I hate to use this term but the nontraditional modes of information flow from public health, from vital statistics, from other entities within the States. So that is important. And I think the thing that we can do is to go back and look at sharing those, that information, the summary that we receive from those States that received that $150 million.
Ms. Dailey: And the other point I'd like to add, too, is a tremendous amount of work is just unfolding now by ONC, the Office of the National Coordinator. And so part of our role is, in terms of collaborating with them, being able to have one central location where States really can go for information.
There are so many different types of grants, there is so much work on health IT going on right now, that we really need to coordinate how that information is dispersed. So that's one of the things that we're trying to address, as well as we're developing a technical assistance plan.
Dr. Dougherty: Yes, I'm thinking—I mean just brainstorming here that one of the first meetings that we might have is to bring together the different HIEs and transformation grantees and others that have their own aggregated data or good data systems for school-based health so that people are aware.
I think it is an issue of awareness of what's out there, not only in terms of individual data points but of methods and mechanisms to make those linkages and aggregation.
Erin, did you want to add anything? No? Yes, Denni?
Ms. McColm: I'm Denni McColm from Citizens Memorial Healthcare. And I think it is important to not lose sight of the other thing that an HIE can do for quality measurement and public health reporting—they can act as not just the data aggregator, that's only one of their purposes, but they can be the conduit and the broker for a provider to have a one-to-many relationship. So we could report to public health through the exchange and we could also report for quality to Medicaid through the exchange.
And it provides us with an efficient way to do that, which sometimes get lost in the HIE realm. But it is the most efficient and effective way. It's the thing that pays and has a business case in the HIE world.
Dr. Dougherty: Thank you. That's a good point. So you'll tell them that this FOA is coming out? I mean they could actually apply. So can everybody here.
Okay. Well, anybody want to add anything else? Is there anybody on the phone? No.
Participant: Can we have a break now? It was supposed to be at 10.
Dr. Dougherty: Yes, we're a little bit behind. But take a 7-minute break, and then we will move into the next session, which would be taking all of these thoughts that we've all had and making sure we are child-specific about them. So a 7-minute break—a 5-minute break for seven minutes as people have said.
Break
Dr. Dougherty: Okay, this time it's a little different. Let's take a look at the child-specific issues, and let's see where they are in this. It's called Settings and Types of Care Beyond Traditional Health Care Delivery Settings, which is how we're categorizing children. But if you have other child issues, you can—the topics are a little bit different.
The first group, instead of being medical home is related to medical home but is a more specific topic, the newborn screening system and linkages to health care providers of all types. So that's going up a level. Assuming that somebody will be responsible for that—accountable for how that happens.
The next is what I'm calling non-doctor settings. So what specific issues need to be addressed, and you don't have to cover the new issues. But you may want to reemphasize issues that have come up in terms of underlying scientific soundness, validity of measures, and feasibility, specifically for children.
And health outcomes is in there. Disparities, the disparities group should go to Great Falls, the newborn screening group should go to Rock Creek. Everybody else stays in here.
Okay, so there is a little overlap between specific criteria issues and big topics. But you know what to do—make it up as you—no, just kidding.
Okay, so this group, Mary, you are the—you are meta—here's meta over here. And this group is—you are not moving—okay. So this is—okay.
Participant: Health outcomes is over here.
Dr. Dougherty: Health outcomes is at the middle table here. And the non-doc settings are where? Newborn screening? Yes, that would be—and if you're not listed as a member of the group, you can float wherever you would like to. We just wanted to make sure every group had at least some people who were not facilitators or recorders.
Lunch Break
Dr. Dougherty: Okay, we're having a working lunch and moving things up a bit. So, just a couple of announcements. If there are any people who want to make public comments, please sign up. I will be here at 3 o'clock to hear them. I'm not sure anybody else will, but they will go into the public record.
Okay, so right now we're kind of grabbing lunch and going to do reports back. And then at 12:15, go into our final breakout session on identifying racial and ethnic disparities in children's health care quality, come back in 45 minutes, have a report back, then attempt an overall synthesis and get your ending thoughts, which will probably be basically the same thing, you know, just a sort of coming together of what insights you've had during the course of this meeting, or what you wish we had done. Then next steps, I'll tell you about.
So sorry for all the changes to the agenda, but weather-related, work-related, and with a smaller group, I think folks are getting things done more quickly than we thought. But if you want to hang out here until 4, that's fine with us.
So who is ready? We'll start, okay. And your group was—sorry—
Dr. McIntyre: Mary McIntyre, Alabama, and the group is meta measures. And I'm trying to get closer because it's not picking up. We wanted to start with just a kind of overall comment, and it was about the name of meta measures.
And the suggestion is just maybe reconsider that name because all of us went through, I think, a similar thing about what does that mean. So that maybe instead, since we're talking about composite measures or groups of measures, why don't we just say composite and define what that means because more people would at least start out with some idea of what we're talking about because all of us went to meta analysis or something like that when we were looking at that. So that's just an overall comment about the name.
Then we went into the different areas, the child-specific criteria, and we said yes, every piece needs to be defined with documentation of established linkages across entities, why do these things fit together, and what is the conceptual framework.
Define each measure within the set or group to make sure that it is clear, and that the measures need to define specific ages, sex when applicable, dosing, developmental status, coding and recognize that coding may vary based on age, okay?
Group, you all better pipe up because some of the people in the group actually understood some of this better than I did.
Dr. Dougherty: So coding of what—needs—may vary—
Dr. McIntyre: Specifically, and I'm going to give you an example that was given that started out—and I think it's in here, but it may have gone when I had to reformat these cells, we talked about things such as depending on the age of a child, like when you're trying to capture asthma, that it may actually, vary as far as whether it is bronchiolitis or whether you specifically are talking about asthma—so that you're talking about the same condition, but it may have a different name, depending on the age of the child.
The next one was links across public programs. And then it has here Medicaid, CHIP, maternal and child health, mental health, we said that yes, this is very important, but we need to look at the movement as key to addressing multiple services.
Include information from schools, the issue needs to be addressed of how to match the data from these varying sources, such as whether they're going to use a master patient index (MPI), unique identifiers, look at the juvenile justice system and also other social entities as part of this. It is going to be very important to identify them.
Some measure sets may—and this is on the required documentation—require time to gather maternal and child health data, health department data, and also consider information that may not be just a child that you are trying to tie data from, that you may have to parental information tied to child information in order to have a complete measure set. So that needs to be identified as part of that.
Links across public and private payers, yes—and it is very important to address issues such as training in order to have a complete set and more accurate data. Examples would even include additional sources, such as vital statistics being able to address that.
Another issue will be the source to link to. A plan for interoperability between payers has to be identified and the ability to attract in and off as far as eligibility from one to the other to identify where they go. When they come off of one plan, where can you pick them up as far as in the other payers' information?
The next one was links between traditional medical and non-medical therapies, yes, key at schools and school-based centers. And then we have we'll have to have linkages between measure developers and other sources of data as part of field testing to make sure that the information that is obtained is valid.
Consider care coordination and the information we gave as examples of why you need to get information from what is not considered traditional sources, such as the ability to get information like missed school days. A lot of times, that's part of the calculations when you're looking at information but without doing that, you will not have that information available if you don't link.
The next one was accountability. Yes, but it needs to be defined as to what we are referring to.
Dr. Dougherty: Any suggestions there?
Dr. McIntyre: Well, we went into who is responsible for the—as an example of the confidential data from multiple sources. If there is a breach, then who do you look to as far as being responsible? So basically who is in charge, okay? And that's going to have to be defined if you're looking at data from a lot of different sources and putting it in one place.
We looked at composites and we were not—we said this is not really relevant because we were talking about composites. Okay? So delete it here.
And then all or none, and we also changed that to saying all or none needs to be deleted, and it would be more accurate to say scoring should be included, okay, so that you are actually identifying an ability to score the data instead of all or none.
Missing data will be more of an issue—and we talked about this—missing data will be more of an issue or conflict in data. So when you get data that actually are contradictory, then how do you resolve that? And it needs to be defined.
And the last one, disparities across all, and what we said was how would they look for them. There needs to be consistency in application because you could lose disparities in the composites or vice versa where you could identify issues by looking at each individual measure. So what is the expectation going into this?
And that was it. Group, did I miss anything?
Ms. McColm: We didn't talk about it specifically, but I got the sense from the group that it would be exciting to be able to do composite measures across different care settings. And that would be something that would be nice to be able to find a grantee that was capable of doing that and understood the challenges of doing that.
Dr. Dougherty: Great, okay, who would like to be next?
Dr. Thompson: I'll go next. This is—let's see, we had disparities. So we talked a lot about differing venues and differing provider types as they relate to special populations. So, for example, on children-specific criteria, we may need to be very specific in each State based on contracting rules and laws, such as age of consent and how you obtain some data.
And then attention needs to be given towards children with special health care needs because they may require additional documentation and thought. For example, developmentally disabled (DD) children or clients may have differing venues for health resources like dental services. Rather than regular dental offices, they might be in traveling clinics where they can do anesthesia.
Another one was documentation and consistency of who is the reporter of the client may need to be considered. For instance, is it a school, a parent, a public health social worker, and that will need to be consistent.
And then some attention will be needed for sampling criteria, especially as it relates to customer service. So just to beat up on California, will "Octomom" count as eight or one? But if you do have two kids in a family, do you count that as one as it relates to customer service data?
Links across public and private programs—or public programs, attention needs to be considered toward specific contracting rules because it can get quite complex in carve outs and carve ins across your managed care or your mental health, your medical, and so systems and laws will need to be taken into consideration for reporting numerator and definitions.
And then additional comments—attention needs to be given to State registries, so in each State you might have an asthma registry, a diabetes registry, an obstetrical registry. And then medical home may or may not require a registry. So we'll need some special considerations around these disparities in disease registries like Ed Wagner's model or the medical home.
Links across public and private payers, in Medicaid, you know, 5 percent of the population spends 50 percent of the cost. And that means that a lot of the population is crossing over between public and private programs. For example, a child that is on a private health plan like the Blues, if they are in a neonatal intensive care unit (NICU) for more than 23 days, they automatically become Medicaid eligible.
Or sometimes when a hemophiliac will hit their $1 million max, then they become Medicaid, and this happens anytime during an eligibility. So tracking those outcomes and reporting accuracy will be important.
Links between traditional and non-medical therapies—States will have differing rules on provider contracting and benefits that will be barriers to tracking and accountability. And then mandated benefits may be needed to be considered. Some States will have mandated benefits across private programs that may not be mandated in public programs or even vice versa.
On accountability, close attention in the measures to who is accountable. Is it the State agency? Is it the provider? Is it the health care service venue? Because it will really be related to basically who has the data, who is accountable for the data, and so there will be some attention there.
On composites, consideration of the role of other venues of care, like daycare and schools, as well as their resource needs will require consideration. This gets back maybe more to accountability.
And then on disparities across all, documenting the services and confounders, like things that are outside the health care services—mentoring, peer support, church-related services—may need to be given some consideration around variations in a State because not all disparities in health care are related to health care venue or provider type. They may be additional confounders that are outside in the community and may explain variations.
Dr. Dougherty: Okay. Anybody from the group want to add? No? Okay, thank you very much. Those are great specifics. Nobody will apply for these awards with all these requirements. Kim, you were—newborn screening, okay.
Dr. Gregory: Okay, Kim Gregory, and I'm representing the newborn screening group.
We felt that it was absolutely child-specific. With regard to links across public reporting programs, this is a criterion to keep. And I think that new criteria to add—or additional comments as it were—is that there needs to be documentation and confirmation that the information was handed off. And it needs to allow for possibility of an appropriate third party to access the information.
With regard to links across public and private payers, we were a little unclear about what this meant. So we interpreted it our own way and came up with two criteria that we felt might be relevant.
One is that—if this relates to churning people through the public program, you want to churn and track the clients as they go from public to private or private to public insurance. And then the other way we thought it might be useful is if you wanted to stratify and compare outcomes with the public system versus the private system.
And then links between traditional medical and non-medical therapies, we thought that this was a criterion to keep. And pointed out that in the current system, especially outside of hospitals, it's not clear that these providers will have access to an EMR, and yet these are the people most likely to be providing the interventions. So if you are tracking outcomes, there needs to be a mechanism to do that.
And then accountability, we agree that this is a criterion that should be kept. It needs to be specified who is accountable, and again, emphasizing the issue of handoff and who is account for the handoff.
And then composites, we weren't sure that this was pertinent, as well as the all or none. Although we did sort of address the fact that this is a screening test; there will be a few false-positives, and then there will be a real positive. And so in our mind, there should be no positives that are not followed up on in the all or none concept. But we weren't sure if that was exactly what you had in mind.
And then finally with regard to disparities, again we thought that hand-offs were critical. And are they getting the screenings—are the results of the screening getting back to the provider? And is there disparity in the types of providers that are not getting the information?
Dr. Dougherty: Okay, thank you very much. So can we move on to the non-doc specialties? Was that you? Okay, great.
Dr. Gonzalez: I'm Jose Gonzalez from Texas. We started out by trying to get some definition to the actual criteria of setting some types of care beyond traditional health care delivery settings. And we thought that it was important for these alternate places of service with different provider types, that they have experience with quality improvement (QI) processes to apply.
And then another goal for these entities was to be able to connect certain populations across separate delivery settings by some method. Although health information technology (health IT) was preferred, that is not essential because some of these places will be marginalized from participating if we require connectivity with systems like health IT for example.
Under child-specific criteria, we felt that this was a key criterion to keep, but that it needed to be child and adolescent centered as to the service setting itself.
Dr. Dougherty: Okay. Could you just clarify for me, when you say to apply, are you—
Dr. Gonzalez: I'm sorry, to submit a grant application.
Dr. Dougherty: Okay. So if some non-medical group wanted to apply and be the creator of quality measures, they would have to meet these criteria?
Dr. Gonzalez: Yes. They'd still be medical, they just wouldn't be physicians.
So anyway, to finish off on the child-specific criteria, again, we thought it was a key criterion to keep but that the actual setting needed to be child- and adolescent-centered.
Under links across public programs, we thought that this criterion would be very difficult to achieve, and that we should consider deleting it.
Links across public and private payers, we thought that this requirement, if it is kept, should be contingent on the type of delivery settings. Some delivery settings may not be able to satisfy this, but others may. So we thought that, in a sense, it is desirable but it shouldn't be required.
We felt very similarly about the next one, between traditional medical and non-medical therapies, in that it is a desirable criterion but should not be required because the lack of those links within the community may be due to the absence of the specific services in the community. So they may not be able to offer all of the ones that are listed there like nutritionists, occupational therapy/physical therapy, mental health, and so on.
Okay, going into accountability, we thought that this criterion was an absolute requirement to keep. As far as composites, we thought that this criterion was important, but it is actually included in the all or none criteria. And that needs to be a little better defined.
As far as disparities across the system, we thought that this was an essential criterion to keep.
And then we added an additional criterion that should be considered, which is the criterion that assesses the burden on the site. And we should try to mitigate as much as possible or as much as appropriate so that they can still be participants.
That's all I have. So if anybody in the group has other stuff that I missed—
Dr. Brown: Yes, I have one. Linda, you can correct me if I'm wrong, but I think one criterion we added was that we felt that these non-physician specialty settings should demonstrate some experience with QI.
Dr. Dudley: And if I could just add—this is Adams Dudley from the University of California, San Francisco (UCSF)—just a clarification about the links to payers—it wasn't that we meant it would be nice, but if it's not there, it's okay, which sort of lowers the impact of the criterion. Rather we meant there are times when it is relevant, but there are times clearly when it is not. For instance, if the population you are looking at is in juvenile detention, there is no billing going on there and there never will be. And so looking for links to payers would exclude—or requiring links to payers would exclude the development of measures and the determination of whether or not they were applicable in that population.
So where it is relevant, we thought it would be useful. But that criterion should be conditional upon its relevance. Is that clear?
Dr. Dougherty: Yes.
Dr. Juszczak: This is Linda Juszczak from the National Assembly of School-Based Health Care. I want to clarify a little bit on the QI criteria. I didn't understand that to be an applicant but to be a participant. So it may not be that these alternate providers would be the applicant. But they may show up as part of an applicant's grant. And there should be some familiarity with QI because this group, when we defined it, was huge. I mean it could be a lot of different things. Some that you would expect to be familiar and others might not be.
Dr. Dougherty: So a little more clarity on that at least for me who is thinking about this and you all will, too. So suppose you had an independent, you know, mental health therapist some place out in Shady Grove, which there are many. In order to participate in a measurement testing or development program, you would require that person to have knowledge or to have participated in a QI project? I'm not quite getting the requirement.
Dr. Juszczak: I felt, and we went back and forth on this, and Julie can do the other side of it, that the process of getting people to buy in to participating in these involves their understanding of what it is they are doing and why. And it certainly has been my experience it takes a long time to get that buy in.
So in the beginning when these grants are first going out, you want to have them participate in this and be more likely to succeed and be active participants, as opposed to spending an inordinate amount of time explaining the QI process, why it is important to do, how it should be used, and so on and so forth.
So that was where I sat. And then the alternate argument was—
Dr. Brown: I don't know that it was an alternate argument, I think it was—we talked about whether it was experience with QI or commitment to QI. And it was kind of, you know, around the margins of what that meant.
I think the concern was that people can sign on to short test the measures at my site—and I'll partner with you in developing and testing the measures—and not really understand the burden that they are taking on. And as a result, drop out part way through. Because you can talk about the commitment, you can talk about the time but if you've never walked the walk, you don't really know what you are signing on for.
Dr. Dudley: I think part of the issue may be—this is Adams Dudley from UCSF again—I think part of the issue may have been the statement of linking it to QI. So it is more demonstration. The underlying concept here is that the further out you reach from the traditional places where this is done, the more you need to make sure that they understand what they are getting into and are capable of achieving it.
Dr. Dougherty: Rich suggests we need a—some funds and attention to QI integration of these other settings. So Barbara, who is writing the report on what we need in the next piece of legislation and resources, can take that into consideration.
I mean it seems as if we always have this dilemma in QI, right? Should you go for the people who are absolutely ready, can start on day 1? Or should you go for the areas that are high opportunity areas where you are going to need to do more work, you know, and it is going to take more time. It is a dilemma. Nora, you wanted to—
Ms. Wells: I just wanted to add that one aspect of this conversation was an overview point—where were we coming from in terms of are we looking at the outcome for the child? Are we trying to look at what happens to a child through multiple settings? Or are we looking at what's happening in various settings, and if the child is coming back and forth, how do we connect them?
So I know that is obviously an underlying reason why this one is even here, but when we got to the links across public programs, I was very concerned that we want to be sure that we are keeping track of that child who is moving from all these different programs.
Dr. Dougherty: That's certainly what we don't have yet. If we don't start now, we will never have it. Okay, I think the health outcomes group is the last one.
Ms. Reuland: Okay, so to keep us grounded, Rita did a good job for us of first defining what we were thinking about when we heard nontraditional settings, and so we anchored ourselves to settings like school-based health centers, telemedicine, urgent care, retail clinics, private clinics that are set up just to do early and periodic screening, diagnosis, and treatment (EPSDT) visits.
So our measure topic was health outcomes. In terms of child-specific criteria, of course we would want that to be a focus. We thought it was particularly important in nontraditional settings where it is not going to be one size fits all, where they are actually probably going to be most likely to see more than just children, and that they are not going to have established, set ways to handle a child-specific measure. And so that's going to be part of the development work—how to actually take all of the information that they have or all the measure testing that they are going to do and separate it out for just children.
For the rest of the criteria, we thought that they were all similar to the last group. Important, and it would be great if the grantees could think about them, but not required in terms of if they don't have something about this in their application, they shouldn't be funded, or that all the Medicaid grantees have to do this. So in terms of links across—
Dr. Dougherty: I'm sorry, could you clarify? They have to do what?
Ms. Reuland: So for the rest of the criteria, we thought that they are all really important. That if the group can consider it, they get more bonus points. And that they look better. But do you require them to link across public programs, or do you require them to link to traditional medical or non-medical therapies? No, particularly when you are talking about health outcomes.
So, for example, if you are measuring a health outcome in a school-based health center, do you require them to share that information with Medicaid and share that information with private payers?
So the next line was link across public programs. And then we thought this was actually one of the biggest challenges and also one of the biggest opportunities for meaningful collection and use of health outcomes data, and that it should be a key focus for future research and information.
We talked a lot about how it was dependent whether the State reimburses or not. So, for example, in school-based health centers, does the State reimburse the school-based health center? And that if there was a way, that in—if applicants in nontraditional settings could try to think about how they could collect data and create measures that are specified by the different units for their patients—or their group of children that they see in Medicaid, for example, or in private, that that would be something really valuable for them to consider and try to build into their processes.
And we interpreted links across public/private to be—well, I mean it's the same issues as link across public programs. For the links between traditional medical and non-medical therapies and settings, again, we would rate grants higher that try and accomplish this goal and recognize that particularly for outcomes data, nontraditional settings might be the place where those outcomes data exist.
For accountability for health outcomes, this is where we found it most problematic, but yet we felt that it was important to try to develop a shared—a process by which there was shared accountability across the groups because there was also a fear that if you say well, no, that one group shouldn't be accountable, that everyone is going to say they're not accountable. And then no one is accountable for the health outcomes of the children.
Obviously for some health outcomes—central line infection—it would be easy to assign them. But when we were trying to think of nontraditional settings, it was hard to think of what health outcomes could be easily assigned to a nontraditional setting.
Missed days of school? But should missed days of schools be accountable to a—
Participant: Shared.
Ms. Reuland: Shared, yes. In terms of composites, we thought for health outcomes, it actually might be helpful to come up with a parsimonious balance measure that is based on various components of health outcomes and particularly when we're talking about nontraditional health settings, it may help us to create a composite measure that has a similar end statement, given that they might be collecting the data in a different way.
So, for example, obesity, maybe they don't collect body mass index (BMI) , calculate the BMI in the same way, but they may have a skin fold. And they may have level of activity, and that would help us get to the same conclusion that maybe a provider in a pediatric office got to.
For an all or none measure, again, not required. But similar to yesterday, that if they use an all or none measure, that we think, obviously, an all or none measure can speak volumes in terms of the signal it can send. But the individual parts—components of it are just as important.
In terms of disparities across all settings, we thought it was important to require of all the nontraditional health care settings to try and collect the race, ethnicity, SES, and special health care needs measures.
And we noted here, not—as was mentioned yesterday—that you need to have specifications by data source for those three components, but you have another wrinkle here where you not only have to have by data source but by setting because the three data sources we mentioned yesterday may not exist—we said claims, as I recall, and correct me if I'm wrong, we said claims, electronic health record (EHR), and parent survey or survey, that you would clarify how you would collect that data, SES, special health care needs, race, in those three settings for those three domains.
In a nontraditional setting, they may not have those three data sources. So you need to think about what data do they have that you could be able to derive those three important disparity groups.
Dr. Dougherty: But you would still want to make it consistent with the other settings?
Ms. Reuland: The end goal consistency, yes. Did I miss anything group? No? Okay.
Dr. Dougherty: So just thinking through this, are there any questions for any of the groups or any child-specific issues that have not come up yet—other than disparities, which we are going to in the next group? Yes? Cathy?
Ms. Hess: Cathy Hess, National Academy for State Health Policy. Our continuing to use the word child just made me—reminded me—I'm assuming, because I know this was prominent in the work of the SNAC, that they were talking about trying to have measures for different age groups, particularly adolescents as well as younger kids.
Dr. Dougherty: Yes. So, if you think about Medicaid, you've got the EPSDT population, which is 0 to 20, so—and the law didn't specify, but it certainly said healthy birth. So that's prenatal. Infants, all age groups. So okay.
Actually, we're down to four breakout groups here, and they are all in this room. We're going to be talking about disparities. And I know this is a very tough one, and that we have Institute of Medicine (IOM) reports and such. But if you could just think about very specifically within the child group, how you would guide measure developers, measure users, on collecting data on race and ethnicity and then combining those data.
So there may not be much to add, but we didn't think there would be much to add about child-specific either, and you all managed to come up with some great stuff that hadn't been thought about before.
So let's see, the groups are medical home, which Jose Gonzalez is now the facilitator, with Kerri, and that can be on your right over there, you can go to that table.
Inpatient, to Beth McGlynn's table. She'll be the facilitator for this one. And Mark, is the reporter. So they are in the back. The outcomes, health outcomes group would be Helen, and she's already at this table. And the meta issues, which we are now calling composite measures and all-or-none measures or measure sets, should go to the left table over here. We don't have any facilitator or reporter on hand for that one. So you can choose your own. I think this table has had the meta—you've all had the meta issues, haven't you, this group of people? So welcome back. It will make it a little bit easier.
And we are asking you to report back at 1:00 p.m.
February 25, 2010 - Afternoon Session
Dr. Dougherty: Okay, I think we can get you all out of here by 2 pm. So who is in the room? Jose Gonzalez, do you want to report out?
Dr. Gonzalez: Oh, no, I have a great reporter.
Dr. Dougherty: A great reporter, okay. Last but not least, you know there are some States that have less than 50 percent of their population, sometimes pretty much less, are truly non-Hispanic whites. So I think there are seven States with that situation. So we're an increasingly diverse population, and some are more advantaged than others. But we do want to keep track of racial and ethnic disparities says Congress, not just me.
So with that in mind, I think Jose—are you ready? Oh, Kerri, sorry.
Ms. Fei: So we had medical home. I just have some general comments about this. Overall there needs to be a better defined, standardized process for data collection. And it shouldn't be limited to race and ethnicity. It needs to include socioeconomic status and the children with special health care needs. Then we said, for example, using the Office of Management and Budget (OMB) for race and ethnicity, poverty levels for socioeconomic status. And then additionally, specifically to the medical home, it may actually be a little bit easier to stratify each measure of disparities within a medical home.
Dr. Dougherty: Why? Sorry.
Ms. Fei: Well, if you think about it, depending on the definition or the concept of medical home, it is going to be a more discrete, more—I don't want to say confined—but defined population of folks that are assigned to the medical home. So it might not be as fluid as a regular general practice or something along those lines.
Okay. So under the next one, which is stratification of vulnerable populations, we had this as a criterion to keep. But we left it as desired but not required. And then we had discussion around, again, like we said before, that the criteria would need to be better defined, what is meant by vulnerable, how is that vulnerable defined?
Are we talking about—is it different from what we've already defined? So the race, ethnicity, and socioeconomic status in children with special health care needs, and then we had a little bit of discussion about considering language proficiency, which would then lead into health literacy where there are some issues as well.
Stratification by racial groups, again, a standardized approach with preference to the OMB categories. And we kept that as a criterion—had that as a criterion to keep. For the next three, which were validity of underlying scientific soundness, validity of measure properties, and feasibility, we have these as criteria to delete because they were covered in other discussions. We didn't feel they were necessary to rehash them here.
And then under settings and types of care beyond traditional health care settings, we also had this as a criterion to delete, and we felt that it should be more of an overarching idea that may be covered in a proposal but setting of care need not be specific—may not be a specific criterion. And that was it.
Dr. Dougherty: Okay, thank you. Next? Who wants to go next? Inpatient? Who is inpatient? Okay, Mark?
Dr. Antman: So, for the inpatient setting, first of all, we began the discussion by acknowledging that inpatient could be defined in different ways. So for our discussion, we limited inpatient to acute care only. And I don't know if that was the intent in identifying inpatient as a category. But that was our decision.
With regard to stratification by vulnerable populations, I think as Kerri just said for the other group, we noted that vulnerable needs to be well defined.
And we talked about three different categories—at least three categories. Are they medically vulnerable? Financially vulnerable? Socially vulnerable? The thinking being that whatever the vulnerable categories that are defined, they should be defined based on national standards. So in other words, not selected—not categories selected by the individual grantees.
And in general, we noted that in the inpatient setting, gathering this information may be challenging because—well, not that gathering information will be challenging, but there will be a problem of a small number in the inpatient setting for any of the individual subpopulations.
And the grantees should be required to note if they are going to stratify by the vulnerable populations how they are going to do so. And we also noted that this, in fact, may not be relevant to each and every measure.
As to stratification by racial groups, again, as I think that Kerri reported for the previous group, there should be a national standard for how the groups are identified. I think you said the OMB categories. We talked about the definitions of the Institute of Medicine (IOM) and the Census categories. Whatever the choice is, it has to be some national standard so there is standardization there.
Underlying scientific soundness, we interpreted this to mean—we interpreted this as a questioning whether or not there is evidence for disparities across these different groups. And certainly there is. But that being the case, it seems that perhaps gathering the information may be more appropriate for some measures than for others. So, for example, measures obviously relevant to diseases such as sickle cell or Tay-Sachs or other conditions that are limited to ethnic populations or racial populations, obviously they are going to be more relevant to those groups.
Dr. Dougherty: Right. So as Ernest was saying, you know, do we make sure that we have a measure for a particularly vulnerable population as opposed to taking standard measures and cutting them by race and ethnicity. Is that what you meant yesterday?
Dr. Moy: Yes.
Dr. Antman: Okay. So as to measure properties, which we took as a reference to validity, reliability, feasibility, all the properties that we've talked about in the last day-and-a-half, I guess the question is, is the measure or would the measures under development by the grantees be equally—would they have those same properties for each of the ethnic or racial populations defined? That reliability, or variability, or feasibility may vary across the different groups. And that being the case, the grantees would hopefully document how well any one of these measures apply to each of these groups. And if they don't apply, say so.
And in particular, we noted in the inpatient setting, collecting the data—again, with regard to the feasibility of data collection, collecting these racial and ethnic data may be more challenging. On the other hand, we also talked about the fact that this may be motivation or this may spur some changes in processes in the inpatient setting or changes in documentation to be sure that in the admission process or somewhere in inpatient documentation, they provide the means to record that information. So that may be some—that may be a benefit of adding that.
For feasibility, we noted that, again, in the inpatient setting, we noted that re-admissions may raise some feasibility issues, particularly because kids may be discharged from one facility and readmitted to another. So that would be a challenge.
And then lastly with regard to settings and types of care beyond the traditional health care delivery settings, possibly not relevant to this topic in that we were talking about inpatient. But we also talked about the fact that the care that kids receive in these other settings may influence whether or not they are, in fact, admitted or re-admitted. So there is a relationship between—and that also ties to disparities, of course. So might the disparities that we see across these groups affect their access to admissions or re-admissions? And I think that covers it. I'll defer to any of the group if I've missed something.
Dr. Dougherty: Anybody else want to add something? Subtract something? No? Okay, and next we have Glenn.
Dr. Takata: Okay. We talked about health outcomes, and our discussion sort of revolved around the same issues as the other groups, but I'll go ahead and report.
We felt that stratification by vulnerable populations should be kept as a criterion with the knowledge that there may be do-ability issues. But, indeed, a need to define the vulnerable populations in detail, as has already been said. Special health care needs children, the homeless, other groups such as infants born in a low-quality hospital—someone brought up the issue that if you start out in a low-quality hospital, it can have impact on future health outcomes. The need for standard definitions, for example, from OMB, as has already been mentioned.
Also, if it is a known disparity, that it really should be included in the analysis. And if it is a known disparity, as has already been said, the measure could apply just to that population.
Our discussion around racial groups was quite similar in the need for a detailed definition and standard definitions. And also, was the intent to include ethnicity as well as racial group in this question?
With regard to underlying scientific soundness, since we were talking about outcomes, we wanted to have outcomes measured that are linked to a process that would lead to improvement. So is it a modifiable outcome?
In terms of measure properties, the same discussion about validity and reliability. And we would expect that there would be some field testing done so that the report would include information on those performance aspects of the measure because it may not be available up front but certainly should be by the end of the project.
In terms of feasibility, yes, that was also a criterion to keep. The data sources should be detailed, including information on the vulnerable populations' race and ethnicity. In a parent or guardian or patient survey or questionnaire, how will translation be provided and literacy issues be addressed? How will the proposal deal with combined other race or no response categories, and particularly if those categories are large, provide information on those specific groups if it is known up front?
The comments were that it may be difficult to pick up children in the combined category, and again, that the other, or combined, or no response categories that you are trying to measure may be large.
In terms of settings beyond traditional health care delivery settings, we thought, as I believe the previous group said, that that should not be an absolute criterion. So we would delete it. It would be nice if we could study differences in outcome in different settings, but we felt the feasibility might be difficult. So to require that as a criterion might not be a good idea. Does the rest of the group—
Dr. Dougherty: Anything from the rest of the group? No, okay. Thanks very much. Just a point of information that the Medical Expenditure Panel Survey (MEPS) and the Healthcare Cost and Utilization Project (HCUP), which are two big data sources, do pick up this other or unknown or multiple races information. And when you look at the quality measures, they typically have the worst quality reporting. So who knows what that means. So, okay, Rita?
Dr. Mangione-Smith: Okay. So our group went fiercely rogue in this round. We did not stick to the table at all. I'm just warning you.
Okay. So we'll start with what we finished with, which was at the very end, Ernest did make some points about meta measures and disparities that again do not go along with the table, but I think are good points. And so we'll make those first.
In terms of disparities and meta measures, grantees should discuss how they plan to measure disparities if they have a composite measure or a measurement set. Will they use OMB criteria? IOM criteria? Or some other locally defined criteria?
And we felt it is okay to use locally defined criteria for groups that might be of special interest in your local area for quality measurement, but those groups needed to be able to be rolled up into one of the bigger national criteria sets.
An example that was given was there was a strong interest in the Hmong population in San Francisco for a certain area of quality measurement. And they could look at that group specifically for improvement reasons. But then that group could be rolled up into the Asian category for one of the bigger nationally defined ways of looking at disparities.
A grantee should also give a rationale for what subgroups they plan to look at and why. Rather than look at—so disparities, this was an interesting point. So should we be developing our quality measures and then doing stratified analyses to look at different groups where there may be disparities? Or should we be developing measures of disparities themselves? And the example that was given was self-report measures such as, do you think you got worse care because of your race or ethnicity? That was just more of a question than a criterion.
And then the other type of measure that was mentioned was we tend to look at disparities of one group versus another referent group. And apparently there are now measures available where you can summarize all differences across all groups and get a score. And that might be a way that some people may—Ernest, I don't know if you want to say anything more about that because you brought that up.
Dr. Moy: Yes, I think it was as much something to think about as it was a specific recommendation or criteria. But there are different scoring techniques to try to summarize disparities across multiple groups within a population. And they might have some interest when you are comparing different States or different geographic regions.
Dr. Mangione-Smith: So back to the beginning of our conversation, which I think was really very interesting. These were more kind of overarching thoughts about the last couple of days.
Alright—no one who applies for one of these grants will meet all of the criteria we've said they need to meet. And the criteria that will be important will really be quite dependent on the type of measure they are proposing to develop.
Some criteria will only apply to some measures. And we felt that it was important that the criteria not just be met by individual grantee, but that the grant portfolio needs to meet all of the criteria that have been laid out today.
Dr. Dougherty: So going back to nobody would be able to meet all the criteria, but you would see that the grant portfolio might—
Dr. Mangione-Smith: The composite of the grants will meet the criteria. There we go. It's a meta measure. See, we were on topic.
Excellent, okay. Then we got into the whole health information technology (health IT) thing. That was a very interesting conversation. Our group was a little bit—I don't want to say we're split. I think in some ways we were actually all on the same page, but we were saying it in different ways. I expressed some concern that the health IT—potential health IT requirement might limit the types of measures that can be developed if you require that the measure be ready to be deployed with health IT today.
We think it is very important moving forward as these people develop measures that health IT be very much in the front of their thinking in terms of how the measure might later be implemented when we have better health IT and better electronic health records (EHRs), and it would help to inform the enhancement of EHRs to do this kind of measurement.
One thing that came up towards the end, but I think it is a really good example of this, is the whole idea of patient self-reported measures or parent-reported measures. And with any current EHR in use today, these don't exist. But in pediatrics, it can be a really important part of measurement. So we wouldn't want to not have grantees put forward those types of measures because they can't be measured with an EHR.
I think that was about—does that summarize it? Okay, that's it.
Dr. Scholle: Actually one more thing. If the focus is on health IT measures, the approach to testing the measures could be really different. I mean if you want a single—start off by saying okay, here are some measures. Could they be in an EHR environment, then it is not likely to look like a field test. It would look more like a case study about feasibility in different systems. And what would be the workflow implications and the implication for the technology?
Dr. Dougherty: Okay, thanks. That's a good point. Okay, well Barbara and I have a couple of things to say. And you may have some things to say to us, which we would like to hear.
So I think this has been beyond our expectations. You know we certainly have more work to do. And you've given us—we'd like to hear more ideas on what next steps we should take.
I think CHIPRA has given us this enormous opportunity to create a science of quality measurement basically, and though that was not what it was intended to do only, it is creating the science—and this is the diverse group here—creating a science that is actually useful and was planned to be useful from the beginning.
So we're not just asking grantees to follow all these criteria because we like to dictate what they need to do. It is really because these are the criteria that are going to be useful—this is the information that is going to be useful to the States, to providers, to the Centers for Medicare & Medicaid Services (CMS), so they can understand where the measures are coming from, what their advantages, disadvantages, and limitations are, and how they can actually be used.
I think this is one of the most exciting opportunities in my career of working on child health and health care quality, which goes back to 1986. So let me just say a couple of things about what our plans are after this meeting.
First of all, we have a transcriptionist who is taking down notes, and we will get that transcription of this meeting—at least the plenary sessions early next week. And our editor here—our Web site editor at AHRQ has promised to turn it around very quickly. So it will be posted by about a week from Tuesday, something like that, about 10 business days.
So the next step is releasing the funding opportunity announcement for these awards. And by posting the transcription quickly, we can—we won't be able to put all these criteria into the the funding opportunity announcement (FOA), but the transcript will be available for whoever wants to apply to take a look at to see what the thinking of this group was.
Now this—the thinking of this group, there are some discrepancies among different folks and some issues that still need to be worked out. So AHRQ and CMS will be working together with our other Federal partners, the Federal Quality Workgroup, two members of the workgroup are over there—working out how we can turn this into a guidance document basically for awardees. Both the CMS demo awardees and the Pediatric Quality Measurement Program awardees.
So as is clear, we have made enormous progress here today. But there are still some issues, the details of which will need to be worked out and will probably not be worked out by us before these awards are made. So we will be counting on the awardees to help us come to where we need more consensus around what the guidance should be. We will be working with them just as we've worked with all of you. It is a quality improvement (QI) cycle to be sure, which will take quite some time.
The other thing is when I announced in the beginning, we'll be sending you emails for an evaluation of some comments on what we could have done better at this meeting, what you liked about it, because we will be doing more quality measures work, and to get more thoughts that you may have about—why didn't I say that when I was there, you know, or you go back to your office and you say well here is an important issue that we didn't, you know, that I have to deal with right now that we didn't deal with at this meeting. But that we should know about it. So I will be doing that.
And I want to thank you—yes, Patrick?
Dr. Romano: I just wanted to make one other sort of general comment which is that I think we had a very robust discussion about some of the issues that are particular to child health measures and CHIPRA. But, again, from my perspective anyway as a general measurement person and as a primary care physician who works with both adults and kids, I think it is important to keep in mind that the commonalities are just as important or more important than the things that may be unique to this population. So again, I would encourage people to look back at the work that has already been done with the National Quality Forum (NQF) guidelines and so forth and to avoid, you know, creating additional criteria where those criteria have already been well specified by NQF or by other organizations that are active in the field.
So it just strikes me that a lot of the issues that we've discussed, some are particular to kids, and those need to be carefully dealt with. But others are just general issues of measurement.
Dr. Dougherty: Okay. So, Barbara, before you make your closing comments, should we get idea like that from the rest of the group?
Ms. Dailey: Sure.
Dr. Dougherty: I think that's an excellent point and where we started out. Certainly we don't want to reinvent the wheel just for kids because then that's always a difficulty because people will continue to say well, kids are too different and too unique, and we'll deal with them later or that kind of thing.
So, are there other thoughts about next steps that we could take? We're posting the transcript. We'll get ourselves together. We will be working with whoever the grantees are and the CMS grantees and the States, continuing to work.
Are there other things we should be doing? Should we have—or should we be asking for resources for specific enhancements to this criteria specification goal? I mean I assume one of the reasons why we don't have a big guidebook of how to specify a measure, all the different kinds of measures, is because there have never been any resources for that kind of thing.
And, okay, Jeff, I think you were first, then Colleen, and then Nora.
Dr. Thompson: So this is Jeff Thompson. Just one, be kind to us, for the States. That's all I ask. But then the other one, we keep sort of skirting around the issue of length of eligibility in a plan and how that will be treated. When will you have a final—whether it is 1 month, 6 months, 12 months? When will you decide that for either all or any of the measures?
Dr. Dougherty: I would expect that would be one of the charges of—correct me if I'm wrong—of whoever these grantees are—awardees to figure out what is the best way to do that.
And I'm not sure it is going to be, you know, whether it is 6 months versus 12 months as maybe some kind of algorithm to include all kids but to adjust for the necessary duration of enrollment. You are going well beyond my level of expertise on how to actually do scoring and weighting. But I'm assuming that could happen.
Ms. Dailey: And more specific to the initial core measure set that is out for public comment, that's one of the areas of information that we were looking for comments on. Under the CHIPRA law, we're required to provide procedures and approaches that we recommend to States to do voluntary reporting. And that's due by February 1st.
Our initial thoughts—
Dr. Dougherty: March 1st.
Ms. Dailey: March 1st, excuse me. Our initial thoughts were to put out some recommendations, maybe even do some brainstorming with people or with States and/or small groups that have had experience with the ground measures to find out what the specific challenges are before we release the final set of recommended procedures and approaches.
Dr. Thompson: Well, and this is Jeff Thompson again—I'll put in my two cents, you know, in writing. But, you know, continuous eligibility of a sustained time for an effect would be something that I would like to see, so then it is not confusing for, you know—I didn't have him long enough to do that effect.
And so the churn and medicate is actually, depending upon the State, 6 months or 12 months of required eligibility tracking can be pretty darn high. So I think it is something—you know, I would like to see a lot of due diligence and a lot of sort of background about if you are going to say 1 month, I'd kind of like to see what the rationale is.
Dr. Dougherty: Okay. Let me ask Sarah, because I know you've—not all of the National Committee for Quality Assurance (NCQA) measures but many of them have different continuous enrollment criteria. So has NCQA ever tested that actual part of the measure? Like what would happen if we—
Dr. Scholle: Yes. So we've tested it, and in the set of measures that we're using, the confidence of well care measures that we just tested, we do have data on—we're comparing 6 to 12 months enrollment period to at least 12 months enrollment period.
And then there is also some information in the literature about this as well. So, in fact, we haven't analyzed the data yet from our field test, but I think we can help to inform that question from our field tests and from some of our prior work.
Dr. Dougherty: Okay. Yes, there was Colleen, and then Nora, and then Kim. And then was there somebody—Beth, were you—did you have your hand up?
Dr. McGlynn: Yes.
Dr. Dougherty: Okay.
Ms. Reuland: I may have misinterpreted your call for resources, but I know just in terms of parents' surveys, we've had a really hard time trying to get funding to develop and validate our surveys for non-English-speaking populations and for the different racial/ethnic groups.
So educating the quality measurement world that it is not just translation but is it culturally sensitive and appropriate, and are we really getting at the same concepts would be great because it has been an uphill battle to try to get funding for it.
So when you have the requirement that the measures be sensitive to it, as a measure developer, we're not finding a lot of funders who are willing to support that work, really important needed work.
Dr. Dougherty: Thanks. Nora?
Ms. Wells: Well, I know this is a comment I keep repeating, but I'm just wondering in these grant—these RFAs [Requests for Applications] that are going out, what the requirement is for involvement of consumers in the development stage. But also a plan for how there is going to be the involvement of consumers at every other stage of the use of this—of these measures? So I am talking about really thoughtful consideration of the kind of education that will need to happen and the way that these measures might be used in partnership with communities to improve care.
So I don't know whether that's already in there. And if it's not, it is a cooperative agreement. And maybe it can sneak in.
Dr. Dougherty: Okay, thank you. Kim, I think you were next. No? Okay. Beth?
Dr. McGlynn: I may have misheard you, so please correct me if I'm wrong, but you said something that sort of raised a red flag on for me, which is that you are creating a science of quality measurement.
I just would argue that one already exists, and I think the point of the CHIPRA effort is that it is no longer acceptable to ignore the fact that there is a science of quality measurement.
And in terms of the kinds of resources that I think would be useful, it struck me that one of the unique opportunities here is that—at least my read of kind of the initial—not that there is the potential for different kinds of grantees. Measure developers who often get accused of spending too much time in petri dishes, people who are actually running programs who may not have a lot of patience for sort of the science piece behind the measurement, and then the people to whom these measures are being applied who mostly feel overburdened as it is. They often feel that the importance of doing the work is not so clear to them.
So it seems to me that one of the resources you might want to ask for is some sort of, you know, cross cultural education amongst these groups—some way of bringing everybody into some alignment with the concerns and perspectives of each of these groups because they all kind of need to be able to walk together to really take advantage of this, what I think is a pretty unique opportunity.
They all have important perspectives. But I think often each group thinks their thing is the most important. And they don't really have ways of understanding the perspectives of the others.
It looks to me like you could end up with a group of grantees in this set who don't really have to talk to the other kinds of grantees. And so if there is kind of a way to create that, I think that would be terrific.
The second thing is, the other red flag that I heard, having been on the bad end of this more than once, is we can't figure it out, so we're going to throw it out there, and we'll figure it out as we go along. And I have to tell you that it is almost never the case that a grant has enough flexibility for the grantee to allow the grantor to come up with new things in the process of doing the work.
So I guess I would urge you to be pretty clear and come to consensus on the absolute critical must haves. And then think about what the sort of learning organization approach would be to learning things where there may be less clarity but not so much with the idea that you're going to give people different driving directions as they are trying to do this work. But more that you use it to inform the next round of grants. And that it is really worth being clear before you release the RFAs about what the must haves are.
I feel like kind of to a point that Rita made earlier that there is a pretty—I was imagining grant applications that would have to be 175 pages long in order to satisfy the criteria and actually also say what you were planning to do.
So I do think that being really kind of clear and focused about the must haves and then having what I suspect will be a much longer list of nice to have or, you know, might be something that would discriminate one grant over another if the grantee was able to sort of get into the nice-to-have area, but to kind of be careful not to just sink the whole enterprise by trying to be overly inclusive.
So that's my guidance de jour.
Dr. Dougherty: Yes, we'll go back—thank you—we'll go back to grounded, intermediate, and aspirational as we did with the—okay, who was next? Cathy Hess? Or did you want to say something exactly on that point? Barbara?
Ms. Dailey: Oh, the only other point I was going to make related to that is we actually just went through some of that experience with the quality demonstration process that we had for the quality grants. With $100 million being dedicated to looking at four specific categories that CHIPRA outlined, it also did give us an option to do an innovation category.
So we went through some of that exact area that you were talking about in terms of what are things that we explicitly do want to recommend that people look at versus what would be nice to have and where you may get bonus points if you consider this.
And an example—Cindy Mann just joined us in June, and she was pretty clear that when we were looking at the demonstrations with the initial core measure set, were we going to look at all of them? Or just some of them? And she was explicit that there is a reason that the initial core set was released. And so we required the grantees to be able to look at all of them.
The other area in terms of the innovativeness, there was a section on provider delivery models, but it wasn't specific to medical home. So we made sure we put into the solicitation that you may want to consider looking at medical home specifically. And additionally we mentioned early and periodic screening, diagnosis, and treatment (EPSDT) specifically because that wasn't highlighted as a specific area. But that is an area we know needs improvement. And so I appreciate that comment very much. And we're taking some of our lessons learned through the quality demonstrations, and we will be working with AHRQ in that respect.
Dr. Dougherty: Yes. And, Beth, these—we already have the notification up, as you know, that these will be cooperative agreements. And so there will be different kinds of grantees.
Now so that means we expect a lot of interaction and learning during the process. Some grantees or awardees, contractors, whatever, don't always appreciate that. They say, you know, give me my money for my grant and go away, and, you know, don't change anything over time. But we're hoping to work with the group to have a continuous learning and improvement process. And there will be some quick turnarounds and some designated priorities because that's what the law says we need to do.
But that will—the priorities will change over time, and the criteria will change over time. We just need to figure out how we herd all the cats and make that all transparent. So we're thinking that way. Any suggestions you can give—so we're not expecting that we're going to put this in place, get all the information back, and then have it—well, we don't even know if we're going to have a next round of grantees. So we may not get—we need to do it right the first time. Even though the legislation calls for annual reporting, it doesn't—or annual modifications to the improved core set, it doesn't actually—we don't have any resources to do that.
So anyway, who was next? Cathy Hess?
Ms. Hess: Just to get back to the discussion a little bit on enrollment of duration and whether that should be made consistent in Medicaid with National Committee for Quality Assurance (NCQA) and the Healthcare Effectiveness Data and Information Set (HEDIS) and other measures, just a comment.
While I understand the rationale, I think we have to remember that what gets measured is what gets done. And I think that the reason that there is reporting—or one of the reasons there is reporting—on kids who have been enrolled for any amount of time is because public programs do have an affirmative responsibility for making sure that kids are getting services, even if they are only in for a short amount time.
In fact, those kids that are churning may be some of the most vulnerable kids, and you really do need to make sure that you pay attention to them when they're in.
I don't know what the answer is because I do understand the importance of being able to try and move toward comparing apples and apples. And it may be that the Centers for Medicare & Medicaid Services (CMS) can require in the annual reporting, at least for CHIP—Medicaid doesn't, I guess, have a comparable—a shorter period. I know there's either what is done now, which is for any amount of time, or I know there is discussion about maybe going to like 3 months. Maybe we have both. Maybe we have 11 months, and we have 3 months.
But just to throw out—we can't forget the policy issues while we're focused on measurement issues here.
Dr. Dougherty: Okay. Anybody else? Specific next steps we should take? Helen?
Ms. Haskell: Well, I have a comment. It's far from specific but—this is Helen Haskell from Mothers Against Medical Error—and I've been sort of trying to put it all together over the past couple of days what's going on here. And one of the things that troubles me is that these are all process measures. To patients, quality is something that actually affects, you know, what care they get and, more specifically, how that care affects their outcome.
My concern is that we need to somehow have a method of assessing the intervention that is being measured. And maybe that's not the place of these data. But if not, I'm not sure where that's being collected in a way that could be related back to these interventions.
So that's just my comment. That I would like to see these connected in a very concrete way to health outcomes if that's possible.
Dr. Dougherty: Could you give an example of—
Ms. Haskell: Well, there are several examples I've been looking through. But, for example, the attention-deficit/hyperactivity disorder (ADHD)—follow up on ADHD drugs, which is not addressing any of the real questions about the ADHD drugs. The fact that people are measuring body mass index (BMI) without knowing what the BMIs are and that sort of thing.
Dr. Dougherty: Yes, we definitely realize those are limitations that need more work. Okay, Barbara, I'm going to turn it over to my CMS colleagues here for their parting thoughts.
Ms. Dailey: Thank you. First of all, on behalf of CMS, we really do appreciate your time and expertise. This was a huge commitment. And one of the things that we've said over and over again, and I want to reiterate one of the things Vicki Wachino mentioned her first day here, was this is a real opportunity for quality to drive health information technology (health IT) and electronic health record (EHR) versus the other way around. And we recognize that. So as much as we've talked about the need for it, I did want to reemphasize that point.
We have a lot of work ahead of us over the next few years. And this will be an evolutionary process. And we're going to learn a lot, and we're not going to—we know we are going to make some mistakes. We're trying to do as much brainstorming and working with partners as we can now.
As I have mentioned, we want to have a couple of meetings in the next couple of months to get your input in terms of our first set of recommendations we want to take to Congress about what is really needed on moving the quality agenda forward, particularly for Medicaid and CHIP, which is much further behind than what has happened with Medicare because they are completely different programs. They are different systems.
Medicaid and CHIP, as I've mentioned, involve hundreds of programs. And how do we bring all of these partners together. And that is what has been a really big challenge.
But this opportunity we have now in terms of really impacting health from newborn to children to adolescents, we're putting $225 million into building this new approach to addressing health care for children. And by doing so, by impacting at such a young age, we really are making a difference for a whole new generation going forward. And so just knowing that this is an opportunity for us is one of the things that makes this so exciting for us and why we are so enthusiastic and motivated.
It requires a lot of us to work together. So CMS, with a lot of new transition and leadership, is very open to feedback and input at this point. We're trying to balance that with getting the work done. So that has obviously been a big challenge for us. I do encourage you to please contact me if you would like to participate in either of these workgroups in terms of making recommendations to Congress.
We are going to be holding an early and periodic screening, diagnosis, and treatment (EPSDT) national improvement workgroup. And so we are trying to gather people who are interested in participating in that going forward. But at the same time, there are so many moving parts in terms of trying to integrate the work that we're doing with the Quality Measures Program, providing and exchanging information with the grantees for the quality grants, of which this is a big component, feeding information to the Institute of Medicine, which has to do a study, the Government Accountability Office has to do a study.
We're working with States in terms of developing an annual quality reporting process, with which we have had experience, as has been mentioned with CHIP, but it's—our systems are still right now currently payment systems. So we have a lot of work before us.
And trying to pull all this together with the work that we're trying to do with the Office of National Coordinator (ONC) and all of our Federal partners has been really challenging, but at the same time, so exciting.
So I just wanted to again thank everybody for your time and your feedback. And I really do look forward to continuing to receive that. I do want to acknowledge Lekisha Daniel Robinson and John Young who have been doing an incredible amount of background work on the CMS side and supporting the work that Denise and I have been doing directly with you all. So I did want to acknowledge and recognize their dedicated efforts because it has been a lot of overtime and weekend work. And I wanted to thank you in public for them.
So, again, thank you everybody. And I look forward to working with you all going forward.
Dr. Dougherty: And I'd like to echo that thanks. And please keep in touch. So do people have your email address Barbara?
Ms. Dailey: It's a long one. But it's barbara.dailey@cms.hhs.gov.
Dr. Dougherty: And I would also like to thank Mia DeSoto, who joined us in late August when we were smack in the—well, it was the beginning, the middle, and the end all at once. So she was one of those people trying to create those one-pagers.
And Matt Levy, who is an intern working with us on health policy. He needs a health policy internship, he's a University of Maryland senior. So we'll have another candidate for quality measurement and improvement work after this meeting, right, Matt?
Ms. Dailey: And John did want to say something.
Mr. Young: Yes, John Young, CMS, just one point I wanted to make briefly and that's around disparities.
With this new Administration, there is a real groundswell and focus on disparities. I think we could have had the same conversation about disparities 5, 10, 15 years ago that they exist. Now there is a real sort of wherewithal, a momentum, to really begin to address disparities. And a conversation with measurement and disparities I think goes hand and hand.
So the information that we develop here easily translates into a strategy, a vision for disparities. I like to liken it to a system for quality. We're building the system to address quality.
So in looking at disparities, the technical aspects—it's not just social work. Now it becomes the science of how do we begin to address disparities in a number of areas.
So we're really excited about that. There are a number of workgroups around the U.S. Department of Health and Human Services (HHS) actually, not just CMS, but HHS as a whole looking at disparities now and developing a strategy and a vision, a leadership vision, as I call it, around disparities. So that's it. Thank you.
Dr. Dougherty: And I forgot to thank Ernest Moy. He's sitting over there quietly. I asked him if he would, you know, be the co-chair of some of these sessions. But he's shy. He just—he likes his National Healthcare Quality and Disparities reports, but he's been very helpful in planning this meeting.
And I'm sure this will influence the reports as well as there is an Institute of Medicine (IOM) report coming out on how to improve the National Healthcare Quality and Disparities reports. So we'll see how much we overlap with those.
So thank you all very much. Stay in touch. And we will be in touch with you.
Adjourn


5600 Fishers Lane Rockville, MD 20857