Health Care Efficiency Measures: Identification, Categorization, and Evaluation
The measurement of health care efficiency has lagged behind the measurement of health care quality. Providers, payers, purchasers, consumers, and regulators all could benefit from more information on value for money in health care. Purchasers, particularly large employers, have been demanding that health plans incorporate economic profiling into their products and information packages. Despite the importance, there has not been a systematic and rigorous process in place to develop and improve efficiency measurement as there has been for other domains of performance. Recognizing the importance of improving efficiency measurement, the Agency for Healthcare Research and Quality (AHRQ) has sponsored this systematic review and analysis of available measures. Our work was designed to reach a wide variety of stakeholders, each of which faces different pressures and values in the selection and application of efficiency measures. Thus, we anticipate that some sections of the report will be less useful to some readers than others. This report should be viewed as the first of several steps that are necessary to create agreement among stakeholders about the adequacy of tools to measure efficiency.
Because we found that many stakeholders attach different meanings to the word "efficiency," we first developed a definition of efficiency. We believe that being explicit about how the term is being used is helpful in advancing the dialogue among stakeholders. In this report, we define efficiency as an attribute of performance that is measured by examining the relationship between a specific product of the health care system (also called an output) and the resources used to create that product (also called inputs). Under our definition, a provider in the health care system (e.g., hospital, physician) would be efficient if it was able to maximize output for a given set of inputs or to minimize inputs used to produce a given output.
Building on this definition, we created a typology of efficiency measures. The purpose of the typology is to make explicit the content and use of a measure of efficiency. Our typology has three levels:
- Perspective: who is evaluating the efficiency of what entity and what is their objective?
- Outputs: what type of product is being evaluated?
- Inputs: what resources are used to produce the output?
The first tier in the typology, perspective, requires an explicit identification of the entity that is evaluating efficiency, the entity that is being evaluated, and the objective or rationale for the assessment. We distinguish between four different types of entities:
- Health care providers (e.g., physicians, hospitals, nursing homes) that deliver health care services.
- Intermediaries (e.g., health plans, employers) who act on behalf of collections of either providers or individuals (and, potentially, their own behalf) but do not directly deliver health care services.
- Consumers/patients who use health care services.
- Society, which encompasses the first three.
Each of these types of entities has different objectives for considering efficiency, has control over a particular set of resources or inputs, and may seek to deliver or purchase a different set of products. Efficiency for society as a whole, or "social efficiency," refers to the allocation of available resources; social efficiency is achieved when it is not possible to make a person or group in society better off without making another person or group worse off. The perspective from which efficiency is measured has strong implications for the measurement approach, because what looks efficient from one perspective may look inefficient from another. For example, a physician may produce CT scans efficiently in her office, but the physician may not appear efficient to a health plan if a less expensive diagnostic test could have been substituted in some cases. The intended application of an efficiency measure (e.g., pay-for-performance, quality improvement) offers another way of assessing perspective.
The second tier of the typology identifies the outputs of interest and how those will be measured. We distinguish between two types of outputs: health services (e.g., visits, drugs, admissions) and health outcomes (e.g., preventable deaths, functional status, clinical outcomes such as blood pressure or blood sugar control). The typology addresses the role of quality (or effectiveness) metrics in the assessment of efficiency. A key issue that arises in external evaluations of efficiency is whether the outputs are comparable. Threats to comparability arise when there is (perceived or real) heterogeneity in the content of a single service, the mix of services in a bundle, and the mix of patients seeking or receiving services. Pairing quality measures with efficiency measures is one approach that has been suggested by AQA and others to assess comparability directly.
In this typology, we do not require that the health service outputs be constructed as quality/effectiveness metrics. For example, an efficiency measure could consider the relative cost of a procedure without evaluating whether the use of the procedure was appropriate. Similarly, an efficiency measure could evaluate the relative cost of a hospital stay for a condition without considering whether the admission was preventable or appropriate. However, the typology allows for health service outputs to be defined with reference to quality criteria. That is, the typology is broad enough to include either definition of health services. We deliberately constructed the typology in this way to facilitate dialogue among stakeholders with different perspectives on this issue.
The third tier of the typology identifies the inputs that are used to produce the output of interest. Inputs can be measured as counts by type (e.g., nursing hours, bed days, days supply of drugs) or they can be monetized (real or standardized dollars assigned to each unit). We refer to these, respectively, as physical inputs or financial inputs. The way in which inputs are measured may influence the way the results are used. Efficiency measures that count the amounts of different inputs used to produce an output (physical inputs) help to answer questions about whether the output could be produced faster, with fewer people, less time from people, or fewer supplies. In economic terms, the focus is on whether the output is produced with the minimum amount of each input and is called technical efficiency. Efficiency measures that monetize the inputs (financial inputs) help to answer questions about whether the output could be produced less expensively-whether the total cost of labor, supplies, and other capital could be reduced. A focus on cost minimization corresponds to the economic concept of productive efficiency, which incorporates considerations related to the optimal mix of inputs (e.g., could we substitute nursing labor for physician labor without changing the amount and quality of the output?) and the total cost of inputs.
This typology provides a framework within which stakeholders can have an explicit discussion about the intended use of measures, the choice and measurement of outputs, and the choice and measurement of inputs. Requesting that groups use a standard format, such as that suggested by the typology, allows stakeholders to systematically examine what is being measured and whether the measure (and available data) is appropriate for the purpose.
Evidence Sources and Searches
We searched Medline® and EconLit for articles published between 1990 and 2005 describing measures of health care efficiency. Titles, abstracts and articles were reviewed by two independent reviewers, with consensus resolution. We focused on studies reporting efficiency of U.S. health care, and excluded studies focusing on other countries. Data were abstracted onto Evidence Tables and also summarized narratively.
Because we expected some of the most commonly used efficiency measures might not appear in the published literature, we developed a list of organizations that we knew had developed or were considering developing their own efficiency measures. We contacted key people at these organizations in an attempt to collect the information necessary to describe and compare their efficiency measures to others we abstracted from articles.
A Technical Expert Panel (TEP) advised the project staff on the typology and sources of information, and reviewed a draft of this report. The TEP is listed in Appendix D of this report.
We found little overlap between the peer-reviewed literature that describes the development, testing, and application of efficiency measures and the vendor-based efficiency metrics that are most commonly used. From the perspective of policymakers and purchasers, the published literature provides little guidance for solving current challenges to managing rising health care costs. From the perspective of measurement experts, the vendor-based metrics are largely untested and as such the results may be problematic to interpret accurately. These observations have implications for the recommendations we make at the end of the report regarding future research.
In total, RAND reviewers examined 4,324 titles for the draft version of this report. Of these, 563 articles were retrieved and reviewed. There were 158 articles describing measures of health care efficiency in the United States.
The majority of peer-reviewed literature on health care efficiency has been related to the production of hospital care. Of the 158 priority articles abstracted, 93 articles (59%) measured the efficiency of hospitals. Studies of physician efficiency were second most common (33 articles, 21%), followed by fewer articles on the efficiency of nurses, health plans, other providers, or other entities. None of the abstracted articles reported the efficiency of health care at the national level, although two articles examined efficiency in the Medicare program.
Almost all of the measures abstracted from the articles used health services as outputs. Common health service types used as inputs included inpatient stays, physician visits, and procedures. Only four measures were found that included health outcomes as outputs. In addition, none of the outputs explicitly accounted for the quality of service provided. A small subset of measures attempted to account for quality by including it as an explanatory variable in a regression model in which efficiency was the dependent variable. Some articles also conducted analyses of outcomes separately from analyses of efficiency.
The health care efficiency measures abstracted were divided between measures using physical or financial inputs. There were more articles that used physical inputs than financial inputs. No articles were found containing measures of social efficiency.
Most of the measures abstracted from the peer-reviewed literature used econometric or mathematical programming methodologies for measuring health care efficiency. Two approaches were most common: data envelopment analysis (DEA) and stochastic frontier analysis (SFA). DEA is a non-parametric deterministic approach that solves a linear programming problem in order to define efficient behavior. SFA is a parametric approach that defines efficient behavior by specifying a stochastic (or probabilistic) model of output and maximizing the probability of the observed outputs given the model. These techniques can explicitly account for multiple inputs and multiple outputs. For example, DEA and SFA could be used to measure the efficiency of hospitals that use nursing labor and supplies to produce inpatient stays and ambulatory visits. DEA and SFA differ in a number of respects. DEA makes fewer assumptions than SFA about how inputs are related to outputs. DEA compares the efficiency of an entity to that of its peers (rather than an absolute benchmark) and typically ignores statistical noise in the observed relationship between inputs and outputs.
Some measures were ratio-based. Ratios were more common for physician efficiency measures than hospital efficiency measures. The main difference between the various measurement approaches is that ratio-based measures include only single inputs and outputs (although various elements are sometimes aggregated to a single quantity), whereas SFA, DEA, and regression-based approaches explicitly account for multiple inputs and outputs.
An example of a measure that uses multiple physical inputs and multiple health services outputs comes from Grosskopf.1 This DEA-based measure used the following inputs (counts): physicians; nurses; other personnel; and hospital beds. As outputs it used (again, counts): outpatient procedures; inpatient procedures; physician visits in outpatient clinics; hospital discharges; and emergency visits. In comparison, a typical example of a measure that uses a single physical input and health services output (ratio) was the number of hospital days (input) divided by the number of discharges (output)-the average length of stay.2
Vendors and Stakeholder Interviews
Thirteen organizations were selected using a purposive reputational sampling approach. The results presented here are based on information gathered from eight vendors and five stakeholders who responded to our request for an interview. The TEP, which included various stakeholders and experts on efficiency measurement, also provided input into the search and reviewed this report. The TEP members are listed in Appendix D.
Most of the measures used by purchasers and payers are proprietary. The main application of these measures by purchasers and plans is to reduce costs through pay-for-performance, tiered product offerings, public report, and feedback for performance improvement. These measures, for the purpose of assessing efficiency, generally take the form of a ratio, such as observed-to-expected ratios of costs per episode of care, adjusting for risk severity and case-mix. Efforts to validate and test the reliability of these algorithms as tools to create relevant clinical groupings for comparison are documented in either internal reports or white papers. External evaluations of performance characteristics of these measures are beginning to emerge from the Medicare Payment Advisory Commission (MedPAC), the Centers for Medicare and Medicaid Services (CMS), and other research groups including RAND. Our scan identified seven major developers of proprietary software packages for measuring efficiency, with other vendors providing additional analytic tools, solution packages, applications, and consulting services that build on top of these existing platforms.
The proprietary measures fall into two main categories: episode-based or population-based. An episode-based approach to measuring efficiency uses diagnosis and procedure codes from claims/encounter data to construct discrete episodes of care, which are a series of temporally contiguous health care services related to the treatment of a specific acute illness, a set time period for the management of a chronic disease, or provided in response to a specific request by the patient or other relevant entity. On the other hand, a population-based approach to efficiency measurement classifies a patient population according to morbidity burden in a given period (e.g., one year).
We contacted a sample of stakeholders to seek their insights on efficiency measurement. We used their input to cross-validate our selection of vendors described above. Our sample included two coalitions on the national level; two coalitions on the state level; and an accrediting agency. We asked these stakeholders to provide the definition of efficiency they used to guide their efforts; describe desirable attributes they considered as they searched for available measures; comment on their interest or objectives in developing and/or implementing efficiency measures; and list proprietary measures they have considered.
While the stakeholders used different definitions of "efficiency," they shared a number of common concerns related to efficiency measurement. Many concerns were related to methodological issues such as data quality, attribution of responsibility for care to providers, risk adjustment, and identification of outliers. The stakeholders also shared a number of concerns related to the use of efficiency measures, including the perceptions of providers and patients, and the cost of using proprietary measures and transparency of the methods used to construct the measures.
Measures of any construct can rarely be evaluated in the abstract. The evaluation must take into account the purpose or application of the measure; some measures that work well for research, for example, may be unusable for internal quality improvement.
We suggest that measures of health care efficiency be evaluated using the same framework as measures of quality:
- Important—is the measure assessing an aspect of efficiency that is important to providers, payers, and policymakers? Has the measure been applied at the level of interest to those planning to use the measure? Is there an opportunity for improvement? Is the measure under the control of the provider or health system?
- Scientifically sound—is the measure reliable and reproducible? Does the measure appear to capture the concept of interest? Is there evidence of face, construct, or predictive validity?
- Feasible—are the data necessary to construct this measure available? Is the cost and burden of measurement reasonable?
- Actionable—are the results interpretable? Can the intended audience use the information to make decisions or take action?
An ideal health care efficiency measure does not exist, and therefore the selection of measures will involve tradeoffs between these criteria. We summarize the results of our review of measures below.
The measurement of efficiency meets the test of importance because of the interest and intent among stakeholders in finding and implementing such measures for policy and operations. Although we found differences in the content of measures from peer-reviewed versus vendor–developed sources, they have in common the specification of one or more outputs and one or more inputs in constructing a measure.
The "importance" of measures abstracted from peer–reviewed literature appears low because these have not generally been used in practice and there is no apparent consensus in the academic literature of an optimal method for measuring efficiency. Some academic experts have indicated skepticism that the construct can be adequately measured. Although many peer-reviewed articles identified factors that were found to influence efficiency, the findings appear to be difficult to translate into policy. We found no clear evidence that efficiency measures developed by academics had influenced policy decisions made by providers or policymakers.
The vendor-developed measures meet the importance criterion because they are being widely used by purchasers and plans to inform operational decisions. Some of the vendor-developed measures are based on methods originally developed in the academic world (e.g., Adjusted Clinical Groups).
Very little research on the scientific soundness of efficiency measures has been published to date. This includes measures developed by vendors as well as those published in the peer–reviewed literature. Although academics are more likely to publish articles evaluating scientific soundness, we found little peer–reviewed literature on the reliability and validity of efficiency measures. Several studies have examined some of the measurement properties of vendor–developed measures, but the amount of evidence available is still limited at this time. Vendors typically supply tools (e.g., methods for aggregating claims to construct episodes of care or methods for aggregating the costs of care for a population) from which measures can be constructed; thus, the assessment of scientific soundness requires an evaluation of the application as well as the underlying tools. Significant questions about the scientific soundness of efficiency measures have been raised. The lack of testing of the scientific soundness of efficiency measures reflects in part the pressure to develop tools that can be used quickly and with relative ease of implementation.
The focus of vendor–developed measures is on producing tools that are feasible for routine operational use. Most of the measures abstracted from the peer-reviewed literature were based on available secondary data sources (i.e., claims data). These measures could feasibly be reconstructed at little cost and measurement burden. The vendor–developed measures also rely largely on claims data. Most of the vendor–developed measures require that the user obtain and pay for a license either directly or through a value added reseller. This has prompted some organizations to begin developing open–source, public domain measures of efficiency. This work is at an early stage.
For efficiency metrics to have the effects intended by users, the information produced from measures must be actionable. We found little research on the degree to which the intended audiences for these measures (e.g., consumers, physicians, hospitals) were able to readily use the information to choose or deliver care differently.
We found little overlap between the measures published in the peer-reviewed literature and those in the grey literature suggesting that the driving forces behind research and practice result in very different choices of measure. We found gaps in some measurement areas, including: no established measures of social efficiency, few measures that evaluated health outcomes as the output, and few measures of providers other than hospitals and physicians.
Efficiency measures have been subjected to relatively few rigorous evaluations of their performance characteristics, including reliability (over time, by entity), validity, and sensitivity to methods used. Measurement scientists would prefer that steps be taken to improve these metrics in the laboratory before implementing them in operational uses. Purchasers and health plans are willing to use measures without such testing under the belief that the measures will improve with use.
The lack of consensus among stakeholders in defining and accepting efficiency measures that motivated this study was evident in the interviews we conducted. An ongoing process to develop consensus among those demanding and using efficiency measures will likely improve the products available for use. A major goal of the AQA has been to develop a consensus around use of language in describing measures of economic constructs. The National Quality Forum is similarly working to achieve consensus on criteria for evaluating measures. Both groups support the use of clear language in describing particular metrics, which may be easier to implement than a consensus definition of efficiency.
Research is already underway to evaluate vendor-developed tools for scientific soundness, feasibility, and actionability. For example, we identified studies being done or funded by the General Accounting Office, MedPAC, CMS, Department of Labor, Massachusetts Medical Society, and the Society of Actuaries. A research agenda is needed in this area to build on this work. We summarize some of the key areas for future research here but do not intend to signal a prioritization of needed work.
Filling Gaps in Existing Measures
Several stakeholders recognize the importance of using efficiency and effectiveness metrics together but relatively little research has been done on the options for constructing such approaches to measurement. Much of the developmental work currently underway at AQA is focused on this gap.
We found few measures of efficiency that used health outcomes as the output measure. Physicians and patients are likely to be interested in measures that account for the costs of producing desirable outcomes. We highlight some of the challenges of doing this that are parallel to the challenges of using outcomes measures in other accountability applications; thus, a program of research designed to advance both areas would be welcome.
We found a number of gaps in the availability of efficiency measures within the classification system of our typology. For example, we found no measures of social efficiency, which might reflect the choice of U.S.-based research. Nonetheless, such measures may advance discussions related to equity and resource allocation choices as various cost containment strategies are evaluated.
Evaluating and Testing Scientific Soundness
There are a variety of methodological questions that should be investigated to better understand the degree to which efficiency measures are producing reliable and valid information. Some of the key issues include whether there is enough information to evaluate performance (e.g., do available sample sizes allow for robust scores to be constructed?); whether the information is reliable over time and in different purchaser data sets (e.g., does one get the same result when examining performance in the commercial versus the Medicare market?); methods for constructing appropriate comparison groups for physicians, hospitals, health plans, markets; methods for assigning responsibility (attribution) for costs to different entities; and the use of different methods for assigning prices to services. Remarkably little is known about these various methodological issues and a program of systematic research to answer these questions is critical given their increasing use in operational applications.
Evaluating and Improving Feasibility
One area of investigation is the opportunities for creating easy-to-use products based on methods such as DEA or SFA. This would require work to bridge from tools used for academic research to tools that could be used in operational applications.
Another set of investigations is identifying data sources or variables useful for expanding inputs and outputs measured (e.g., measuring capital requirements or investment, accounting for teaching status or charity care).
Making Measures More Actionable
Considerable research needs to be conducted to develop and test tools for decisionmakers to use for improving health care efficiency (e.g., relative drivers of costs, best practices in efficient care delivery, feedback and reporting methods) and for making choices among providers and plans. Research could also identify areas for national focus on reducing waste and inefficiency in health care. The relative utility of measurement and reporting on efficiency versus other methods (Toyota's Lean approach, Six Sigma) could also be worthwhile for setting national priorities.