Chapter 3. Results
Health Care Efficiency Measures: Identification, Categorization, and Evaluation
The electronic literature search identified 4,324 titles (Figure 2). An additional five articles were suggested from a conference attended by the principal investigator. Reference mining identified another 113 potentially relevant titles.
Of the titles identified through our electronic literature search, 3,692 were rejected as not relevant to our project, leaving 632 total from all sources. Repeat review by the research team excluded an additional 62 titles. Seven titles could not be located even after contracting with Infotrieve, a private service that specializes in locating obscure and foreign scientific publications. A total of 563 articles were retrieved and reviewed.
Screening of retrieved articles/reports resulted in exclusion of 245: 145 due to research topic (research topic was not health care efficiency measurement); 93 that did not report the results of an efficiency measure); 6 duplicate articles that were accidentally ordered; and 1 article with duplicate data. The remaining 318 articles were accepted for detailed review. Because of the volume of articles, we considered as first priority only those studies that reported efficiency using U.S. data sources. There were 158 such articles. (For a list of excluded studies, please refer to Appendix F a).
The focus of the majority of articles on health care efficiency has been the production of hospital care. Of the 158 priority articles abstracted, 93 articles (59%) containing 155 measures examined the efficiency of hospitals. Studies of physician efficiency were second most common (33 articles, 21%, 45 measures), followed by much smaller numbers of articles focusing on the efficiency of nurses, health plans, other providers, or other entities. None of the abstracted articles reported the efficiency of health care at the national level, although two articles focused on efficiency in the Medicare program.
Articles were considered to contain an efficiency measure if they met our definition presented earlier—i.e., they included a measurement of the inputs used to produce a health care output. We abstracted 250 efficiency measures, summarized in Table 6 and listed in detail in Appendix G a. The measures are organized according to the typology presented above: by perspective, outputs, and inputs. However, perspective—which asks who is the evaluator, who is being evaluated, and what are the objectives—could not be abstracted adequately from most articles, and is represented by unit of analysis in Table 6 and the discussion.
Table 6. Summary of efficiency measures abstracted from the peer-reviewed literature
|Inputs||Health Services Outputs||Health|
Source: Authors' analysis.
Almost all of the measures abstracted from articles reviewed used health care services such as inpatient discharges, physician visits, or surgical procedures, as outputs. Very few measures (4) included the outcomes of care such as mortality or improved functional status. In addition, none of the outputs explicitly accounted for the quality of service provided. A small subset of measures attempted to account for quality by including it as an explanatory variable in a regression model in which efficiency is the dependent variable. Some articles also conducted analyses of outcomes separately from analyses of efficiency.
A larger number of measures used physical inputs (118) compared to financial inputs (74). Many measures used both physical and financial inputs (58). Studies of health plan efficiency were more likely to focus on financial inputs, while studies of provider efficiency were more likely to focus on physical inputs (particularly studies of physician efficiency).
Most of the measures abstracted from the peer-reviewed literature used econometric or mathematical programming methodologies for measuring health care efficiency. Two approaches were most common: data envelopment analysis (DEA) and stochastic frontier analysis (SFA). DEA is a non-parametric deterministic approach that solves a linear programming problem in order to define efficient behavior. SFA is a parametric approach that defines efficient behavior by specifying a stochastic (or probabilistic) model of output and maximizing the probability of the observed outputs given the model. These methods are described in more detail in Box 1. Some measures were ratio-based. Ratios were more common for physician efficiency measures than hospital efficiency measures. The main difference between the various measurement approaches is that ratio-based measures can include only single inputs and outputs (although various elements are sometimes aggregated to a single quantity), whereas SFA, DEA, and regression-based approaches explicitly account for multiple inputs and outputs.
The types of measures found are discussed below in more detail, organized primarily by the three tiers of the typology (perspective, outputs, and inputs).
Existing measures are based on a variety of methodologies. Each of these methods compares outputs to inputs across units within some setting. For example, they might compare discharges to labor hours within hospitals. The methods differ in their assumptions and their ease of implementation. Principal methods include ratios, data envelopment analysis (DEA), stochastic frontier analysis (SFA), regression-based approaches, and Malmquist and other index numbers. Ratios divide outputs by inputs. For example, a ratio could include hospital discharges in the numerator and some input into production, such as the number of full-time-equivalent (FTE) personnel, in the denominator, giving a measure of discharges per FTE. Dividing inputs by outputs would give the opposite but essentially equivalent ratio, or FTEs per discharge in our example. Ratios can also measure productive efficiency by treating cost as an input, giving a measure such as "dollars per discharge."
Ratios are easy to implement, requiring only a straightforward calculation based on data on a single output and input. They do not make any potentially mistaken assumptions about the relationship between the input and the output (e.g., that the number of discharges increases by a constant amount with the number of FTEs). However, ratios do not account for multiple outputs (e.g., outpatient treatments as well as inpatient discharges) and inputs (e.g., nursing vs. administrative labor). They also do not provide any direct information about the reasons why hospitals, physicians, or health plans vary in their performance so they may not be useful for directing improvement. Ratios may also mask the magnitude of an effect.
DEA uses complex mathematical-programming techniques to produce an efficiency score for each unit analyzed.18,19 It can account for multiple inputs and outputs without requiring any assumptions about the relationship among them. DEA does assume that all inputs and outputs are included in the analysis, and the results may be unreliable if this assumption is not correct.20 Like ratios, DEA can be used to measure technical or productive efficiency. If cost data are available, differences in technical efficiency can be distinguished from differences in the costliness of the mix of productive inputs (e.g., the balance between physician and nursing labor). DEA is typically "deterministic," that is, this method usually ignores random noise in inputs and outputs as a potential source of variation in efficiency scores.
SFA is an econometric technique that allows for such "stochastic" noise.21 In an analysis of technical efficiency, a particular relationship between outputs and technical inputs is assumed; productive efficiency can be analyzed by specifying the relationship between costs and multiple outputs (if desired). Inefficiency is distinguished from measurement error through assumptions about the distribution of each. In particular, measurement error can lead observed output to be either higher or lower than expected based on observed inputs, while inefficiency can only lead output to be lower than expected. If these assumptions are valid, SFA can be more informative about inefficiency across units than DEA. SFA, like DEA, can be unreliable if some inputs or outputs are excluded.
Finally, there are regression-based approaches. For example, in corrected ordinary least squares (COLS) technical efficiency is analyzed by regressing an output on productive inputs.22 Like SFA, COLS makes an assumption about the relationship between inputs and outputs. COLS is easier to implement, but at the cost of making more restrictive assumptions about the relationship between inputs and outputs across units.23 Productive efficiency can also be analyzed with regression-based approaches.
Malmquist and other index numbers are a final, albeit infrequently used, approach.24 These methods "solve for" units' relative productivity based on observed data about, and an assumed relationship among, inputs and outputs. Like ratios, index numbers are relatively straightforward to calculate, yet multiple inputs and outputs can be accommodated. However, index numbers do not themselves provide any information about the sources of variation across units. They are also not useful for analyzing productive efficiency.
Box 1. Explanation of methods
Articles measuring hospital efficiency were most common (93 articles containing 155 measures). This focus on hospital efficiency is likely due to the high cost of hospital care -30% of total U.S. health spending in 2004.25 Increasing the efficiency of hospital care has been a longstanding focus of U.S. cost containment, with prospective payment implemented in the Medicare program in the mid-1980s and with many private insurers following suit. Several measurement-related issues may also have contributed to the large number of analyses of hospital efficiency. The first is data availability: hospitals routinely collect utilization and cost data that can be used for efficiency measurement. For example, many studies use data from hospital discharge abstracts and the American Hospital Association (AHA) Annual Survey. Second, hospitals are relatively closed systems, so that it is easier to measure and attribute all relevant inputs and outputs. The exception is physician services; since many physicians may have admitting privileges at a hospital, it is difficult to count how many physicians the hospital "employs" (although it is possible to measure the volume of physician services).
Most of the articles containing measures of hospital efficiency were research studies. A smaller number of articles were descriptive; these typically reported hospital efficiency scores but not in order to answer a research question or were descriptions of efficiency measurement approaches with illustrative examples. In terms of measurement approach, econometric analyses including DEA, SFA, or regression-based approaches were the most common ways of measuring hospital efficiency. These included multiple inputs and outputs and often controlled for patient-level, hospital-level, or area-level factors that could be associated with efficiency. Ratios were also used to measure hospital efficiency. These measures compared the amount of a single input used to produce a single output. In some articles, the ratio would then be used as the dependent variable of a regression model.
Outputs. All but 3 of the hospital efficiency measures used health services as outputs. Common outputs were discharges, inpatient days, physician visits in outpatient clinics, and inpatient and/or outpatient procedures performed. Similar outputs were used for hospital measures across the different measurement approaches employed such as SFA, DEA, and ratios. One of the 3 measures using health outcomes as outputs26 measured efficiency using hospital payments per life saved for patients in a single DRG (tracheostomy except for mouth, larynx, and pharynx disorder). Few of the outputs used in the measures accounted for differences in the quality or outcomes of the hospital care provided (i.e., quality was assumed to be equivalent). Several articles (e.g., Zuckerman, 199427) attempted to adjust for quality by entering it as an explanatory variable in regression models of hospital efficiency. Many measures adjusted for the case-mix of the outputs.
Inputs. The hospital efficiency measures were divided between measures using physical inputs and financial inputs, with more measures using physical inputs.
Physical inputs. About two thirds of the hospital efficiency measures (107 of 155 measures from 93 articles) included physical inputs. There were 66 measures that included only physical inputs and 41 measures that included both physical and financial inputs. These measures typically were used to compare the amount of labor, capital, and other resources used to produce outputs such as discharges and outpatient visits. The specific inputs used in the hospital efficiency measures varied widely between measures. There were 40 different inputs used. The average measure used four different inputs. Common physical inputs included:
- Physician labor—number of physicians (usually FTEs) or hours worked—can be difficult to measure since many physicians may have admitting privileges but generally a few account for the majority of admissions.
- Nursing labor—number of nurses (usually FTEs) or hours worked—often split into various categories such as RNs and LPNs.
- Administrative, technical, or other labor categories—number of personnel (usually FTEs) or hours worked.
- Beds—the number of beds was used as the most common indicator of capital stock.
- Depreciation of assets-a measure of capital, calculated in various ways.
An example of a measure that uses multiple physical inputs and multiple health services outputs comes from Grosskopf1). This DEA-based measure used the following inputs (counts): physicians; nurses; other personnel; and hospital beds. As outputs it used (again, counts): outpatient procedures; inpatient procedures; physician visits in outpatient clinics; hospital discharges; and emergency visits. In comparison, a typical example of a measure that uses a single physical input and health services output (ratio) was the number of hospital days (input) divided by the number of discharges (output)-the average length of stay2).
Financial inputs. About one half of the hospital efficiency measures included financial inputs. These measures typically compare the cost of producing health services outputs such as discharges and outpatient visits. For example, Rosko et al.28,29). measured the total cost, including the costs of labor and capital separately, to produce case-mix-adjusted discharges and physician visits in hospital clinics, adjusting for provider- and area-level characteristics and estimated using SFA. A common example of a ratio-based measure using financial inputs is the total cost (inputs) used to produce case-mix-adjusted discharges (outputs).
Physician efficiency measures constituted the second most common category (33 articles containing 45 measures). One possible explanation for the paucity of physician efficiency measures relative to hospital efficiency measures is that the methodology for measuring physician efficiency has developed more recently (e.g., methods of grouping episodes of care to use as outputs). Data sources covering physician care across multiple settings and types of care, including pharmaceuticals, are more difficult to collect and aggregate than data covering hospital stays.
Compared with the literature on hospital efficiency measurement, the physician efficiency literature included more descriptive articles. Approximately half of the articles containing physician efficiency measures were descriptive and half were research. Ratios were the most common methodology used in the physician efficiency measures, although multivariate approaches such as SFA and DEA were also common.
Outputs. All of the physician efficiency measures used health services as outputs. Similar to the hospital efficiency literature, none of the measures of physician efficiency accounted for the quality or outcome of the care provided. The types of health services used as outputs varied widely between measures, depending on the focus of the article. Common outputs included episodes of care and relative value units.
Inputs. Most of the physician efficiency measures (30 of 45) used physical inputs only. There were 7 measures that used financial inputs and 8 that used both physical and financial inputs.
Physical inputs. Ratio-based physician measures using physical inputs often compared the amount of service output produced per physician over a period of time. An example of a typical measure30 would be the relative value units of care provided per physician per month. Another common ratio-based physician measure using physical inputs was the number of visits per physician per week or month (e.g., Garg, 199131). DEA was used for six measures using physical inputs. An example32 used DEA to measure the amount of drugs, physician visits, ER visits, and lab/diagnostic tests used to produce an episode of care.
Financial inputs. There were 7 physician measures using only financial inputs. Three measures used ratios to compare the efficiency of physicians. A typical ratio-based measure33 using financial inputs compared per-member per-year costs (input: costs, output: covered lives) for physicians with responsibility for a defined patient population, controlling for case-mix and other patient characteristics. In this article, the ratio was then used as the dependent variable of a regression to examine the association between payment methods and efficiency. Another article34 measured total costs per episode; it used a regression-based approach to examine the effect of risk adjustment on efficiency measurement using Episode Treatment Groups.
There were nine articles containing ten measures focusing on health plan efficiency. The small number of articles focusing on health plan efficiency is surprising given the rapid increases in health plan premiums that employers and other purchasers of health insurance have faced in recent years. All nine of the articles containing health plan efficiency measures were research articles.
There was very little consistency in the approaches used to measure health plan efficiency. The most common approach was to compare the average amount of physical inputs (e.g., physician visits, hospital days) used by health plan beneficiaries over a period of time. Econometric methods, mostly DEA, were used in all of the measures except one ratio-based measure. The one ratio-based measure was the cost per episode of care.35
Outputs. Four of the health plan efficiency measures used covered lives as the sole output. Two articles, both by Cutler and colleagues,35,36 used episodes of care, focusing on a specific condition (acute myocardial infarction). The three remaining articles used utilization counts as outputs, including multiple types of services such as physician visits and hospital days.
Inputs. Seven of the health plan efficiency measures used financial inputs. Only one measure used only physical inputs; two used both physical and financial inputs. The three measures including physical inputs all used DEA (one article also used SFA and another regression-based approach) to analyze the production of covered lives using multiple inputs. Two of these articles used utilization counts as inputs (hospital days, physician visits, etc.). These same variables were used as outputs in several measures of productive efficiency in health plans.
Four measures using only financial inputs used DEA or a regression-based approach to compare the total costs of producing multiple outputs (hospital days, physician visits, etc.). One article used SFA to measure cost per covered life. Finally, two measures used by Cutler et al. (described above) used either ratios35 or regressions36 to compare costs per episodes of care for a specific medical condition.
There were three articles containing three measures focusing on nursing efficiency. The measures described in these articles were all based on ratios, with two articles providing a descriptive, rather than a model based, analysis. One article was unique in the sense that it used a simulation approach, rather than empirical data. Two articles used the number of hospital discharges as the output; the third used the number of non-physician visits. Commonly used inputs included the number of nurses, nurses' time, and labor cost.
Geographic Areas. Two articles37,38 compared the efficiency of hospital care between geographic areas. Both were by the same primary author and used DEA to measure the amount of various physical inputs used to produce physician visits and hospital discharges. These measures were similar to those used in hospital-focused articles, but were aggregated to the regional level.
Medicare. Two articles examined the efficiency of the Medicare program. One article39 reported on an analysis of trends in costs per hospital discharge and average length of hospital stay in hospitals paid by Medicare over time. Another article40 contained an analysis of the efficiency of the Medicare program using an area-level analysis, building on information from the Dartmouth Atlas of Health Care. The efficiency measure was a comparison of Medicare expenditures (inputs) used to produce survival (outputs) between regions. A simple comparison shows a negative relationship—areas with lower survival rates have higher Medicare expenditures. However, this comparison has a problem of reverse causation. Regions with a more severe case mix are expected to have higher spending, but higher spending is also expected to increase survival (other things being equal). In order to address this issue, an instrumental variables approach was used (intensity of care in the last six months of life was used as the primary instrument) to model regional survival rates as a function of Medicare expenditures.
Integrated Delivery Systems. Two articles, both by the same primary author, compared the efficiency of integrated delivery systems. One article included multiple measures of physical inputs;41 the other included one measure using physical inputs and one measure using financial inputs.42 The article with multiple measures included two ratios (average length of stay and days of care per bed) and one DEA-based measure. The DEA measure included beds, ambulatory surgical centers, and total facilities as inputs, and inpatient/outpatient procedures and discharges as outputs. The second article included a similar DEA measure and a ratio-based measure using financial inputs, costs per hospital discharge.
Other Units. There were several units of observation where efficiency was measured in only one article. These included articles focusing on efficiency in community-based youth services,43 physician assistants,44 general practice medical residents,45 area agencies on aging,46 community mental health centers,47 hospital cost centers,48 dialysis centers,49 hospital pharmacies,50 medical groups,51 mental health care programs,52 organ procurement organizations,53 outpatient substance abuse treatment organizations,54 and cancer detection programs.55
In this section we will describe the methods behind the efficiency measures abstracted from articles in some more detail. In doing so, we will not distinguish between the units of observations as we did in the previous section.
Most of the measures abstracted from the peer-reviewed literature were based on available secondary data sources, most commonly claims or other administrative data. Of the 158 articles containing efficiency measures, 109 used available secondary data sources. The remaining articles collected primary data for the purpose of efficiency measurement (38 articles) or did not report their data source (11 articles).
Seventy-eight percent of the articles examined data at the level of the unit of observation for which efficiency was estimated (e.g., the physician or hospital). Fourteen percent, in addition, examined data on the individual-patient level. Sample sizes varied between 1 and 6,353 for the former, and 57 and 1,661,674 for the latter.
The majority of articles (70%) examined one or more explanatory variables, either to control for certain confounding variables (e.g., case-mix, market concentration), or to explain efficiency differences by some observed characteristic (e.g., whether the hospital was under public or private ownership).56 In 52% of the articles, at least one measure was used in combination with provider characteristics as explanatory variables. Similarly, 29% used area characteristics as explanatory variables, 14% of the articles included (diagnosis-unrelated) patient characteristics such as age and gender, and 42% included diagnosis-based case-mix information.
The time frame used by each study varied; 46% of articles examined efficiency at one point in time and based their findings on a single year of data (cross-sectional study design) and 54% of the articles used data from multiple years, and in some cases tracked efficiency over time (longitudinal design).
Sensitivity Analysis and Testing of Reliability and Validity
Thirty-six percent of the articles tested the robustness of their findings against alternative specifications of the models used. This approach, commonly known as sensitivity analysis, can provide helpful insights as the choice of a particular model is often somewhat arbitrary. In this regard, the number of articles that examined the sensitivity of their findings is surprisingly low. In addition, only four of the articles attempted to estimate the reliability and/or validity of the measures used.
The grey literature included efficiency measures developed and used by private groups that were otherwise not adequately captured in the peer-reviewed literature. We supplemented the information available in the grey literature with interviews of vendors and stakeholders. Ten organizations were contacted using a purposive reputational sampling approach. We identified organizations that had either developed measures of health care efficiency, were in the process of developing such measures, or were evaluating and choosing measures. These organizations were selected based on nominations by members of the study team, by the TEP, or by other interviewed stakeholders and vendors. Participation in a meeting on efficiency sponsored by AHRQ and The Alliance, convened in Madison, Wisconsin, in May 2006, also aided in the identification of potential developers of efficiency measures.
Eight of these organizations are vendors marketing proprietary measures. The other five organizations represent stakeholders who have been exploring the use of in-house or vendor-developed measures. The vendor organizations included major developers of proprietary software used as efficiency measurement tools. The stakeholder organizations selected were either national leaders in quality and efficiency measurement and improvement (e.g., The Leapfrog Group, AQA, and NCQA) or regional coalitions with a long history of performance measurement and reporting (e.g., IHA in California and the Employer Health Care Alliance Cooperative, also known as The Alliance, in Wisconsin).
The results presented here are based on information gathered from eight vendors and five stakeholders who responded to our request for an interview.
Our scan identified eight major developers of proprietary software packages for measuring efficiency. Other vendors (not included in our study) provide additional analytic tools, solution packages, applications, and consulting services that build on top of these platforms. Although some of the vendors' measures were mainly developed for other purposes (e.g., risk adjustment) they all have been commonly used by payers and purchasers to profile the efficiency of provider organizations (e.g., hospitals, medical groups) and individual physicians. They have also been used in the selection of provider networks. In some cases they have also been used to create tiered insurance products, where patients are required to pay larger co-payments for visits to providers with lower efficiency scores. Activities to link provider profiling to pay-for-performance initiatives are underway.
These measures, for the purpose of assessing efficiency, generally take the form of a ratio, such as observed-to-expected ratios of costs per episode of care, adjusting for patient risk. None of these measures used SFA, DEA, or other multiple input, multiple output regression-based measurement approaches common in the efficiency measures abstracted from the peer-reviewed literature. Almost all of these measures rely on insurance claims data.
The measures fall into two main categories: episode-based or population-based. An episode-based approach to measuring efficiency uses diagnosis and procedure codes from claims/encounter data to construct discrete episodes of care, which are series of temporally contiguous health care services related to the treatment of a specific acute illness, a set time period for the management of a chronic disease, or provided in response to a specific request by the patient or other relevant entity.57 Efficiency is measured by comparing the physical and/or financial resources used to produce an episode of care. Attribution rules based on the amount of care provided by each provider are typically applied to attribute episodes to particular providers, after applying additional risk adjustment.
Examples of episode-based approaches include:
- IHCIS-Symmetry of Ingenix: Episode Treatment Groups (ETGs) Episode Treatment Groups (ETGs), developed by IHCIS-Symmetry of Ingenix, create distinct episodes of care and categorize them based on the relevant clinical condition and the severity of that condition. An episode of care is the unique occurrence of a condition for an individual and the services involved in diagnosing, managing, and treating that condition. ETGs use the diagnosis and procedural information on an individual's billed claims for medical and pharmacy services to identify distinct episodes of care for the individual.
- Thomson Medstat: Medstat Episode Groups (MEG) Medstat Episode Groups (MEGs), developed by Thomson Medstat, apply the disease staging approach to classify discrete episodes of care into disease stages. The disease staging criteria define levels of biological severity or pathophysiologic manifestations for specific medical conditions—episodes of care. Staging is driven by the natural history of the disease. Contrary to the ETGs, treatments, whether medical or surgical, are not part of the disease staging classification of the MEGs.
- Cave Consulting Group: Cave Grouper The CCGroup Marketbasket System™ compares physician efficiency and effectiveness to a specialty-specific peer group using a standardized set of prevalent medical condition episodes with the intent of minimizing the influence of patient case mix (or health status) differences and methodology statistical errors. The Cave Grouper™ groups over 14,000 unique ICD-9 diagnosis codes into 526 meaningful medical conditions. The CCGroup EfficiencyCare™ Module takes the output from the Cave Grouper™ and develops specialty-specific physician efficiency scores that compare individual physician efficiency (or physician group efficiency) against the efficiency of a peer group of interest.
A population-based approach to efficiency measurement classifies a patient population according to morbidity burden in a given period (e.g., one year). Efficiency is measured by comparing the costs or resources used to care for that risk-adjusted patient population for a given period. This approach is used when a single entity, such as a designated primary care provider or an insurance plan, can be assumed to be responsible for the efficiency of a defined patient population's care for a given period.
Examples of population-based approaches include:
- The Johns Hopkins University: Adjusted Clinical Groups (ACGs) The Adjusted Clinical Groups (ACGs), developed by researchers at the Johns Hopkins University, are used to evaluate efficiency with respect to the total health experience of a risk-adjusted population over a given period of time. The ACG system uses automated claims, encounter, and discharge abstracts data to characterize the level of overall morbidity in patients and populations. This person-focused approach assigns each individual to a single mutually exclusive ACG category, defined by patterns of morbidity over time, age, and sex.
- 3M Health Information Systems: Clinical Risk Grouping (CRG) The Clinical Risk Grouping was developed by 3M Health Information Systems to classify patients into severity-adjusted clinically homogeneous groups. The CRG classification system can be used prospectively and retrospectively for both inpatient and ambulatory encounters. It uses demographic data, diagnostic codes and procedural codes to assign each individual to a single mutually exclusive risk group that relates the historical clinical and demographic characteristics of the individual to the amount and type of health care resources that individual will consume in the future.
- DxCG: Diagnostic Cost Groups (DCGs) DxCG models work by classifying administrative data into coherent clinical groupings based on age, sex, diagnoses, and drug codes and applying hierarchies and interactions to create an aggregated, empirically valid measure of expected resource use. The measure, called a "relative risk score," is calculated at the individual patient level and quantifies the financial implications of the patient's "illness burden" or morbidity. The classification systems are freely available and transparent.
- Health Dialog: Provider Performance Measurement System Provider Performance Measurement System examines the systematic effects of health services resources a person at a given level of comorbidity uses over a predetermined period of time (usually one year). Based on John Wennberg's work, PPMS assesses and attributes unwarranted variations in the system with respect to three dimensions: (1) effective care; (2) preference sensitive care; and (3) supply sensitive care.
Table 7 provides a summary of key attributes of these vendor-developed measures. With both episode- and population-based measures, the focus of measure development has mainly been on defining the output of the efficiency measures (the second level of our typology presented above). To be used as efficiency measures, vendors then customize and construct inputs by adding either or both the costs and resources used in the production of that output, pending specification needs of the users representing various perspectives (e.g., payers, health plans). Cost-based inputs can be constructed using either standardized pricing (e.g., Medicare pricing) or allowing the price to vary according to users' specification.
These tools have had other uses in addition to efficiency measurement. For example, most of these tools have been used as methods for adjusting risk and case-mix. In addition, researchers use these grouping algorithms to risk adjust for resource utilization prediction, provider profiling, and outcomes assessment. Efforts to validate and test the reliability of these algorithms as tools to create relevant clinical groupings for comparison are documented in either internal reports or white papers. However, there is very little information available on efforts to validate and test the reliability of these algorithms specifically as efficiency measures (the available evidence is summarized in the next section).
The choice of episode-based versus population-based measures may depend on the context in which the measures are being used. For example, the management of chronic or acute conditions may be best understood at the level of an episode whereas the management of preventive care may be best understood at the population level. Similarly, the use of fee-for-service payments makes episodes somewhat easier to interpret whereas capitation payments can be evaluated using population-based methods. Adjusting population-based metrics for the differences in enrollee characteristics and case mix may be difficult and taking action on the findings may also be challenging.
Table 7. Efficiency measures developed by vendors
|Organization||Efficiency Measure Name||Approach||Description|
|IHCIS Symmetry of Ingenix||Episode Treatment Groups (ETG)||Episode-based||The ETG™ methodology identifies and classifies episodes of care, defined as unique occurrences of clinical conditions for individuals and the services involved in diagnosing, managing, and treating that condition. Based on inpatient and ambulatory care, including pharmaceutical services, the ETG classification system groups diagnosis, procedure, and pharmacy (NDC) codes into 574 clinically homogenous groups, which can serve as analytic units for assessing and benchmarking health care utilization, demand, and management.|
|Thomson Medstat||Medstat Episode Groups (MEG)||Episode-based||MEGT is an episode-of-care-based measurement tool predicated on clinical definition of illness severity. Disease stage is driven by the natural history and progression of the disease and not by the treatments involved. Based on the disease staging patient classification system, inpatient, outpatient, and pharmaceutical claims are clustered into approximately 550 clinically homogenous disease categories. Clustering logic (i.e., construction of the episode) includes: (1) starting points; (2) episode duration; (3) multiple diagnosis codes; (4) lookback mechanism; (5) inclusion of non-specific coding; and (6) drug claims.|
|Cave Consulting Group||Cave Grouper||Episode-based||The CCGroup Marketbasket System™ compares physician efficiency and effectiveness to a specialty specific peer group using a standardized set of prevalent medical condition episodes with the intent of minimizing the influence of patient case mix (or health status) differences and methodology statistical errors. The Cave Grouper™ groups over 14,000 unique ICD.9 diagnosis codes into 526 meaningful medical conditions. The CCGroup EfficiencyCare™ Module takes the output from the Cave Grouper™ and develops specialty-specific physician efficiency scores that compare individual physician efficiency (or physician group efficiency) against the efficiency of a peer group of interest.|
|National Committee for Quality Assurance (NCQA)||Relative Resource Use (RRU)||Population-based||The RRU measures report the average relative resource use for health plan members with a particular condition compared to their risk-adjusted peers. Standardized prices are used to focus on the quantities of resources used. Quality measures for the same conditions are reported concurrently.|
|The Johns Hopkins University||Adjusted Clinical Groups (ACG)||Population-based||ACGs are clinically homogeneous health status categories defined by age, gender, and morbidity (e.g., reflected by diagnostic codes). Based on the patterns of a patient's comorbidities over a period of time (e.g., one year), the ACG algorithm assigns the individual into one of 93 mutually exclusive ACG categories for that span of time. Clustering is based on: (1) duration of the condition; (2) severity of the condition; (3) diagnostic certainty; (4) etiology of the condition; (5) specialty care involvement.|
|3M Health Information Systems||Clinical Risk Grouping (CRG)||Population-based||The CRG methodology generates hierarchical, mutually exclusive risk groups using administrative claims data, diagnosis codes, and procedure codes. At the foundation of this classification system are 269 base CRGs which can be further categorized according to levels of illness severity. Clustering logic is based on the nature and extent of an individual's underlying chronic illness and combination of chronic conditions involving multiple organ systems further refined by specification of severity of illness within each category.|
|DxCG||Diagnostic Cost Groups (DCG) and RxGroups||Population-based||DxCG models predict cost and other health outcomes from age, sex and administrative data: either or both Diagnostic Cost Groups (DCG) for diagnoses and RxGroups® for pharmacy. Both kinds of models create coherent clinical groupings, and employ hierarchies and interactions to create a summary measure, the "relative risk score," for each person to quantify financial and medical implications of their total illness burden. At the highest level of the classification system are 30 aggregated condition categories (ACCs) which are subclassified into 118 condition categories (CCs) organized by organ system or disease group.|
|Health Dialog||Provider Performance Measurement System||Population-based||The Provider Performance Measurement System examines the systematic effects of health services resources a person at a given level of comorbidity uses over predetermined period of time (usually one year). The measures incorporate both facility/setting (e.g., use of ER and inpatient services) and types of professional services provided (e.g., physician services, imaging studies, laboratory services). Based on John Wennberg's work, PPMS assesses and attributes unwarranted variations in the system with respect to three dimensions: (1) effective care; (2) preference sensitive care; and (3) supply sensitive care.|
We contacted a sample of stakeholders to seek their insights on efficiency measurement based on their efforts in scanning, developing, and/or implementing efficiency measures. We also used their input to cross-validate our selection of vendors described in the above section. Our sample included two coalitions on the national level; two coalitions on the state level; and an accrediting agency. These stakeholders are listed in Table 8. We asked these stakeholders to provide the definition of efficiency they used to guide their efforts; describe desirable attributes they considered as they searched for available measures; comment on their interest or objectives in developing and/or implementing efficiency measures; and list proprietary measures they have considered. Desirable attributes described by these stakeholders are incorporated in the next section as criteria for assessing efficiency measures. Table 9 summarizes comments we obtained from these stakeholders. The TEP, which included various stakeholders and experts on efficiency measurement, also provided input into the search and reviewed this report. The TEP members are listed in Appendix D.
While the stakeholders used different definitions of "efficiency," they shared a number of common concerns related to efficiency measurement. Many concerns were related to methodological issues such as data quality, attribution of responsibility for care to providers, risk adjustment, and identification of outliers. The stakeholders also shared a number of concerns related to the use of efficiency measures such as the appropriate way to make comparisons, how measures will be perceived by providers and patients, and the cost burden and transparency of measures. All of the stakeholders had been through decision processes about whether to use vendor-developed measures or develop their own measures in-house, with different conclusions reached.
Definition of Efficiency
Responses from stakeholder informants reflected the diversity of perspectives and definitions of health care efficiency. While some stakeholders considered efficiency as an input-output relationship (e.g., resources used for a given condition), others conceptualized it as costs relative to one's peers. There is wide recognition of the importance of integrating efficiency measurement with quality measurement, particularly for pay-for-performance initiatives. Most stakeholder informants noted that they had considered proprietary software marketed by at least one of the vendors described in Table 8, typically through a request for information (RFI)/request for proposal (RFP) process. Informants also shared with us that the process of identifying, endorsing, and implementing an efficiency measure(s) involved multiple stakeholder inputs, especially at the early stage of development.
Table 8. List of contacted stakeholders
|Stakeholders||Description||Perspective||Source of Information|
|The Alliance||The Alliance is a non-profit cooperative that was founded in 1990 by seven local employers in Wisconsin. Its current membership includes approximately 158 employers. Its public reporting program began in 1997.||Multi-stakeholder coalition||Organization's website|
|The Leapfrog Group||Founded and launched in 2000, membership of the Leapfrog Group includes Fortune 500 companies and other large private and public health care purchasers. The Leapfrog Hospital Reward Program is the first nationally standardized hospital incentive program, based on Leapfrog's public reporting program for private health care purchasers to measure and reward for performance in both quality and efficiency in inpatient care.||Business coalition||Telephone discussion, written material per study request, organization's website|
|National Committee for Quality Assurance (NCQA)||NCQA has over 10 years of performance measurement and reporting, particularly among managed care organizations and more recently among individual physicians and medical groups. With the support of the Commonwealth Fund, NCQA began, in 2005, to develop methods to benchmark physician performance, including efficiency.||Accrediting agency||Telephone discussion, written material per study request, organization's website|
|Integrated Healthcare Association (IHA)||Established in 1994, IHA is an association whose membership includes major health plans, physician groups, hospital systems, academic, consumer, purchaser, pharmaceutical and technology representatives in California. It has over 5 years of experience in pay for performance. One of its current projects is the measurement and reward of efficiency in health care.||Quality Improvement Collaborative||Telephone discussion, written material per study request, organization's website|
Issues of greatest concern to most stakeholders are related to:
- Data aggregation and quality: which organizational entity should provide, clean, and aggregate data files; will data be easily accessible; are data complete and populated correctly for evaluation; are complete, accurate encounter data available for capitated payment arrangements?
- Cost calculation: whether to use standardized costs vs. actual costs (it is especially complicated in regions in which providers are heavily capitated because claims data might not be available); are service-level data on prices or payment rates accurate and complete?
- Case-mix and severity adjustments: whether reliable methods exist to appropriately adjust for case-mix and severity of illness.
- Attribution: how to attribute responsibility for care of a particular episode or patient to a provider.
- Outliers: how should cases with extremely high costs be treated (truncated, trimmed, etc.)?
- Comparison group: how to define appropriate peer groups for comparison.
- Clinical relevance: how will efficiency measures be perceived by the provider and patient communities?
- Transparency: will providers understand how the results of efficiency measurement were reached? Will they be confident that the results are scientifically sound and meaningful?
- Linkage to quality measures: how to evaluate efficiency with respect to quality.
- Score reporting: how to structure the reporting mechanism (single scores or multiple scores for multiple specialties) and make the score transparent and actionable.
In addition, stakeholders whose initiatives involve voluntary participation expressed concerns about placing the cost burden on their participants. The Leapfrog Group, for example, developed their own efficiency measures for their pay-for-performance program for hospitals because the purchase of vendor-developed software might impose financial barriers to participation. On the other hand, several stakeholders shared with us that they considered vendors because many vendor-developed tools are already used to measure efficiency and they did not need to reinvent the wheel.
Stakeholder informants noted that by and large, efforts to measure and reward health care efficiency are still at a nascent and developmental stage, with most initiatives currently collecting baseline information and assessing feasibility. There are several examples of more mature initiatives, however, including the Massachusetts Group Insurance Commission's Clinical Performance Improvement project and the efforts of some individual health plans, including Blue Cross Blue Shield of Texas, Regence Blue Cross Blue Shield, United Healthcare's Premium Designation Program, and Aetna's Aexcel.
Table 9. Summary of stakeholder inputsa
|Organization||Definition of Efficiency||Objective in Using Efficiency Measures||Description of Development/Selection of Efficiency Measures|
|The Leapfrog Group||Relative resource use, for a given condition||To measure and reward inpatient efficiency and quality among hospitals||Leapfrog's measures were developed in-house, through a multi-stakeholder process. They consulted other organizations with similar experience in measure development and proprietary vendors on constructing severity adjustments. Leapfrog is currently collecting baseline data for its Resource-Based Efficiency Measure in five clinical areas. The measure assesses average actual length of stay (ALOS) per case for a specific bed type (i.e., routine vs. specialty), adjusting for severity and re-admission within 14 days.|
|NCQA||Cost relative to peers||To measure resource use for areas of quality already captured by HEDIS measures||NCQA's efforts in assessing efficiency are implemented on two levels. The first level is the systems level, including HMOs, PPOs, integrated delivery systems (IDSs). The plan is to incorporate resource use into the updated HEDIS measure for 2007 in order to assess quality and cost of care at the health plan level. The second level is the individual physician level-to assess quality and cost of care rendered by physicians, adjusting for risk. NCQA is currently in the process of working with stakeholders and selecting a vendor for this initiative.|
|IHA||"Cost of care" is a measure of the total health care spending, including total resource use and unit price(s), by payor or consumer, for a health care service or group of health care services, associated with a specified patient population, time period, and unit(s) of clinical accountability. b||To be used as a part of the pay for performance program||All P4P measurement decisions are made by multi-stakeholder P4P committees. After a comprehensive RFI and RFP process, Thomson Medstat was selected as the vendor/partner for efficiency measurement. Measures and methodologies for efficiency measurement are still being finalized. Measurement will be at the physician group level, and there will be both episode-based measures using Medstat's Medical Episode Grouper (MEG) and population-based measures. Measures will be risk adjusted for patient complexity and disease severity, and output to physician groups will be granular enough to be actionable. Measures are expected to be fully implemented by measurement year 2008.|
|The Alliance||The relationship between cost to the employer and the quality of care delivered.||(1) To implement an incentive program that takes into account performance in both quality measures and severity-adjusted costs;|
(2) To report health care cost and quality at the provider organization level to consumers so as to better inform decisionmaking.
|The Alliance constructed their own measure of efficiency, which integrates both cost and quality dimensions. However, they used proprietary software to calculate severity-adjusted cost and mortality.|
|AQA||"'Efficiency of care' is a measure of cost of care associated with a specified level of quality." c|
"Cost of care" is a measure of the total health care spending, including total resource use and unit price(s), by payor or consumer, for a health care service or group of health care services, associated with a specified patient population, time period, and unit(s) of clinical accountability.
|In addition to assessing individual physicians, groups and system performance, efficiency measurement should also be designed for learning and to inform a research agenda.||The AQA aims to develop general principles for comprehensive cost of care measures and a parsimonious "starter" set of cost of care measures related to specific conditions or procedures.|
a Same sources of information as corresponding organizations in Table 8.
b IHA has adopted a working definition of efficiency, based on "cost efficiency" definition provided by the AQA.
c AQA website with email confirmation.