Chapter 5. Root Cause Analysis
Heidi Wald, M.D.
University of Pennsylvania School of Medicine
Kaveh G. Shojania, M.D.
University of California, San Francisco School of Medicine
Historically, medicine has relied heavily on quantitative approaches for quality improvement and error reduction. For instance, the US Food and Drug Administration (FDA) has collected data on major transfusion errors since the mid-1970s.1,2 Using the statistical power of these nationwide data, the most common types of errors have been periodically reviewed and systems improvements recommended.3
These epidemiologic techniques are suited to complications that occur with reasonable frequency, but not for rare (but nonetheless important) errors. Outside of medicine, high-risk industries have developed techniques to address major accidents. Clearly the nuclear power industry cannot wait for several Three Mile Island-type events to occur in order to conduct valid analyses to determine the likely causes.
A retrospective approach to error analysis, called root cause analysis (RCA), is widely applied to investigate major industrial accidents.4 RCA has its foundations in industrial psychology and human factors engineering. Many experts have championed it for the investigation of sentinel events in medicine.5-7 In 1997, the Joint Commission on the Accreditation of Healthcare Organizations (JCAHO) mandated the use of RCA in the investigation of sentinel events in accredited hospitals.8
The most commonly cited taxonomy of human error in the medical literature is based on the work of James Reason.4,9,10 Reason describes 2 major categories of error: active error, which generally occurs at the point of human interface with a complex system, and latent error, which represents failures of system design. RCA is generally employed to uncover latent errors underlying a sentinel event.6,7
RCA provides a structured and process-focused framework with which to approach sentinel event analysis. Its cardinal tenet is to avoid the pervasive and counterproductive culture of individual blame.11,12 Systems and organizational issues can be identified and addressed, and active errors are acknowledged.6 Systematic application of RCA may uncover common root causes that link a disparate collection of accidents (i.e., a variety of serious adverse events occurring at shift change). Careful analysis may suggest system changes designed to prevent future incidents.13
Despite these intriguing qualities, RCA has significant methodologic limitations. RCAs are in essence uncontrolled case studies. As the occurrence of accidents is highly unpredictable, it is impossible to know if the root cause established by the analysis is the cause of the accident.14 In addition, RCAs may be tainted by hindsight bias.4,15,16 Other biases stem from how deeply the causes are probed and influenced by the prevailing concerns of the day.16,17 The fact that technological failures (device malfunction), which previously represented the focus of most accident analyses, have been supplanted by staffing issues, management failures, and information systems problems may be an example of the latter bias.17 Finally, RCAs are time-consuming and labor intensive.
Despite legitimate concerns about the place of RCA in medical error reduction, the JCAHO mandate ensures that RCA will be widely used to analyze sentinel events.8 Qualitative methods such as RCA should be used to supplement quantitative methods, to generate new hypotheses, and to examine events not amenable to quantitative methods (for example, those that occur rarely).18 As such, its credibility as a research tool should be judged by the standards appropriate for qualitative research, not quantitative.19,20 Yet, the outcomes and costs associated with RCA are largely unreported. This chapter reviews the small body of published literature regarding the use of RCA in the investigation of medical errors.
To be credible, RCA requires rigorous application of established qualitative techniques. Once a sentinel event has been identified for analysis (e.g., a major chemotherapy dosing error, a case of wrong-site surgery, or major ABO incompatible transfusion reaction), a multidisciplinary team is assembled to direct the investigation. The members of this team should be trained in the techniques and goals of RCA, as the tendency to revert to personal biases is strong.13,14 Multiple investigators allow triangulation or corroboration of major findings and increase the validity of the final results.19 Based on the concepts of active and latent error described above, accident analysis is generally broken down into the following steps:6,7
- Data collection: establishment of what happened through structured interviews, document review, and/or field observation. These data are used to generate a sequence or timeline of events preceeding and following the event.
- Data analysis: an iterative process to examine the sequence of events generated above with the goals of determining the common underlying factors:
- Establishment of how the event happened by identification of active failures in the sequence.
- Establishment of why the event happened through identification of latent failures in the sequence which are generalizable.
In order to ensure consideration of all potential root causes of error, one popular conceptual framework for contributing factors has been proposed based on work by Reason. Several other frameworks also exist.21,22 The categories of factors influencing clinical practice include institutional/regulatory, organizational/management, work environment, team factors, staff factors, task factors, and patient characteristics. Each category can be expanded to provide more detail. A credible RCA considers root causes in all categories before rejecting a factor or category of factors as non-contributory. A standardized template in the form of a tree (or "Ishikawa") may help direct the process of identifying contributing factors, with such factors leading to the event grouped (on tree "roots") by category. Category labels may vary depending on the setting.23
At the conclusion of the RCA, the team summarizes the underlying causes and their relative contributions, and begins to identify administrative and systems problems that might be candidates for redesign.6
Prevalence and Severity of the Target Safety Problem
JCAHO's 6-year-old sentinel event database of voluntarily reported incidents (see Chapter 4) has captured a mere 1152 events, of which 62% occurred in general hospitals. Two-thirds of the events were self-reported by institutions, with the remainder coming from patient complaints, media stories and other sources.24 These statistics are clearly affected by underreporting and consist primarily of serious adverse events (76% of events reported resulted in patient deaths), not near misses. The number of sentinel events appropriate for RCA is likely to be orders of magnitude greater.
The selection of events for RCA may be crucial to its successful implementation on a regular basis. Clearly, it cannot be performed for every medical error. JCAHO provides guidance for hospitals about which events are considered "sentinel,"8 but the decision to conduct RCA is at the discretion of the leadership of the organization.12
If the number of events is large and homogeneous, many events can be excluded from analysis. In a transfusion medicine reporting system, all events were screened after initial report and entered in the database, but those not considered sufficiently unique did not undergo RCA.25
Opportunities for Impact
While routine RCA of sentinel events is mandated, the degree to which hospitals carry out credible RCAs is unknown. Given the numerous demands on hospital administrators and clinical staff, it is likely that many hospitals fail to give this process a high profile, assigning the task to a few personnel with minimal training in RCA rather than involving trained leaders from all relevant departments. The degree of underreporting to JCAHO suggests that many hospitals are wary of probationary status and the legal implications of disclosure of sentinel events and the results of RCAs.12,26
As RCA is a qualitative technique, most reports in the literature are case studies or case series of its application in medicine.6,27-30 There is little published literature that systematically evaluates the impact of formal RCA on error rates. The most rigorous study comes from a tertiary referral hospital in Texas that systematically applied RCA to all serious adverse drug events (ADEs) considered preventable. The time series contained background data during the initial implementation period of 12 months and a 17-month follow-up phase.13
Published reports of the application of RCA in medicine generally present incident reporting rates, categories of active errors determined by the RCA, categories of root causes (latent errors) of the events, and suggested systems improvements. While these do not represent clinical outcomes, they are reasonable surrogates for evaluation. For instance, increased incident reporting rates may reflect an institution's shift toward increased acceptance of quality improvement and organizational change.5,21
Evidence for Effectiveness of the Practice
The Texas study revealed a 45% decrease in the rate of voluntarily reported serious ADEs between the study and follow-up periods (7.2 per 100,000 to 4.0 per 100,000 patient-days, p<0.001).13 Although there were no fatal ADEs in the follow-up period, the small number of mortalities in the baseline period resulted in extremely wide confidence intervals, so that comparing the mortality rates serves little purpose.13
The authors of the Texas study attribute the decline in serious ADEs to the implementation of blame-free RCA, which prompted important leadership focus and policy changes related to safety issues. Other changes consisted of improvements in numerous aspects of the medication ordering and distribution processes (e.g., the application of "forcing" and "constraining" functions that make it impossible to perform certain common errors), as well as more general changes in organizational features, such as staffing levels.
The significance of the decline in ADEs and its relationship to RCA in the Texas study is unclear. As the study followed a highly publicized, fatal ADE at the hospital, other cultural or systems changes may have contributed to the measured effect. The authors were unable to identify a control group, nor did they report data from serious ADEs in the year preceding the study. Their data may reflect underreporting, as there is no active surveillance for ADEs at the study hospital, leaving the authors to rely on voluntary reports. The decline in reported ADEs may actually call into question the robustness of their reporting system as other studies have found that instituting a blame-free system leads to large increases in event reporting.5 On the other hand, it seems unlikely that serious ADEs would be missed in a culture of heightened sensitivity to error.
In a separate report, an event reporting system for transfusion medicine was implemented at 2 blood centers and 2 transfusion services.25 Unique events were subjected to RCA, and all events were classified using a model adapted from the petrochemical industry.21 There were 503 events reported and 1238 root causes identified. Human failure accounted for 46% of causes, 27% were due to technical failures, and 27% were from organizational failures. This distribution was very similar to that seen in the petrochemical industry, perhaps an indication of the universality of causes of error in complex systems, regardless of industry.
Potential for Harm
The potential for harm with the use of RCA has received only passing mention in the literature, but might result from flawed analyses.31 The costs of pursuing absolute safety may be the implementation of increasingly complex and expensive safeguards, which in themselves are prone to systems failures.4,16 Ill-conceived RCAs which result in little effective systems improvement could also dampen enthusiasm for the entire quality improvement process. Arguably the harm caused by pursuit of incorrect root causes must be offset by the costs of not pursuing them at all.
Costs and Implementation
No estimates of costs of RCA have appeared in the literature, but as it is a labor-intensive process they are likely significant. Counterproductive cultural norms and medico-legal concerns similar to those seen in incident reporting may hinder implementation of RCA.12,26 The authors of the Texas study note the importance of clear expressions of administrative support for the process of blame-free RCA.13 Other studies note the receptiveness of respondents to blame-free investigation in the name of quality improvement, with one health system reporting a sustained 10-fold increase in reporting.25,27
Root cause analyses systematically search out latent or system failures that underlie adverse events or near misses. They are limited by their retrospective and inherently speculative nature. There is insufficient evidence in the medical literature to support RCA as a proven patient safety practice, however it may represent an important qualitative tool that is complementary to other techniques employed in error reduction. When applied appropriately, RCA may illuminate targets for change, and, in certain healthcare contexts, may generate testable hypotheses. The use of RCA merits more consideration, as it lends a formal structure to efforts to learn from past mistakes.
1. Sazama K. Current good manufacturing practices for transfusion medicine. Transfus Med Rev 1996;10:286-295.
2. Food and Drug Administration: Biological products; reporting of errors and accidents in manufacturing. Fed Regist 1997;62:49642-49648.
3. Sazama K. Reports of 355 transfusion-associated deaths: 1976 through 1985. Transfusion 1990;30:583-590.
4. Reason JT. Human Error. New York: Cambridge Univ Press. 1990.
5. Battles JB, Kaplan HS, Van der Schaaf TW, Shea CE. The attributes of medical event-reporting systems: experience with a prototype medical event-reporting system for transfusion medicine. Arch Pathol Lab Med 1998;122:231-238.
6. Eagle CJ, Davies JM, Reason J. Accident analysis of large-scale technological disasters applied to an anaesthetic complication. Can J Anaesth 1992;39:118-122.
7. Vincent C, Ennis M, Audley RJ. Medical accidents. Oxford ; New York: Oxford University Press. 1993.
8. Joint Commission on the Accreditation of Healthcare Organizations. Sentinel event policy and procedures. Available at: http://www.JCAHO.org/sentinel/se_pp.html. Accessed May 30, 2001.
9. Reason JT. Managing the Risks of Organizational Accidents. Ashgate Publishing Company. 1997.
10. Reason J. Human error: models and management. BMJ 2000;320:768-770.
11. Leape LL. Error in medicine. JAMA 1994;272:1851-1857.
12. Berman S. Identifying and addressing sentinel events: an Interview with Richard Croteau. Jt Comm J Qual Improv 1998;24:426-434.
13. Rex JH, Turnbull JE, Allen SJ, Vande Voorde K, Luther K. Systematic root cause analysis of adverse drug events in a tertiary referral hospital. Jt Comm J Qual Improv 2000;26:563-575.
14. Runciman WB, Sellen A, Webb RK, Williamson JA, Currie M, Morgan C, et al. The Australian Incident Monitoring Study. Errors, incidents and accidents in anaesthetic practice. Anaesth Intensive Care 1993;21:506-519.
15. Caplan RA, Posner KL, Cheney FW. Effect of outcome on physician judgments of appropriateness of care. JAMA 1991;265:1957-1960.
16. Perrow C. Normal accidents: Living with High-Risk Technologies. With a New Afterword and a Postscript on the Y2K Problem. Princeton, NJ: Princeton University Press. 1999.
17. Rasmussen J. Human error and the problem of causality in analysis of accidents. Philos Trans R Soc Lond B Biol Sci 1990;327:449-460.
18. Pope C, Mays N. Reaching the parts other methods cannot reach: an introduction to qualitative methods in health and health services research. BMJ 1995;311:42-45.
19. Giacomini MK, Cook DJ. Users' guides to the medical literature: XXIII. Qualitative research in health care A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 2000;284:357-362.
20. Giacomini MK, Cook DJ. Users' guides to the medical literature: XXIII. Qualitative research in health care B. What are the results and how do they help me care for my patients? Evidence-Based Medicine Working Group. JAMA 2000;284:478-482.
21. Van der Schaaf T. Near miss reporting in the Chemical Process Industry [Doctoral thesis]. Eindhoven, The Netherlands: Eindhoven University of Technology. 1992.
22. Moray N. Error reduction as a systems problem. In: Bogner M, editor. Human error in medicine. Hillsdale, NJ: Lawrence Erlbaum Associates. 1994.
23. Ishikawa K. What is total quality control? The Japanese way. Englewood Cliffs, NJ: Prentice Hall. 1985.
24. Joint Commission on Accreditation of Healthcare Organizations. Sentinel Event Statistics. Available at: http://www.JCAHO.org. Accessed April 16, 2001.
25. Kaplan HS, Battles JB, Van der Schaaf TW, Shea CE, Mercer SQ. Identification and classification of the causes of events in transfusion medicine. Transfusion 1998;38:1071-1081.
26. Will hospitals come clean for JCAHO? Trustee 1998;51:6-7.
27. Leape LL, Bates DW, Cullen DJ, Cooper J, Demonaco HJ, Gallivan T, et al. Systems analysis of adverse drug events. ADE Prevention Study Group. JAMA 1995;274:35-43.
28. Bhasale A. The wrong diagnosis: identifying causes of potentially adverse events in general practice using incident monitoring. Fam Pract 1998;15:308-318.
29. Troidl H. Disasters of endoscopic surgery and how to avoid them: error analysis. World J Surg 1999;23:846-855.
30. Fernandes CM, Walker R, Price A, Marsden J, Haley L. Root cause analysis of laboratory delays to an emergency department. J Emerg Med 1997;15:735-739.
31. Hofer TP, Kerr EA. What is an error? Eff Clin Pract 2000;3:261-269.