Chapter 41. Human Factors and Medical Devices
Harvey J. Murff, M.D.
Harvard Medical School
John W. Gosbee, M.D., M.S.
Department of Veterans Affairs National Center for Patient Safety
David W. Bates, M.D., M.Sc.
Harvard Medical School
Human factors engineering (HFE), also known as usability engineering or ergonomics, is the study of how humans interact with machines and complex systems.1 Through the merging of cognitive psychology, engineering and other disciplines, human factors researchers have detailed numerous principles concerning device and software program designs that allow for optimal usage.2 When these principles are violated, improper use of a machine is more likely to result.3,4 Specific examples have been detailed during observations of user errors with electronic infusion devices.5
Medical device misuse is an important cause of medical error,6,7 and therefore, incorporating human factors methodology into the design of medical devices has assumed an important role in ensuring patient safety.2,8 This chapter first describes the use of HFE principles as a safety practice in the design of medical devices and their evaluation both prior to and after institutional purchase. Next, medical device alarms and the contribution of HFE to alarm improvements will be evaluated. Finally, the chapter reviews the use of preoperative checklist procedures to reduce anesthesia device failures (see also Chapter 23).
Human factors engineering is a powerful component in the design of usable, safe medical devices.8 HFE principles can be incorporated as safety practices that occur at various points during device development and usage. Industry can use HFE principles at multiple times in the design and developmental cycle of medical devices and software packages.3 Healthcare institutions can consider results of HFE evaluations when deciding which products to purchase. Finally, HFE principles can also be incorporated into the ongoing evaluation of devices that have already been purchased and are in use. While these practices have high face validity, there has been little formal study of their effectiveness in reducing medical error. They are presented here because they may hold promise if scrutinized rigorously, and to familiarize readers with their potential to reduce medical error.
Design and Developmental Phase
Data collected by the United States Food and Drug Administration (FDA) in the late 1980s demonstrated that almost half of all medical device recalls resulted from design flaws.9 In 1990, Congress passed the Safe Medical Devices Act, giving the FDA the ability to mandate good manufacturing practices (GMP). These GMP involve design controls for manufacturers that help ensure the use of HFE within medical device design.9 As described in the Good Manufacturing Practice Regulation, Design Control subsection (Title 21-Section 820.30), these include the use of iterative design and testing during the developmental phase. The Act requires that designs be "appropriate and address the intended use of the device, including the needs of the user and patient."10 Multiple human factors techniques, such as user studies, prototype tests, and task/function analysis, are utilized in the development and design process.
Manufacturers are required not only to use human factors principles to repeatedly test the product in all phases of design, but also to validate the ultimate device design. Validation entails testing the device, either in an actual clinical situation or a simulation, and documenting that the device conforms to the individual user's needs. Thus, manufacturers are required to apply HFE methods through the multiple phases of device design and development cycles.10
Human factors engineering practices for medical device design and evaluation have been well described. In 1993 the Association for the Advancement of Medical Instrumentation and the American National Standards Institute established guidelines for the incorporation of HFE principles into medical device design.11 This comprehensive document helped direct attention to the problem of poor medical device design and helped establish the design standards necessary to ensure safe medical equipment and thus should stand as the safety benchmark for industry.
Only limited data are available concerning the application of HFE principles to medical device design, and most are not published. Nonetheless, the application of human factors principles during a device's design phase has been demonstrated to reduce user error. Patient controlled analgesia (PCA) pumps are a case in point of how HFE principles in product design reduce user error. User errors associated with poor interface design have been described with PCA pumps.12,13 Lin and colleagues investigated whether applying human factors engineering principles to the design of the user interface of a PCA pump could result in fewer dosage errors as well as less time spent programming the device.14 Information on device usage was obtained through cognitive task analysis. This involved observing and interviewing nurses operating PCA pumps both in the laboratory setting and in the field. Utilizing this feedback, as well as other human factors design principles, a "new" PCA pump interface was designed. Twelve recovery room nurses were required to complete specific tasks with both the standard PCA user interface and the newly designed interface. There were 29 programming errors on the traditional interface and 13 on the redesigned interface (an error reduction of 55%, p<0.01). Furthermore, users were able to program in the necessary orders in 18% less time.13
Another example involves the design of an ultrasound machine. In this study, Aucella and colleagues15 interviewed sonographers, videotaped the ultrasound device being used, and performed usability testing through simulation to collect information regarding the operator-machine interface of the ultrasound machine. After their extensive investigations they implemented over 100 design changes to the console and control panel. Although errors with the machine were not measured, comments collected by the authors from the beta operators of the newly designed device suggested that the resulting machine was much easier to use.
There are enormous numbers of medical devices and software being designed and developed. Thus the FDA has initiated several regulatory mechanisms to ensure compliance with these guidelines. Some of the mechanisms include site inspections of manufacturers, review and approval of medical devices before marketing, and review of medical device incident reports.16 Despite the tremendous amount of effort put forth by the FDA to ensure compliance with the Good Manufacturing Practices Regulation, individual institutions should critically analyze whether a device they intend to purchase meets HFE principles for user-centered design.
Device Evaluation Prior to Purchase
Adhering to HFE principles during initial design stages of a medical device is essential. However, human factors analysis should also be incorporated into the institutional decision to acquire a new medical device or software program.3 Device purchasers should strongly consider institution-specific human factors testing. Usability testing at the institutional level establishes built-in redundancies to capture any design problems missed by manufacturers. Furthermore, the users and environments at individual institutions will differ, possibly in important ways, from the users and environments in which the device or program was initially designed and tested. It is important for an institution to be aware of who the intended users of the device or software will be, as well as where and when they plan to use the device. The information for such evaluations may be obtained from vendors, from an in-house analysis, or from independent organizations.
Vendors must be able to prove to the FDA that the user will be able to operate the medical device in the way in which it was intended.10 As companies are required to collect human factors analysis data, it is important that institutions wishing to purchase a new medical device or software receive and carefully review this information. Gosbee provides a list of questions to ask a vendor before a purchase, which include: "How long does it take to learn to operate the system? How long does it take to complete typical set-up tasks? What are the types and frequency of errors that could happen, and the systems to thwart them?"3
It is also important to consider the environment in which a device will be used. Idiosyncratic features of the environment, such as excessive noise or poor lighting, and differences in user skill or acuity due to fatigue or otherwise, may affect safety and the device's in-house usability.
Some institutions have developed in-house usability labs, in order to rigorously test any device before purchasing. The Mayo Clinic uses simulations to test the usability of medical software before purchasing.17 By carefully measuring user performance with the software they are able to uncover latent errors in the design. The usability lab is also able to measure the time necessary to learn to use the new software. This important information can help predict the device's or software's influence on workflow as well as its predilection for operator misuse.
Even without sophisticated usability laboratories, an institution can use basic human factors techniques to evaluate a product before purchase.3 Powerful techniques such as cognitive walk-through can be easily utilized at any institution. This involves observing the end-users of a product interact with the product. As they attempt to use the device, they are instructed to "think out loud." Careful observation of the user's actions and comments can identify potential design flaws that might make it difficult to utilize the device or software.
Independent organizations are another potential source of information on device safety. Unfortunately, most independent sources do not make clear to what degree HFE principles were used in product evaluations, although they do provide some assessment of safety. One such organization is ECRI (formerly the Emergency Care Research Institute), a nonprofit international health services research agency. Another is the Institute of Safe Medical Practices (ISMP). Both release newsletters and publications regarding product safety. By searching these and similar databases, institutions can gather additional information concerning product safety prior to purchasing a device. ERCI also publishes articles specifically geared to the institutions that might wish to purchase a medical device or software.
Regardless of the level of pre-procurement testing, some unsafe designs will not be detected until after the product is in use.3 Therefore, it is important for institutions to continuously evaluate these products to ensure safety.
Ongoing Device Evaluation
Devices and software at greatest risk for user error should be systematically evaluated. This is particularly important in areas where multiple devices are used with different interfaces, such as the operating room or the intensive care units.3 Furthermore, areas where multiple medications are stored together should be scrutinized for potential latent errors within device or software user interfaces prior to user errors occurring.
Resources are available that can help direct an institution's search. Through publications from the FDA, ECRI, ISMP and similar organizations, medical device problems identified at other institutions can be targeted. Thus an important safety practice may be using this published information to search for latent errors within already purchased medical devices and applying this information toward a directed product evaluation at the local institution.
Another potential safety practice is to educate practitioners about HFE principles to increase awareness of medical device user error.3 Several groups, including the American Nurses' Credentialing Center and the American Society of Health-Systems Pharmacists, recommend incorporating HFE training within healthcare curricula as a means to reduce error.18
To create a culture of safety within medicine, practitioners must couple the ability to identify potential design weaknesses with a change in the prevailing culture of silence surrounding medical errors. Educational programs directed at healthcare providers in training should address both of these important concerns. Curricula for teaching medical student and medical residents HFE principles have been described18 and will likely be adopted at other institutions. Casarett and Helms caution that an unintended result of error curriculum19 may be that residents become too willing to attribute an error to system causes. Their concern is that the resident will ignore any possible individual contribution to the adverse medical event and not learn from analyses of the event. This concern has been discounted by Gosbee, stating that any error-in-medicine curriculum should aim to "teach residents to see when errors are due to inadequate skills and knowledge versus when they are due to inherent cognitive limitations and biases."18
Numerous aspects of patient care compete for providers' attention and can reduce their vigilance in monitoring medical devices. Alarms can alert providers to urgent situations that might have been missed due to other distractions and have become a necessary part of patient monitoring. In a study looking at critical incidents within a neonatal intensive care unit, 10% were detected through alarms.20
However, fundamental flaws in the design of current alarm systems likely decrease their impact.21 There are reports documenting some alarm failings in the medical literature,22 but few data address interventions to improve alarm system effectiveness. For an alarm to be effective it requires that a medical problem trigger the alarm, that personnel identify the source and reason for the alarm, and that the medical problem be corrected prior to patient injury. This section reviews 2 aspects of alarm safety: (1) the use of HFE principles in the redesign of medical alarms to improve identification of the source and reason for alarm, and (2) practices in both device design and programming that may improve safety by decreasing false positive alarms.
Identification of Alarm Source and Reason
The recognition accuracy of alarms within the operating room is quite low. When presented with alarm sounds and asked to identify the source, anesthesiologists, operating room technicians, and operating room nurses correctly identify the device producing the alarm only 33 to 53.8% of the time.23-25 Furthermore, experiments suggest that humans have difficulty reliably recognizing more than 6 alarms at one time.26 The sheer number of different medical devices with alarms can make it difficult to discern one alarm from another and studies within the human factors literature have documented the inability of medical providers to discern between high priority and low priority alarms.27 While this is a known problem in operating rooms and intensive care units, how well alarms are recognized in other settings has not been described.
Some effort has been made to improve alarm systems through redesign.28 One non-medical study examined ways to improve the recognition of auditory alarms by comparing abstract alarm sounds with specially designed alarms using speech and auditory icons.29 Other studies within the human factors literature have revealed certain acoustical properties that are more likely to result in a higher sense of perceived urgency by the operator.
In a series of experiments, Edworthy required subjects to rank the level of urgency associated with different alarms.30 The acoustical properties of the alarms were altered for the different subjects. Level of urgency was then correlated with a specific alarm sound. After ranking a set of acoustic parameters based on perceived urgency, the experimenters predicted what urgency ranking the alarm would receive and played the alarms for a new set of subjects. The correlation between the subjects' urgency rating and the investigators' predicted ratings was 93% (p<0.0001). Acoustical properties such as fundamental frequency, harmonic series, and delayed harmonics all affected the users perceived urgency.
Another study looked at the redesign of an alarm to improve detectability within the operating room.31 An alarm that was spectrally rich, frequency-modulated, and contained small amounts of interpolated silence was detectable with at least 93% accuracy over background operating room noise. However, both of these alarm experiments have only been done in laboratory settings. In addition, Burt and colleagues found that when subjects were required to urgently perform a task, the prior acoustically manipulated perception of urgency was ignored in order to attend to the situational urgency of the task.32 Furthermore, with both alarms and clinical tasks competing for an operator's attention, the newly designed alarm might not be as discernible. It has continued to be a challenge to create the best auditory alarm sound to indicate an emergency.
Visual Interfaces for Alarms
Alarms can also be visual. Some research has been done to improve hemodynamic monitoring device displays. Responses to abnormal values are delayed when workload for the anesthesiologist is high,33 prompting interest in improving current visual displays. Furthermore, the clinical decision process often rests on the practitioner's interpretation of a patient's hemodynamic parameters. Thus, it is important that this information be presented in a way that assists with decision making and minimizes errors of interpretation.
Two observational studies have compared different visual displays of data to traditional visual monitors.34,35 Each evaluated errors in performing a designated task as well as response time to completion. One measured how quickly subjects recognized a change in a parameter34 and the other measured how long it took for anesthesiologist to manipulate a set of abnormal parameters to a stable set.34,35 Both studies used computerized simulations of anesthesiology cases, with subjects serving as their own controls. In one study, subjects were required to identify when changes in physiologic parameters occurred using different visual formats.34 Response time and accuracy to the simulated cases was compared among a histogram, polygon, and numerical display. Subject responses were more accurate with the histogram and polygon displays (p=0.01).
In the other study, 20 anesthesiologists with an average working experience of 5 years were required to perform specific tasks on an anesthesia simulator35 (see Chapter 45). The tasks consisted of returning a set of abnormal hemodynamic parameters to normal using intravenous medications. A specific time for the completion was determined and this time was compared among 3 different visual interfaces. Trial time was significantly shorter with the traditional display (p<0.01), yet there were fewer failed trials using the other monitor displays (26% with the profilogram display, 11% with the ecological display, and 42% with the traditional display). The slower time with the non-traditional displays could have resulted from the subject's lack of experience with such screens. Nevertheless, the newer interfaces produced fewer failed attempts at arriving at the appropriate hemodynamic parameters on the simulator, suggesting that these displays might improve the clinical decision process.
None of the studies comparing traditional auditory alarms and visual monitor displays reported any adverse event associated with the newer technology. However these studies are limited by the artificial nature of the experiments.29,34,35 Anesthesiologists have many tasks to perform during anesthesia, often amidst great distraction. Attending to monitors is only one aspect of their workload. Because these laboratory experiments do not include all of the different "real world" problems and diversions that an anesthesiologist might face, it is difficult to generalize them to the workplace. Also, because this experimental task might be taken out of the context of caring for a patient in the operating room, the subject might simply focus on the completion of the experimental task and not consider other tasks that the anesthesiologist would be required to perform in a real situation.
Decreasing the Frequency of Alarms
Poorly designed device alarms can create not only problems with alarm recognition but also frequent false positive alarms. Two observational studies found that from 72 to 75% of alarms during routine general anesthesia did not require corrective action.36,37 Another study showed that only 3% of all auditory alarms during routine anesthesia monitoring represented a patient risk.38 Providers frequently must interrupt clinical tasks to silence these false positive alarms. More concerning is the fact that when alarms are unreliable, they tend to be ignored.21,39 This "cry-wolf" effect is a significant detriment to the optimal performance of alarm systems and may result in dire consequences when "true alarms" are ignored.
False alarms can be managed in two ways. Devices can be designed so that they identify and eliminate false alarms before triggering or users can manipulate alarm parameters to reduce false alarms. User manipulation can range from adjusting alarm thresholds40 to even turning the alarms off.22 There are no data describing how often operators reset alarm parameters to reduce false positive rates.
Some research has focused on the identification of alarm parameters that improve or optimize alarm accuracy (i.e., to improve the ratio of true positives to false positives—the "signal-to-noise" ratio). For example, Rheineck-Leyssius and Kalkman studied how altering an alarm parameter on a pulse oximeter would affect the incidence of hypoxemia.40 Consecutive patients admitted to the recovery room of a regional hospital in the Netherlands after general or regional anesthesia were randomized to either a lower limit of SpO2 90% or SpO2 85%. The 2 groups were comparable at baseline. The outcomes measured were hypoxemia, defined by a pulse oximeter reading less than or equal to 90% or 85%. The authors were also required to judge if they believed a signal to be artifact versus a true positive. The authors were blinded as to which group the subject was randomized to during artifact assessment and data analysis. The relative risk of having a hypoxic episode (Sp02£85%) in the group with the lower alarm limit set at 85% (as compared with those with the lower alarm limit set at 90%) was 3.10 (95% CI: 1.32-7.28, p<0.001). One weakness of this study was the lack of a bedside observer to verify the validity of the measurement, so that it is unclear to what degree measurement bias could have affected the results. The pulse oximeter was considered the "gold standard" for measuring hypoxia and thus false positives were calculated based on alarm artifact rates (outliers, loss of signal). Keeping the lower alarm limit for a pulse oximeter at 90% did reduce the number of patients with hypoxemia, however it also increased the false positive rate (33% versus 28%). A higher false positive rate on an alarm could make it more likely that an operator might disregard the alarm. The majority of alarms were transient and lasting less than 20 seconds. The authors also noted a 60% reduction in the number of triggered alarms in the Sp02 90% group by introducing a "theoretical delay" of 15 seconds between crossing the alarm threshold and actually triggering the alarm. Other investigators have documented a 26% reduction in mean alarm rate by increasing the alarm delay from 5 to 10 seconds.41
Overall, only modest evidence supports the practice of not lowering pulse oximeter lower alarms limit settings below 90%. This intervention could reduce hypoxemic events with little added cost. However, there would be an increased number of false positive alarms, which might affect attendance to the device. Fortunately, newer technological advances in oximetry appear to reduce false positives rates and may make this less of a problem. In a study in the Netherlands, a conventional pulse oximeter was compared with a "third generation" pulse oximeter equipped with a signal processing technique designed to reduce false positives.42 This "smart" pulse oximeter applied signal quality tests, differentially amplified the input signal, and applied motion-detection testing to the identified pulse. The "smart" pulse-oximeter only triggered one false positive (an alarm that did not coincide with hypoxia) and had a relative risk of 0.09 (95% CI: 0.02-0.48) for generating a false positive alarm when compared with conventional pulse oximetry with a 21-second delay.
Observational studies have suggested that current alarm systems could be improved, but future laboratory studies are needed to determine which properties of alarm systems are most effective to alert operators. These tests must be followed by field studies and ultimately with trials looking at actual patient outcomes to determine the best designs. Information concerning the cost and feasibility of implementing these changes should also be gathered.
False positive alarms remain a significant problem. Few data exist on the incidence of resetting alarm parameters or at what parameter values alarm accuracy is optimized. Advances in alarm technology aimed at reducing false positives appear a promising alternative to resetting parameters.