Executive Summary

Assessing the Evidence for Context-Sensitive Effectiveness and Safety


The emergence of new kinds of interventions to improve health care quality and safety has led to a rethinking of traditional health services and clinical research. Interventions intended to improve quality and safety are often complex sociotechnical interventions whose targets may be entire health care organizations or groups of providers, and they may be targeted at extremely rare events. As such, evaluation of patient safety practices (PSPs) must be evaluated along two dimensions: the evidence regarding the outcomes of the safe practices and the contextual factors influencing the practices' use and effectiveness.

The methodological criteria for assessing the quality of clinical intervention research and evaluation studies may be insufficient for studies of the effectiveness of organizational and behavioral change required to implement a safety practice. Indeed, researchers of PSPs often have to assess, as clinical researchers do, whether an intervention works. They also, as organizational and behavioral researchers do, need to determine whether such practices will work in their own settings, (i.e., will they benefit patients in their own organization with its unique attributes). In addition to questions of effectiveness (whether, how, and why interventions work), it is also important to consider unintended adverse consequences of implementing the safety practice. In other words, like medications, quality improvement (QI) and safety interventions can have side effects, which must be anticipated and measured.

Origin of this Report

Over the past decade, major concerns about the quality and safety of medical care have surfaced. Influential factors in our health care system such as government payers, accreditors, and employers have responded by creating a variety of incentives to promote quality and safety. The lack of consensus about the standards creates a risk that the substantial investment in new knowledge will be undermined by poor study design, flawed execution, or inappropriate interpretation of study results. In addition, policymakers are encouraging or requiring provider organizations to implement safe practices in the absence of explicit criteria for evaluating the strength of the evidence supporting the practice under consideration or evidence about the likelihood that patients will benefit.

Recognizing this major gap in knowledge and understanding, AHRQ supported the development of a report to identify criteria for assessing the context-sensitive effectiveness and safety of PSPs. Context is a particularly crucial issue because it is believed to be a key factor differentiating the interpretation of PSPs from clinical interventions. Researchers, policymakers, and providers evaluating PSPs care not only whether robust evidence supports the PSP, but also whether and how they can implement the PSP in their organizations to improve patient outcomes.

To address these gaps, the Agency for Healthcare Research and Quality issued a Request for Proposals (RFP) focused on developing criteria to assess the effectiveness and safety of PSPs. In the RFP guiding this project, PSPs are described as "interventions; systems, organizational, and behavioral interventions; and various combinations of these." To provide a real-world basis for committee deliberations regarding the research questions, the study investigators, working with a panel of experts, chose to focus on five PSPs representing various aspects of the patient safety research field:

  1. Checklists for catheter-related bloodstream infection prevention.
  2. The Universal Protocol for preventing wrong procedure, wrong site, wrong person surgery.
  3. Computerized order entry/decision support systems.
  4. Medication reconciliation.
  5. Interventions to prevent in-facility falls.


In this 1-year project, we assembled a 22-member Technical Expert Panel (TEP) comprising international patient safety leaders, clinicians, policymakers, social scientists, and methodologists. We met with the TEP three times, performed many literature reviews, conducted five Internet surveys, and achieved consensus on the points below.

Key Findings

  1. Important evaluation questions for these PSPs are:
    1. What is the effectiveness of the PSP?
    2. What is the implementation experience of the PSP at individual institutions?
    3. What is the success of widespread adoption, spread, and sustainability of the PSP?

    Interpretation and significance: Evaluations of PSPs should explicitly consider these three questions. Journals should consider asking researchers to report on them separately. Also, implementers will want to assess their experience across all three questions.

  2. High-priority contexts for assessing context-sensitive effectiveness at individual institutions are:
    1. Structural organizational characteristics (such as size, location, financial status, existing quality and safety infrastructure).
    2. External factors (such as regulatory requirements, the presence in the external environment of payments or penalties such as pay-for-performance or public reporting, national patient safety campaigns or collaboratives, or local sentinel patient safety events).
    3. Patient safety culture (not to be confused with the larger organizational culture), teamwork, and leadership at the level of the unit.
    4. Availability of implementation and management tools (such as staff education and training, presence of dedicated time for training, use of internal audit-and-feedback, presence of internal or external individuals responsible for the implementation, or degree of local tailoring of any intervention).

    Interpretation and significance: Context is considered important in determining the outcomes of PSPs. The study investigators and the TEP judged these four domains as the most salient areas of context. This recommendation has broad implications for a variety of audiences. Researchers should be encouraged to measure and report on these contexts when describing a study of a PSP. Consumers of research will want to look for such reports, which will influence their interpretation of the study results and affect the applicability of the PSP to their setting. Accreditors and regulators should be reluctant to mandate adoption of a given PSP if it appears to be very dependent on context. In that case, they should also provide guidance on how that PSP might need to be modified depending on local contexts.

  3. There is insufficient evidence and expert opinion to recommend particular measures for patient safety culture, teamwork, or leadership. Given the plethora of existing measurement tools we identified and reviewed, our recommendation is to use whichever method seems most appropriate for the particular PSP being evaluated.
    1. For patient safety culture, the measurements methods with the most support were the AHRQ Patient Safety Culture Surveys, the Safety Climate Scale, and the related Safety Climate Survey.
    2. For teamwork, the most support was given to the ICU [Intensive Care Unit] Nurse-Physician Questionnaire; no other measure received more than half the votes of respondents.
    3. For leadership, the measures receiving the most support were the ICU Nurse-Physician Questionnaire, the Leadership Practice Inventory, and the Practice Environment Scale.

    Interpretation and significance: Because the four areas of context described in Point 2, above, are judged highest priority, it will be crucial to develop and use valid measures of them in PSP studies. Researchers' use of common validated instruments would better enable readers to evaluate whether published results are applicable to their own settings. The state of the science here is immature, and funders and researchers are encouraged to continue to develop standard measures of the key domains of context.

  4. The PSP field would advance by moving past considering studies of effectiveness as being "controlled trials" versus "observational studies." Although controlled trials offer greater control of sources of systematic error, they often are not feasible, either in terms of time or resources. Also, controlled trials often are not possible for PSPs requiring large-scale organizational change or PSPs targeted at very rare events. Hence, strong evidence about the effectiveness and comparative effectiveness of PSPs can be developed using designs other than randomized controlled trials. However, PSP evaluators are to be discouraged from drawing cause-and-effect conclusions from studies with a single pre- and post-intervention measure of outcome. More sophisticated designs (such as a time series or stepped wedge design), are available and should be used when possible.

    Interpretation and significance: Given the major push to improve patient safety and the focus on evidence-based practices (which are rapidly embedded in national standards such as those issued by the National Quality Forum, the Joint Commission, the Institute for Healthcare Improvement, and others), it will be crucial to develop standards for appropriate evaluations to answer key safety-oriented questions. The results above will help journal editors, funders, researchers, and implementers adopt robust study methods for PSPs, methods that most efficiently answer the key questions without undue bias.

  5. Regardless of the study design chosen, criteria for reporting on the following items in a PSP evaluation are necessary, both for an understanding of how the PSP worked in the study site, and whether it might work in other sites:
    1. An explicit description of the theory for the chosen intervention components, and/or an explicit logic model for "why this PSP should work."
    2. A description of the PSP in sufficient detail that it can be replicated, including the expected change in staff roles.
    3. Measurement of contexts in the four domains described in Point 2, above.
    4. Details of the implementation process, what the actual effects were on staff roles, and how the implementation or the intervention changed over time.
    5. Assessment of the impact of the PSP on outcomes and possible unexpected effects. Including data on costs, when available, is desirable.
    6. For studies with multiple intervention sites, an assessment of the influence of context on intervention and implementation effectiveness (processes and clinical outcomes).

    Interpretation and significance: These criteria (items a-f) are deemed necessary for an understanding of PSP implementation and effectiveness and the degree to which these elements are sensitive to context. Future AHRQ-supported evaluations of PSP implementation should adhere to the criteria developed by this project. Only through repeated assessments and measurements will it be possible to determine the context-sensitivity of PSPs, build the evidence base for which contexts are most important, and determine how they should be measured and reported.

Recommendations for Future Research

Based on the group discussions and a formal vote by the TEP, the most important needs for future research are:

1. Developing and validating measures of patient safety culture. Discussion at the panel meetings indicated that several technical experts considered patient safety culture to be the overarching important construct. This view may explain why patient safety culture received majority support as a high priority for future research, whereas research on leadership and teamwork measures did not. Specific suggestions for future research included:

  1. Developing validated measures of cultural adaptability to change.
  2. Assessing the potential distinction between a culture of safety, a culture of excellence, and organizational culture.
  3. Establishing connections between aspects of patient safety culture and patient outcomes or processes of care.
  4. Assessing correlations between measures.

Additional comments that we received can be summarized as "we think teamwork and leadership are important," "several measures are currently available," and "the most important thing at this point is for people to use them so we can start building some evidence about this construct."

2. Developing criteria and recommendations, for what constitutes "reporting the intervention in sufficient detail that it can be replicated." More precise criteria for how PSP interventions should be described warrant additional research. In particular, the guidance described here, along with that provided by Standards for Quality Improvement Reporting Excellence (SQUIRE) and the National Quality Forum (NQF), need to be evaluated. Doing so will help determine which PSP elements need to be described in order to evaluate whether the PSP is truly effective. This also will help maximize the possibility of successful PSP replication with similar outcomes. Further research could also evaluate the effect of applying these draft criteria regarding PSP descriptions on the quality of PSP projects and published articles. Clearly, thoroughly describing PSPs also can help readers determine the relevance of an evaluation study to other PSPs or other contexts. For example, if a PSP requires an individual behavior change such as hand-washing, then knowing intervention details may help readers of the study assess whether the given results are relevant only to hand-washing interventions or if they could be applied to other types of PSPs requiring individual behavior change. Knowing the details of the intervention also could help readers of the study determine how much the success of the PSP implementation depended on contextual issues (e.g., organization or teamwork). 

3. Understanding the important items to measure and report on for implementation. Experts consider having comprehensive information about implementation key to being able to replicate a PSP. However, little empirical evidence exists about what makes a description of the PSP adequate for reporting. Assessing what implementers need to know, if they are to be able to implement or adapt an intervention in their own settings, is critical. Most experts considered "understanding the important items to measure and report on for implementation" to be related to or even the same as "reporting the intervention in sufficient detail that it can be replicated." This view suggests that the distinction between "the intervention" and "the implementation" may be an arbitrary line, and that ideal evaluations of PSP interventions need to consider the implementation as part of the intervention.

4. Developing a theory-based taxonomy or framework with which to describe and evaluate key elements of interventions, contexts, and targeted behaviors. Although the current project made a promising start on meeting this need, progress in this area will require additional development to produce a taxonomy that would be both sufficiently broad based and flexible enough to be widely useful. Issues to be considered include whether a taxonomy is the preferable way to proceed, or whether a more useful strategy might be to create an explicit methodology that researchers could apply to specific problems and contexts. Yet another approach might be to devise an "assessment framework." Some experts sounded cautionary notes on this topic. They reported that outpatient PSP research may be too new to apply a taxonomy at this stage. They also reported that a single "unified" taxonomy may not be sufficiently flexible for diverse PSPs, and multiple taxonomies may be needed in any case. The countervailing view to these cautionary notes was that the field would not be well-served by having a proliferation of taxonomies. Instead, they reported, what is needed is a coherent, sufficiently comprehensive taxonomy that can accommodate the challenges of the subject.

5. Refining a framework for assessing the strength of a body of evidence. We did developmental work on an adaptation of the GRADE and Evidence-based Practice Center (EPC) systems for assessing the strength of evidence across studies of a PSP. This work warrants further development.

6. Generating empirical evidence that the contextual factors identified in this project influence the success of the PSP. We acknowledge that most of the recommendations in the report have a thin empirical evidence base, which simply reflects the relatively immature state of research in this still relatively young field. Building a stronger evidence base will help future efforts at refining the recommendations presented here.

Page last reviewed December 2010
Internet Citation: Executive Summary: Assessing the Evidence for Context-Sensitive Effectiveness and Safety . December 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://archive.ahrq.gov/research/findings/final-reports/contextsensitive/contextsum.html