Skip Navigation Archive: U.S. Department of Health and Human Services U.S. Department of Health and Human Services
Archive: Agency for Healthcare Research Quality
Archival print banner

This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: Let us know the nature of the problem, the Web address of what you want, and your contact information.

Please go to for current information.

Proof and Policy from Medical Research Evidence

Special issue contains articles from expert meeting, 'Evidence: Its Meanings and Uses in Law, Medicine and Health Care.

Cynthia D. Mulrow, University of Texas Health Science Center—San Antonio, and Kathleen N. Lohr, University of North Carolina School of Public Health—Chapel Hill.

Notice of Copyright

This article was originally published in the Journal of Health Politics, Policy and Law. All rights reserved. This material may be saved for personal use only, but may not be otherwise reproduced, stored, or transmitted by any medium, print or electronic, without the explicit permission of the copyright holder. Any alteration to or republication of this material is expressly prohibited.

It is a violation of copyright law to reproduce any copyrighted information from this publication without first obtaining separate permission directly from the copyright holder who may charge fees for the use of such materials. It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form.

Permission requests should be directed to:
Journals Division
Duke University Press
Box 90660
Durham, NC 27708
Fax: (919) 688-3524


Evolution of Ideas about Medical Evidence
Assembling, Evaluating, and Interpreting Medical Research Evidence
Interpreting and Judging Medical Research
Applicability of Medical Research Evidence to Populations or Individuals
Recommendations Based on Evidence: Guidelines versus Standards


When judging the benefits and harms of health care and predicting patient prognosis, clinicians, researchers, and others must consider many types of evidence. Medical research evidence is part of the required knowledge base, and practitioners of evidence-based medicine must attempt to integrate the best available clinical evidence from systematic research with health professionals' expertise and patients' rights to be informed about diagnostic and therapeutic options available to them. Judging what constitutes sound evidence can be difficult because of, among other things, the sheer quantity, diversity, and complexity of medical evidence available today; the various scientific methods that have been advanced for assembling, evaluating, and interpreting such information; and the guides for applying medical research evidence to individual patients' situations. Recommendations based on sound research can then be brought forward as either guidelines or standards, and criteria exist by which valid guidelines and standards can be developed and promulgated. Nonetheless, gaps and deficiencies exist in current guidelines and in the methods for finding and synthesizing evidence. Interpreting and judging medical research involves subjective, not solely explicit, processes. Thus, developments in evidence-based medicine are an aid, but not a panacea, for definitively establishing benefits and harms of medical care, and the contributions that medical research evidence can make in any clinical or legal situation must be understood in a context in which judgment and values, understanding of probability, and tolerance for uncertainty all play a role.

Return to Contents


Many types of evidence must be considered when judging the benefits and harms of medical care and forecasting the prognoses of patients (Table 1). This article addresses one form of evidence and answers the question "What constitutes sound medical research evidence?"

Specifically, as the first in a series of papers prepared for the workshop on "'Evidence:' Its Meanings and Uses in Law, Medicine, and Health Care," we address the evolution and current concepts of medical research evidence and methods that are used to synthesize and judge such evidence. Further, we offer an overview of the status of medical evidence, evidence-based medicine, and clinical practice guidelines in medicine. We review the history, development, and current meaning of evidence in medicine, as well as how medical evidence is currently manifested in guidelines.

The primary definition of "evidence" given in Webster's New World Dictionary (1988) applies: the data on which a conclusion or judgment may be based. It is accepted that medical data often are limited. Medical research inadequately addresses many health-related situations that confront patients, practitioners, health care systems, and policy makers. The gaps between what research evidence shows will likely benefit or harm, and what patients and the public receive or are exposed to, can be large (Haynes 1993). We do not focus on such gaps and the reasons behind them (e.g., inadequate decision support systems at the point of care, rapidly evolving complex knowledge, competing priorities and limited resources, conflicting values, errors, or insufficient skills and communication). Rather, we address methods for judging and summarizing health care evidence from the ideological perspective of the medical profession.

Table 1. Types of Evidence Involved in Medical Judgments

  • Medical research.
  • Particulars of patient situations such as course and severity of illness, concurrent mental and physical disease, education, beliefs, social resources, and finances.
  • Medical providers' experiences, beliefs, and skills.
  • Society's values.
  • Patients' readiness to accept and adherence to recommended diagnostic, therapeutic, and/or monitoring strategies.
  • Health care systems' rules, resources, and financing.

Return to Contents

Evolution of Ideas about Medical Evidence

Both the diversity and quantity of medical evidence increased during the twentieth century. In the first half of the century, advances in medical research were based primarily on basic, physiologic, and reductionist approaches (Annas 1999; Porter 1997). Units of study focused on cells, organs, and animals. By the second half of the century, two major developments changed the face of medical research. First, revolutionary advances in our understanding of molecular and cellular biology prompted scientists to initiate remarkable new avenues of study, such as the Human Genome Project (HGP). Second, the branch of medicine known as epidemiology spawned new research designs for use with human participants, most notably the advent of the clinical trial (Bull 1959; Lilienfeld 1982; Porter 1997; Williams 1999). These new tools to answer important scientific questions raised the bar for medical research that was directly applicable to medical care of patients (Williams 1999).

Concomitant with the development of new research designs, increasing medical research of all types was seen. In the 1990s, more than two million articles were published annually in more than 20,000 biomedical journals, more than 250,000 controlled trials of health care therapies had been conducted, and more than $50 billion was being spent annually on medical research (Ad Hoc Working Group for Critical Appraisal of the Medical Literature 1987; Michaud and Murray 1996; Cochrane Collaboration 1999).

Not surprisingly, the medical profession's beliefs concerning evidence have evolved in the United States, influenced in large part by swings in their philosophies of how to approach health care (Figure 1, 42 KB). In the late 1700s, Dr. Benjamin Rush, a signer of the Declaration of Independence and the "founding father" of American medicine, urged practitioners and patients alike to be "heroic, bold, courageous, manly, and patriotic" (Payer 1988; Silverman 1993: 5). Rush's followers sanguinely believed in direct, drastic intervention: "When confronted by a sick patient, providers gather their purges and emetics, bare their lancets, and charge the enemy, prepared to bleed, purge and induce vomiting until the disease is conquered" (Silverman 1993: 6). A hundred years later, this "do everything you can, anything is possible" approach was replaced with a more nihilistic philosophy espoused by the famous North American physician and writer Oliver Wendell Holmes (not to be confused with the little-known son and Justice of the same name!). As a reaction to medicine's unbridled use of treatments such as purging, blistering, mercury, and arsenic, Holmes (1988: 6) espoused "doing nothing because doctors did more harm than good." A renowned early-twentieth-century American physician, William Osler, mirrored Holmes's message: "Most remedies in common use are likely to do more harm than good" (Thomas 1983: 15).

Thus, in the early 1900s, treatment of disease was a minor part of American medical curricula. Rather, the focus was on accurate diagnosis, prediction of course of disease, and doctors standing by as compassionate family friends and advisors (Porter 1997; Williams 1999). A therapeutic explosion around the time of World War II erased any notion that doctors would remain passive observers, sitting with a magazine of largely blank cartridges; a feverish and soaring optimism hit American medicine (Gordon 1994; Porter 1997; Williams 1999). We returned to Rush's "do everything you can, anything is possible" dogma.

Diagnostic and treatment strategies were adopted with little thought given to the need for careful observations in adequate numbers of patients and for comparisons of outcomes between persons given an intervention or diagnostic test and those not given the intervention or test. Potential harms of diagnostic and therapeutic approaches often were not studied, and innovations were adopted enthusiastically and uncritically. Fueled by recognition of some treatment disasters, an underlying value system firmly embedded in scientific inquiry and experiment, marked variation in the practice patterns of medical professionals, and new types of medical research and dissemination strategies, leading North American physicians propagated "evidence-based medicine" during the last decades of the twentieth century.

Evidence-based medicine is defined as the conscientious, explicit, and judicious use of current best evidence in making decisions about health care (Sackett et al. 1997). Evidence-based practice, building on the original definition, is said to be "an approach to decision making in which the clinician uses the best evidence available, in consultation with the patient, to decide upon the option which suits that patient best" (Muir Gray 1997: 9). The latter concept does emphasize the role of patients in shared decision making about their health care. Thus, practicing evidence-based medicine involves integrating the medical professional's expertise and the patient's right to choose among diagnostic and treatment alternatives with the best available external clinical evidence from systematic research.

Best available external clinical evidence is taken to mean clinically relevant evidence, often from the basic sciences of medicine, but especially from patient-centered clinical research into the accuracy and precision of diagnostic tests, the power of prognostic markers, and the safety, efficacy, and effectiveness of therapeutic, rehabilitative, and preventive regimens (Sackett et al. 1997). Although evidence-based medicine has provoked antagonism and skepticism among some academics and practicing physicians, many of its underlying principles reflect the medical profession's current understanding of sound medical evidence (Naylor 1995; Feinstein and Horwitz 1997; Lohr, Eleazer, and Masukopf 1998). Moreover, evidence-based medicine stresses a structured critical examination of medical research literature; relatively speaking, it deemphasizes average practice as an adequate standard and personal heuristics.

Return to Contents

Assembling, Evaluating, and Interpreting Medical Research Evidence

Medical research evidence can be simple and straightforward or complex and conditional. The latter, common instance poses a tremendous challenge to consumers, health care providers, and policy makers who try to understand what scientific evidence is valid. Moreover, understanding the causes of diseases, benefits and harms of diagnostic or therapeutic strategies, and prognoses of patients often requires accumulating and critiquing data from multiple studies and disciplines (Hulka, Kerkvliet, and Tugwell 2000).

When evidence is not simple, and when there is a lot of it, we can use frameworks and trained experts to assemble, sort through, and integrate evidence. Scientific methods for assembling, evaluating, and interpreting medical research evidence have been developing rapidly (Light and Pillemer 1984; Eddy 1992; Cook et al. 1995; Cook, Sackett, and Spitzer 1995; Mulrow and Cook 1998; Cochrane Collaboration n.d.). The principles behind these methods are to avoid bias in finding, sorting, and interpreting data, and to be comprehensive and current (Table 2).

The methods that one uses to assemble and critique relevant evidence vary depending upon the question that is asked. Table 3 displays broad concepts of types of studies to look for and ways to critique and interpret them, depending upon whether the question relates to harm, diagnosis, prognosis, or treatment.

Table 2. Principles for Assembling, Evaluating, and Interpreting Medical Research

  • A priori explicit statements of questions being addressed.
  • Systematic, explicit rather than selective, "file drawer" searching for pertinent research.
  • Systematic sorting of relevant from irrelevant research using preset explicit selection criteria.
  • Systematic critique of the validity of individual pieces of medical research based on the quality of the research methodology.
  • Critique of the generalizability of pieces of research based on characteristics of participants involved in research studies and characteristics of the agents or strategies tested in the research.
  • Integration of bodies of evidence based on sources of evidence, research design, directions and magnitudes of clinical outcomes, coherence, and precision.
  • Extrapolation of research findings to particular situations based on preset criteria.
  • Continual updating and integrating of evidence (perpetual revision).
  • Open attribution and statement of conflict of interest by those who do research synthesis.

Return to Contents

Interpreting and Judging Medical Research

Practitioners of evidence-based medicine and developers of clinical guidelines and standards may need to address the quality and strength of medical research at three levels. First (and arguably simplest) is evaluating the quality and applicability of individual studies. In this effort, one attempts to understand how well research studies have been designed and conducted as well as whether results apply to specific or general populations of patients. Second is evaluating the strength and applicability of a body of evidence about a specified clinical question. In the second effort, one judges how much credence and reliance to place on a collection of individual studies. The third consideration involves the intensity of recommendations, and so pertains more to experts developing authoritative guidelines containing recommendations than to experts assembling systematic reviews of the evidence. The force or intensity with which a recommendation is made often reflects the strength of evidence and the level of net benefit expected for the health service in question.

Table 3. Examples of Types of Relevant Research and Methods of Critique and Interpretation

Assemble Relevant Research


  • Case reports with challenge designs.
  • Cohort studies.
  • Case-control studies.
  • Controlled trials.


  • Diagnostic test studies.


  • Cohort studies.
  • Controlled trials.


  • Controlled trials.

Critically Evaluate Evidence


  • Appropriate temporal relationship?
  • Appropriate follow-up duration?
  • Dose-response gradient?
  • Positive rechallenge test?
  • Comparison groups similar?
  • Exposure measured appropriately?
  • Outcome measured appropriately?
  • Strong and precise association?
  • Biologically plausible association?
  • Research sponsorship clear?


  • Test performed appropriately?
  • Independent, blind comparison to appropriate standard?
  • Appropriate spectrum of patients?
  • Standard applied regardless of test result?
  • Diagnostic power and precision?
  • Research sponsorship clear?


  • Representative patient sample?
  • Follow-up long and complete?
  • Objective outcome criteria applied blindly?
  • Adjustment for known prognostic factors?
  • Validation set if testing predictive power?
  • Likelihood of outcomes over time?
  • Prognostic estimates precise?
  • Research sponsorship clear?


  • Randomized with concealed allocation?
  • Outcome assessments unbiased?
  • Groups treated equally except for intervention strategy?
  • Few withdrawals and dropouts?
  • Intention-to-treat analysis?
  • Tested intervention similar to practice?
  • Trial participants markedly atypical?
  • Research sponsorship clear?

Know How to Interpret


  • Relative risk.
  • Relative odds.
  • Odds ratios.
  • Probability tests.
  • Confidence intervals.
  • Meta-analysis.


  • Sensitivity.
  • Specificity.
  • Likelihood ratio.
  • Probability tests.
  • Confidence intervals.
  • Meta-analysis.


  • Absolute terms (five-year survival rate).
  • Relative terms (size of risk from a prognostic factor).
  • Survival curves.
  • Probability tests.


  • Relative risk reduction.
  • Absolute risk reduction.
  • Number needed to treat.
  • Probability tests.
  • Confidence intervals.
  • Meta-analysis.
  • Multivariate statistical techniques.

Interpreters and judges of medical evidence are faced with multiple sources and various research designs (Table 4) (Mulrow and Cook 1998). These can include laboratory experiments, observations in a single patient or groups of patients, studies in humans with cases (persons with condition of interest) compared to controls (persons without the condition of interest), and controlled trials of one diagnostic or therapeutic strategy compared to another.

Although in some situations the evidence will be clear, in many other situations judges of medical research are faced with murky, dubious, narrow, conflicting, or irrelevant evidence. They use judgment to weigh types of evidence based on study methodology and precision and magnitude of results. As exemplified in Table 3, all pieces of evidence are not equal; their value depends on the specific question and context.

Return to Contents
Proceed to Next Section


Current as of April 2001
Internet Citation: Proof and Policy from Medical Research Evidence. April 2001. Agency for Healthcare Research and Quality, Rockville, MD.


The information on this page is archived and provided for reference purposes only.


AHRQ Advancing Excellence in Health Care