An Epistemologist in the Bramble-Bush
At the Supreme Court with Mr. Joiner
Susan Haack, University of Miamia
This article was originally published in the Journal of Health Politics, Policy and Law. All rights reserved. This material may be saved for personal use only, but may not be otherwise reproduced, stored, or transmitted by any medium, print or electronic, without the explicit permission of the copyright holder. Any alteration to or republication of this material is expressly prohibited.
It is a violation of copyright law to reproduce any copyrighted information from this publication without first obtaining separate permission directly from the copyright holder who may charge fees for the use of such materials. It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form.
Permission requests should be directed to:
Duke University Press
Durham, NC 27708
Fax: (919) 688-3524
Think before you think! (Stanislaw Lec)1
"Judges Become Leery of Expert Witnesses," ran a headline in the Wall Street Journal a couple of years ago; they are "Skeptical of Unproven Science"—the "Testimony of Dilettantes" (Schmitt 1997). Intrigued, I began to struggle through thickets of details of exploding tires, allegedly poisonous perfumes, leaking and bursting breast implants, contaminated insulating oil, etc., etc., and through legal developments from Frye through the Federal Rules of Evidence to Daubert, until eventually I found myself at the U.S. Supreme Court with Mr. Joiner, eavesdropping as the justices —for all the world like a conclave of medieval logicians—disagreed among themselves about whether there is a Categorical Distinction between methodology and conclusions.
Now that, I thought, certainly sounds like the kind of question to which an epistemologist or philosopher of science ought to be able to make a contribution; and, in due course, I shall have something to say about it. But I soon realized it was only the tip of a very large iceberg.
By now, scientific evidence of just about every kind (from DNA fingerprinting to battered-wife syndrome, from studies of mice injected with potentially carcinogenic chemicals to recovered memories) plays a large and apparently ever-growing role in both criminal and civil cases. The long and tortuous history of efforts to ensure that when the legal system relies on scientific evidence it is not flimsy speculation but decent work, suggests that this interaction of science and the law raises some very tricky problems. And to judge by how often, in that long and tortuous history, explicit or implicit assumptions about the nature of scientific knowledge and the character of scientific inquiry are crucial, those problems are in part epistemological.
The epistemological issues intersect, of course, with problems of other kinds. Peter Huber is preoccupied with greedy tort lawyers hoping to earn huge contingency fees by winning cases with "junk science,"2 Kenneth Cheseboro with heartless corporations hoping to avoid compensating the victims of their profitable but dangerous products (Cheseboro 1993). I'm afraid both have a point. Both are well aware, however, that there is something about scientific evidence that encourages and enables the operation of such unsavory motives.
Almost a century ago, Learned Hand argued that the role of the expert witness—who not only may but must offer his opinion, draw conclusions—is anomalous, for if each party presents its own expert witness(es), the jury must decide "between two statements each founded upon an experience foreign in kind to their own"—when "it is just because they are incompetent for such a task that the expert is necessary at all" (Hand 1901: 54). Only a couple of years ago, Justice Breyer—concerned with scientific evidence specifically rather than with expert evidence generally, and focused less on the jury than the judge, on whom a significant gatekeeping burden now falls—suggested an essentially similar diagnosis. Reflecting that Daubert requires judges "to make subtle and sophisticated determinations about scientific methodology," he observes that "judges are not scientists, and do not have the scientific training that can facilitate the making of such decisions."3
In 1901, Hand had suggested court-appointed experts; in 1997, in his concurring opinion in Joiner, Justice Breyer urged that judges make more use of their power under Federal Rule of Evidence 706 to appoint scientists to advise them. But, as Hand himself had observed earlier in his article, when there are expert witnesses on both sides we ask the jury to decide "where doctors disagree" (ibid.: 154; emphasis mine). And now it begins to appear that there is a problem beyond judges' or juries' inability fully to understand scientific evidence. Many scientific claims and theories, at some point in their career, occupy that large grey area of the somewhat-but-far-from-overwhelmingly warranted; so sometimes the scientific determinations that judges or juries are asked to make may be so subtle and sophisticated, so manifold and tangled, that even those competent in the relevant area of science may legitimately disagree—or may agree that there is too little evidence, that they just don't know.
Legal efforts to winnow decent scientific evidence from the chaff, I shall argue, have often been based on false assumptions about science and how it works. It doesn't follow, unfortunately, that if we had a better understanding of science, all problems could be easily resolved. A better understanding of scientific evidence and inquiry will reveal why it has proven so difficult to find a legal form of words that will ensure that only decent scientific evidence is admitted, or a simple way to delegate some of the responsibility to scientists themselves; but rather than suggesting any easy solutions it accentuates the need to think hard and carefully about what goals we should be trying to achieve, and what kinds of imperfection in achieving them we are more willing, and what we are less willing, to tolerate.
Here I can offer only some preparatory steps toward such re-thinking: a brief account, first, of scientific evidence and its special complexities; and then—as I cautiously approach that bramble-bush with my philosophical pruning-shears—a brief epistemological commentary on the legal mechanisms that have been devised to handle scientific evidence in court. But I hope, by cutting away some overgrown epistemological deadwood, to clear the way for potentially healthier new growth.
In their descriptive use, the words "science," "scientific," etc., refer to a loose federation of disciplines including physics, chemistry, biology, and so forth, and excluding history, theology, literary criticism, and so on. But they also have an honorific use; "scientific," and "scientifically," especially, are very often all-purpose terms of epistemic praise, vaguely conveying "strong, reliable, good." They play their honorific role when the credulous are impressed by actors in white coats assuring them that new, scientific Wizzo will get clothes even cleaner, or that new Smoothex is scientifically proven to get rid of wrinkles faster; and no less so when, skeptical of some claim, people ask: "Yes, but is there any scientific evidence for that?"
Unfortunately this dual usage, descriptive and honorific, has encouraged a damaging preoccupation—especially in Popper and among his admirers—with the "problem of demarcation," of distinguishing real science from pretenders.4 It has distorted our perception of the place of the sciences within inquiry generally, and disguised what would otherwise be obvious facts: that neither all nor only scientists are good, honest, thorough inquirers; and that scientific claims and theories run the gamut from the thoroughly speculative to the very firmly warranted.
Natural-scientific inquiry is continuous with other kinds of empirical inquiry. The physicist and the investigative journalist, the X-ray crystallographer and the detective, the astronomer and the ethnomusicologist, etc., etc., all investigate some part or aspect of the same world. And scientists, like detectives, or historians, or anyone who seriously investigates some question, make an informed conjecture about the possible explanation of a puzzling phenomenon, check out how well it stands up to the available evidence and any further evidence they can lay hands on, and then use their judgment whether to give it up and try again, modify it, stick with it, or what.
Nor is there any "scientific method" guaranteeing that, at each step, science adds a new truth, eliminates a falsehood, gets closer to the truth, or becomes more empirically adequate. Scientific inquiry is fallible, its progress ragged and uneven. At some times and in some areas, it may stagnate or even regress; and where there is progress, it may be of any of these kinds, or it may be a matter of devising a better instrument, a better computing technique, a better vocabulary, etc.
As human cognitive enterprises go, natural-scientific inquiry has been remarkably successful. But this is not because it relies on a uniquely rational method unavailable to other inquirers; no, scientific inquiry is like other kinds of empirical inquiry—only more so. As Percy Bridgman once put it, "the scientific method, so far as it is a method, is doing one's damnedest with one's mind, no holds barred" (Bridgman 1955: 535).
Scientific inquiry is "more so" in part because of the many and various helps5 scientists have devised to extend limited human intellectual and sensory powers and to sustain our fragile commitment to finding out: models, metaphors, and analogies to aid the imagination; instruments to aid the senses; elaborate experimental set-ups to aid in testing and checking by flushing out needed evidence; mathematical, statistical, and computing techniques to aid our powers of reasoning; and a tradition of institutionalized mutual disclosure and scrutiny that, at its best, enables the pooling of evidence and helps keep most scientists, most of the time, reasonably honest.
E. O. Wilson describes his work on the pheromone warning system of red harvester ants: collect ants; install them in artificial nests; dissect freshly killed workers, crush the tiny gobbets of white tissue released, and present this stuff, on the sharpened ends of applicator sticks, to resting groups of workers: they "race back and forth in whirligig loops." Enlist a chemist, who uses gas chromatography and mass spectrometry to identify the active substances, and then supplies pure samples of identical compounds synthesized in the laboratory. Present these to the ant colonies: same response as before. Enlist a mathematician, who constructs physical models of the diffusion of the pheromones. Then design experiments to measure the rate of spread of the molecules and the ants' ability to sense them (Wilson 1999: 69-70).
This illustrates both the continuity of scientific inquiry with other kinds of inquiry, and the remarkable persistence with which good scientists go about solving one problem with the help of solutions to others.6 Of course, that carries risks as well as rewards; the earlier results on which a scientist builds could turn out to be mistaken, and possibly in ways that undermine his work. Scientific helps depend on substantive assumptions, and our judgments of their reliability depend on our background information—e.g., our reasons for thinking that gas chromatography reliably indicates chemical composition.
Still, fallible and imperfect as they are, by and large those helps have helped, enormously: helped to stretch scientists' imaginations, to enable their powers of reasoning, to extend their evidential reach, and to stiffen their respect for evidence. Almost every day, it seems, the natural sciences come up with new and better technical helps (from chemical assays through statistical modeling to computer programs). But there are no grounds for complacency. As science has become so expensive that only governments and large industrial concerns can afford to support it, as career pressures grow, so too does the temptation to exaggerate results or ignore awkward evidence for the sake of money, prestige, or an easy life.
Like the evidence with respect to any empirical claim, the evidence with respect to a scientific claim includes both experiential evidence (someone's seeing, hearing, etc., this or that) and reasons (background beliefs) ramifying in all directions; and, as "with respect to" was chosen to indicate, normally includes both positive evidence and negative. But, again, it is "more so"—in the complexity of its ramifications, in the dependence of its experiential components on instrumentation, in the pooling of evidential resources within a scientific community, etc.
A press report describes a meteorite found in Antarctica which when heated gives off a mix of gases unique to the Martian atmosphere—it was part of the crust of Mars about four billion years ago. Lasers and a mass spectrometer reveal that it contains polycyclic aromatic hydrocarbons (PAHs); this residue closely resembles what you have when simple organic matter decays, and might be fossilized bacteria droppings. David MacKay of the Johnson Space Center argues: "We have these lines of evidence. None of them by itself is definitive, but taken together, the simplest explanation is early Martian life" (Rogers 1996: 56-57). Other scientists, however, suggest that the PAHs might have been formed at volcanic vents; others agree that they are bacterial traces, but believe they were picked up while the meteorite was in Antarctica; and some think the supposed bacterial traces might be nothing more than artifacts of the instrumentation (Begley and Rogers 1997).
This illustrates both the continuity of scientific evidence with everyday empirical evidence, and the complexities that can make it so strong—or so fragile. All of us, in the most ordinary of everyday inquiry, depend on learned perceptual skills like reading, and many of us rely on glasses, contact lenses, hearing aids; in the sciences, observation is often highly skilled, and usually mediated by sophisticated instruments themselves dependent on theory. All of us, in the most ordinary of everyday inquiry, sometimes depend on what others tell us; a scientist virtually always relies on results achieved by others, from the sedimented work of earlier generations to the latest efforts of his contemporaries—though there is virtually always some disagreement within the relevant scientific community about which results are to be relied on, and which are shaky. A firmly anchored and tightly woven mesh of evidence can be a strong indication of the truth of a claim—that is partly why "scientific evidence" has acquired its honorific use; but where anchoring is iffy, where some of the threads are fragile, where different threads pull in different directions, there will be ambiguity, the potential to mislead.
The structure of evidence, to use an analogy I have long relied on, is more like a crossword puzzle than a mathematical proof.7 Einstein, I recently learned, once described a scientist as like a man "engaged in solving a well-designed word puzzle."8 I will add that scientific inquiry is a deeply and unavoidably social enterprise (otherwise, each scientist would have to start the work alone and from scratch); so that scientists, in the plural, are like a bunch of people working, sometimes in cooperation with each other, sometimes in competition, on this or that part of a vast crossword—a vast crossword in which some entries were completed long ago by scientists long dead, some only last week; some are in almost-indelible ink, some in regular ink, some in pencil, some heavily, some faintly; and some are loudly contested, with rival teams offering rival solutions.
The degree to which a scientific claim or theory is warranted, at a time, for a person or group of people, depends on how good that person's or that group's evidence is, at that time and with respect to that claim or theory. When there is relevant disagreement within the group—as with several people working on the same crossword and disagreeing over certain entries—the group's evidence should be construed as including the reasons on which the group is agreed, and the disjunctions of those about which there is dispute. Talk of the degree of warrant of a claim or theory at a time, simpliciter, can be construed as shorthand for the degree of warrant of the claim for the person or group of people whose evidence, at that time, is best.
"Person or group" because, while usually the pooled evidence of a group is better than that of its members, sometimes a single person has learned something which has not yet been shared with other members of the relevant community: the results of his experiment have not yet been published, or have been published in a journal too obscure to reach others in the field, or, etc.
Though the warrant of a claim at a time depends on the quality of the evidence possessed by some person or persons at that time, the quality of evidence, its strength or weakness, is not subjective or community-relative. How reasonable a crossword entry is depends on how well it is supported by the clue and any already completed entries, how reasonable those entries are, independent of the entry in question, and how much of the crossword has been completed. Analogously, how warranted an empirical claim is depends on how well it is supported by experiential evidence and background beliefs, how reasonable those background beliefs are, independent of the belief in question, and how much of the relevant evidence the evidence includes.
The meteorite example also illustrates the connection between supportiveness of evidence and explanatoriness. Briefly and very roughly, how well evidence supports a claim depends on how well the claim is explanatorily integrated with the evidence. Explanation requires the classification of things into real kinds; so supportiveness, requiring kind-identifying predicates, is vocabulary-sensitive. That is why, though there is supportive-but-not-conclusive evidence, there is no syntactically characterizable inductive logic. Most importantly for our purposes, it is also why scientists so often need to introduce new terms, or to adapt the meaning of old terms, as they try to match their language to the real kinds of thing or stuff. (Friedrich Miescher first found a non-proteinaceous substance in the nucleus of cells and dubbed it nuclein in 1856;9 now molecular biology has refined its classifications over and over: DNA, with its A, B, and Z forms; messenger RNA, transfer RNA, etc.)
Truth-indicative is what evidence has to be to be good; the better-warranted a claim is, the likelier that it is true.10 At any time, some scientific claims and theories are well warranted; others are warranted poorly, if at all; and many lie somewhere in between. When no one has good enough evidence either way, a claim and its negation may be both unwarranted (so degrees of warrant don't work just like mathematical probabilities). Most scientific claims and theories start out as informed but speculative conjectures; some seem for a while to be close to certain, and then turn out to have been wrong after all; a few seem for a while to be out of the running, and then turn out to have been right after all. But, as scientific inquiry has proceeded, a vast sediment of well-warranted claims has accumulated.
Ideally, the degree of credence given a claim by the relevant scientific sub-community would be appropriately correlated with the degree of warrant of the claim. The processes by which a scientific community collects, sifts, and weighs evidence are fallible and imperfect, so the ideal is not always achieved; but they are good enough that it is a reasonable bet that much of the science in the textbooks is right, while only a fraction of today's speculative frontier science will survive, and most will eventually turn out to have been mistaken.11 Only a reasonable bet, however; all the stuff in the textbooks was once speculative frontier science, and textbook science can occasionally be embarrassingly wrong (e.g., the arbitrary tautomeric forms in the chemistry texts on which, before Jerry Donohue set him straight, James Watson relied).12
The quality of evidence is objective, depending on how supportive it is, how comprehensive, and how independently secure the reasons it includes; but judgments of the quality of evidence are perspectival, i.e., they depend on the background beliefs of the person making the judgment. If you and I are working on the same crossword, but have filled in the much-intersected 4 down differently, we will disagree about whether the fact that an entry to 12 across ends in an "F," or the fact that it ends in a "T," makes it reasonable. Similarly, if you and I are on the same hiring committee, and you believe that handwriting is an indication of character, while I think that's all nonsense, we will disagree about whether the fact that a candidate loops his fs is relevant to whether he should be hired. Whether it is relevant, however, depends on whether it is true that handwriting is an indication of character.
If, as I have maintained, the standards of strong evidence and well-conducted inquiry that apply to the sciences are the very same standards that apply to empirical inquiry generally, doesn't it follow that a lay person should be able to judge the worth of scientific evidence as well as a scientist? Unfortunately, no—far from it; for every area of science has its own specialized vocabulary, dense with theory, and judgments of the worth of evidence depend on substantive assumptions. Very often, the only alternative to relying on the judgment of scientists competent in the relevant field is to acquire a competence in that field yourself.
When a lay person (or even a scientist from another specialty) tries to judge the quality of evidence for a scientific claim, he is liable to find himself in the position of the average American asked to judge the reasonableness of entries in a crossword puzzle where, though some of the clues are in pidgin English, the solutions are all in Turkish and presuppose a knowledge of the history of Istanbul, or are all in Bengali and require a knowledge of Islam, or, etc.13 Similarly, to know what kinds of precaution would be adequate to ensure against experimental error requires substantive knowledge of what kinds of thing might interfere. To judge the likelihood that you are not dealing with a real phenomenon but with an artifact of the instrumentation requires substantive knowledge of how the instrument works. And so on.
Still, can't we at least assume that competent scientists in the relevant field will agree whether this is strong or flimsy evidence, whether that experiment is well- or ill-designed, etc.? Unfortunately, no—not always. At the textbook-science end of the continuum, where claims and theories are very well-warranted, competent scientists will agree. But the closer scientific work is to the frontier, the less comprehensive the evidence so far available, the more room there is for legitimate disagreement about what background information is reliable, hence about what evidence is relevant to what, and hence about the warrant of a claim. Even the most competent scientists may be in something like the position of people working on a part of a crossword in which, so far, only a few entries have been completed, leaving open more than one reasonable alternative solution to others. As Crick and Watson began work on the structure of DNA, some scientists in the field still believed that protein was the genetic material. As the work proceeded, Crick and Watson were sure DNA was helical; Franklin remained for a good while unconvinced. Crick and Watson thought the backbone was on the inside of the molecule; Franklin suspected it was on the outside. As soon as he learned of Chargaff's discovery of approximate equalities in the purine and pyrimidine residues in DNA, Watson was convinced of its importance; Crick still had to be persuaded.14
For most of what follows, the epistemological points that will most concern me are negative, identifying deadwood in need of pruning, misunderstandings about science and how it works which have hampered legal efforts to distinguish decent science from junk: In the descriptive sense of "science," there is bad science as well as good. There is no peculiar method which distinguishes genuine science from impostors. Usually there is no way of judging the worth of scientific evidence without substantive knowledge of the appropriate field. There is no guarantee that specialists in a scientific field won't sometimes legitimately disagree. And there is no guarantee, either, that at any given time and for any legitimate scientific question, a warranted answer will be available.
Once upon a time, in cases where expert knowledge was required, jurors with the necessary expertise were specially selected—e.g., a jury of butchers when the accused was charged with selling putrid meat; and sometimes specially qualified persons would be summoned to help determine some matter of fact which the court had to decide—e.g., masters of grammar for help in construing doubtful words in a bond. Learned Hand reports that the first case he can find of "real expert testimony"—expert testimony as exception to the rule that the conclusions of a witness are inadmissible—was in 1620.15 But now, of course, when specialized knowledge is needed, the usual method is calling expert witnesses.
Though it was not cited in a federal or state ruling for a decade, the Frye case (1923) gradually began to set the standard of admissibility of scientific evidence, at first mainly in criminal cases but later in civil cases too. Mr. Frye was charged with murder, and had confessed. Later, however, he repudiated the confession; and took, and passed, a polygraph test (or more exactly, a discontinuous test of systolic blood pressure changes under questioning; the technology was in an early and primitive stage).16 But the trial court judge excluded this evidence, taking the view that deception tests were inadmissible unless there is "an infallible instrument for ascertaining whether a person is speaking the truth or not."17 On appeal, the D.C. Court confirmed the exclusion of this lie-detector evidence, ruling that novel scientific evidence "crosses the line between the experimental and the demonstrable," and so is admissible, only if it is "sufficiently established to have gained general acceptance in the particular field to which it belongs."18 This is the "Frye rule" or "Frye test."
As the Frye rule was applied and contested in the courts, the effect was sometimes more and sometimes less restrictive. Voice-print evidence, for example, was sometimes admitted under the Frye test, sometimes excluded.19 In People v. Williams (1958), the prosecution's own experts conceded that the medical profession was mostly unfamiliar with the use of Nalline to detect narcotic use, but the court upheld the admissibility of its evidence all the same; the Nalline test was "generally accepted by those who would be expected to be familiar with its use," and "in this age of specialization more should not be required."20 In Coppolino v. State (1968), the prosecution was allowed to introduce the results of a test (for the presence of succinylcholine chloride or its derivatives in human tissues) devised by the local medical examiner specifically for this trial—and so not known to, let alone generally accepted in, any scientific community. The appellate court cited Frye but, ruling that the trial judge did not abuse his discretion, nevertheless upheld the admissibility of this evidence (Giannelli 1980: 1222 ff.).
The epistemological assumptions behind the Frye test are quite crude; and, while it seems overly restrictive in principle, it is indeterminate in ways that made it nearly inevitable that in practice its application would be, not merely variable in borderline cases, but systematically inconsistent.
Rather than requiring the trial judge to determine in his own behalf whether scientific evidence proffered is solidly established work or unreliable speculation, the Frye test had him rely obliquely on the verdict of the appropriate scientific subcommunity. Three assumptions seem to lie behind the test: that there is a definite point at which scientific claims or techniques cease to be "experimental" and become "demonstrable"; that a claim or technique has not achieved this "demonstrable" status unless it is generally accepted in the relevant community; and that only "demonstrable" claims and techniques should be admitted.
The first two assumptions are at best oversimplifications. Rather than a sharp line, there is really a continuum from the unwarranted through the poorly-warranted to the well-warranted; and the degree of credence given a claim in the relevant scientific community is only an imperfect indicator of its degree of warrant (which is only an imperfect indicator—albeit the best we can have—of its truth). Sometimes—perhaps in the case of the medical examiner in Coppolino—one person has better evidence than the community. General acceptance in the relevant community is only a very rough-and-ready, and a quite conservative, guide to what is well-warranted at the time in question.
The third assumption—that only "demonstrable" scientific evidence should be admitted—seems extremely restrictive. Precluding the possibility that there should be scientific witnesses who disagree but both of whose testimony is admissible, it seems to confine the courts, in effect, to textbook science. A physicist colleague tells me he once testified that the hypothesis was consistent with the laws of mechanics that the deceased wasn't pushed, but fell; but very often, surely, the relevant science will be quite far from the textbook stage.
However, it takes only a moment's reflection to realize that how restrictive the Frye test would be in practice depends on what exactly was required to be accepted by what proportion of what community. The narrower and more homogeneous the relevant community is taken to be, the likelier it is that there will be agreement; the broader and more heterogeneous the community, the likelier that there will be disagreement. (Unlike the Verification Principle, which is broader if "verifiable" is construed broadly and narrower if "verifiable" is construed narrowly, the Frye test is broader if the community is defined narrowly and narrower if the community is defined broadly.) No wonder, then, that, though often criticized as overly restrictive, in practice the test was far from consistent.