An Epistemologist in the Bramble-Bush
At the Supreme Court with Mr. Joiner
The Federal Rules of Evidence (1975) encapsulate a (less ostensibly restrictive) relevancy approach. Rule 104 (a) affirms the gatekeeping role of the court in ruling on admissibility of evidence. But Rule 401 states that relevant evidence—evidence which has any tendency to make the existence of any fact of consequence to the determination of the action either more or less probable than it would otherwise be—is admissible unless otherwise provided by law. Rule 702 states that expert evidence, including but not restricted to scientific evidence, is admissible subject to exclusion under Rule 403. Rule 403, specifying the grounds for exclusion, mentions the danger of unfair prejudice, confusion of the issues, or misleading the jury, but does not mention any requirement of general acceptance in the appropriate scientific community. Rule 706 allows the court to appoint expert witnesses of its own selection.
The Frye rule didn't wither away immediately. Scholars debated whether the Federal Rules were compatible with the Frye test: some arguing that they weren't, because they didn't mention consensus in the relevant community; and some arguing that they were, because they didn't mention consensus in the relevant community (!).21 The 1987 edition of a textbook on the Federal Rules suggests ironically that the Frye test be reconstrued under Rule 403 as "an attempt to prevent jurors from being unduly swayed by unreliable scientific evidence" (Graham 1987: 92).
Most to the point of the present narrative, in Daubert (1993) the trial court relied almost exclusively on Frye in ruling the plaintiff's expert evidence inadmissible. The plaintiffs were two minor children and their parents, and the claim was that the children's birth defects were caused by their mothers' having taken the morning-sickness drug Bendectin during pregnancy. But the plaintiffs' expert evidence (based on animal studies, pharmacological studies of the chemical structure of Bendectin, and an unpublished "re-analysis" of previously published human statistical studies) was disqualified under the Frye test. The Ninth Circuit confirmed the trial court's decision to exclude.
But in 1993, reversing the exclusion of Daubert's expert testimony, the majority of the U.S. Supreme Court repudiated the Frye test as an "austere standard, absent from, and incompatible with, the [Federal Rules]. . . . [U]nder the Rules the trial judge must ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable."22 Jurors, whose job it is to determine sufficiency, are to concern themselves with expert witnesses' conclusions; but judges, whose job it is to determine admissibility, must focus "solely on principles and methodology" to make "a preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid and . . . properly can be applied to the facts in issue."23
In determining whether what is offered is really scientific knowledge— knowledge, not mere opinion, and genuinely scientific knowledge, "with a grounding in the methods and procedures of science"—a key question will be "whether it can be (and has been) tested."24 Justice Blackmun's opinion for the majority quotes Green: "'Scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry',"25 and refers to Popper and Hempel. Retaining something of the Frye test in the liberalized form of indications, rather than necessary conditions, of admissibility, the Daubert ruling also mentions peer-review, a "known or potential error rate," and "widespread acceptance."
However, dissenting in part from the majority, after pointing out that there is no reference in Rule 702 to reliability, and urging that the question of expert testimony generally not be confused with the question of scientific testimony specifically, Justice Rehnquist remarks:
I defer to no one in my confidence in federal judges; but I am at a loss to know what is meant when it is said that the scientific status of a theory depends on its 'falsifiability,' and I suspect some of them will be, too. . . . I do not think [Rule 702] imposes on them either the obligation or the authority to become amateur scientists.26
Those reservations are well-founded; for the epistemological assumptions on which the Daubert ruling rests are badly confused.
Unlike the Frye test, the Federal Rules as interpreted in Daubert require the trial judge to make determinations about scientific methodology in his own behalf. But what the Daubert Court has to offer by way of advice about how to make such determinations is—well, a little embarrassing.
The justices are apparently unaware that Popper gives "falsifiable" a very narrow sense, "incompatible with some basic statement" (a basic statement being defined as a singular statement reporting the occurrence of an observable event at a specified place and time); and that according to Popper no scientific claim or theory can ever be shown to be true or even probable, but is at best "corroborated." In Popper's mouth, this is not equivalent to "confirmed," and does not imply truth or probable truth, but means no more than "tested but not yet falsified."27 If Popper were right, no scientific claim would be well-warranted. In fact, it is hard to think of a philosophy of science less congenial than Popper's to the relevance-and-reliability approach (or to the admissibility of psychiatric evidence, but that is a whole other can of worms). And if the reference to Popper is a faux pas, running Popper together with Hempel—a pioneer of the logic of confirmation, an enterprise the legitimacy of which Popper always staunchly denied—is a faux pas de deux.
In and of itself, of course, the Daubert Court's mixing up its Hoppers and its Pempels is just a minor scholarly irritation. A more serious problem is that neither Popper's nor Hempel's philosophy of science will do the job they want it to do. Popper's account of science is in truth a disguised form of skepticism; if it were right, what Popper likes to call "objective scientific knowledge" would be nothing more than conjectures which have not yet been falsified. And, though Hempel's account at least allows that scientific claims can be confirmed as well as disconfirmed, it contains nothing that would help a judge decide either whether evidence proffered is really scientific, or how reliable it is.
And the most fundamental problem is that the Daubert Court (doubtless encouraged by the dual descriptive and honorific uses of "scientific") is preoccupied with specifying what the method of inquiry is that distinguishes the scientific and reliable from the nonscientific and unreliable. There is no such method. There is only making informed conjectures and checking how well they stand up to evidence, which is common to every kind of empirical inquiry; and the many and various techniques used by scientists in this or that scientific field, which are neither universal across the sciences nor constitutive of real science.
The Daubert Court runs together (1) the tangled and distracting questions of demarcation and scientific method with (2) the question of the degree of warrant of specific scientific claims or theories and (3) the question of the reliability of specific scientific techniques or tests—which is different again, for the claim that this technique is unreliable may be well warranted, the claim that this other technique is reliable poorly warranted. Unlike determining whether a claim is falsifiable, however, determining whether a scientific theory (e.g., of the etiology of this kind of cancer) is well warranted, or whether a scientific test (e.g., for the presence of succinylcholine chloride) is reliable, requires substantive scientific knowledge. Justice Rehnquist is right: the reference to falsifiability is no help, and judges are indeed being asked to be amateur scientists.
Furthermore, despite the majority's reassuring noises to the effect that juries can handle scientific evidence well enough, and can always be directed by the judge if they look like going off the rails, one is left wondering: if judges need to act as gatekeepers to exclude scientific evidence which doesn't meet minimal standards of warrant because juries may be taken in by flimsy scientific evidence, how realistic is it to expect juries to discriminate the better from the worse among the half-way decent?
One of the many subsequent cases28 in which the Federal Rules as interpreted in Daubert are applied to the question of the admissibility of scientific evidence is the one that first drew my attention—the case of Mr. Joiner.
Robert Joiner had worked for the Water and Light Department of the City of Thomasville, Georgia, since 1973. Among his tasks was the disassembly and repair of electrical transformers in which a mineral-based dielectric fluid was used as a coolant—dielectric fluid into which he had to stick his hands and arms, and which sometimes splashed onto him, occasionally getting into his eyes and mouth. In 1983 the city discovered that the fluid in some of the transformers was contaminated with PCBs, which are considered so hazardous that their production and sale has been banned by Congress since 1978.
In 1991 Mr. Joiner was diagnosed with small-cell lung cancer; he was thirty-seven. He had been a smoker for about eight years, and there was a history of lung cancer in his family. He claimed, however, that had it not been for his exposure to PCBs and their derivatives, furans and dioxins, his cancer would not have developed for many years, if at all. On this basis he sued Monsanto, which had manufactured PCBs from 1935 to 1977, and General Electric and Westinghouse, which manufactured transformers and dielectric fluid. His case relied essentially on expert witnesses who testified that PCBs alone can cause cancer, as can furans and dioxins, and that since he had been exposed to PCBs, furans, and dioxins, this exposure had likely contributed to his cancer.
Removing the case to federal court, GE et al. contended that there was no evidence that Mr. Joiner suffered significant exposure to PCBs, furans, or dioxins, and that in any case there was no admissible scientific evidence that PCBs promoted Joiner's cancer. The district court granted summary judgment, holding that the testimony of Joiner's experts was no more than "subjective belief or unsupported speculation."29
The court of appeals reversed. Federal Rule 702, governing expert testimony, displays a "preference for admissibility," and in the present instance, the question of admissibility was "outcome-determinative": if the scientific evidence offered were excluded, Mr. Joiner would simply have no case. So a "particularly stringent standard of review" should apply to the trial judge's exclusion of expert testimony.30
But in 1997, reversing the admissibility of Mr. Joiner's expert evidence, the U.S. Supreme Court held that the appeals court erred in applying an especially stringent standard of review. The appropriate standard was abuse of discretion; and it was not an abuse of discretion for the district court to have excluded Mr. Joiner's experts' testimony.31
And now it begins to appear how the question of the legitimacy of the distinction between methodology and conclusions came to be a hotly contested issue. The Daubert Court, taking the distinction for granted, had interpreted the gatekeeping role of trial judges as requiring them to focus solely on methodology, not conclusions. But, Mr. Joiner's lawyers argue, the District Court had no objection to the methodology of the studies cited, only to the conclusions that their experts drew; and this was a reversible error.
GE's brief argues that the court of appeals treated Daubert's requirement of scientific methodology "at such a superficial level as to leave it meaningless—calling for no more than the invocation of scientific materials."32 Mr. Joiner's experts rely on the "faggot fallacy": the fallacy of supposing that "multiple pieces of evidence, each independently being suspect or weak, provide strong evidence when bundled together."33 Mr. Joiner's lawyers reply that his experts "were applying a methodology which is well established in the scientific method. It is known as the weight of evidence methodology. . . . There are well-established protocols for this . . . published as the EPA's guidelines. There are similar guidelines for the World Health Organization."34 GE's lawyers never challenged Mr. Joiner's experts' methodology before; indeed, they use the "weight of evidence" methodology themselves.
Rather than challenging Mr. Joiner's claim that the District Court failed to restrict its attention to methodology as Daubert requires, the majority of the Joiner Court sustains its ruling that there was no abuse of discretion by holding that "conclusions and methodology are not entirely distinct from each other."35
Justice Stevens, however (concurring on the question of the correct standard of review but dissenting from the majority's ruling on whether the district court erred) protests that this is neither true nor helpful. "The difference between methodology and conclusions is just as categorical as the distinction between means and ends." The district court ruling on reliability in Joiner, in particular, is "arguably not faithful" to the statement in Daubert that the focus must be on methodology rather than conclusions. The majority "has not adequately explained why its holding is consistent with Federal Rule of Evidence 702 as interpreted in Daubert v. Merrell Dow Pharmaceuticals."36
In the Joiner ruling, Daubert's epistemological chickens come home to roost: with the references to falsifiability gone and the distinction between methodology and conclusions dropped, it is starkly obvious that judges will sometimes be obliged to determine substantive scientific questions.
Given the difficulties with the Daubert Court's efforts to specify what makes evidence genuinely scientific, perhaps the knots in which everyone ties themselves in Joiner (not to mention the absence from the ruling of any reference whatever to falsifiability, testability, Hepper, Pompel, etc.)37 are not so surprising. What is surprising, to me at any rate, is that the Joiner Court should offer, as an interpretation of Daubert, a ruling that denies the legitimacy of a distinction Daubert presupposed. I have no difficulty with the idea that a later ruling may make an earlier ruling determinate in respects in which it was formerly indeterminate (which, incidentally, explains why the Daubert Court could rule that the Frye test is incompatible with the Federal Rules, which at first raised my logical eyebrows quite far). But the idea that a later ruling which flatly denies a clear presupposition of an earlier ruling could qualify as an interpretation, rather than a revision, of it, still strikes me as very strange indeed.
However. What about the distinction between methodology and conclusions presupposed in Daubert, but repudiated in Joiner? In these cases the concept of methodology (never exactly well-defined in the philosophy of science) seems to have turned into an accordion concept,38 expanded and contracted as the argument requires. Is the judge, in determining the validity of experts' "methodology," to decide whether the mouse studies on which Mr. Joiner's experts in part relied were well-conducted, with proper controls and good records, using specially bred genetically- uniform mice, etc., etc.; or what weight to give mouse studies with respect to questions about humans; or what weight to give those mouse studies in the context of other studies of the effects on humans of PCB and other contaminants; or what? There are so many ambiguities that everyone is right—and everyone is wrong.
Mr. Joiner's lawyers are right to suggest that drawing the reasonable conclusion from a conglomeration of disparate bits of information (mouse studies, epidemiological evidence, etc.) requires, well, weighing the evidence. But of course it matters whether you weigh the evidence properly; and GE's lawyers are right, too, when they complain that Mr. Joiner's attorneys use "methodology" so loosely as to make Daubert's requirements practically vacuous.
But GE's accusation that Mr. Joiner's experts commit the "faggot fallacy" relies on an equivocation. There is an ambiguity in the reference to "pieces of evidence, each independently . . . suspect or weak": this may mean either "pieces of evidence each themselves poorly warranted" (which seems to be the interpretation intended by Skrabanek and McCormick, to whom the phrase "faggot fallacy" is due), or "pieces of evidence each by itself inadequate to warrant the claim in question" (which seems to be the interpretation most relevant to the case). True, if the reasons for a claim are themselves poorly warranted, this lowers the degree of warrant of the claim itself. But GE's brief offers no argument that the reasons based on the studies to which Mr. Joiner's experts refer are themselves poorly warranted. True again, none of those reasons by itself strongly warrants the claim that PCBs promoted Mr. Joiner's cancer. But GE's brief offers no argument that they don't do so jointly.
Sometimes bits of evidence which are individually weak are jointly strong; sometimes not—it depends what they are, and whether or not they reinforce each other (whether or not the crossword entries interlock). Chargaff's discovery that there are approximate regularities in the relative proportions of adenine and thymine, guanine and cytosine in DNA is hardly, by itself, strong evidence that DNA is a double-helical, backbone-out macromolecule with like-with-unlike base pairs; Franklin's X-ray photographs of the B form of DNA are hardly, by themselves, strong evidence that DNA is a double-helical, backbone-out macromolecule with like-with-unlike base pairs. That the tetranucleotide hypothesis is false is hardly, by itself, strong evidence that DNA is a double-helical, backbone-out macromolecule with like-with-unlike base pairs; and so on. But put all these pieces of evidence together, and the double-helical, backbone-out, like-with-unlike base pairs, structure of DNA is very well-warranted indeed (in fact, the only entry that fits).
Neither party seriously addresses this question of interlocking. But in the very complex EPA guidelines to which Mr. Joiner's attorneys so causally refer, I find this: "Weight of evidence conclusions come from the combined strength and coherence of inferences appropriately drawn from all of the available evidence."39
Justice Stevens is right to say that there is a difference between methodology and conclusions, as there is between ends and means; there is a difference, certainly, between a technique and its result, or between premises and conclusion. But on a more charitable interpretation, the majority's point is not that there is literally no distinction, but that it is impossible to judge methodology without relying on some substantive scientific conclusions. And this is both true and important.
To determine whether this evidence (e.g., of the results of mouse studies) is relevant to that claim (e.g., about the causes of Mr. Joiner's cancer) requires substantive knowledge (e.g., about the respects in which mouse physiology is like human physiology, about how similar or how different the etiologies are of small-cell lung cancer and alveologenic adenomas, etc.). And to determine the reliability of a scientific experiment, technique, or test, it is necessary to know what kinds of thing might interfere with the proper working of this apparatus, what the chemical theory is that underpins this analytical technique, what factors might lead to error in this kind of experiment and what precautions are called for, or to possess a sophisticated understanding of statistical techniques or of complex and controversial methods of meta-analysis pooling data from different studies. And so on.
Which takes us back to that old worry of Justice Rehnquist's of which Justice Breyer's observation that judges are not scientists reminds us: judges are neither trained or qualified to do this kind of thing.
Already at the time of Joiner, the Daubert ruling, requiring judges to make a preliminary evaluation of scientific evidence proffered, had prompted wider use of Rule 706, allowing judges to appoint their own experts.
In 1992, the FDA had banned silicone breast implants, formerly "grandfathered in." They were not known to be unsafe; but manufacturers had not, as required under FDA regulations, supplied evidence of their safety. Understandably, the ban caused a good deal of anxiety, and provoked a wave of fear, greed, and litigation. In 1996, Judge Sam Pointer of the U.S. District Court in Birmingham, Alabama, who had been in charge of all several thousand federal implant cases for more than six years, convened a panel of four scientists—an immunologist, an epidemiologist, a toxicologist, and a rheumatologist—to review evidence of the alleged connections between silicone implants and various systemic and connective tissue diseases.
Judge Pointer's carefully phrased remit asks: "to what extent, if any and with what limitations and caveats do existing studies, research, and reported observations provide a reliable and reasonable scientific basis for one to conclude that silicone-gel breast implants cause or exacerbate any . . . 'classic' connective tissue diseases [. . . or] 'atypical' presentations of connective tissue diseases. . . . To what extent, if any, should any of your opinions . . . be considered as subject to sufficient dispute as would permit other persons, generally qualified in your field of expertise, to express opinions that, though contrary to yours, would likely be viewed by others in the field as representing legitimate disagreement within your profession?"40
Two years and (only) $800,000 later,41 after selecting from more than two thousand published and unpublished studies those they thought most "rigorous and relevant," in December 1998 the panel submitted a long report. Their conclusion was that the evidence studied and reanalyzed (apparently the forty or so studies submitted by each side plus about one hundred others, including unpublished studies, Ph.D. dissertations, and letters) does not warrant the claim that silicone breast implants cause these diseases. They add, however, that in some respects "the number and size of studies is inadequate to produce definite results"; that animal testing "may not fully predict the human effects"; that some evidence suggests that silicone implants are not entirely benign (they can cause inflammation, and droplets can turn up in distant tissues); and that while most people in the field would agree with their conclusions, a few might not.42
Despite Judge Pointer's efforts to ensure that his experts were unimpeachably neutral, the plaintiffs' lawyers objected that the rheumatologist on Pointer's panel had undisclosed connections with one of the defendants, Bristol-Meyers Squibb (BMS), while a member of the panel: in August 1997, apparently, he signed a letter soliciting up to $10,000 in support of a rheumatology meeting he co-chaired, stating that "the impact of sponsorship will be high, as the individuals invited for this workshop, being opinion leaders in their field, are influential with the regulatory agencies"; in October 1998 he signed a $1,500-a-day fee arrangement with BMS, and in November 1998 he received $750 for participating in a company seminar.43
In April 1999, averring that there was no actual bias, though acknowledging that there might be a regrettable appearance of bias, Judge Pointer ruled against the plaintiffs' motion that the panel's report be excluded. The members of the panel will give videotaped sworn statements that may be used as evidence in courts nationwide.
The bramble-bush, of course, is alive and well, growing new fruit, and new thorns, almost every day.44 In Kumho (1999), considering judges' responsibility for making a preliminary reliability assessment of the testimony of engineers and other non-scientific experts, the Supreme Court stressed that Daubert's test of reliability is "flexible," and that its list of specific factors (falsifiability, peer review, etc.) "neither necessarily nor exclusively applies to all experts or in every case"; thus partially addressing the issues about the place of scientific evidence within expert evidence generally raised by Justice Rehnquist's dissent from the Daubert ruling.45
There have also been some efforts to educate judges scientifically. In April 1999 about two dozen Massachusetts Superior Court judges attended a two-day seminar on DNA at the Whitehead Institute for Biomedical Research. A report in the New York Times quotes the director of the institute: in the O. J. Simpson trial lawyers "befuddle[d] everyone" over the DNA evidence; but after this program, "I don't think a judge will be intimidated by the science." Judges will "understand what is black and white . . . what to allow in the courtroom" (Goldberg 1999: 10).
And in May 1999 the American Association for the Advancement of Science inaugurated a five-year project to make available to judges "independent scientists who would educate the court, testify at trial, assess the litigants' cases, and otherwise aid in the process of determining the truth" (Bandow 1999).
Disentangling "reliable" from "scientific," as Kumho begins to do, is certainly all to the good. But a bit of scientific education for judges is at best a drop in the bucket; and court-appointed panels of experts, though potentially helpful, are no panacea.
Not that educating judges about DNA or whatever mightn't do some good. But a few hours in a science seminar will no more transform judges into scientists competent to make subtle and sophisticated scientific determinations than a few hours in a legal seminar would transform scientists into judges competent to make subtle and sophisticated legal determinations. ("This kind of thing takes a lot of training," as Mad Margaret sings in Ruddigore.) And, to be candid, that New York Times report has me a little worried about the danger of giving judges a false impression that they are qualified to make those "subtle and sophisticated determinations."
"[N]either the difficulty of the task nor any comparative [sic] lack of expertise can excuse the judge from exercising the 'gatekeeper' duties that the Federal Rules impose," Justice Breyer avers.46 More directly than the Frye test, calling on court-appointed panels of scientists turns part of the task over to those who are more equipped to do it. Isn't this a whole lot better than asking judges to be amateur scientists? Sometimes, probably, significantly better—the more so, the closer the work at issue is to black-letter science; not, however, as straightforwardly or unproblematically better as some hope.
As Judge Pointer's panel's report was made public, an optimistic headline in the Washington Times47 proclaimed "Benchmark Victory For Sound Science"; and under the headline "An Unnatural Disaster," an editorial in the Wall Street Journal announced that "reason and evidence have finally won out."48 ABCNEWS.com's "Health and Living" was considerably more cautious: under the headline "No Implant- Disease Link?" a sideline adds, "The panel found no definite links, but it also left the door open for more research."49 Neither quite captures my reaction.
I should be quite surprised if it turned out that silicone implants do, in fact, cause the various diseases they have been alleged to (so far as I can tell it isn't just, as the panel's report says, that there is no evidence that they do; but that there is pretty good evidence that they don't).50 And I don't think it very likely that that $750 seriously affected Dr. Tugwell's opinion (though I must say that—even if this kind of thing is routine in funding applications, as for all I know it may be—that letter boasting of the applicants' influence with regulatory bodies leaves a bad taste in my mouth).
I don't feel equally confident, however, that a really good way has yet been found to delegate part of the responsibility for appraising scientific evidence to scientists themselves. Besides the worry about ensuring neutrality, and the appearance of neutrality,51 there is the worry about how much responsibility falls on how few shoulders—just four people, in the case of Judge Pointer's panel, all of whom combined this work with their regular full-time jobs, each of them in effect solely responsible for a whole scientific area; and the worry about what jurors will make of court-appointed experts' testimony. The history of the Frye test should warn us, also, of potential pitfalls in determining the relevant area of specialization.