The Representativeness Fallacy in Jewish Law
Assia – 2012
Rabbi Dr. Michael Abraham
The Statistical Reliability of Tests for Rare Phenomena
Outline:
Introduction
Preliminary Questions
The Representativeness Fallacy
Mathematical Explanation
Another Example: Munchausen Syndrome by Proxy
Two Important Qualifications
A Legal Implication: "Something More" in Addition to the Defendant's Confession
Two Examples from Jewish Law
Statistical Explanation
Explanation in Terms of the Representativeness Fallacy
The Nature of a Statistical Fallacy: On Mathematics and Psychology
Practical Conclusions
Summary
Introduction
Prof. Daniel Kahneman won the Nobel Prize for his contribution (together with Amos Tversky) to understanding various fallacies in human statistical thinking in general, and in the thinking of experts in various fields in particular. A significant portion of these fallacies is based on confusion between relative frequency and absolute frequency, or between conditional probability and absolute probability[1]. One of the prominent types among these fallacies (today it is included in the material for the matriculation examination in mathematics) is the representativeness fallacy[2], and it concerns the reliability of using statistical tools for rare phenomena.
These fallacies appear in many varied contexts, and it turns out that they also have a place in Jewish law. It appears that the Sages were probably aware of this fallacy, and the medieval authorities (Rishonim) and later authorities (Acharonim) were also careful to guard against it. In this article I wish to explain the fallacy, present several of its implications and manifestations, and finally to derive, in very brief and wholly general terms, a few conclusions relating to medical tests and to the law of evidence in the legal world.
Preliminary Questions
Reuven is sent by a physician to undergo some test. The reliability of the test is 99%, that is, the test results are correct in 99% of cases. In Reuven's case, the test results showed that he has the disease in question. As stated, there is some chance of error in the test. What is the probability that Reuven is really ill?
Ordinary people, including physicians and judges, tend to think that the probability is very high, and some would say 99%, for that is precisely the proven reliability of the test. But it turns out that this is a mistake. The answer to the question of the probability that Reuven is ill depends on another question: what is the prevalence of the disease in the population. For a disease whose prevalence in the population is very low relative to the reliability of the test (that is, significantly less than 1%), the result of the test is almost meaningless.
That is with respect to medical diagnosis. Let us now ask a similar question regarding the law of evidence in court. An expert appears before the judges and testifies about some test that it is highly reliable. For example, a person is suspected of murder, and blood is found at the scene of the crime. A forensic test shows that this is the defendant's blood. What is the probability that the defendant is indeed the murderer? Alternatively, in the Judgment of Solomon, a person stands before the court whose lineage (who his mother is) the court must determine. Solomon's court sends him for a genetic test, and an expert informs the court that the reliability of the test is 99%. Can the judges rely on it and determine who the parents of the person before them are? Put differently: the test shows a match to mother A and not to mother B. What is the probability that he is indeed A's son?
In both cases, of course, serious halakhic questions arise regarding reliance on uncertain evidence, and regarding the need for testimony as opposed to other forms of evidence.[3] Here I wish to discuss the subject from a different angle, in fact from a scientific angle: is the evidence really reliable from a statistical point of view? That is, assuming that the reliability of the genetic test is indeed 99%, and the test shows that Reuven is indeed Jacob's son, what is the probability that he really is his son?
The Representativeness Fallacy
The answer to these questions lies in what is called the base-rate fallacy (which is one of the representativeness fallacies). Let us begin with the example of the medical test. As stated, we have before us a medical test intended to diagnose a certain disease. The reliability of the test is 99%, meaning that in 1% of the cases it errs (reversing the true situation: healthy to sick, or sick to healthy). We now ask: Reuven was tested by this test and was found to be ill. What is the probability that he is really ill? The obvious answer is that the probability is 99%, for that is the level of reliability of the test. But, as we shall now see, this may be a grave mistake.
Suppose there are 10,000,000 inhabitants in a country, of whom one thousand truly have the disease under examination (that is, the prevalence of the disease is 1/10,000). Now let us test all the residents of the country with this test. The test will err in 1% of the cases, meaning that 1% of the healthy will be diagnosed as ill, and 1% of the sick will be diagnosed as healthy. Consequently, the test results will indicate that there are about 100,000 sick people. But, as stated, the truth is that there are only 1,000 genuinely sick people among all the residents of the country. Therefore, when a person is diagnosed by this test as ill, the probability that he is really ill is… 1%. It is hard to believe, but a test whose reliability is 99% yields, in such a situation, results whose reliability is 1%.
That is the situation with a disease whose prevalence in the population is low. What happens if it is a more common disease? For example, if the true number of sick people in the population is 1,000,000 (and 9,000,000 residents are healthy, that is, the prevalence of the disease is 1/10). Again, let us assume that 1% of the healthy are mistakenly diagnosed as ill, and 1% of the sick are mistakenly diagnosed as healthy. In such a case, the test results will show us 990,000 (sick people correctly diagnosed) + 90,000 (healthy people diagnosed as sick). Altogether, 1,080,000 people will be found to be diagnosed as sick, most of whom are indeed truly sick. If a person is found to be sick in such a situation, the probability that he is really sick is 11/12, which, although not 99%, is already beginning to sound entirely reasonable. In such a situation, if the test showed that someone is sick, he should certainly be concerned. The conclusion is that the reliability of the test for a person diagnosed as sick approaches more and more its theoretical reliability as the disease becomes more prevalent in the population. In other words: reliable tests for rare diseases are not worth even the shard on which the flask rests.
The significance of this conclusion is that in the previous case, if someone was diagnosed as healthy, that result is reliable, because the rate of healthy people in the population is high. By contrast, a person diagnosed as ill can ignore this diagnosis, because the overall rate of sick people in the population is low. In other words: this test is designed to diagnose healthy people, not sick people.
How high must the prevalence be for the reliability of the test to be reasonable? The relevant measure is the ratio between the reliability of the test and the prevalence of the disease. If they are of the same order of magnitude, then the results begin to be meaningful. This means that the test is a kind of microscope whose purpose is to distinguish some object. The magnification (= precision) of the microscope determines which objects can be distinguished. It is impossible to find a bacterium with a microscope whose resolution is not of the proper order of magnitude. It is like firing a cannon at a mosquito, or firing a rifle bullet at a tank.
Here is another implication. If we want to adopt genetic tests in order to decide, according to Jewish law, the question "Who is a Jew?", we will need to use tests whose unreliability (the chance of error) is significantly smaller than the prevalence of Jews in the world population (about a quarter of a percent, 1/250). A genetic test whose reliability is 99.5%, for example, will not suffice. The same applies to any other medical or legal test.
Mathematical Explanation
This surprising phenomenon can be understood through the formula of total probability:
When P(A/B) is a conditional probability, that is, the probability of A if B is known. In our case there are only two states i, and therefore the meaning of the expressions in the formula is as follows: P(B1) is the probability that a given person is sick. P(B2) is the probability that the person is healthy. P(A) is the probability that the test results for someone show that he is sick. The formula says that the probability that the test will show that some person is sick is composed of two components: the probability that he is sick multiplied by the conditional probability that the test will correctly show this, plus the probability that he is healthy multiplied by the conditional probability that the test shows that he is sick (that is, an erroneous result).
As stated, the probability that the test errs is 1%, but this affects only one of the components in this sum: P(A/B2)=0.01. By contrast, the second conditional probability is very high: P(A/B1)=0.99. And if the prevalence of the disease is very low, then the probability that the person is not sick (P(B2)) is very high, which significantly changes the final result.
Alternatively, the question of the reliability of the test concerns quantities of the type P(A/B) (what are the test results relative to reality), whereas the question we are asking (what is the probability that he is really sick given positive test results) is the reverse question, concerning quantities of the type P(B/A). More precisely, the relation between the two conditional probabilities that interest us is:
P(B1/A) = P(A/B1) P(B1)/P(A)
The reliability of the test is P(A/B1)=0.99, and it is indeed high. But the probability that the person is sick on the assumption that the test results are positive is a different quantity: P(B1/A), and it can be very low. In our case it is indeed low, since the ratio between the absolute probabilities is low.
Another Example: Munchausen Syndrome by Proxy[4]
Mrs. Sally Clark was a British woman whose two infants died in her home of unexplained death (sudden infant death syndrome). She was charged in a British court with the murder of her children, convicted, and sentenced to prison. The conviction was based on the expert testimony of Professor Sir Roy Meadow, who argued that the probability of sudden infant death syndrome was 1/8,500. Therefore, the probability of the deaths of two children is the square of that small number, which comes out to about 1/73,000,000. Professor Meadow claimed that there is a medical syndrome known as "Munchausen syndrome by proxy" (which some associated in Israel with the "starving mother")[5], whose meaning is that a person sometimes harms others in order to receive attention for himself. He argued that since the probability of sudden infant death syndrome is so small, it is clear that this was murder on the basis of that syndrome (in a criminal trial it is not enough to prove that an act of murder occurred; proof of criminal intent is also required).
Without any corroborating evidence, solely on the basis of this statistical consideration, the judge found Sally Clark guilty of murdering her children and sent her to prison.
Let us note that Professor Meadow testified in hundreds of trials, and in many of them defendants were found guilty and sentenced to various punishments. Some of them were convicted without any other corroborating evidence (what we above called "something more").
After some time, an expert witness in statistics came and testified in court that the conviction was based on a statistical mistake. His main claim was that it is incorrect to multiply the numbers by one another, since the events may be statistically dependent. Even if the probability that one child will die of sudden infant death syndrome is 1/8,500, it does not follow that the probability of two such deaths is the square of that number. Since the causes of sudden infant death syndrome are unknown, it is reasonable to assume that there are factors in the home, or in the family's genes, that may have caused the death. And since these were two siblings who grew up in the same house, one should assume that the cause of their deaths was the same cause, and therefore the events are dependent on one another.
Let us clarify this by means of an example. Reuven bought a lottery ticket, and his numbers were drawn. What is the probability that precisely those numbers would be drawn? Very small (let us say 1 in a million). And what is the probability that Reuven will win the lottery? Also very small (let us say again 1 in a million). Now let us ask: what is the probability that both those numbers will be drawn and Reuven will win the lottery? Seemingly this is a product, and the result is 1 in a million million. But that is a mistake, since Reuven's winning is a result of the fact that his numbers were drawn. The events are dependent on one another, and therefore it is incorrect to view this conjunction of events as something more surprising than each of them separately.
If so, the expert's testimony shows that the probability is not as small as was initially thought. But it is still clear that the probability is very small (1/8,500 is also a very small number). According to this, one could seemingly sentence every mother whose child died of sudden infant death syndrome to prison.
The main problem in the testimony of the above "expert" was not the dependence of the two deaths, but a completely different problem. There was here a disregard of the representativeness fallacy. One may relate to this statistical test as a means of diagnosing Munchausen syndrome. The reliability of the test is 8,499/8,500, and therefore it would appear to identify those with this syndrome with very high reliability. The problem is that the prevalence of this syndrome is extremely low. How many women will murder their sons in order to get attention? Let us assume for the sake of discussion that the number is something like 1/100,000, which itself appears to be an overestimate in relation to the true prevalence. One can now immediately see that a test whose chance of error is 1/8,500 is worth nothing. The test is very reliable, but its unreliability is far higher than the prevalence of the phenomenon it is trying to capture. The holes in this statistical net are too large.
Two Important Qualifications
A. This analysis is valid only when the error of the test is symmetric in both directions: it may classify healthy people as sick and sick people as healthy. If this test has errors only in one direction, that is, it may err only regarding sick people but healthy people are never diagnosed as sick, then the reliability of the test regarding those diagnosed as sick is equal to its theoretical reliability. This can be seen by a simple calculation when one applies these data to the previous numbers. There will be no healthy people diagnosed as sick, and therefore the number diagnosed as sick is almost their true number.
B. When we have additional indications of the disease, or of the offense, the situation is of course different. If the defendant was identified at the scene of the murder at the time of the murder, and in addition the test shows that the blood at the scene is his, one can more readily rely on the test. The reason is that the number of people who are potential suspects in the murder is small, and the prevalence of his DNA among them is high (1/10, not 1/10,000,000). That is, this circumstantial addition changes de facto the relevant prevalence of the phenomenon.
But if there is no additional evidence, the number of suspects is all the inhabitants of the country, or of the world, and if we take into account the fact that among all of them there is only one murderer who murdered Shimon, then the prevalence of the measured phenomenon is very low. In such a situation, a test whose reliability is 99% is not worth much. Additional evidence narrows the total number of suspects and increases the relevant prevalence, and therefore also the effectiveness of the test.
A Legal Implication: "Something More" in Addition to the Defendant's Confession[6]
There is a major dispute in different legal systems regarding the status of self-incrimination. When the defendant confesses guilt, some regard such a confession as "the queen of evidence," while others are skeptical of it. Some are skeptical because of the possibility that it was extracted by improper means (such as violence and threats by investigators). This is the doctrine of the "fruit of the poisonous tree."[7] Others are skeptical because of Maimonides' remarks about deranged people (which became very widespread throughout the world, and also in Israel, following the Miranda ruling in the United States, which cites them). Maimonides writes in the Laws of Sanhedrin 18:6:
It is a decree of Scripture that a religious court does not execute a person or flog him on the basis of his own confession, but only on the basis of two witnesses. As for Joshua's execution of Achan and David's execution of the Amalekite convert on the basis of their own confessions, that was either an emergency measure or an exercise of royal authority. But the Sanhedrin neither executes nor flogs one who confesses to an offense, lest his mind have become deranged in this matter; perhaps he is one of those bitter and distressed souls who long for death, who stab swords into their bellies and throw themselves from rooftops. Perhaps this person too will come and say something he did not do in order to be killed. In short, it is a royal decree.[8].
In Israel, confession has very strong standing, but the law requires that if it was obtained outside the courtroom there must be the addition of "something more" to the body of evidence, beyond the defendant's confession itself. Thus we find in the Attorney General's Directive no. 4.3012, dated Nisan 5767 (April 2007), section 1:[9]
It has long been a settled rule in the case law of the Supreme Court that a person cannot be convicted on the basis of his confession alone when it was given outside the walls of the courtroom, even when that confession was obtained without external pressure, unless "something more" is found to strengthen that confession (CrimA 3/49 Andelersky v. the Attorney General, PD 2, 589; CrimA 290/59, so-and-so v. the Attorney General, PD 14, 1489).
Why is the addition of some further corroborating element so important? It seems to me that this too can be explained by means of the representativeness fallacy.
If we view confession as evidence of high probability, since the chance that a person will incriminate himself when he is not guilty is very low, and on the other hand the number of criminals in the world is also low, then using confession as a test to discover criminality may fall prey to the representativeness fallacy. The relation between these two probabilities, that is, between the reliability of the confession and the prevalence of criminality, may be such that the confession is not effective. The "something more" places the defendant in a narrower group of potential suspects, thereby increasing the relevant prevalence of the phenomenon. If the defendant was seen at the scene of the murder, this already increases the prevalence, and thereby makes the confession more effective evidence, as we saw above.
Similarly, in cases where the reliability of a medical test is of the same order of magnitude as the prevalence of the disease, then in medical diagnosis too the physician must take into account additional circumstantial evidence beyond the test results. Such additional evidence will increase the relevant prevalence. If the patient before him shows symptoms characteristic of the disease, this means that the relevant prevalence has increased, for among those with these symptoms the disease is certainly much more common. In such a situation, a test at the same level of reliability will be far more effective.
Two Examples from Jewish Law
The halakhic discussion I have chosen in order to illustrate the representativeness fallacy concerns two Talmudic passages in which evidence based on a majority is attacked by a weakening factor. We shall present here two such passages: the passage in Yevamot concerning the majority of women who give birth at nine months, and the passage in Ketubbot concerning the majority of women who marry as virgins.
A. The Passage in Yevamot: Most Women Bear at Nine Months
The Mishnah in tractate Yevamot 35b states:
If it is uncertain whether the child is a nine-month child of the first husband or a seven-month child of the latter husband, he must divorce her, the child is valid, and they are liable for a provisional guilt-offering.
This concerns one who hastened and performed levirate marriage with his brother's widow immediately after the death of his brother, and afterward a son was born to her at a time that raises doubt whether he is a nine-month child of the first husband or a seven-month child of the second. For the sake of the example, let us assume that he married her leviratically two months after his brother's death.
The Mishnah says that the child is legitimate in any case, for if he belongs to the first husband, then although the levirate marriage is invalid and they transgressed the prohibition of a brother's wife where no commandment applies, the child is certainly legitimate. And if he is a seven-month child of the second husband, then the levirate marriage is valid and he is the legitimate son of the second. Thus, with respect to the child, there is no doubt that he is legitimate. But as for the couple, they bring a suspensive guilt-offering because of the doubt regarding intercourse with a brother's wife where no commandment applies.
The Talmud there, 37a, objects:
An uncertain nine-month child, etc. Rava said to Rav Nahman: Let us say: Follow the majority of women, and most women give birth at nine months!
The Talmud asks why the child is regarded as doubtful. There is a majority that women give birth at nine months, and therefore one should decide that he is the son of the first husband, and the obligation of the offering should therefore be that of a definite sin-offering and not a suspensive guilt-offering.
In the conclusion, the Talmud explains it as follows:
…He said to him: This is what I meant: Most women give birth at nine months, and a minority at seven; and every woman who gives birth at nine months has a fetus that is recognizable by one-third of her term, and this one, since her fetus was not recognizable by one-third of her term, her majority status has been undermined.
At this stage the Talmud proposed that every woman who gives birth at nine months has a visibly recognizable pregnancy by one-third of her term. Here, however, we are dealing with a fetus that was not recognizable, for otherwise no doubt would have arisen here. Since among those who give birth at nine months the fetus is recognizable, the majority that women give birth at nine months is undermined.
The Talmud now objects:
If every woman who gives birth at nine months has a fetus that is recognizable by one-third of her term, then from the fact that this one was not recognizable by one-third of her term, her fetus is certainly a seven-month one from the latter husband!
If indeed every woman who gives birth at nine months has a recognizable pregnancy, then the law should not be that they are liable to a suspensive guilt-offering, but that they are exempt, because it is certain that he was born at seven months.
Finally, the Talmud corrects this and says that it is not certain that the fetus is recognizable in one who gives birth at nine months; rather, this too is only a majority:
Rather, say: Most women who give birth at nine months have a fetus that is recognizable by one-third of the term, and this one, since it was not recognizable by one-third of the term, her majority status has been undermined.
Thus, in the conclusion, the Talmud explains that the majority that women give birth at nine months is undermined because among those who give birth at nine months, the majority have a recognizable pregnancy. Therefore the majority that women give birth at nine months is weakened, and the result is a state of doubt; hence they are liable to a suspensive guilt-offering.
B. The Passage in Ketubbot: Most Women Marry as Virgins
Elsewhere throughout the Talmud and in the medieval and later authorities there are further examples in which some consideration neutralizes evidence that comes by force of a majority, and the majority is thereby undermined. Here we shall present the example of Ketubbot 16a, for the course of the Talmud there is truly identical to the course we saw in the passage in Yevamot. There too, another majority is brought against the initial majority and neutralizes it.
The Talmud deals with the question whether the woman before us was married as a virgin or not. The assumption is that no public report reached us that she was married as a virgin. On the other hand, the majority of women who marry are virgins. Regarding this, the Talmud says there:
Ravina said: For one can say that most women marry as virgins and a minority are widows, and every woman married as a virgin has a public reputation to that effect; and this one, since she has no such reputation, her majority status has been undermined.
There is a majority that women marry as virgins. And the majority is undermined because every woman who marries as a virgin has a public report.
The Talmud now objects that if the rule that those who marry as virgins have a public report is a certainty and not a majority, then not only is the majority undermined, but there is an opposite clarification:
If every woman who is married as a virgin has a public reputation to that effect, then when witnesses come, what of it? Those witnesses are false witnesses!
And in the conclusion the Talmud explains that this too is a majority and not a certainty:
Rather, Ravina said: Most women married as virgins have such a public reputation, and this one, since she has no such reputation, her majority status has been undermined.
As stated, the course of the discussion is precisely parallel to the passage in Yevamot.
Statistical Explanation
In both passages, it is not clear at first glance why the final explanation answers the difficulty. The difficulty was why, after the qualification, there is no situation of certainty in the opposite direction. The Talmud answered that such certainty would arise if the qualification were an absolute rule (that all women who give birth at nine months have a recognizable pregnancy, or that all women who marry as virgins have a public report), but since we are dealing with a majority and not an absolute rule, no opposite certainty is created.
But even if the qualification is only a majority, it would seemingly still create clarification in the opposite direction. For example, in the Yevamot passage one may conclude that if indeed most women who give birth at nine months have a recognizable pregnancy, even if this is only a majority and not a certainty, then if in our case the fetus was not recognizable (and we saw that this is the case), there is a majority in the opposite direction: that this fetus was not born at nine months. If so, why do we not decide here that this fetus was born at seven months and exempt them altogether from an offering? And so too regarding the Ketubbot passage: there too one may ask that if most women who marry as virgins have a public report, then one who married without a public report is apparently not a virgin. So why is this a doubt and not a certainty in the opposite direction?
The answer to this is quite simple, and we can see it through the second example. There is a majority among marrying women that they are virgins. Therefore, in general, if we ask whether the woman before us married after prior intercourse or as a virgin, the answer will be: as a virgin. On the other hand, if no public report circulated about her, there is an opposing majority, for most of those who marry as virgins do have such a report. Suppose there are 1,000 women in the world who got married. Of them, 80% are virgins, that is, there are 800 virgins and 200 non-virgins. By contrast, among the virgins, a majority of 80% have a public report, that is, there are 640 virgins about whom a public report circulated, and consequently 160 virgins who married and no public report circulated about them. Now a woman comes before us who married without a public report, and we are uncertain whether she is a virgin or not. To decide, we compare the number of non-virgins who married (200) with the number of virgins without a public report (160). The determination is clear: she is not a virgin. The second majority neutralized the first. This is the mechanism of a majority that is undermined by an opposing majority.
As an anecdote, I should note that in the lectures of Rabbi Shmuel Rozovsky on tractate Yevamot, sec. 399, he too noted this difficulty, and he formulated it as follows:
Now, in the simple understanding, the second majority is of the same ratio as the first majority. By way of example, if the first majority, that most women give birth at nine months, stands in a ratio of 4 to 5, such that out of one hundred women eighty give birth at nine months, then the second majority, that most women who give birth at nine months have a recognizable pregnancy, is likewise in a ratio of 4 to 5, that is, sixty-four out of the eighty who give birth at nine months. According to this, it follows that out of one hundred women there are twenty women who give birth at seven months and another sixteen women whose pregnancy is not recognizable. If so, this woman certainly belongs to one of these minority groups and not to the rest of the women who give birth at nine months and whose pregnancy is recognizable. Since that is so, why do we remain in doubt about which group she belongs to? There is behold a majority that she is from among those who give birth at seven months, for those who give birth at seven months are more numerous than those who give birth at nine months and whose pregnancy is not recognizable.
And it is forced to say that here we are speaking specifically of a case where the second majority is not of the same ratio as the first majority, but rather that out of the eighty women who give birth at nine months there are twenty women whose pregnancy is not recognizable, so that those who give birth at seven months are not more numerous than those who give birth at nine months and whose pregnancy is not recognizable. But in the simple understanding it does not appear that this is the case; rather, the ratio between minority and majority in the latter majority is like the ratio between minority and majority in the former majority, and this requires investigation.
If the two majorities are of the same strength, then an opposing majority is created here, as we saw above. If so, Rabbi Shmuel Rozovsky asks, why do we treat this as a state of doubt (that the majority was undermined) and not as a certain case? Seemingly there is here a determination in the opposite direction and not a state of doubt.
Clearly, the answer to this question depends on the ratio between the two majorities. If the majority of virgins who marry with a public report were less significant, for example 60%, then the number of virgins who married without a public report would be 320, as against the number of non-virgins who married, which is 200. If so, if a woman who married without a public report were to come before us, the determination would still be that she is a virgin. Here the majority was not undermined. At a majority of 75% of the virgins who marry with a public report, the two quantities would be equal, and we would remain in a state of doubt.
Thus, regarding Rabbi Shmuel Rozovsky's question itself, one may answer that Jewish law determines categorically that when a majority has been undermined, we do not take it into account. We have no possibility of conducting a statistical survey in every case that comes before a religious court in order to know the ratio between these majorities, and therefore the assumption is that the situation is balanced and there is no way to decide on the basis of majority considerations. From the standpoint of Jewish law, this is a state of doubt.
It is true that if we conducted a specific survey for a particular case and discovered the proportion of each of these majorities, we could decide the question by the consideration set out above. But so long as we have not conducted such a survey, the determination is that this is a doubt.
Explanation in Terms of the Representativeness Fallacy
As stated, the reliability of the majority we are using depends on the strength of the opposing majority. The result depends on the ratio between the two competing majorities. This situation may be seen as parallel to those I described at the beginning of the article, and Rabbi Shmuel Rozovsky's question reflects the representativeness fallacy. The relation between the two majorities reflects prevalence as against test reliability.
What we really want to examine is whether the woman before us is a virgin or not. The prevalence of the phenomenon (virginity) in the population is 80%. What test are we using? A test through the circulation of a public report (if there is a public report, she is a virgin, and if not, then she is not). What is the reliability of the test? This is determined by the opposing majority. If most women who marry as virgins have a public report, this means that the test is highly reliable. Its reliability is equal to the strength of the opposing majority. If we are dealing with 80% of the women who marry as virgins having a public report, then the reliability of the test is 80%. One can now see that this test is not a successful one, because the prevalence of the phenomenon it seeks to examine is similar to the reliability of the test.
If indeed the public report existed for a much larger majority of the women who marry as virgins, that is, if the test were much more reliable, one could rely on it even in order to examine a phenomenon that is not so prevalent. Everything depends on the relation between the majorities (test reliability versus prevalence of the phenomenon). Once again we discover that the resolution of our microscope must be suited to the size of the observed phenomenon.
We have seen that Jewish law does not take account of the detailed calculation, that is, of the ratio between the majorities, since in different cases the results are different. The halakhic solution is that when there is a representativeness fallacy the majority is undermined and one cannot rely on it, and the situation is defined in Jewish law as a doubt.
This is additional support for the reservations raised by halakhic decisors regarding DNA tests or other tests in legal-halakhic contexts. The problem is not only purely halakhic but also scientific-statistical. It is true that if we have data about prevalence and reliability we may perhaps arrive at a different concrete conclusion in this particular case, but there is logic in establishing a sweeping and uniform rule for the sake of coherence and simplicity in Jewish law.
I should note that awareness of this fallacy may explain several additional difficult points in the medieval authorities regarding the passages in Ketubbot and Yevamot, but I will not enter into that here.
The Nature of a Statistical Fallacy: On Mathematics and Psychology
Many may feel that treating the passages in Yevamot and Ketubbot in terms of the representativeness fallacy is unnecessary. The conclusion of the passages is clear even without using statistical tools. On the other hand, in the two contexts described at the beginning of my remarks (the legal and the medical), which are entirely equivalent to the Talmudic situations, it seems that many may fail. The example of Munchausen syndrome, which sent many women to prison though they had done no wrong, is a good example of this.
Wonderful are the ways of our psychology, and it is not clear why in certain cases we do not fall into this fallacy and in others we do. There are cases in which it is easy for us to see the answer, and in other cases it is very easy for us to err. It is important to understand that a statistical fallacy is a psychological phenomenon and not a statistical one. The structure of our thinking causes us to fail, and it is difficult to know exactly when this happens and when it does not. The Nobel Prize was awarded to Kahneman in the field of economics and not in the field of mathematics. The mathematical discovery is not so impressive, but the psychological discovery (how much people, including experts, may fail in their thinking) is the main point. Both Kahneman and his research partner Amos Tversky are psychologists by training.
As an anecdote, I will add here that Pascal's famous wager (in favor of observing the commandments of religion) also suffers from a similar fallacy[10]. Pascal was one of the founding fathers of probability, and yet he too fell into this fallacy. It turns out that even experts in probability have psychology.
Therefore, the fact that the Talmudic examples seem simpler to us should not mislead us. We are liable to fall into this fallacy, and sometimes the results are disastrous: medically, legally, or humanly.
Practical Conclusions
Many are unaware that judges, and even senior and excellent physicians, may err in their statistical judgment. Sometimes we are advised to take a test for some disease when the reliability of the test is no greater than the prevalence of the disease. In such a situation there is no point in taking the test. True, if it shows that the person is healthy, then most likely he is indeed healthy, but if he is diagnosed as ill, he should not rely on the results of this test (although see the two qualifications noted above).
As I explained, these remarks apply only in a place where the possibilities of error are symmetric, that is, where the error in the test is such that both a sick person may come out healthy and a healthy person may come out sick. By contrast, when the error is one-directional, that is, when only sick people may come out healthy, but not the reverse, there is no impediment to conducting such tests and taking their results into account.
In such cases it is desirable to ask the physicians (and to make sure they are answering from knowledge) whether the chance of error is one-directional or two-directional, and whether there is reliable information about the prevalence of the disease. It is also desirable to ask how they arrived at the data regarding the prevalence of the disease, for those data too may be based on statistical considerations of this mistaken kind (that is, on the results of such tests). These remarks are of course relevant to the physicians themselves as well (see the case of Munchausen syndrome).
As stated, these remarks also apply to the law of evidence in the legal/halakhic context. In these contexts different kinds of evidence are brought before courts, such as the results of genetic tests and the like. For example, let us think of a certain person who is covered by medical insurance with a certain insurance company. He is now tested and diagnosed as suffering from a disease included in the insurance coverage. The case comes before the judge, who must decide whether the insurance company has to pay the insured or not. Usually, if a medical expert comes and says that this test is 99% reliable, the judge will accept his expert's testimony and order the insurance company to pay. But, as stated, when we are dealing with a disease of low prevalence (that is, of the same order of magnitude as the reliability of the test), there is no real basis for this. This is a mistaken legal ruling (for the burden of proof rests on the claimant).
The same applies to tests carried out before a proposed match. A person is tested and found to suffer from some disease, and now the question arises whether to cancel the match, or perhaps even to annul a marriage on the basis of a mistaken transaction[11]. Here too, one must exercise statistical judgment (and Torah-legal judgment as well, of course) with great caution.
When taking into account the results of a statistical test, it is important to be aware of this fallacy (and of additional fallacies), for such an error may have disastrous consequences. One must not place automatic trust in experts, especially when they are experts in medicine or law, when the problem is a statistical one.
And above all, both in the legal context and in the medical context, it is recommended to make use of other independent evidence as an addition to the statistical consideration. The decision to send a woman to prison, or to cancel a match, or to impose payment, on the basis of a statistical consideration of this kind may turn out to be a grave mistake.
Summary
The representativeness fallacy may cause great confusion when weighing the force and reliability of statistical tests for rare phenomena. It has a great many implications. We have seen several implications regarding medical tests for diseases whose prevalence is of the same order of magnitude as the reliability of the test (provided only that the unreliability is two-directional): in most cases they have no meaning whatsoever, and there is no point in carrying them out unless there is support from additional directions. In the legal sphere as well, in the law of evidence, it is important to take the representativeness fallacy into account and to make use also of direct and independent evidence.
From inquiries I have made, physicians today are indeed taught, as part of their professional education, the field of statistical fallacies. As for lawyers and judges, this certainly is not part of their professional training. The need for this is perhaps the most important conclusion to be drawn from the picture presented here. As I have shown, this is no more than an added layer of analysis and conceptualization of passages that are studied in any event as part of training for the rabbinate judiciary (the passages in Ketubbot and Yevamot); but it is important to point out the more general significance of a majority that has been undermined, and the additional conclusions that emerge from it. This would provide better preparation of judges for their role when they come to weigh evidence brought before them.
- 1. On this matter, it is recommended to see the book by Varda Liberman and Amos Tversky, Critical Thinking: Statistical Considerations and Intuitive Judgment, The Open University, 1996.
One may also consult the article by Gerd Gigerenzer and his colleagues:
'Helping Doctors and Patients Make Sense of Health Statistics', Gerd Gigerenzer, Wolfgang Gaissmaier, Elke Kurz-Milcke, Lisa M. Schwartz, and Steven Woloshin, PSYCHOLOGICAL SCIENCE IN THE PUBLIC INTEREST, Vol.8 No. 2, pp. 53-96.
The article is also available on the internet, at:
http://www.psychologicalscience.org/journals/pspi/pspi_8_2_article.pdf .
For a Hebrew article that contains part of the material, see: Gil Gringroz, 'Medical Statistics – How to Understand Medical Information Better?', on the Homo Sapiens website.
- 2. See chapter 6 in the book by Liberman and Tversky, and the two articles mentioned in the previous note.
- 3. See Rabbi Wosner's responsum to the Border Police rabbi, "Halakhic Identification by DNA Testing," Tehumin 21 (2001) 121. Also see, at greater length, A. Westreich, "Medicine and the Natural Sciences in the Rulings of the Rabbinical Courts," Mishpatim 26 (1996), pp. 425-492. See also: D. Frimer, "Determining Paternity by Blood-Type Testing [in the A, B, O system] in Israeli Law and in Jewish Law," Assia 5 (1986), p. 185; M. Halperin, H. Brautbar, D. Nelken, "Determining Paternity by Means of the Central Tissue-Matching System," Tehumin 4 (1983), p. 431.
- 4. See a report and analysis in Tal Galili's article, 'How (Non-)Statistical Thinking Sends a Woman to Prison – The Story of Sally Clark', on the The Hitchhiker's Guide to Statistics website:
http://www.biostatistics.co.il/?p=20
- 5. Regarding this controversial syndrome, see the relevant entry in Wikipedia. Also see a short article by Professor Zvi Zamishlany, 'What on Earth Is Munchausen Syndrome?', on the mako website,
http://www.mako.co.il/news-columns/Article-e06d4aab8d97221004.htm .
Also see the article by Professor Esther Herzog, 'The Syndrome That Never Was', in the 'Opinions' section of the ynet website, dated 19.7.2009.
Regarding the sequence of events in the Sally Clark case, see the website set up in her honor: http://www.sallyclark.org.uk/.
- 6. See Wikipedia, s.v. 'Confession'. Regarding confession in Jewish law, see Michael Vigoda's article, "Confession in Jewish Law," on the 'Daat' website:
http://www.daat.ac.il/mishpat-ivri/havat/48-2.htm, and the sources cited there.
- 7. See A. Kirshenbaum, "Self-Conviction in Jewish Law," Jerusalem 2005, p. 523 and the surrounding discussion. Kirshenbaum recommends adopting this doctrine in Jewish law as well.
- 8. Compare, however, his remarks in the Laws of Testimony 12:2; the issue is an old one.
[9] See also the article by Judge Dalia Dorner, "The Queen of Evidence v. Tarek Nujidat – On the Danger of False Confessions and How to Cope with It," HaSanegor 95, February 2005, and the sources cited there. Also online.
- 9. See my book God Plays Dice, Yediot Books, Tel Aviv 2011, pp. 104-112. There I explained the fallacy somewhat differently: the criterion of expected value is not effective for decision-making if the probability of obtaining the expectation is low. This too can be seen, in another way, as a kind of representativeness fallacy, but this is not the place to elaborate.
- Regarding annulment of marriage on the claim of mistaken transaction because of defects or illness, see Babylonian Talmud Ketubbot 73b, and Maimonides, Ishut 25:2, and the Tur and Shulchan Arukh, Even Ha-Ezer 117:4. Likewise, regarding defects in the husband, see Rabbenu Simhah of Speyer, whose words are cited in the responsa of Maharam of Rothenburg (Cremona ed.), sec. 77, and in Or Zarua part I, sec. 761; responsa Havot Yair sec. 221; Beit Shmuel sec. 154, subsec. 2; novellae of Beit Ha-Levi sec. 3; Rabbi Isaac Herzog, Pesakim u-Ketavim, vol. 7, Even Ha-Ezer part I sec. 81; responsa Beit Av, seventh series, part Ezrat Avraham on Even Ha-Ezer sec. 27; Rabbi Shlomo Zalman Auerbach, whose words are cited in Nishmat Avraham, Even Ha-Ezer part I sec. 39, subsec. 1. See also responsa Ein Yitzhak, Even Ha-Ezer part I sec. 24. Also responsa Tashbetz part I sec. 1; Bah on Even Ha-Ezer there; Beit Meir there, subsec. 1; responsa Shevut Yaakov part I sec. 101; responsa Yeri'ot Shlomo part I sec. 8; Pirushei Ivra, p. 41 and onward; Hazon Ish, Even Ha-Ezer sec. 69, subsec. 23; responsa Minhat Yitzhak part VII sec. 128. See also the article by Rabbi D. Bass, Tehumin, 24, 2004, p. 194 and onward.
Discussion
And another point: I did not understand what you are adding to Rabbi Shmuel Rozovsky’s answer. He himself brings what you suggest—that if the ratio between the two majorities is different, then the Gemara is understandable—but he rejects it and writes, “and it is forced to say… and this requires further investigation.”
I have no idea. I do not think that nowadays people speak about a woman who gives birth at seven months, but rather about a premature birth. If so, then clearly there is no difference.
I no longer remember the details. But as far as I recall, I wrote exactly that: that Rabbi Shmuel Rozovsky’s intention is what I was saying.
Thank you very much for the article.
An innocent question: In the Yevamot passage, regarding when the pregnancy is recognizable—women who give birth at seven months: is their pregnancy not recognizable after a third of their term??? From a medical reality standpoint, it seems to me there is no difference, and for both women (whether they give birth at 9 months or at 7) the pregnancy is recognizable after 3 months.