Q&A: The distribution of the hundreds digit in the census of the tribes of Israel in the wilderness
The distribution of the hundreds digit in the census of the tribes of Israel in the wilderness
Question
Hello Rabbi,
I heard an idea from the Bible scholar Maor Ovadia, who heard it from the researcher Dr. Neria Klein, that the hundreds digit in the census of the tribes of Israel in the wilderness is not uniformly distributed. These are the census numbers:
Numbers 1
Numbers 26 (the plains of Moab)
Reuben: first count – 46,500, second count – 43,730
Simeon: first count – 59,300, second count – 22,200
Gad: first count – 45,650, second count – 40,500
Judah: first count – 74,600, second count – 76,500
Issachar: first count – 54,400, second count – 64,300
Zebulun: first count – 57,400, second count – 60,500
Ephraim: first count – 40,500, second count – 32,500
Manasseh: first count – 32,200, second count – 52,700
Benjamin: first count – 35,400, second count – 45,600
Dan: first count – 62,700, second count – 64,400
Asher: first count – 41,500, second count – 53,400
Naphtali: first count – 53,400, second count – 45,400
Total: first count – 603,550, second count – 601,730
You can see that the hundreds digit is never one of the digits 0, 1, 8, or 9 (the two edge digits on each side). Statistically, the probability that a random series of 24 digits would never include even one of those digits is 0.6 to the 24th power, or about 1 in 212,000. In other words, a very small probability. In your opinion, what is the significance of this? Or what can be learned from it?
Best regards,
Answer
I don’t see what can be learned from this. Probabilities like that don’t say much. Any combination that came out there would be rare, just like in dice rolls.
Discussion on Answer
I don’t have time to get into it right now, but at first glance it sounds very dubious to me. In any case, there are reservations there, such as expansion across several orders of magnitude, the difference between a leading digit and the digits that follow it, and all of that needs to be checked here as well (for example, there aren’t several orders of magnitude here).
I’m not claiming that Benford's law applies here (it doesn’t). I only brought it as an example of drawing conclusions from statistical irregularities in data, and people don’t say there that you can’t do that because every data series is statistically irregular to the same degree (like a sequence of dice rolls). What I’m trying to argue is that if the numbers described in the Torah really represented an actual census of the tribes, then the hundreds digit of that census should have been uniformly distributed. If you run an experiment and generate a series of 24 digits and check how often the digits 0, 1, 8, and 9 are absent, that will happen once in 200,000 series. So apparently one can infer from this that the numbers described in the Torah are not real numbers (and that was also the conclusion of the scholars I mentioned above).
You gave an excellent illustration of my argument. This is a biased calculation. What is the probability that any four digits would be missing? Much higher. What is special about these particular digits? Nothing. Therefore the probability that these four specific ones would be missing is a meaningless number. It’s like the probability that the digit sequence would be exactly 1,7,1,4,4,9,8,8,8,1,5… That too is negligible.
What would you say if all digits appeared except for 4? The probability of that too is small (about 5%). Would you infer anything from that? And not because of the difference between 5% and the tiny probability you calculated, but because that calculation is meaningless.
Think about the birthday problem: https://he.m.wikipedia.org/wiki/%D7%A4%D7%A8%D7%93%D7%95%D7%A7%D7%A1_%D7%99%D7%95%D7%9D_%D7%94%D7%94%D7%95%D7%9C%D7%93%D7%AA.
As is well known, in a class of 23 people the probability that two of them share the same birthday is high (over 50%). But the probability that both were born on June 5 is utterly negligible. You have a class of 23, and two of them were born on June 5. That’s surely a miracle, right? According to your method, it can’t be chance. You understand that this is nonsense.
I recalculated the probability that any four digits would be missing, and it still comes out very low. I simply multiplied my previous result by the number of groups of 4 digits that can be chosen out of 10 possible digits. I used the binomial coefficient formula, 10 choose 4, and got 210. In other words, the new calculation shows that the probability that any four digits would be missing is 1 in 1,005. That is still a very low probability.
What this means is that God didn’t count people and then write the Torah, but the other way around.
To that I’d give the same answer. What is sacred specifically about four digits? What is the probability that n digits are missing? There is nothing unique specifically about four. It changes nothing.
Go back to birthdays. There is a group of ten people (not 23), and among them there are two with the same birthday, or even three. What do you learn from that? Nothing. That’s just what happened to come up here.
True, if they were all born on the same birthday, that really would say something. The difference is not only quantitative. It turns that day into something special, and when there is something special, that calls for explanation. In all the other cases there is nothing special about the result, and therefore it is rare but not anomalous.
In the fine-tuning argument, for example, the situation is different. The combination of the values of the constants creates life and biology and human beings. That is a special combination, and therefore its rarity calls for explanation. But if there had been some other combination of values that created nothing, that combination still would have been very rare (because every combination is rare to the same degree), but it would not raise any question. After all, something had to come out in the draw.
So why, when tax reports are analyzed and the leading-digit distribution differs from Benford's distribution, do people conclude that the reports are fake? According to your approach, that too should be just a rare case but not an anomaly and not special.
I wrote that I can’t get into Benford right now. You’d have to look into it and understand what the issue is.
Even without studying it, I’m fairly sure that nobody actually draws conclusions from that alone. At most it raises suspicion and then they investigate. If they infer forgery from that alone, that’s simply stupidity. One cannot deny that sometimes there is stupidity in the system,
like putting a woman in prison on the basis of Münchausen syndrome by proxy.
Regarding what you wrote above: “To that I’d give the same answer. What is sacred specifically about four digits? What is the probability that n digits are missing? There is nothing unique specifically about four. It changes nothing.”
I’m not claiming there is anything sacred about four digits. I’m claiming that the hundreds digit is supposed to be uniformly distributed, so that each digit has an equal probability of appearing. Once we see that there is a strong deviation from the expected distribution, that raises serious suspicion that there is some factor here that intentionally skewed the distribution.
I understood that. And as I wrote, I disagree. Think about the birthday example.
By the way, if all the digits had appeared, the probability of that would also be fairly small.
Let me sharpen it further. For a phenomenon to require an explanation, it is not enough that its probability is small (that it is rare). It also has to be special. Every chain of results in dice throws is rare, but not special—unlike a long chain consisting entirely of 5s.
There is nothing special about the absence of four digits, nor about these four specific ones. Therefore this calculation has no significance at all.
For the sake of discussion, if the hundreds digit had been only the digit 3, say, even then you wouldn’t see that as something special?
Then it would become something that is special and not merely rare. Like the example of the digit 5 in dice rolls. But that is exactly the boundary question (from what point does it become special and not merely rare), and I don’t know how to answer that. The illusions of statistics.
Let’s assume for the sake of discussion that the current distribution also counts as special. In your opinion, what can be learned from it? Or how can it be explained?
That is another question sitting in the background of the discussion. Why is all this important?
It seems to me that the intention is to argue from here that these numbers were not meant to describe the number of people but something else. Because the number of people is natural data and should have a normal distribution. But if this is special, then the author is using these numbers to convey some other message rather than to describe the number of people. But as I said, in my view all this is irrelevant here because it isn’t special.
There is another statistical problem in the distribution of the tens digit. In all cases it is 0, except in two cases where it is 5 in the first census for the tribe of Gad and 3 in the second census for the tribe of Reuben. If this were a matter of rounding, I would expect the digits 3 and 5 not to appear. And if the tens digit is not rounded, how can it be that it came out exactly 0 in 22 out of 24 cases?
Right. That stood out to me too.
I thought that perhaps in those two cases there was no rounding, but rather that was the exact number. Once they are already rounding, they do it to the hundreds, because that’s simpler.
Following up on this discussion, I recently came across a study on the way human beings choose numbers randomly. In the study, 8,500 students were asked to choose a number between 1 and 10, and this was the resulting distribution:
The digit zero – 0.5% or 47 people
The digit 1 – 3.4% or 294 people
The digit 2 – 8.5% or 731 people
The digit 3 – 9.7% or 834 people
The digit 4 – 9.7% or 833 people
The digit 5 – 12.3% or 1,058 people
The digit 6 – 9.8% or 843 people
The digit 7 – 28% or 2,405 people
The digit 8 – 10.8% or 933 people
The digit 9 – 5.3% or 460 people
The number 10 – 1.9% or 166 people
Here is a link to the source:
Asking over 8500 students to pick a random number from 1 to 10 [OC]
byu/monkeymaster56 indataisbeautiful
You can see that the most common digits are the digits in the middle of the range—three through eight—and the digits 1, 2, 9, 10 (the two edge numbers on each side) are the rarest (let’s ignore zero for the moment because it’s not supposed to be in the range).
As a reminder, I showed above that the hundreds digit in the census of the tribes of Israel is never one of the digits 0, 1, 8, 9 (the two edge digits on each side). About that you wrote to me: “This is a biased calculation. What is the probability that any four digits would be missing? Much higher. What is special about those particular digits? Nothing.”
But in light of the study above, one actually can learn that the edge digits are rarer precisely when a person is the one choosing them (and not when they are the result of a random event, such as the number of people in a certain tribe). What do you think?
First, this finding itself may be culture-dependent, meaning dependent on place and time. So it is not clear how valid it is for ancient times and other cultures.
If it is true for every place and time, then it may perhaps change the conclusion, but it is not clear with what strength. You are trying to determine whether these data were written by someone or arose as natural data. For that purpose you need to use the formula of total probability (Bayes’ formula). This finding gives you the conditional probability that if a person chooses, this is the distribution. You are interested in the reverse conditional probability: if this is the distribution, what is the probability that a person chose it. To determine that, you need to know several more probabilities. And of course also the probability that this would occur by chance (and note, as I remarked above, the issue is not the probability that these specific digits would be missing, but that any three or four digits would be missing. There is nothing special about these particular digits).
Rabbi, in calculating the total distribution:
In practice, should one use Benford's law for the hypothesis that this is merely rounding overall, and the human choice data that Oren brought for the second hypothesis (typological numbers)?
Also, how much is it correct to assign to the prior probability of each hypothesis itself? And what about the presumption held by a believing person that it is something real unless proven otherwise?
Also, if the number is typological then it has some specific meaning, but that’s not like the study that speaks about random selection of digits, so in practice the probability of what came out is 1. But if so, that will always win… so how do you create an appropriate formula for this case?
Oren, how can it be that the digit 8 is absent throughout all the numbers? If it is actually among the most frequent?
I didn’t understand most of the questions. In general I’ll say that I don’t know how to assign numbers, and of course everyone will assign different numbers.
You said that in order to know statistically whether the list of numbers is real or typological, one has to examine it through the formula of total probability.
That is, we have two hypotheses: 1. This is a real list (or a real list that underwent rounding and approximation). 2. This is a “typological” list.
Now in order to calculate this, one needs to know the probability of each possibility.
That is:
P(E) = P(E/A)P(A) + P(E/~A)P(~A)
As I understand it,
P(E) is very rare and calculable, as was done above: 1 in 212,000.
Basically: p(A) and p(¬A) are connected to your a priori assumption regarding the writing of the Torah, how realistic the writer was.
P(E/A) is connected to Benford's law.
But,
how do you fill in P(E/~A), since this is not a matter of throwing random numbers around but a typological number, and you don’t know what it symbolizes…
I didn’t understand your question about the disappearance of the digit 8. From which numbers did it disappear?
Now it may be that I understood what you were actually asking. My claim is that in any range in which a person is allowed to generate a number, say from a to b, the distribution will not be uniform; rather, the closer the digits get to the edges, the lower their frequency will be. In the case of the hundreds digit in the census of the tribes of Israel, the digit 8 is second from the edge, and in the case of the experiment with the students, the digit 8 is third from the edge.
The digit 8 does not appear in any number, and not only in the hundreds digit—also not in the thousands or the tens of thousands. (In the numbers in the question.)
The point is that here you are not claiming that the writer just made a random draw and filled in quantities. Though that is another possibility.
Here you are suggesting reading them as numbers with additional meaning, in line with what was customary in the region. But if so, how does your calculation help?
Also, it seems to me that the random distribution of these numbers is valid nowadays, not necessarily in the past; one would have to check all the numbers in the Bible and in the ancient world… for example, what would happen in primitive times when counting amounted to distinguishing between one, two, and many… here we are talking about tribes for whom it is hard to remember many shades in large quantities.
The fact that one digit is missing is statistically much more plausible than four missing digits.
It is possible that there is a combination here of choosing numbers based on some meaning together with arbitrary choice. In other words, it is possible that the hundreds digit was chosen randomly, while the thousands and tens-of-thousands digits were chosen in order to convey some message.
In any case, my calculation is trying to show that these are not real numbers that actually existed in a given tribe, but rather numbers that were chosen in some way (randomly or meaningfully directed).
There is a statistical law called Benford's law, which says that the leading digit in certain numbers is supposed to appear more frequently as 1 than as the other digits. The frequency of 1 is about 30%, and of 9 about 5%. Here is a link to an explanation of the law on Wikipedia: https://he.wikipedia.org/wiki/%D7%97%D7%95%D7%A7_%D7%91%D7%A0%D7%A4%D7%95%D7%A8%D7%93
Law-enforcement authorities use this law to identify people cheating on their tax reports, because usually when people make up numbers they don’t obey Benford's law and instead distribute them uniformly, whereas real numbers in tax reports do obey Benford's law. But according to what you answered, any person caught cheating on a tax report could argue that any combination that came out in his fake tax report would have been just as rare as a sequence of dice rolls, and therefore no conclusions can be drawn from his statistical anomaly. So how is the above case of the tribes different from the use of Benford's law in tax reports?