חדש באתר: NotebookLM עם כל תכני הרב מיכאל אברהם. דומה למיכי בוט.

Simplism in Simple Statistical Forecasts (Column 473)

Back to list  |  🌐 עברית  |  ℹ About
Originally published:
This is an English translation (originally created with ChatGPT 5 Thinking). Read the original Hebrew version.

I’ve just finished reading Jim Holt’s book, which deals with scientific, philosophical, and mathematical issues that have triggered intellectual revolutions and shifts in worldviews. I usually don’t read popular science literature, because it’s quite hard to write interesting, high-quality popular science. Such books often focus on anecdotes and gossip about thinkers and scientists, and on very superficial, top-down descriptions of ideas which—without professional understanding—confuse more than they help. They often aspire to present the philosophical implications of scientific insights, but in practice it usually comes out rather silly (scientists also do plenty of silly things and embrace populist “philosophies” when interpreting scientific results and pointing to their implications for our lives and thought). This book, in part, does those things too, but here and there it also enters the arguments themselves (at least at the popular level), and I really didn’t find gross errors. That, too, is quite rare.

One of the points that got me thinking is a simple argument that appears at the beginning of the fourth chapter (and a variant returns later on). As Holt writes, these arguments are rather astonishing because they are extremely minimalist while their conclusions are very far-reaching. This is exactly the sort of argument I’m fond of. It reminded me of another example.

A Way to Increase State Revenues

I once saw a lecture by Bibi, where he explained why raising taxes does not necessarily increase state revenues, and that lowering taxes can actually do so. He drew axes on the board, with the Y-axis marking the amount of money in the state’s coffers (revenues only), and the X-axis marking tax rates. Now, suppose you set a 0% tax—what are the revenues? Of course, 0. Suppose you set a 100% tax—again, revenues are 0 (no one will work if there’s no profit in it). Thus, the graph of state revenues as a function of the tax rate should look roughly like this: somewhere between the origin and the point (1,0) there must be some maximum (assuming state revenues can’t be negative), and according to him, world experience shows it’s located around 30%.

What does that mean? If the current tax rate is about 50%, then the way to increase state revenues is to lower the tax rate. And if the current tax rate is about 20%, then the way to increase revenues is to raise the tax rate. This is, of course, a simplistic and imprecise argument, but I quite like it because very minimal, simple assumptions lead to an interesting result (perhaps somewhat counter-intuitive).

Back to our topic.

Estimating the Lifespan of Phenomena

At the beginning of chapter four Holt presents an argument first formulated by Princeton astrophysicist J. Richard Gott in a 1993 Nature paper. He assumes the Copernican principle: we are probably not special. From this it follows that if we are acquainted with some phenomenon, then we are likely neither among the first to experience it nor among the last. That is, this moment is neither among the earliest moments of its existence nor among its latest. Note the conclusion that emerges from this simple assumption.

Suppose there’s a Broadway show that has already been performed n times. I, watching it now, am probably not among the first 2.5% who saw it, nor among the last 2.5% (true for 95% of observers). I can therefore determine, with a 95% confidence interval, that I am somewhere within the middle 95%. Hence one may state with 95% confidence that the show will continue for at least another n/39 performances (otherwise I’d be in the first 2.5%), and for no more than another 39n performances (otherwise I’d be in the last 2.5%).

For example, if the show has had 100 performances so far, it will continue at least another two or three, and no more than about 3,900.

Likewise for humanity. (For simplicity, I’ll speak of “humanity” from here on.) Suppose that in its present form it has existed for ~10,000 years. Then, with 95% confidence, it will continue to exist at least another 250 years and no more than 400,000 years. He estimates, in this way, the lifespan of the Internet, the presence of numbers in our lives, and more. Of course, the older a phenomenon is, the longer its expected remaining lifespan. He even provides several examples that corroborate this result.

This really is a powerful argument. With minimal, reasonable assumptions and a single simple consideration, we arrive at dramatic and surprising results with implications across many fields.

Objections

A first problem with this formula is that we arbitrarily chose 95% as the “normal” yardstick, and 5% as the “special” yardstick. We could just as well have chosen 99% versus 1%, and then the results would need to be divided and multiplied accordingly. But that’s not really an objection, since the same question can be addressed at different confidence levels. One may say, with 95% confidence, that our lifespan will be between X and Y, and with higher confidence that it will be longer. Note that the confidence concerns the entire interval, not just the upper or lower bound. Otherwise we’d end up with 99% confidence that we’ll live between 250 and 400,000 years, and 100% confidence we’ll live a million years, which is absurd. Naturally, assuming we are in the “special” 1% gives less confidence. The assertion that we are not in the special 5% is simpler and safer than assuming we are not in the special 1%, hence it yields a weaker result.

A second issue: as time goes on, the estimates themselves lengthen. If we wait another year and humanity is still on stage, then—surprise—the lifespan estimates will lengthen too. Yet this isn’t really a problem. It’s a dynamic estimate that constantly changes; as we have more information, our estimates can be updated. If we tossed a die without knowing the result, the chance to get a five is one-sixth; if we later learn the outcome is odd, the chance becomes one-third. The longer life goes on, the more information we have, and thus it’s reasonable to update our statistical estimates.

The real problem here is that I already know today, with high probability, that I’ll live another thousand years—i.e., I already possess information about where I will be then. Why not use it now and update my estimate? You see this leads to a divergence toward infinity. If we look at the other side of the time axis and ask what estimates our ancestors would have made (assuming they performed them) back when humanity had existed for only a thousand years, we’d get rather poor results. A thousand years ago, the estimate would have been between 25 and 40,000 years. Fine, that’s still reasonable. But what about 9,900 years ago? Back then humanity was only a hundred years old, so the estimate should have been between two and a half and four thousand years. Yet we already know today that this estimate is wrong. If so, why should we trust our current estimates if we’re already building things today that our descendants will know are wrong, and toss them in the trash?

This isn’t a technical quibble but a problem inherent to phenomena like humanity. It’s obviously hard to bound such estimates because of ambiguity in definitions: at which point in the evolutionary process do I define the creature as “human”? If we define the creature that existed a million years ago as human, then the estimated remaining lifespan of the human species increases dramatically. At the same 95% confidence, we would now expect between 25,000 and 40,000,000 more years. Does this contradict the earlier estimate? Not necessarily. It yields an optimistic estimate, while the previous one is more pessimistic. But notice: if we take the minimal estimate from the optimistic one—say, 250,000 years—that implies we are very special hominids, which breaks the Copernican principle as assumed in the second estimate. Thus there is, in fact, a contradiction between the two estimates.

A similar problem arises regarding the end point. It’s not easy to define when humanity is considered extinct. After significant evolutionary changes, would we still be “human”? Are we ourselves truly the continuation of the caveman, or is “scientific man” already a new creature? If you like, there are names for this: Generation X, Y, Z, and so on.

At root lies this question: every entity is special in some respects and not in others. The question is whether the respect in which you apply the Copernican principle is indeed not a special one. Perhaps we mistakenly applied it over our particular axis, rendering the estimate not worth much. Hominids may not be very special, but humans could be more special. If you dig through all relevant data about any person, you’ll always find something special: for example, the gematria (numerical value) of his name equals exactly his age; or he lives exactly 10 km from his mother. The probability of such things is slim, but it clearly happens for some. Note that being exactly in the middle is also very special. I could propose, with the same confidence interval, that I am neither in the middle, nor in one-third nor two-thirds of the human period. Each such placement is very special, hence unlikely that I live in it. But if so, by definition I always live in a very special region—and thus the Copernican principle kills itself.

All of this hides an assumption that the process by which such phenomena go extinct is random and uniformly distributed over their lifetime. That’s a very strong assumption, and I doubt it’s generally correct. Humanity today might be easier to wipe out than five hundred years ago, because we possess weapons that could end the entire story in seconds. On the other hand, humanity is bigger and more distributed, hence harder to wipe out. Suppose we’re on the brink of nuclear war, with a 50% chance it breaks out. Would it be right to estimate our expected survival under such circumstances using Gott’s formula? Not really. What’s the chance that we are on the brink of such a war? What’s the chance it will break out? Is the distribution of such events uniform? We have no way to know. In the absence of other information one might assume a uniform distribution, but I wouldn’t base anything on that.

Judgment Day Draws Near

On p. 340 Holt describes what is called the “Doomsday Argument.” I don’t know why astrophysicists formulate such arguments, but facts are facts: this one too was raised by an astrophysicist—Brandon Carter of Australia—at a meeting of the Royal Society in London in 1983.

Shall we begin the thought experiment? The astrophysicist presumably thinks that humanity is likely on the verge of extinction.

The argument goes roughly like this. Let’s assume an optimistic future for humanity: it will survive many more generations. The Earth’s population will stabilize at a reasonable number of about fifteen billion; then, as we grow, we will settle other stars in our galaxy, and manage to expand the food supply to match the needs of all humanity. Let’s say that every decade humanity grows by a billion people, until the sun dies out (a reasonable estimate). Suppose, in total, humanity across all generations will number only about fifty billion people before the curtain falls.

If that’s the case, then, according to the Copernican principle, we are exceedingly special: roughly 0.00001 of all humans—wow, admit it, that’s very special. In contrast, if humanity will go extinct soon, then it’s very plausible that our current generation is precisely the largest generation. It’s very plausible we’re living in the last generation, because that’s the likeliest moment.

Holt, who was very impressed by the previous argument, for some reason notes that this one contradicts it head-on. From the fact that we are not special (the Copernican principle), the earlier argument led to the conclusion that our existence will continue for about 40 more times our current age, not end in a few generations. Here, the same mode of reasoning leads to a completely different conclusion, and in many ways the exact opposite. It seems there’s something rotten in the state of Denmark… How can that be?

Numbers of People and the Time Axis: The Wonders of Exponential Processes

In the past I mentioned the uniqueness of exponential processes (in particular regarding COVID spread). Here’s a nice illustration. Think of a regular sheet of paper. You fold it in half, then again in half, and so on, forty times. What will the thickness be? For simplicity, assume the sheet’s original thickness is 1 mm. Each fold doubles the thickness: after one fold, 2 mm; after the next, 4 mm; then 8, 16, and so on. After 40 folds, the total thickness is 240 mm. Converting to kilometers (divide by a million), that’s about 220 km; and 210 is roughly a thousand, so here we have roughly a million kilometers—almost three times the distance between Earth and the Moon (!!). All from forty folds of a 1-mm sheet.

What’s the point? Humanity grows in an exponential process, doubling itself every few generations (of course there are disasters and extinctions, and growth rates vary by place and time; I’m just illustrating a theoretical point). In such a process, at each generational stage there are as many individuals as the total number who existed up to that point. Today there are about ten billion people on the globe, which is on the order of the total number of people who have lived until now (I’ve read estimates of about forty billion). If so, the last generation won’t be so special, since the number of people in it is comparable to all people who existed throughout history. This allows us to base the estimate on numbers of people rather than the time axis. Of course one can translate it back to the time axis, and thereby resolve the apparent contradiction between the two calculations above.

(There’s a well-known story about the inventor of chess. The Persian shah offered him any reward he asked for in gratitude for inventing the game, and he asked for a chessboard—8×8 squares—with one grain of wheat on the first square, two on the second, four on the third, and so on across the 64 squares. There wasn’t enough grain in the entire kingdom to pay his request. For illustrated explanations of this story and exponential processes, see: link.)

Back to Simplistic Considerations

Here’s another example of the claim above that it’s hard to speak of the Copernican principle because every person has specialness along some axes and not others. In our case, I can be very special on the time axis (e.g., living in the last generation), yet not special regarding the headcount axis (since in my generation lives about a quarter of all humanity).

Let me stress: I myself don’t accept Holt’s approach. But all this is said just to ask whether the claim that Gott’s approach is correct is itself correct. I think the explanations I presented above apply here as well. For example, the assumption that I am “not special” implies that my soul is drawn at random from some pool and tossed into the world at some stage, with the lottery being uniform (each stage equally likely). I see no real basis for that assumption. In such a context one could perhaps speak of plausibility (in my opinion, even that is dubious, since we have no information about the process), certainly not of probability. And besides: if I myself had been “drawn” in the 12th century BCE, would I still have been me? In what sense? It would be a different person altogether—born at a different time and environment. By that definition, the chance that I would be born exactly now is 1. Asking “what would Maimonides have said if he lived today?” is like asking whether he would still be Maimonides—it’s undefined. He would not have been Maimonides, but someone else.

Incidentally, Holt notes (p. 341) that Brandon Carter coined the term “anthropic principle” about a decade earlier (in the 1970s). To my surprise, the doomsday argument bears many similarities to those ideas. Despite the charm of their simplicity (I discussed this in my first book, God Plays Dice, and also in the third conversation in my book The First Being), I still recommend a healthy dose of skepticism toward elegant, simple arguments. Sometimes a person hits upon a simple and correct insight—evolutionary theory is like that; as the philosopher Malcolm put it, it’s an “eye-opening tautology.” But as a rule, far-reaching conclusions (alas) require heavier argumentative work; “according to the pain is the reward.” Grand conclusions (that burst with profligacy) generally demand more thought than minimalist arguments. So before you follow Holt and run with an argument like this, it’s definitely worth giving it another check.

Homework for readers: Try to raise objections to Bibi’s argument presented at the beginning of the column.

Source: translated from the attached PDF. :contentReference[oaicite:0]{index=0}

Discussion

Shlomi (2022-05-08)

In the context of Bibi's argument, the argument assumes there is a single maximum, whereas it is certainly possible (and even likely) that there are several maxima, and therefore also at least one minimum. Practically speaking, the argument is not very useful; what it says is that there is some optimal tax rate (from the standpoint of state revenues), which is a fairly trivial claim. The important question is what that optimal tax rate is, which of course can vary from one economy to another and with the macroeconomic situation.
In short, the less information the model contains (correct assumptions about reality), the less useful it is.

Michi (2022-05-08)

That is the weakest criticism. It's not even entirely correct, because most likely there is only one maximum, and in any case it at least proves that an increase in taxes does not necessarily increase revenues. That is the main claim.
I also really don't agree that less information is less helpful. Here too there is a more complex process that has its own optimum.

Tirgitz (2022-05-08)

I haven't yet looked into it, but one remark caught my eye. You wrote that in your view, when we have no information at all about the distribution process, one cannot even speak of probability. Apropos what you mentioned at the end, about parallels to discussions of God and creationism: regarding the proof from the uniqueness of the system of laws, I thought you did argue that one can claim uniqueness without any information at all about the distribution process. What's the difference?

The Last Posek (2022-05-08)

If the assumption is that we are not special, then it makes no difference at all whether what is happening to us is happening for the first time or the last time, with a probability of 50% or a probability of 1 in a trillion, according to statistical rules or contrary to them. None of these changes anything. After all, we are not special.

Therefore this whole discussion is unnecessary.

Michi (2022-05-08)

When the process is entirely unknown to us, but there is some process there, there is no point in assuming a uniform distribution. As I noted, that is at most a default, and not one I would build much on. But in the physico-theological argument there is an assumption that the formation of the world is pure chance out of absolute nothingness (otherwise the question would remain of what created what existed before). In such a situation, the assumption of a uniform distribution is the most reasonable and sensible one. A non-uniform distribution requires a reason. In the lottery of souls, if it is conducted by the Holy One, blessed be He, or by some other mechanism, there is a reason, and one would need to know that reason in order to say anything about it.

Tirgitz (2022-05-08)

It's complicated for me, but I'll try to feel my way a bit further. I find it hard to see the distinction between a uniform distribution and a non-uniform one, but I'll grant it for now (because it's an idea that needs reflection) and ask differently – seemingly, a uniform distribution (which fits symmetry considerations) is actually much more special than some non-uniform distribution.
In addition, and I hope I'm not mistaken and muddling things up, seemingly regarding majority in matters of prohibition, where there too there are mechanisms pushing toward stringency, you also said that one basically assumes some kind of uniform distribution, and therefore the halfway line has significance.

Michi (2022-05-08)

Exactly. That is why, in the absence of other information, one assumes a uniform distribution. It is the simplest and most symmetric.
As for halakhah in matters of prohibition, each case is judged on its own merits. But there one does not follow only the statistical consideration, but also legal-halakhic rules (for example, there is an aspiration to simplicity. There are meta-legal principles that have an influence, etc.).

Tirgitz (2022-05-08)

If it is the simplest and most symmetric, then it is the most special of all, and yet? Sustain me, etc.

Michi (2022-05-08)

We are not randomizing distributions. The distribution governs the randomization. The uniform distribution is the simplest, and therefore we assume it. Just as fitting points along a straight line is preferable to fitting them along a sine wave, even though you could say that the straight line is the simplest and therefore the most special.

Tirgitz (2022-05-08)

Seemingly, from your straight-line example, on the contrary: since one sees that there is a simple and special line that approximately fits what is there, that is precisely why it is reasonable that it is not by chance. But we would not be able to assume from the outset that a certain phenomenon will fall on a straight line without any grounding. I understand that you are saying that considerations of simplicity are entirely a priori, but how does the line show that?
(I reflected before the previous comment about randomizing distributions, but couldn't get anywhere, and I am still wondering.)

Michi (2022-05-09)

I don't really understand what the discussion is about. Do you dispute that, in the absence of other information, it is reasonable to assume a uniform distribution? Why make a distinction between outcomes? If one knows of no differences among outcomes in the sample space, the most reasonable assumption is that they all have the same weight. I don't know what more there is to add.

Tirgitz (2022-05-09)

But you are the one who holds that even in the absence of information it is not reasonable to assume a uniform distribution regarding souls. And you explained that this is because there is an unknown process, and only in an emergence from absolute nothingness would systems of laws have been expected to emerge with a uniform distribution, and therefore from the uniqueness of the system there is a proof of creation.
I still do not have a settled opinion, and perhaps there is a difference between before the events occur (when, if one calculates an expectation, one probably has to assume a uniform distribution) and after it has happened (when it is very hard to piously assume that it was supposed to happen according to a uniform distribution). In any case, I asked within your framework, and if it has been exhausted, it has been exhausted.

Michi (2022-05-09)

Exactly. And I explained the distinction. In a random process the distribution is uniform. In a process of choice there is no reason to assume דווקא that. I also added that perhaps that is what I would assume in the absence of information, but I would not build anything on it.
It seems to me we have exhausted the matter.

Tirgitz (2022-05-09)

Could you just clarify for me whether I understood correctly that in an emergence from nothingness (if we assume that is possible, for the sake of the physico-theological proof independently of the cosmological one) you are making the positive claim that there would be a uniform distribution there (and this is a critical claim for the proof), and not merely a conjecture based on lack of knowledge.

Michi (2022-05-09)

Yes. If it is from nothingness, then it should be treated as a uniform distribution.

Tirgitz (2022-07-27)

Let the interested reader consult https://mikyab.net/posts/63572#_ftn14.

השאר תגובה

Back to top button