A Look at Occam’s Razor (Column 426)

Originally published: November 7, 2021

This is an English translation (originally created with ChatGPT 5 Thinking). Read the original Hebrew version.

In this column I wanted to touch on the principle of Occam’s razor (Occam’s razor). It’s a principle I use—of course alongside many others—in various contexts, and it has a few confusing aspects that are worth discussing.

History

This principle is ancient, with roots in Greek philosophy and later periods, but it is attributed to William of Ockham, a 14th-century English Franciscan friar who first formulated and articulated it. William of Ockham’s principle concerns the number of entities we posit, and it states that entities should not be multiplied beyond necessity. In other words, a theory that posits fewer entities is the better theory. Put differently: whoever proposes a theory with more entities bears the burden of proof.

Over the generations, notions such as “simplicity,” “elegance,” “parsimony,” and “aesthetics” have been attached to this idea. That is, we prefer an explanation or theory that is simpler, more economical, or more elegant/aesthetic. These formulations, of course, go beyond counting how many entities a theory contains and essentially expand the original principle. Still, the idea is similar: a small number of entities is one example of a criterion that expresses a theory’s simplicity and elegance.

In earlier eras, justifications for this principle were grounded in the nature of God—that is, assumptions that God prefers simplicity and thus created His world in that way. Even without resorting to theology, scientists tend to think the world is structured simply; this is a metaphysical justification. It isn’t clear where this belief comes from if not from theology, though some hold it is an inference from our accumulated experience.

Those earlier conceptions assume that the razor guides us to truth—meaning, we choose the simpler theory because it is more likely to be true (not necessarily certainly true, but with a higher chance of being true). This is an ontic-metaphysical view of the principle. Today, however, I think most scientists and philosophers tend to see it as merely a methodological principle (see the Wikipedia entry mentioned above and, for example, here), i.e., it is not a claim about the world but a guideline for scientists and philosophers. According to this interpretation, when two theories explain the relevant body of facts, there is no reason to choose the more complex theory. On this view, we choose the simpler one not because it is more likely true, but because there is no reason to use a more complex theory when a simpler option exists—at least until the simpler theory is refuted. If that does happen, experience will force us to adopt a more complex theory.

I doubt you’ll be surprised if I add that this principle may stem from our intuition and that its status is akin to that of the principles of causality or induction, which are also part of the foundations of scientific (and general) thought. These, too, are principles without empirical (certainly not direct) grounding, and yet we assume them to be true—often without even noticing. Even on this view, Occam’s razor is not merely methodological but an ontic claim, i.e., a claim about the nature of the world. I’ll remind you that in my view intuition is a cognitive tool; the upshot of my suggestion is that we grasp these principles through (non-sensory) observation of the world itself.

The significance of these debates

At first glance these are merely theoretical disputes. We all agree the principle should be used; the question is only how to justify it. That’s a philosophical issue that, on the face of it, doesn’t matter to practitioners who use Occam’s razor. As I will show, however, that’s not the case. These debates have implications for the principle itself and for how it is used.

As noted, it is now common to think of the razor as merely methodological. Many argue that quantum theory or relativity are certainly not simple theories, so it’s hard to claim that the laws of nature are simple or that God created a simple world. For example, Newtonian mechanics and gravity are far simpler than quantum mechanics and relativity, and yet we now know the latter are the truer theories. This seems like a decisive argument against the validity of the razor, suggesting it is at most methodological rather than ontic-metaphysical.

This argument targets a very specific understanding of the principle. If the justification is theological, then indeed we’d expect the laws of nature not to be so complicated. The theological thesis is challenged by this argument. But the claim that the principle points to truth need not be based on theology. For example, the intuitive grounding I offered does not buckle under this argument. On my proposal, the razor does not say that the laws of nature or the world’s operation are simple, but that among two possible explanations of the body of facts, the simple explanation has a higher chance of being correct. Over the years since Newton formulated mechanics, it became clear that it cannot explain all the observed facts, and so it isn’t a candidate for a correct theory. Among those that do explain all known facts, we still choose the simplest. Quantum and relativistic theories, for all their complexity, are the simplest theories that fit the totality of facts known to us. As Sherlock Holmes said (The Sign of the Four): once you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Thus, different justifications for the principle yield different formulations of it. This is not merely a theoretical dispute about justification; these disputes have consequences for the principle’s content and its use.

It’s important to add that the picture is now more nuanced. On the one hand, the justification I propose yields an ontic-metaphysical conception of the principle. On the other hand, it is not a direct claim about the world (I’m not asserting that it is simple and elegant). My formulation looks, on the face of it, very similar to the methodological conception, since it doesn’t say anything about the simplicity of the true theories but only guides us in choosing among candidates. Haven’t I circled back, through the back door, to the methodological view?

Not quite. Suppose we have two possible theories, A and B, both explaining the body of facts, with A being the simpler. According to the methodological view, choosing A says nothing about its truth; the chance that A is true equals the chance that B is true. If the chances are equal, there is no reason to choose the complex theory, hence we prefer the simple. By contrast, on the ontic-metaphysical view, the choice of A rests on the assumption that it is more likely true. When facing such competing theories, scientists usually try to devise an experiment to decide between them. They look for an experiment that could deliver one of two possible outcomes: outcome a would fit A, and outcome b would fit B. What, then, is the a priori probability of obtaining outcome a? Advocates of the methodological view should answer that, a priori, the two outcomes are equally likely. We have no information to favor a over b, since A’s simplicity doesn’t mean it is more likely true. By contrast, on my ontic view, the probability of getting a is indeed higher. As noted, my directive to choose A for its simplicity is based on the assessment that A is more likely true than B. Of course this doesn’t mean outcome b cannot occur. That would have been the expectation under the theological justification (which claims the world itself is simple). On my view, the b outcome is possible, but less likely than a.

So on my account, too, the razor is a claim about the world, not just a methodological guideline—even though I am not claiming the laws of nature are necessarily simple. I claim that simplicity and elegance are indicators of factual truth. They do not guarantee it, but a simple theory has a higher probability of being true.

Proof

Let’s compare the two interpretations. To sharpen the discussion, I’ll focus on a concrete scientific topic, say Newton’s second law of mechanics. It states that there is a linear relation between the force on a body (F) and the acceleration it develops (a), with the proportionality constant being its mass (m): F = m_*a.

How can we arrive at this law empirically? Very simply: perform an experiment. Apply different forces to a body of given mass m and measure the acceleration in each case. Suppose we conducted such an experiment and obtained the results shown in the following graph:

The five empty circles are the results of the experiment. As the text in the graph notes, there are several ways to connect these points into a continuous curve. Two are shown (the solid straight line and the dashed line). Clearly the straight line appears simpler, and Occam’s razor tells us to choose it.

Two ways to interpret that choice now arise:

The methodological option — We choose the solid line even though it is no truer than the dashed one, simply because it is the simplest.
The ontic option — We choose the solid line because the simplest is probably also the truer (it has the higher likelihood).

Before continuing, let me note that no scientist on earth would seriously claim that the true curve is not the straight line. That’s the simplest option, and it’s obvious to any reasonable person that it is probably the correct one. No scientist would say it’s necessary, but all would agree it is the most plausible. In other words, any reasonable scientist will tell you that the accelerations in cases 6 and 7 will be those predicted by the solid straight line. In other words, scientists implicitly assume the ontic interpretation of Occam’s razor, not the methodological one.

If I ask such a scientist why they prefer the solid line, they’ll of course say it’s simpler. If I then ask why they choose the simpler line—are the laws of nature always simple?—they usually won’t have an answer. Pressed to the wall, they’ll fall back on the methodological option: the solid line isn’t any truer, but why use a complex curve when a simple one suffices? Yet if I then ask what result they’d bet on if we perform experiments at points 6 and 7, an honest answer must be that they expect the result predicted by the solid line. Their stance is thus ontic, not methodological.

We could stop here, but I want to push it one step further. Suppose I face a scientist who insists they have no expectation whatsoever for experiments 6 and 7: any outcome is equally likely, since the straight line is no more correct than the dashed; its chance of being right is the same as the dashed line’s, because simplicity is no criterion of truth. With such obstinacy, experiments won’t help: even if we get the straight-line outcomes, they’ll claim it’s a lucky coincidence. It could have been any other result as well.

In appendix B of my book God Plays Dice and in my article here, I offered an argument proving that such a stubborn scientist is mistaken. My claim is this: there are, in fact, infinitely many possible curves that “sew through” all the points in the graph. Therefore, the probability of obtaining precisely the straight-line outcomes for cases 6 and 7 is exactly 0. If, when we run the experiment, we do get those outcomes, that is indeed confirmation of the ontic thesis. Moreover, suppose we collected all the cases in the history of science where results lined up on a straight line, and in each instance where a further experiment was conducted we asked whether it, too, fell on the same straight line. Adherents of the methodological stance would predict that the number of such cases is negligible—in their view, it almost never happened (recall: the probability is exactly 0). Proponents of the ontic stance would say the probability is not negligible (they cannot say exactly how large; their claim is merely that the simple generalization is not a shot in the dark).

Let’s broaden the question further. Ask how many times in the history of science a tested scientific generalization yielded a prediction that was confirmed experimentally. This is essentially the same question, only now about scientific generalizations in general (each being the simplest theory under the circumstances), not just straight lines. Here, too, the methodological camp would have to say the number of such cases was negligible. The ontic camp, by contrast, will say there were quite a few.

At this point it should be clear that the ontic position is correct; the history of science corroborates it. If the methodological stance were right, no scientific generalization would ever be confirmed by any experiment, and we would have no general scientific laws. In other words, our scientific knowledge today would be akin to that of primordial humankind. If there has been scientific progress, it means generalizations work—not always, of course, but in a non-trivial number of cases. That suffices to refute the methodological stance. The number of confirmed generalizations is a measure of the quality of the intuitive cognition underlying Occam’s razor (the quality of our non-sensory “sight”).

In other words, the dispute between methodological and ontic camps is not merely philosophical-theoretical. It has practical significance and can therefore be decided empirically. The course of scientific history factually confirms the ontic thesis and refutes the methodological one.

Objections to the razor

Over history, and especially in recent years, several objections to Occam’s razor have been raised (see the Wikipedia entry mentioned above). With the preliminaries in place, we are ready to examine them one by one.

Distorting scientific considerations

The first objection is that the razor introduces extraneous considerations into science. Science should seek truth, while the razor injects simplicity, parsimony, and elegance—alien considerations. This objection effectively targets the ontic reading but could accept the razor as methodological (why adopt a complex theory when a simpler one does the job?).

Even the ontic reading, however, isn’t genuinely touched by this claim. First, there are philosophies of science that don’t see science as a truth-seeking discipline but rather as one that seeks simple and elegant descriptions of the facts. On that view (which I have, following Ze’ev Bechler, called “actualism”), Occam’s razor is the very essence of science and perfectly at home in it. But I am, of course, not an actualist (the graph argument above was originally offered against actualism). Second, on the ontic reading, the razor is a criterion of truth, not merely a methodological rule; it is therefore only fitting to use it in the pursuit of scientific truth. One might argue that it is not an empirical tool but a metaphysical-philosophical one and thus has no place in scientific methodology. But by the same token one could say that about countless other scientific assumptions that don’t arise from observation—causality, induction, the assumption that nature is time-invariant, and many more. So even when we frame the dilemma—between the two interpretations of the razor—this objection falls in the gap between the horns; it fails to land a blow on either.

The main problem with this objection is that it misses the point of Occam’s razor and therefore fails against either interpretation. As explained, the razor is invoked to decide between two alternatives that both explain all the known facts. Once that’s the situation, we have no purely intra-scientific criterion with which to decide between them. If an empirical decision were available, we would not need the razor at all.

Above I offered an empirical demonstration that the razor works: using it yields better results than a random shot in the dark. That shows that there is no distortion of scientific considerations here, but rather an additional meta-scientific consideration—one among many—that helps us reach scientific truth.

Time dependence

This objection says that using the razor leads to time-dependence in scientific theory. At any given time, what counts as simple can shift in light of the information accumulated up to then, and so the theory deemed simple may also change.

Again, this reflects a misunderstanding of science as well as of the razor. First, scientific theories do change over time. When new facts are discovered, a theory can change or at least be updated. How is that different from any other scientific matter? Clearly, as our knowledge advances, our theory may change—whether we take an actualist stance (theories are claims about us rather than the world) or an information-realist stance (they are claims about the world).

One might refine the objection and argue that our concept of simplicity can shift not because of newly accumulated facts but due to cultural and philosophical currents, and so the theory changes regardless of facts and observation. But that returns us to the previous objection, where I noted that every scientific theory rests on meta-empirical assumptions; the razor is no different. This brings us to the next objection.

The vagueness of “simplicity” and “elegance”

A major objection to the razor is that notions like simplicity and elegance are relative and vague. They can vary between people, eras, or cultures. What’s simple and elegant to me may be crude and very far from simple to you.

That is indeed correct, but it needs qualification. First, there are cases in which simplicity is well-defined. For example, in the illustration above, the straight line is simpler than any other curve because it can be described with fewer parameters (a straight line uses only two parameters). This echoes the original formulation that spoke of the number of entities; there, too, the criterion seems well-defined.

Many argue that counting entities is illusory. Is a train a single thing or many? It has seats, cars, a locomotive, restrooms, passengers, an engine, and so on. What about a car? Or a stone? A stone is made of atoms, each of which is itself complex in terms of elementary particles. Not to mention a person—or even a plant or an animal—organisms of staggering complexity that are hard to regard as simple. Is saying that one person caused phenomenon X simpler than saying it was caused by a collection of other things like stones, clouds, and wind? In my view, this argument is mistaken. When we say a person caused something, we mean the person as an organic whole. In that sense, there is a single cause—even if the person is internally complex. That differs from a non-organic combination of other causes which, even if individually far simpler than a human, still make for a more complex and less elegant explanation.

One could still argue that this very distinction is not scientific but cultural and conceptual. Here are several replies: (a) There are mathematically grounded measures for such distinctions. Entropy is a measure of complexity with concrete expressions in the laws of nature (especially thermodynamics), which makes it hard to see it as purely subjective or culture-bound. (b) Even if simplicity were a vague and relative notion, at least as a methodological tool it is very reasonable to use it. We have nothing better for adjudicating between two theories that both explain all the facts. (c) My claim about an intuitive basis says that parsimony and simplicity are not subjective: we draw them from a kind of (not necessarily sensory) observation of the world. I therefore reject the premise that these notions are merely subjective. True, there are disagreements about them; but in any such disagreement, one side is right and the other wrong—and in some cases science can even decide which.

The same goes for theories that use abstract (theoretical) notions such as energy, potential, wavefunction, or force, which might also be hard to count unequivocally. Here, too, I answer that if a notion functions holistically/organically, then for our purposes it is one notion.

As to this objection as well, I can only refer back to the demonstration I gave above: the razor works. For our purposes, the meaning of that demonstration is that our notions of simplicity and elegance are apparently not purely subjective—even if we lack crisp metrics and objective validation. The success of science is the best validation there is. The successes of science show that our simplicity notions are (not with certainty, of course) broadly correct rather than mere subjectivity.

Conservatism

The objection from conservatism can be formulated in two ways: (a) the razor is false because it expresses conservatism; (b) the razor is dangerous because it leads to conservatism.

The first claim is that the razor is merely an expression of conservatism. Scientists facing two equivalent theories will choose the one that fits the current style of thought. That’s what they call simplicity and elegance.

But this claim, too, is wrong. First, a scientist always uses the knowledge accumulated so far, and there is nothing wrong with that. On the contrary, that is what it means to accumulate knowledge: that it serves as a basis for further thinking. This claim is merely another manifestation of what we already saw: our notions of simplicity and elegance are not subjective. They are the product of accumulated knowledge and intuitive cognition and, as such, are fully legitimate scientific tools. True, they are not infallible, but current knowledge and theory are our best starting point at any given moment; there is no reason to ignore them in the name of some illusory objectivity.

The second claim does not attack the razor itself but points to a danger in using it: it may justify conservatism and inertia. Over-reliance on the razor could lead scientists to dismiss revolutionary theses out of hand without due consideration. Einstein’s attitude to quantum theory (he refused to accept it to his dying day, despite being one of its fathers) is a good example.

This is an important warning and certainly deserves attention. But it should not prevent us from using the razor—only from overusing it. What counts as overuse? At first glance, if the theories are equivalent (each explains all the facts), the decision between them depends on simplicity and elegance. When is it wrong to use them? The description “a theory that explains all the facts” is naïve and simplistic. In the overwhelming majority of cases, that is not the situation: a theory explains many facts, but there are always open questions requiring further research. Contrary to Karl Popper’s description, we do not discard a scientific theory because a single experiment contradicts it. Sometimes we place such issues in the “needs investigation” box. A theory that explains many facts and strikes us as highly reasonable has strength and weight of its own. Abandoning it requires several very significant empirical falsifications. Many philosophers of science have noted that this is not mere conservatism; such conservatism is crucial to the scientific process. Without it, we would flit between theories whenever some experiment failed.[1]

Example: What is rational thinking?

I’ve described a case several times that nicely illustrates this fallacy (see, e.g., column 267). When I studied at the Gush Etzion yeshiva, there was a student two years my senior (now a well-known figure) who fell ill with severe jaundice and did not recover for several months. A mutual friend told me what he had seen when visiting him in the hospital. A certain “witch” was brought with pigeons; she placed them, one after the other, on his navel. Each in turn died after a few seconds, then another was placed and died, and so on. After a few days he recovered and returned to the yeshiva. This was a known phenomenon at the time; I believe that in later years its mechanism was understood and it became clear why the pigeons do not cure jaundice. But that’s not the point. When I returned home and told my parents this story, they clapped their hands in dismay at the intellectual darkness and irrationality the ossified yeshiva was instilling in me (note: this was Gush Etzion, not Toldot Avraham Yitzhak). I told them that their “rational” approach didn’t strike me as rational at all.

Rationality does not mean denying facts that seem implausible. If we have reliable testimony about them, then they are probably correct (until proven otherwise). A rational person doesn’t stop at accepting the facts but tries to seek an explanation for them—either in terms of existing knowledge or by expanding knowledge into new horizons. Denying facts is not rationality but sheer conservatism. If all the great scientists had behaved in that “rational” way, we would still have the science of primordial man (since every fact that contradicted it would have been summarily dismissed).

I should add that skepticism about such facts is healthy. Facts that contradict established scientific knowledge are rare, and the more comprehensive and well-founded our knowledge, the less we expect to encounter such facts. Therefore, when such facts reach us, it is indeed important to make sure no mistake has occurred and that the facts are reliable. But once we have checked and concluded they are, the rational path is to accept them and think further about their meaning.

Further cautions

We have seen that using the razor is both necessary and justified, but that it comes with risks we should beware of. Beyond conservatism and irrationality, one common risk is simplistic, catch-all explanations. Sometimes people think that if they have offered a general, simplistic explanation, then it is also simpler—and therefore more correct. For example, I have often heard the claim that God is the most economical explanation for the world’s existence and therefore also the most scientifically correct. The theistic explanation posits a single being; seemingly the most elegant explanation possible.

Such an explanation, however, suffers from two problems, both typical of misuses of the razor—and they are connected: first, it is not open to falsification; second, it is so general that it doesn’t say much.

An unfalsifiable explanation will always fit all the facts, by definition. It’s easy to invent and impossible to refute—but that is precisely its weakness. It’s no accident that the razor is typically used in scientific contexts, where we can test our decisions with future experiments. Consider, for example, Israel’s process of redemption: it has ups and downs. One explanation is that this is the way of the world; there are various forces working in different directions—sometimes more success, sometimes less. Another pins everything on divine providence: it raises and lowers, but always toward a predetermined goal. The second explanation is, superficially, simpler—yet its weakness is that it is unfalsifiable. What could possibly happen that would make us abandon it? Nothing. Beyond that, it is too general; anything that happens fits it, so it doesn’t really explain. What explains everything explains nothing.

I have often heard people claim to find in the Zohar, the Torah, the Maharal, or Rav Kook, evolution, quantum theory, relativity, and so on. In the rare cases where I checked the sources, I found a flimsy remark that, at best, bears some resemblance to the scientific theory in question, but doesn’t actually say anything concrete and certainly doesn’t yield predictions or quantitative principles. Even in cases where one could indeed find something of the idea in question (usually not the case), scattering broad generalities is not an explanation. The tell is that everything that happens will fit those “predictions.” I’m sure that if and when relativity or evolution (which is not falsifiable) are refuted, those same quoters will find brilliant exegesis showing that their source predicted that too. I’m sure that in the past, people found in Torah or Talmud the scientific theories of their time; the question of what they would say today, now that those theories have been shown incorrect, remains open. And again: theory of everything is a theory of nothing.

I’m sure there will be comments about my own uses of the razor, especially in theological contexts that are not falsifiable. I only suggest thinking before asking, since in my assessment most such remarks will be mistaken. For the public good, I will clarify only this: I did not say that the razor should never be used outside science and in unfalsifiable contexts. I said that when there is a competition between two explanations, it is important to factor in, in deciding between them, that one of them is unfalsifiable and overly general.

Now, to the summaries.

The prosecution’s summation: “Wikipedia”

In the Wikipedia entry mentioned above, all the objections presented here are cited uncritically. For some reason there isn’t a single argument there in defense of the razor, nor any word about the problems with those refutations. You won’t be surprised to read the concluding paragraph from that entry:

One should not treat Occam’s razor as a rule or a law but only as a pragmatic recommendation. If we treat it as a rule, we will use it to choose a certain theory because it is simpler. According to Karl Popper, if one day we find an observation that does not match that theory, the theory is immediately refuted and rejected. But, according to the same Popper, that would immediately refute and reject Occam’s razor as well, by whose lights we chose the now-refuted theory in the first place.

A common mistake is to claim that Occam’s razor is supposed to provide a tool for choosing between true and false theories, and that is not so. The razor helps us choose, from among different true theories, the theory that is simpler in terms of explanation, parsimony, and content, and to use it to proceed with the scientific enterprise. But what about the other theories, those that were rejected? They too are (at that time) correct! Why reject correct theories merely because they are more complex and complicated to understand? Many argue that a scientist’s task is to reject false theories, not complicated ones.

Galileo Galilei mocked Occam’s razor by saying that we should discard all the science books and choose only the letters of the alphabet, since with them one can explain anything and they are simpler than the science books.

Still, one should not entirely reject Occam’s razor. It is useful in areas such as the didactic domain—it is easier to explain and teach theories and concepts in a simpler way than in a complicated way. Choosing a simpler theory can also reduce implementation costs compared to a more complicated one.

I’m pleased by the indulgent tone that allows us to make methodological use of this ancient, primitive principle. We discover that, in the learned author’s eyes, it is forgivable—sometimes even tolerable. I hope it now needs no further explanation why this is a collection of nonsense reflecting a deep misunderstanding of Occam’s razor and of science in general.

The defense’s summation: using the razor

We have seen that the use of the razor is justified and well-grounded in scientific practice. True, care is needed in using it, but it gains strong support from the history of science. We have also seen that all the objections rest on misunderstandings. The conclusion is that the razor is indeed useful—and more than that: it is a tool for grasping ontic truth, not merely a methodological device.

Originally, the razor served William of Ockham in philosophical contexts (many use it to argue for the existence of God). But it is generally used in scientific contexts, and not by accident I focused on those. I will add that the razor is the sole foundation of the non-deductive logic we developed, which underlies “soft” (non-deductive) inferences—the inferences of science and law—as opposed to those of mathematics and formal logic. This is not the place to go into it; it can be found in our two articles in BD”D (Part A and Part B), in the first volume of the Talmudic Logic series, and in a more popular and concise form in part six of my book Truth, Not Stability.

Beyond that, we use the razor at every turn in daily life. We constantly draw conclusions from what we experience and encounter, usually choosing one conclusion from among several possibilities. So there, too, we choose the simplest and most elegant conclusion under the circumstances. If the station is empty, we assume the bus has already passed; if it’s dark, we assume evening has fallen; if someone tells us the time, we assume he’s telling the truth; and so on. Skeptics can always raise alternative possibilities, and unseasoned defenders will tend to retreat to methodological language. If you ask someone why he believes the person who told him the time, or why he assumes the bus has passed, he won’t understand what you want. It’s self-evident to him. If you press and ask who told him he’s right—after all, there are alternative explanations—he will have to say that, methodologically speaking, this is the reasonable choice even if it isn’t necessarily correct. But that’s post-hoc rationalization, not a description of what he actually thinks. To him it’s obvious that it’s correct; lacking a rational account, he escapes to methodological justifications. That is exactly what happens to scientists and philosophers with Occam’s razor. Instead of admitting that we have an intuitive capacity to grasp the right generalizations with decent probability—something that sounds a bit mystical and not very scientific—they retreat to methodological justifications.

[1] In a certain sense, this claim is equivalent to what is called (in machine learning and elsewhere) overfitting (see here). Excessive fit to the facts is known to be a defect in a theory (usually signaling experimenter manipulation or a programmer’s misunderstanding in machine learning). See, from a different angle, chapter six of my book The Science of Freedom.

Discussion

Ro (2021-11-07)

Excellent post! I enjoyed it 🙂

Michi (2021-11-07)

Thanks.

Michi (2021-11-07)

Now I see that the graph is missing. I’ll ask Oren to add it.

Y.D. (2021-11-07)

Now all that’s left is to edit the Wikipedia entry

Avishai (2021-11-08)

I really liked it!
But it seems to me that someone with a methodological position would use your example as a refutation of your view—the number of correct predictions in the history of science that stemmed from incorrect theories (like the example of Newtonian mechanics) is enormous, which shows that a straight line may help generate predictions but not find the more correct theory.
If science is entirely a method for supplying predictions—then there is indeed a “substantive” justification for using the razor, but if the goal of science is to get at the truth—the evidence you bring from the number of predictions is only a justification that this is a successful method for producing predictions, and therefore, although experience shows there is a good chance that later the theory will be proved untrue, for now it is still very reasonable to hold on to it.

Avishai (2021-11-08)

In other words—the razor helps find a theory that will be easy to confirm, but it does not help find the theory that will not be refuted. In most cases it actually will be refuted in the future, and therefore it is proven to be a useful but incorrect tool

Michi (2021-11-08)

That is an absurd claim. First, you assume that mistaken theories yield no fewer predictions than correct theories. That is of course ridiculous, since on that basis it is unclear why they were replaced. Beyond that, the theories that arose and were tested are indeed correct within certain limits even in light of the more revised theory (as with Newtonian mechanics in light of quantum theory and relativity). And that is exactly my claim: in this way we advance toward the truth. On your view, by now we should have had a collection of theories that were not refuted and have no connection to the truth. I would not get on an airplane on the basis of a theory about which all I can say is that it has not been refuted. There are millions like that.

Tirgitz (2021-11-08)

Thinking aloud

A. It seems there is a kind of begging the question here. You take (for illustration) the set of all theories humanity has conceived up to now, and suppose we see that the simpler theory was always right. From this you infer a meta-theory (that is, a theory about scientific theories) that a simple theory has an ontic advantage. But in fact you have assumed the razor here, and found the simplest explanation for why in the past and present the simpler theories have been more successful. The razor hypothesis about all theories is like the straight-line hypothesis about a single theory. Since there are still infinitely many other meta-theories that would generate exactly humanity’s successful theories over the data sets, we still have no justification for choosing the razor meta-theory over any other meta-theory.

B. In your doctrine you hang the matter on cognition by the eye of the intellect.
B1. A scientist can see only the data and nothing from the experiment itself. For example, presumably today the theoretician holds a list of numerical results obtained by various experimenters in different experiments under different conditions, and on that basis conceives a theory. He is not looking at the world itself but at numerical records in human language written on paper and encoded in a human way. So on your view, through that paper the theoretician apprehends the general law? Seemingly cognition requires information that comes directly from the world, not processed and encoded information.
B2. If this is a matter of cognition, then how did Newton, for example, arrive at his own mistaken theory? Even if numerically the results come out very close, clearly cognition of the theory and the objects does not concern numerical prediction but the entities themselves. Thus Newton of course did not apprehend a law that does not exist at all (F=ma), and if so the question returns to its place: how did the razor principle help him predict?

Michi (2021-11-08)

A. You are right in principle, and I think I noted this in Appendix B. In my view, if a theory works then it is also true. The actualists think it may only work and not be true. So on their view, at the meta-level they must conclude methodologically that the theories are true. They too agree that it is right to rely on statistics, and the statistics say that the theories are true. In other words, someone who insists even at this level falls back into ordinary skepticism, and there is nothing to be done against skeptics.
B1. The numbers evoke the situation for him. Suppose a scientist sees a numerical report on the relation between force and acceleration. He understands what force is and what acceleration is, and understands that there is a relation between them. The numbers only instruct him what exactly the quantitative relation is (the proportionality coefficient is the mass) and the form of it (a straight line).
B2. Newton’s theory is also substantively true. It is not just a successful approximation by chance (on the basis of a theory that is ontically mistaken). There really is a gravitational force acting between two masses. Its quantitative description changes slightly in extreme circumstances. Describing the force as a curvature of space is just a different form of description of the matter.
The same ontic reality can be described in several ways, and there may be differences in their precision.

Tirgitz (2021-11-08)

A. I understand that what you are saying is that the proof from the whole history of science is stronger than a proof from a single scientific theory in which a scientist predicted a simple theory and it turned out that he was actually fairly close to the truth. But I still haven’t understood the point of difference. What can an actualist argue against a proof from a single scientific theory that he cannot argue against the proof from the whole history of science? Is looking at the whole history of science intended only to strengthen the statistics (from zero for all practical purposes toward a very, very zero zero for all practical purposes)?

Michi (2021-11-08)

It is not a quantitative difference. The actualist argues that a theory that works is not true as a description of the world. But he accepts the methodological assumptions of science, according to which a theory should work and statistical confirmation (like generalizations) is acceptable, etc. Except that in his view all this is said only on the methodological plane.
Now I ask a meta-theoretical question: are the theories true, or do they only work? Ostensibly this is only a philosophical question, and therefore the debate about it remains in place and cannot be decided. But my argument—and this is what is unique about it—offers this question a statistical-empirical (scientific) answer and not merely a philosophical one. And after all, the actualist accepts statistics as a decisive tool, at least on the scientific-methodological plane. If so, then at least methodologically he ought to adopt the conception that the theories are true. This admittedly sounds absurd, but that is what comes out—apparently because his very doctrine is absurd (for what works is probably true. It is not plausible that this is merely a collection of miracles. It turns out that one who clings too tightly to facts and rejects speculation must adopt a miraculous view).

Tirgitz (2021-11-08)

You explain here that even one successful prediction is a refutation of the naive actualist who, even methodologically, does not accept that the theory is true but only that it works. But I still haven’t understood what the look at the set of all scientific theories adds to this (where the razor is accepted empirically as a meta-theory explaining the theories), more than your look at a single theory (where a theory that used the razor is accepted empirically as a theory explaining the findings).

B. Regarding cognition, I still haven’t understood. Are you saying that if a scientist were given a graph like the one you presented between two variables, and were not told their meaning, then he would no longer recognize the simplest theory as ontically describing the data and predicting what comes next? If he would still propose that simple theory, that would mean the razor is not a cognitive principle at all.

Avishai (2021-11-08)

1. Incorrect theories can, numerically speaking, give no fewer correct predictions. The reason a theory is replaced is not because we have a theory that gives more predictions, but because the first theory was refuted. That is, even if one theory is confirmed a thousand times, if it is convincingly refuted, it will be replaced by a theory that has been confirmed less but not refuted.

2. I get on a plane not because I am sure that the physics of the scientists who built it will not be refuted, but because I am convinced it is at least a good approximation to the truth. As I wrote—a useful but incorrect tool. On the question of what works, the razor is a good tool, for choosing what is the best approximation (the straight line is the shortest distance between the two observations). On the question of what will always work—(will not be refuted)—there is no advantage to simplicity of theory. And as time passes it does not seem that scientific theories are becoming simpler, but rather more complex.

Michi (2021-11-09)

A. I don’t know what is unclear about what I wrote and repeated. The difference is not quantitative (one theory versus many) but in the nature of the question. When discussing one theory, we are talking about the question whether it works. When discussing all theories, or the outlook that leads to building theories, we are talking about the question whether we have such a faculty or not—that is, whether theories describe reality or only work. That is the test question, and that is what I tried to answer, not a scientific question as in the case of a single theory. Looking at all theories speaks about us; looking at one theory speaks about the world, or more precisely about science.

B. He will recognize that the straight line is the correct relation. But of course he will not be able to understand which theoretical concepts underlie that straight line. Presumably the level of cognition of reality stands in direct proportion to the level of encounter and acquaintance with the empirical materials.

HaPosek HaAcharon (2021-11-09)

Occam’s razor is just nonsense.
Occam’s electric saw principle is stronger: keep yourself from getting entangled, preserve simplicity, don’t learn anything.

If there are 2 theories that explain the same phenomena, that means both theories are possible.
What determines whether one theory is better than another is not its simplicity but its ability to surprise us with its predictions. And that should be the guiding principle.

yoav (2021-11-10)

I didn’t understand the graph example.
The reason we assume in advance that an experiment will support the straight line is not because of Occam’s razor but because the straight line has a theory that explains it, whereas the curved line has no explanation at all.
If both lines had plausible theories that explained them, then we would not be able to guess the results of the experiment in advance.

Michi (2021-11-10)

Absolutely not true. Here too there is no explanation, and yet one still assumes a straight line; and in general, whenever there is such a relation, a straight line is preferred regardless of explanations.

yoav (2021-11-10)

As far as I know, the purpose of experiments is to distinguish between hypotheses.
With respect to the straight line we have a hypothesis; with respect to a curved line we do not, since it is equivalent to infinitely many other curved lines, so what good would an experiment do?

Michi (2021-11-10)

There is no point to this whole discussion. Take a graph like the one shown above and show it to some scientist. Erase the axis labels, and don’t tell him what the X-axis is and what the Y-axis is or what was measured here.
I’d be happy to hear if there is even one scientist who would not choose the straight line. I assume you won’t find even one.

HaPosek HaAcharon (2021-11-10)

The main reason is that there aren’t enough points to reduce the uncertainty.
Choosing a non-straight line adds more uncertainties to the additional variables.

If there were enough points to reduce the noise and the uncertainty, no one would choose a straight line.

mozer (2021-11-10)

The razor of our Rabbi Moses son of Maimon—Guide of the Perplexed, Part Two, chapter 11:
“Since the aim of this science (astronomy) is to posit a configuration (an astronomical model) with which the motion of this star can be [explained]
… and what follows from this motion will accord with observation.
Yet one seeks to minimize the motions and the spheres as much as possible.
If it is possible for us to posit a configuration according to which the observed motions are explained by means of three spheres (circular motion),
and another configuration according to which the very same thing is possible by means of four spheres—
it is fitting for us to rely on the configuration in which the number of motions is smaller.”

There is no explanation why “it is fitting for us to rely” on it.

HaPosek HaAcharon (2021-11-10)

Occam was born 83 years after Maimonides died.

Another translation:
“Know that these astronomical matters that have been mentioned—if a person reads and understands them in a merely studied way—will think that this is conclusive proof that the form of the spheres and their number are indeed so. But this is not the case, nor is it the aim of the science of astronomy. Rather, some of its matters are proven to be so, as it has been proved that the path of the sun is inclined from the equator, and this is something about which there is no doubt. But whether it has an eccentric sphere or an epicycle—this has not been proven. And this is something to which the astronomer pays no attention, for the aim of this science is to posit an astronomical configuration by which the star’s motion can be one circular motion, with neither speed nor slowness nor change, [184] and the outcome of that motion will correspond to what is seen.

And together with this he strives *2 to minimize the motions and the number of spheres as much as possible, because if, for example, we could posit a configuration by which what appears from the movement of this star would be realized by three spheres, and another configuration by which that same thing would be realized by four spheres, then it is preferable that we rely on the configuration with the fewer motions 3. Therefore, with regard to the sun, we chose eccentricity rather than positing an epicycle, as Ptolemy mentioned 4.
”

*2. Here too Rabbi Qafih wrote that our Rabbi corrected in his translation: “and together with this he intends.” And he was right; the meaning is that he strives for this and makes it his aim.

HaPosek HaAcharon (2021-11-10)

It seems that the logic behind this requirement appears later on with the parable of the man who has surplus wealth… (surplus/preferable)

And it seems that the description with fewer rotations is preferable because ostensibly God gave created beings what they need, no more and no less. And if 3 are enough, then there is no reason to add additional tools.

So the principle is then based on a kind of perfection with minimal waste of energy. Or a principle of minimal motion (Hamilton’s principle).

K (2021-11-10)

1. In a naturalistic world, which developed purposelessly in an evolutionary way, can one speak in terms of justification for the principle?
At most one can explain why psychologically, and therefore methodologically, we accept it. But can one speak of ontic justification?
2. Is the argument that Occam’s razor works and that the chance of that is negligible, evidence in favor of the principle?
Because it seems that this succeeds in confirming it only for one who indeed assumes in advance that the principle is correct, since even the findings of the test are analyzed within a conceptual framework of the principle that the simpler explanation is also the correct one. So this is not a sequence of mere flukes.
3. You did not address the objection that this depends on time—not in the sense of the amount of knowledge available to us, but that the principle is applied within the framework of a world that depends on time. (In the sense in which the laws of nature are “constant” in time.)
For example, the very claim that the hypothesis that the bus passed by is preferable, even though there is no probability at all that we can think that within the framework of the world as it is open to human beings, the simplest event is also the correct one.

Michi (2021-11-10)

1. In a materialist world there is no justification for anything. I have written about this more than once. In such a world there is no judgment and we are a mechanical machine. Whatever comes out comes out; there is no issue of justifications.
2. See my reply to Tirgitz, who spoke about begging the question.
3. I didn’t understand the question. I did address dependence on time.

K (2021-11-10)

Thank you very much, I’ll look there.
3. Indeed that was not clear enough. I’ll try another way.
There are two essential parts regarding Occam’s razor.
1. The laws of nature as a kind of source and realization of “pure laws.”
2. The multifaceted human world in which we live.
So while regarding the first part one can quite reasonably accept a belief that the simple description and explanation is also the correct one, (a theological assumption and proof that we are capable of understanding the world is enough to justify this),
it seems that in the second world, the world open to human beings—beings with free desires and judgment, a world built from so many factors/subfactors and different shades, where every tiny thing has major effects—it already seems that the assumption regarding the correctness of the simple description may even be arbitrary.
Because even if we assume that the framework of the laws of nature is simple and deterministic, this really does not require that the human world be simple as well. On the contrary, it changes greatly and is subject to time.
Another aspect from which this can be seen is our inability to predict the future, and from our perspective perhaps it is even something of a chaotic character.
But if so, how can Occam’s razor be justified even on the plane of “our” world, open to human beings and changing in time.

There is no need to repeat the last paragraph of the post and show that Occam’s razor is constantly behind our inferences regarding this human plane as well—whether in daily life, walking and encountering an empty bus stop, or in the legal domain of strong evidentiary presumptions, and so on and so on.

Michi (2021-11-10)

I don’t understand the question. Do you mean to ask whether simplicity is not necessarily true? I explained that. There is evidence that it works.

K (2021-11-11)

What I meant to ask is how one can assume that simplicity is true within the framework of human culture—not within the framework of the laws of nature, where indeed it is understandable that if one assumes that God has implanted in us the ability to understand the world, there is no obstacle to our being able to infer the laws of nature with some implanted basic assumptions.

But human reality is complex by its very nature, because it is open to free desires, to myriads of independent factors, and is not deterministic at all.

From ‘Ockham’ to ‘Occam’ – an application of the razor? (2021-11-11)

It seems that William of Ockham applied his simplifying “razor” even to the name of his birthplace, whose name is Okham, in the manner of place names in England ending in -ham, such as “Nottingham” and “Birmingham” and the like.

However, in his philosophical writing William refers to his birthplace in a simplified and reduced form: Ockham became Occam, skipping the h. As the poet said: the landscape of his homeland is the mold of the man 🙂

With blessings of simplicity, Frastic of Peshitik

Between ‘Occam’ and depth (2021-11-11)

If we take as an example Newton’s simple mechanics versus the more complex mechanics of the theory of relativity—it is not correct to say that the simple theory is “refuted.” At low speeds Newtonian mechanics is astonishingly accurate, and it is what we use in our ordinary lives. Only when one reaches orders of magnitude approaching the speed of light does the complex mechanics of “the theory of relativity” come into expression.

There are visible and simple layers in the world and in life, and there are deep and hidden layers. In the simple layers, Ockham’s simplifying razor assists us, but beneath the surface, with study and inquiry, deeper and more complex layers are revealed—depth within depth. On the level of plain meaning the Razor helps us—but beyond it there are wondrous worlds of “mystery” and “light.”

Regards, Nehorai Shraga Agami-Psisovitz

Michi (2021-11-11)

I’ve given up. Chinese.

Corrections (2021-11-11)

Paragraph 1 line 2
… at low speeds the mechanics of…

There, line 4
… then the complex mechanics comes to expression…

Melafefon (2021-11-11)

Ostensibly, the very learned discussion of the principle contradicts the principle

Daniel (2021-11-12)

This question is dealt with a lot in machine learning, in what is called “model selection,” which is actually very similar to the example here of the graph—whether to choose a description of a line or of a higher-degree polynomial.
There is a principle called “no free lunch” which, simply put, says that if one does not assume a priori assumptions (that do not depend on the information) about the function that is supposed to describe the information, then the information cannot teach us anything. If a function can be random (even one that cannot be described simply as a mapping from x to y), then any information we received does not teach us anything about things we have not seen.
Only when we assume something a priori about a restricted family of functions that can describe the information can the information guide us, from within that family of functions, as to which function is best suited to the information.

There is a measure called the “VC dimension” of the family of functions, which defines the “size” of the family of functions. The more I enlarge the family of functions from which I choose, the less certain I can be in my conclusions, because the fit of the theory I selected to the data may be the result of the fact that there are so many permitted theories that obviously one of them will fit the data. So if I choose a linear description, that is a small family of functions, and therefore if the line really fits the information I can be fairly confident in my conclusions. If I choose a high-degree polynomial (and I don’t have much data), then I can be less sure of my conclusions. So although the high-degree polynomial also fits the information, the bound on my error for information I have not seen will be high.
There is also a Bayesian approach that asks what the probability of the family of functions is in light of the information, and this depends on the probability of the information given the family of functions. That is, the probability that the data would lie on one line when one assumes the function is a line is high. By contrast, if I assume the function is a high-degree polynomial, then the probability that I would see only data lying on one line is low. Therefore the line should be preferred.

Rabbi Akiva’s razor? (2021-11-12)

With God’s help, 8 Kislev 5782

It would be interesting to examine whether there are parallels to the idea of “Occam’s razor” in Judaism. Perhaps the rule that “a verse does not depart from its plain meaning” goes partly in a similar direction—that the simpler explanation in the language of the text is true. However, unlike “Occam’s razor,” which accepts only the simpler explanation and rejects the more complicated one—in the interpretation of the Torah, the “plain meaning of Scripture” is not the only explanation, but exists alongside the sages’ midrash, and the plain meaning and the homiletical interpretation are complementary facets of understanding the Torah.

The assumption that simplicity and “economy” are preferable to prolixity seemingly underlies Rabbi Akiva’s method of study, for he held that the Torah’s wording ought to have been the shortest and most “economical” possible. Accordingly, Rabbi Akiva maintained that every apparent excess of words or letters, and even the “crowns of the letters,” was not inserted merely to beautify the language (as Rabbi Ishmael held, that “the Torah speaks in human language”), but that every surplus expression comes to teach “great halakhot.”

Regards, Nasaf

And Hillel’s convert (2021-11-12)

And perhaps also the convert who said to Hillel, “Convert me on condition that you teach me the whole Torah while I stand on one foot,” proceeded from a foundational assumption similar to Occam’s—that one should seek an explanation that explains all the “phenomena” of the Torah on the basis of one guiding principle, an assumption Hillel also accepts when he proposes that the guiding principle of the whole Torah is awareness of the other.

Regards, Nasaf

Michi (2021-11-12)

Many thanks. The NFL assumption is exactly the basis for what I said here. (You are of course also assuming that the results you saw are uniformly distributed, meaning that this is a representative sample; otherwise, even in a family of exponential functions it is possible that the results before you fall on a single line.)
Right now I’m in the middle of a Coursera course (from Stanford) on machine learning. Interesting.

Adam (2021-11-12)

Rabbi Michael puts the common denominator on Occam’s razor, and apparently that is the accepted understanding

K (2021-11-12)

Let’s try again, this time with clearer assumptions 🙂 ,
How can one rely on the principle that one must not multiply entities in a world that is not deterministic (but includes beings with free choice)?
Because:
The a priori probability of a hypothesis explaining the event depends on the number of entities. (The simpler it is, the more probable you assume it to be.)
But the number of entities depends directly on their will.
And their will is not subject to the principle.

If so, there is no reason to assume that the principle applies to explanations in a world rich in non-deterministic entities, aside from all the world’s complexity anyway.

I assume you will object to assumption 3, but that contradicts the concept of free choice.

Michi (2021-11-12)

Still in despair. Turkish.

Binyah Yitzhak Koren (2021-11-13)

Daniel—thank you. You saved me a response ⁦:-)⁩

In general, it is disappointing to me how little concepts from statistics and machine learning occupy a place in philosophical discourse, even though they are the closest thing we have to “epistemology engineering.” I’m glad Prof. Michi chose to study this, and I hope a lot of good philosophy comes out of it

. (2021-11-14)

I join this as well.

HaPosek HaAcharon (2021-11-14)

Philosophy is the language of clever naive people. People who do not really understand that the world is more complex than the collection of words they mumble.

And the implication for privatizing kashrut (2021-11-16)

It seems that according to Occam, kashrut should not be privatized, for it is preferable to minimize the entities that grant kashrut 🙂

Regards, Ray Golator

Or perhaps the opposite? (2021-11-17)

On the contrary, Occam’s razor requires the kashrut reform,

For until now, the body determining the certification of kashrut providers has been the “Council of the Chief Rabbinate,” which has 15 members and is elected by a 150-member “electoral body,” whereas according to the reform the final decisor will be the “appointee” on behalf of the Minister of Religious Affairs. Thus there will be only one entity according to whose ruling all matters of kashrut will be decided: the Minister of Religious Affairs, may he live long.

Regards, Gillette Glakhovsky

← Column 425ContentsColumn 427 →

Discussion

שתף

Leave a ReplyCancel reply