Man and Machine—May the Lord Save Us: B. Neural Networks (Column 695)
With God’s help
Disclaimer: This post was translated from Hebrew using AI (ChatGPT 5 Thinking), so there may be inaccuracies or nuances lost. If something seems unclear, please refer to the Hebrew original or contact us for clarification.
In the previous column I discussed what a computer is and whether it can be likened to a human being. The discussion focused on a digital computer and on classical computing programs. The new generation of artificial intelligence works in completely different ways, and as we shall see, it is no accident that the question of the differences between a computer and a human resurfaces in full force. As of today, Turing tests are far behind us, but as we shall see here, we still remain unconvinced (see the summary at the end of the column on Turing tests and their broader “halachot”). Again, I need to give some preliminaries that describe the technology itself—at least schematically and in simplified fashion—so we understand what we’re talking about. I hope I haven’t erred (I am not an expert), and in any case I ask the experts’ pardon for resorting to basics.
New-generation machines: Modern Artificial Intelligence
In the past few decades a new concept entered the world of computing that operates very differently from the style of computation described thus far. The original idea is based on a combination of insights about the structure of our brain and results from models of networks in statistical physics (this is how I was first introduced to the whole topic). It turns out one can build a computing machine that works in a very strange and indirect way, not by straightforward execution as described in the previous column. I’ll give a simplified, schematic description whose aim is only to convey the core idea and clarify—especially for the uninitiated—what is happening in and behind this marvel that has dominated headlines over the past two years.
AI models are built as a network of neurons, analogous to the human brain. The structure consists of vertices (sites) and edges that connect them (bonds). Each vertex corresponds to a neuron in the brain and can be in an activated state—1—or off—0. Each connection between vertices is associated with some weight, a number that can change over time (see below). The state of each neuron (on or off) is determined by some function (an activation function) that takes a weighted sum of all the connections entering it: if this weighted sum crosses a certain threshold, the neuron’s state is 1; otherwise it is 0.
Consider, as an example, a simple one-dimensional network. It consists of a chain of vertices connected by such edges—for example, in the following form:
This is a chain of five vertices, and the connections between them are depicted by the lines joining them. Each such line has a numerical weight. You can see that connections between vertices can be made in various patterns. Not every pair of vertices is connected (of course, one could draw every vertex as connected to every other and assign weight 0 to some connections). In the illustration above, the state of the third neuron, for example, is defined by the sum of the two connections from the two neurons entering it (the first and second). The state of the fifth neuron is defined by summing the connections from the second and third.[1]
In modern systems there is a series of layers of neurons, each layer being a row of neurons connected to the row before it and the row after it. They function as a series of networks where one feeds the next, and so on (in recent years it was discovered that the most efficient updating occurs when information is fed forward and there is also feedback backward). The first layer receives the input (some sequence of 0s and 1s), the last layer outputs the output (also a sequence of 0s and 1s), and in between there are several hidden layers that do the processing. Here is an example image of such a structure (from Wikipedia):
Training a neural network
Think of such a neural network designed to perform some task. To carry out the task we “train” it. The purpose of training is to shape the machine so that, when presented with a future query, it will answer correctly as much as possible (just like human training). The machine’s architecture is usually given, and so the aim of training is chiefly to set the weights of the various connections inside the machine. A machine with the right weights on each connection will generally produce the correct answers (there will always be some errors; humans also err from time to time). Clearly, the more vertices and connections the machine contains, the higher the chance of success because we have more weights (variables) to tune. Sometimes one can also adapt the network’s structure (not just the weights) specifically to the task for which it is built, but the common architecture is roughly what’s depicted above. There are several training paradigms, the best-known being supervised training. The other methods don’t substantially affect our discussion, so I’ll focus on this one because it is the most intuitive. I’ll note that even this I describe very schematically (for example, I won’t enter into the differences between pure feed-forward procedures and procedures that are cyclic—i.e., progressing forward and backward through the network).
Supervised training proceeds by known examples. Suppose I want to train a network to recognize my grandmother’s face. The goal is that if any picture of my grandmother is shown to it—against any background, at any age, and under any circumstances (remember Google’s photo libraries that can track a given person across images; they are the result of such a network)—it will know to say that it is she (i.e., output “yes”), and of course if other images are shown it will know it is not she (i.e., it will not output “yes”). To train the network for this task, I start with random weights on the connections in the network, and then begin feeding it different images—some of my grandmother in various situations, and some of other things or other people (preferably people, or even women, to improve resolution: the closer the negative examples are to the true image, the more effective the training). Such an image is represented by numbers (for example, a numeric description of pixel colors), and I feed the relevant numbers into the network (into the input layer) as input. For each input the network produces (in the output layer) a yes/no output, generated by computing the states of the various neurons as described. If I’ve fed in a photo of my grandmother and the network identified her correctly (answered “yes”), I change nothing. If it failed to identify her (answered “no”), I change the weights of the network’s connections so that the outcome for the cases seen thus far would come out correctly. Next I feed another image and repeat the process. Among these will be images that are not of my grandmother, for which the update is inverted: if the network identifies them as my grandmother I update the weights, and if not—I don’t touch them. Up to this point I’m effectively supplying the network information, since I provide feedback on whether it was right or wrong (note that I am the one who identifies the image and tells it whether it is indeed a photo of my grandmother or not). It turns out that after a sufficiently large number of examples with feedback, the network is trained for the task. In that state, it is asked to recognize new photos of my grandmother, i.e., photos it hasn’t seen before (now it must solve the problem by itself). It turns out that after training, it succeeds at this in a very impressive way.
It’s important to understand that a classical program (of the sort described in the previous column) cannot succeed at such a task. In a classical program the basic logic is conditional (if … then …). We would have to feed it every photo it needs to recognize, and the program would instruct that if the new picture before it is one of the pictures it “saw” earlier, it should answer “yes,” and if not—then “no” (you can see that, in principle, this is just a collection of conditionals: if it’s one of the pictures you saw—output “yes”). From this you can understand that for any picture it hasn’t encountered, you won’t get a correct answer. This means preparing the machine for action is not “training” but spoon-feeding. I build it so that it does exactly what I put into it, no more and no less. This is just like we saw with electric circuits in the previous column. Generalizing beyond the scope on which the program trained is a property of the new AI networks and does not exist in classical computation. It’s achieved by the training process described above. The new machine does not implement conditionals in the same sense as the old program. You build it so that it operates on its own—i.e., it “knows” how to act correctly on the tasks at hand. Unlike the old computation, the neural network also recognizes images it hasn’t seen before. Note how this is entirely analogous to training a person for certain tasks by practicing solutions to known problems (with a teacher or lecturer), so that they will be able to solve new problems of those types that they will meet in the future. No wonder the question of the relation between such a machine and a human being arises with even greater force regarding these machines.
Another task for a neural network could be to find the relation between force (F) and acceleration (a) for a body of mass m (see another angle on this in column 426). Newton’s second law states there’s a linear relation between them (F = m·a). We want to train the network so that, given a force magnitude, it will tell us what the acceleration of the body will be under that force. Once again we supply it with examples from experiments that were done (for such-and-such force, such-and-such acceleration was observed, and so for many more examples). This updates the internal weights of the network until we end up with weight values that yield the correct answer in all (or most) cases—including force values the network hasn’t encountered.
Up to here we built (or rather trained) a network for a single specific task: recognizing the face of a particular woman, or finding the relation between force and acceleration for a particular object. Now we may want to build a more intelligent network—one that performs multiple tasks of the same kind. For example, it should recognize the face of any person, not just my grandmother (i.e., given a person’s face, it can decide whether other photos depict the same person or not), or relations between any two variables, not just a particular object’s force and acceleration. We train a network exactly as we did in the old classical computer case described above. Because the task is more general and complex, the network itself presumably must be more complex (with more connections and vertices). Now we train it in the same way on many photos of many people (whenever two photos of the same person are fed in, it should answer “yes”; otherwise “no”). The assumption is that the network will reach a state in which it can recognize people’s images in general, not just my grandmother’s. I remind you of Google Photos’ feature that can track a person and recognize all their images in any background and at any age. That software was trained exactly like this. After that, one can broaden the scope and train the same network to recognize images of any objects, not only people. Later it will also be able to generate images on request (“a man riding a horse against the sunset with a two-story house in front of him”). Our machine is now carrying out a very wide range of tasks, and note that it’s the same machine. The various programs we used in the old computation are replaced here by different sets of training examples for the different tasks. After all these trainings, we have one network that can do all these tasks.
The next step is to try to create a universal neural machine, i.e., a single network that can perform all tasks—just like human beings. It should recognize my grandmother, or any other face, or any object whatsoever; it should perform mathematical calculations; it should find the relation between force and acceleration for any body; it should converse with me about any topic and supply any information I need, solve physics problems, provide psychological counseling, and more and more. In principle this seems attainable if we take a very, very complex network and feed it enormous numbers of examples of every kind of task (a set of examples for each type of task, but all of them will train the same network), and in effect “feed” it all the information that exists under the sun. In the last two or three years it has become clear that such neural networks are within our reach. The new machines manage to reach these astonishing capabilities. Thus arose OpenAI’s models, called ChatGPT, which burst onto the scene in 2022, followed by similar models from other companies, up to the Chinese DeepSeek that broke out just in recent weeks and is stirring technological and economic upheaval worldwide (see for example here and here). Even the creators of these machines did not anticipate achievements of this magnitude. They were stunned to discover that after feeding the machine monstrous quantities of information, they found themselves with machines that can do everything. They can converse with us in every domain, including heavy professional fields; they already can write programs; they can solve mathematical (without a programmer writing a program; they find the way on their own) and scientific problems; translate texts; write papers and essays at a very high level across fields; create works of art and music; recognize images; and of course retrieve accurate information across domains. Needless to say, the Turing test is trivial for them. These machines are already better than us in most areas.
Is a sophisticated neural network a person?
In principle we’ve reached a situation where we have a machine that can do everything we can do—and, in fact, a single machine that does everything humanity as a whole can do (not only what a single person can). Moreover, we have seen that we no longer feed these machines the solution to the problem as in classical computation. The machine finds the solutions to all types of problems itself and does everything by its own “decisions.” It would seem we’ve built a sort of person that learns from experience just like us (only it digests far more information than we do per unit time, and therefore knows and is more skilled than we are in most tasks). No wonder the question about the relation between such a network and human beings arises with even greater force regarding these new machines.
Note that the training described here is completely analogous to the process I described in the Chinese Room. The person sitting in the Chinese Room receives questions in Chinese and trains himself to give relevant answers in Chinese, with the aim that ultimately he will answer correctly even to questions he hasn’t encountered. The assumption is that after enough time, this person will be able to do so even without being taught anything about Chinese and the meanings of its words. One can say that the neural network in his brain is being updated via the electrical pulses he “takes” for wrong answers. This process corresponds to the feedback a neural network receives for wrong answers that update its internal connections. John Searle was ahead of his time and essentially described the training of a (human) neural network. Note that both in the case of a neural network and in the Chinese Room, the entity being trained is a mechanical entity, and there is no mental-cognitive dimension behind the communication with it. In the Chinese Room a flesh-and-blood human sits there, but in his case his cognitive dimension does not participate in the process (there is no understanding). He is, essentially, a computer. In a computer or a neural network, of course, there is never such a cognitive dimension.
The complexity and variety of tasks the machine performs, and the lack of need for an algorithm and programmer’s content-level reasoning (only for network architecture and training), do sharpen our question. But to form a view we must relate to the two features of the computer described in the previous column:
- The first feature—that it is the programmer (and not the machine) who solved the problem—does not exist here. In this case, the machine is what solves the problem. The programmer does not know the solution and does not solve the problem. Nor does he write an algorithm that solves the problem; he only feeds the machine examples and answers.
- The second feature—the absence of understanding and meaning behind the mechanical activity—of course does exist here too. This sophisticated network does not really find solutions to problems, or hold information, or think, or understand. It simply outputs the correct answer, just like the person in the Chinese Room. It knows how to pair a relevant answer to the questions it receives, but for it this is a mechanical mapping from input to output that results from mechanical computation, not from understanding. The interpretation of answers and the understanding of content are entirely on the user’s side. The machine only pushes electricity from here to there, and it is the user who grants these operations and results their sense and meaning. Everything I have described so far with regard to these networks pertains only to their capabilities (what they can do), but the mental dimensions are apparently absent in them as well; and if one sees in this difference a decisive difference between a human and a machine, then neural networks are not a person either. But now you can see why neural networks raise the question anew and more forcefully: for them only the second feature holds, not the first. They are closer to human thinking.
Returning to the discussion I conducted at the beginning of the first column: a neural network is like water. We did not create the water, nor did we put into it the capacity to “flow correctly” (i.e., to solve the Navier–Stokes equations); rather, that is embedded in its structure. That is as to the first feature. But we saw there that it is implausible to say that water solves the Navier–Stokes equations, even though it flows according to those equations and its dynamics constitute the solution to the equations in that situation. The water simply flows that way because that is its nature. The fact that this is a solution to certain equations is relevant only to us as observers. Understanding and thought arise only in us. The water does not truly solve anything; it simply does what its nature dictates. It has no judgment, no thought, and no understanding. When we observe it, we interpret the surface and the water in it as an equation of motion (Navier–Stokes) and its flow in this situation as the solution of those equations. All this occurs only in our own minds. All this is equally true of the neural network. The neural network solves nothing. It shuttles electricity. We, the observers or users, interpret this as problem-solving and as an output that is the solution.
Bottom line, in my judgment the question is not truly troubling even for a sophisticated neural network like the ones mentioned. It is clearly still a golem, not a human. The three claims I presented in the previous column in favor of comparing the computer to a human (the three vouchers and their receipts) are relevant for neural networks as well, and all are rejected here in exactly the same ways. To think that a neural network is a person is roughly like thinking a Strandbeest is a living creature (see that column). In the end we built a machine that manages to imitate human thinking. Instead of doing this by countless classical machines—or a classical machine with countless programs—we succeeded in building an imitation that does all of those things in one machine. That is not a principled difference. No one would imagine defining those countless machines as a human. The universal machine is not essentially different from them. It is only a more efficient structure.
Moreover, the training process has important significance for this discussion. In classical computation, we saw that the programmer builds the machine and causes it to perform the task. But in a neural network, the programmer does that as well. True, not directly, but he is the one who “feeds” the machine an immense amount of information via training. Therefore de facto the programmer does insert the relevant information into the machine. He also provides the feedback during training—in other words, he even inserts the knowledge relevant to the problem at hand. This means we simply found a way to train the machine without inserting all the information directly but rather indirectly. It turns out that the machine organizes itself in light of the information it receives, and the resulting structure knows how to respond to new challenges (beyond the examples it received). This is only a more sophisticated way to build, with our own hands, the collection of classical machines (which would be scattered across the globe), each trained to perform a particular task. The fact that we build a machine that succeeds in imitating us is not sufficient to declare that this machine is a person. A broom saves us from picking up dirt by hand, and no one would imagine saying it is a person. If we build an automatic broom, it still won’t be a person. If we build yet another machine that performs another one of our tasks, and another, and another—and then connect them all so everything is carried out in one machine—even then we have not obtained a person but rather devices that serve human beings and perhaps imitate them. Combining all of these into one machine is only more efficient, but it should not materially change the conclusion.
In short, thinking is indeed perceived as a human function more significant than sweeping, but an imitation of thinking is not essentially different from an imitation of sweeping. In both cases the imitation is not the thing itself, and despite the resemblance—confusing the imitation with the original is an error. In terms of the examples from the previous column: just as an imitation watermelon made of plastic—red inside, green outside, with “seeds” (and even walking and quacking like a duck)—is not a watermelon, and just as the person sitting in the Chinese Room does not understand Chinese and is not a speaker of Chinese, so a machine that imitates us is not us.
Another look at semantics and syntax
In the previous column we dealt with general distinctions between a computer and human thinking, and one significant difference I noted was in the nature and meaning of the activity. We saw that the computer operates syntactically (mechanically—formally), and the human operates semantically (through meaning). I illustrated this via the distinction between use and meaning (Wittgenstein) and via the examples of the Turing test and the Chinese Room. After acquainting ourselves with how computers operate, we can now return to these ideas and sharpen them.
A person who receives a problem analyzes it and addresses it according to some logical order. This division into stages is rooted in understanding the problem and applying reason to it. An AI model based on a neural network does not operate like that. It does not truly think about the problem, and therefore does not decompose it into its components. It simply receives input and outputs an output produced by computing weighted sums of numbers. The relation between output and input is entirely mechanical, and the transition between them is unconnected to the logical stages of a solution to that problem (as done by a human). The machine is built such that, although it is doing altogether different things (updating weights in the network, computing, and outputting), the output comes out correct relative to the input that was received. The answer is the correct answer, but the “thinking” that led to it is not really thinking, only computation that happens to yield a product that matches what systematic human thinking would produce for such questions. That’s the whole trick.
Think of the electric circuit described in the previous column and you’ll see it is not truly computing a sum or product of numbers but rather performing technical operations that create the “correct” result—i.e., produce some electrical output (which is not an answer at all, certainly not a “correct” one)—and only the user interprets that product as the correct answer. Think of a computer screen that displays before you the equation 1+2=3. Clearly this display is the result of directing beams onto the screen that illuminate pixels in that morphology: 1+2=3. The programmer or computer builder designed it so that the morphology of that shape would represent to the user the correct answer to their problem. This illustrates why the activity the computer performs has no meaning in itself. The computer merely throws electrons at the screen and lights certain points. The computer’s creator performs a trick that generates an output the user can find the answer in.
To understand this better, I’ll explain a bit more the distinction between semantics (sense or meaning) and syntax (form, structure). Hofstadter, in his book Gödel, Escher, Bach, presents the following game. We have a language whose alphabet consists of three letters: {M, I, U}. But not every combination of these letters yields a legitimate word in the language. The set of legal words can be generated by transformation rules constructed as follows: we are given that the string MI is a legal word in the language. We have four rules that help us take one legal word and create another from it. For example, there could be a rule that any word ending in I may have another I appended. Another rule could be that any occurrence of II may be replaced by MU. (By the way, one could create the word MIII via the first rule and then replace II by U, yielding either MIU or MUI.) Another rule could be that a word ending in U may be duplicated (e.g., from MU create MUMU), or that any UU in a word may be deleted. These are not the rules Hofstadter presents in his puzzle—only examples of the sort of rules in question. The set of legal words in the language is the set that can be generated by applying these four rules to MI and to any word subsequently generated via those rules. Thus the word MMU is also legal (apply the first rule to get MII, then the second rule that replaces II by MU). The puzzle Hofstadter presents is to determine whether the word MU is legal (remember, his rules are different from those presented here—don’t try to solve it based on these).
Let’s leave Hofstadter’s puzzle for a moment and broaden the frame. Now add syntactic rules. These rules tell us how to build sentences out of legal words. There are certain permitted ways and prohibited ways (the resulting string is not proper—a sentence malformed in the language. Think of the word collection “went on horse Yosef to ride.” That is not a grammatical Hebrew sentence—a collection of words that is not a sentence). Now we truly have a language, with an alphabet, with a set of words, and with syntax and grammar.
Now return from sentences to words. Take the set of legal words in this language. Might there exist a wholly different set of transformation rules that generate exactly the same set of legal words (and generate none of the illegal words)? There is no reason to think not. And what of syntax? Might there be a different set of syntactic rules that generate the same set of legal sentences? Again, there is no reason to think not. Thus we can create the same language via entirely different rules.
Hofstadter then proposes a different way to look at his puzzle and, using it, to employ a computer to solve it. Think of the three letters as digits. Suppose M is represented by 3, I by 1, and U by 0. We now have a numeral system consisting of {3, 1, 0}. The word MI is now the number 31. The word MIU is 310. A rule that moves me from MI to MIU can also be presented arithmetically: if a number ends in 1, it may be multiplied by 10. The rule that permits adding I to a word that ends in I would say in this idiom: a number ending in 1 may be multiplied by 10 and 1 added. Thus 31 can become 311. You can see that every typographic rule about deleting and adding shapes can be represented by an arithmetic rule about numerical operations. There is full correspondence between these two modes of description. Now one can present our puzzle to a computer and ask it whether the number 30 (the representation of MU) lies in the set of numbers generated by these rules. That is a mathematical question a computer can try to solve. It will do so without understanding at all that we are dealing with words, and that the question concerns whether a word is legal in a language. None of this interests it. For it, this is a problem in mathematics.
Now think of a group of people for whom this is truly their language. The words have meaning to them, and they speak the language. For them there are grammatical rules that tell them which words are legal and how to assemble them into sentences (syntactic rules). Suppose one of them proposes building a computer that will generate the legal words in the language but in an arithmetic way, translating the transformation and syntactic rules into arithmetic rules. The computer will generate sentences and words exactly like the person it imitates, but it will do so in a completely different manner. Incidentally, the computer could also use a completely different grammatical system (we saw above that there is no reason to rule out such systems) and translate it into arithmetic. This would, of course, produce exactly the same language. From the computer’s perspective, these are just collections of arithmetic operations, not communication via language. Whoever uses the computer can interpret the numbers it prints as words and sentences in the language and derive messages from them (which the computer, of course, does not “intend” to convey—the message exists only in the user’s mind and has no relation to what the computer actually does). One can say the computer operates syntactically—i.e., formally—while the human operates semantically—i.e., for him these are propositions with meaning. Thus, what distinguishes the human from the computer is that in the computer there is no semantics, only syntax. It moves from one structure to another, but meaning plays no role there, only in the user’s mind. In principle I might even generate the words and sentences of the language not by an arithmetic translation of the rules I described but by wholly different rules that yield the same words and sentences. For the computer this changes nothing so long as the same words and sentences are produced. For the human, however, syntax represents some content he wishes to express. Therefore, when he generates words and sentences, he will necessarily do so using the “correct” syntactic rules. Moreover, a different group of people who use different syntactic and transformation rules might speak the same words and sentences, but their meanings will be entirely different (the semantics differ even though the syntax is identical). The computer serving that group will be the very same computer, with the very same program. The computer does not care which semantics you use to interpret its output. Each group will do something else with its output, but the computer will operate in exactly the same way for users from both groups. This illustrates the difference between semantics (meaning) and syntax, i.e., formal manipulations on words or numbers. Think of Reuven from Group A conversing with Shimon from Group B: they can hold a lively conversation when there is no connection between the meanings in Reuven’s mind and those in Shimon’s (exactly like the philosophers’ “problem of the other mind”—the “chestnut”—see about it in column 251, in the series 379–381, and in column 399).
The upshot is that the computer generates its output in ways that are in no sense connected to how people think or speak. The analogy between the computer and the human is at the first stage (input) and the final stage adapted to the input (output), and of course in the correspondence between them. That correspondence will be preserved both in the computer and in the human. But the way we get from one to the other is entirely different in us and in it. And as we have seen, the way matters, for it is based on understanding. A person analyzing a problem or composing a sentence does so out of a relation to the contents and their meanings. The computer does it mechanically-formally; meaning plays no role for it. This is yet another angle from which one can see the difference between the computer’s mode of operation and our own as human beings. As noted, this returns us to the distinctions between use and meaning (Wittgenstein) and between semantics and syntax that arose at the start of the previous column. Once we understand how the computer operates, we can better grasp those distinctions.
However, the situation now becomes complicated again. There are new programs that indeed do think precisely according to the stages present in human thinking. Here the analogy to us appears almost perfect. No wonder that, with respect to these machines, many more people think one can say they truly think.
Step-by-step reasoning
My renewed interest in all these questions arose now because of a video I saw not long ago, which explains the difference between the Chinese DeepSeek model that just hit the market (as mentioned above) and earlier models. The basic claim is that DeepSeek thinks precisely according to the stages of human reasoning (the new ChatGPT model, called o1, does this as well). In light of what I argued above, this seems surprising. This machine does analyze the problem and solve it systematically, and not merely produce an output that happens to be the correct answer—raising anew the question of the difference between it and us. Can we say that in these machines there is now meaning and understanding, not merely syntactic manipulations?
When I watched the video, I tried to clarify a few more details with someone a bit more versed in the field. I raised two possibilities for how a machine like DeepSeek operates: (1) their machine undergoes ordinary training, but when one gives it a question and probes internally how it handled it, it turns out—surprisingly—that it thinks precisely according to the human stages (simply because that’s truly a sensible way to handle the problem). (2) the machine undergoes training different from what I’ve described so far: you feed it a given question, but it is required to output a different kind of answer—not only the final solution to the problem (as described above) but the entire chain of steps on the way to the solution. If it didn’t output the correct chain of steps (i.e., the human chain of steps) en route to the solution, it receives negative feedback—i.e., its weights are updated. This is a machine whose expected output is not the solution to the problem but the whole analysis and the solution at the end. It is trained to output that, not only the final answer. One can illustrate this by a student who answered correctly on a math question because two errors canceled each other out. The older machines I described would get 100 on such a test, since what is required of them is the correct answer; the route doesn’t matter. By contrast, this machine fails the test because what’s required of it is also the “correct” path (i.e., the human one), not only the solution.
From my inquiries it emerged that the correct answer is (2). Already at the training stage the machine is required to output the steps of the analysis, not only the final output. A machine trained merely to output a correct result, as described above, will not operate along the human analytic stages. The conclusion is that nothing here differs from what we saw earlier. The machine is still trained to perform this task; only now the task is not to solve the problem but to find the human analytic steps, follow them, and reach the solution. That is a task like any other, and the machine handles it in the same mechanical way described thus far. In effect, we found a way to build a machine that imitates us better: it will not only happen to arrive (by different means than ours) at the correct output, as described above, but will arrive by the route a human takes. Once that is the required output in training, we can of course succeed at that too. This is no different from training a network to find solutions to all the problems described above. Finding human analytic steps is just another kind of task one can assign it.
True, in this sort of training it is hard to pre-define a single set of rules as the way to solve any given problem. It requires a general presentation of human thinking. But people use various methods such as reinforcement from hindsight or modeling the distribution over possible routes, and so on. One way or another, this is a difficulty for the programmer, not for the machine. Bottom line: this machine is not different from those described above.
One more side remark that helps us see where those who see such a machine as human thinking go wrong. The presenter in the video keeps saying this analysis teaches us how such models operate/think. But you can now see that this statement is entirely mistaken. The analysis teaches us nothing of the kind. We spoon-fed the machine how we think; it was trained accordingly; and lo and behold, it “thinks” like us. How it itself would have operated had it not been trained this way—i.e., had we required only the correct final answer during training—would have been entirely different. Think of a classical (non-AI) program built to answer how to approach solving an equation (a teacher program) versus a program built to solve the equation (a mathematician program). There is no necessary connection between them, and the first does not necessarily teach me about the second.
So, in effect we have discovered nothing here—except ourselves. We inserted into training the way we think, and discovered … the way we ourselves think. We learned nothing new about how the machines themselves operate. It strongly reminds me of Winnie-the-Pooh, who saw tracks in the sand and followed them to discover the mysterious creature that left them—then realized he was walking in circles and that they were his own tracks.
Which is better? Another look at machines and human beings
After I understood how the DeepSeek model works, I asked that person whether this approach yields better results than the non-stepwise approach. I’ll clarify the meaning of the question.
There are two approaches in game theory: (1) its aim is to seek the correct and most efficient solution to the problem; (2) its aim is to seek the way a person solving the problem operates (to imitate human reasoning). According to (1), game theory is a branch of mathematics. According to (2), it is a branch of psychology. Obviously this is not only a philosophical question: the procedures for solving a given problem may differ between the two approaches. Humans have blind spots—types of tasks where, due to the way we think, a person will act incorrectly or at least inefficiently. According to (1), game theory will seek the most efficient and correct solution (not necessarily the human one), whereas according to (2) we will find the human solution—and sometimes that will not be the best solution (because we have a bug in our reasoning for that kind of problem).
Returning to our topic, the question regarding DeepSeek is whether a machine that seeks the human path also arrives at the best solution. Is it more efficient and effective than the old machines that did not operate in steps? It turns out that in our case DeepSeek also gives the better solution (i.e., hits the truth on a higher percentage of tasks). See for example here regarding ChatGPT’s o1 model (algorithmically akin to DeepSeek). In practice it was shown that step-by-step operation (i.e., a machine that imitates us) is much better than a machine that operates directly (unlike humans).
If so, not only is the stepwise machine no closer to being a person, but this analysis highlights even more sharply the differences and further undermines the comparison between machine and human. We saw that the machine does not compete with the human and does not, by itself, arrive at the optimal route to a solution. If we build it to imitate us, it solves the problem best. In other words, we built a machine of imitation. It’s true that we did not do this by copying a human brain one-to-one, but indirectly—training a neural network on enormous information and examples—yet it remains only an indirect way to build an imitation of a person. Ultimately, we’re dealing with a machine trained to do what we do, a kind of parrot; and only for that reason does it achieve the best results. Without that, its performance would be poorer.
Thus it is hard to accept that even in these sophisticated models there is anything human. We saw that stepwise operation raises the doubt more sharply, but understanding how those steps are obtained shows that, as far as the relation between human and machine, we are still in the same place. It is essentially a device built by a human according to human logic—only faster and more precise—not essentially different from classical software. Much less threatening to the definition of the human and “the advantage of man.” As we shall see in the summary, stepwise operation even sharpens more the difference between human and machine.
An analogy to the debate over evolution
This analysis of stepwise operation parallels an argument I raised in the debate over God and evolution. As is known, believers claim it is implausible that our sophisticated world arose without a guiding hand (this is the physico-theological argument). It is like monkeys banging on a keyboard and out comes a Shakespeare sonnet. They argue that the chance of obtaining something like that without a guiding hand is minuscule—and the same holds for our universe and, in particular, life. Therefore it is clear there is a guiding hand—i.e., God. Many neo-Darwinians present a counter (a refutation) by the following experiment. Let a computer program randomly draw 14 letters one after another. What is the chance that the well-known Shakespearean phrase (from Hamlet) “to be or not to be” (the experiment is performed on the phrase without the spaces) will be obtained? Virtually zero. A computer randomly generating 14 letters at a time would not reach this sequence for thousands of years. By contrast, if one lets the computer generate one letter and, when it hits T, stops and moves on to drawing the next; when it hits O, stops and moves on; and so on until “TOBEORNOTTOBE” is formed—then the experiment finishes in a fairly short time (not that I understand why an experiment is needed; it’s a simple calculation one can do on paper—like an experiment to check whether the sum of angles in a triangle is 180°, or whether a fair die after many throws lands on 5 one-sixth of the time). The conclusion, claim the neo-Darwinians, is that when there are laws of nature that govern the world and create an evolutionary process, the chance of life arising increases dramatically—i.e., the time it would take for life to arise by chance is much shorter than in mere random drawing in a vacuum (without laws of nature). This experiment is presented as a demonstration of Dawkins’ argument against the physico-theological argument: believers, he says, do not understand that the laws of evolution “smooth the slope of Mount Improbable.”
In my book God Plays Dice I explained why this argument is foolish. Not only does it not refute the physico-theological argument; it constitutes an excellent demonstration of the argument itself. What happened in the second experiment is that the programmer directed the process, and therefore it became possible. Without the programmer’s involvement, it truly would have been an implausible slope—only there was human involvement that smoothed it. It was not the machine that produced the result, but the human who programmed it. In the end, it was not a random emergence of the letter sequence; there was a guiding hand involved (the hand that wrote the program led the process to a pre-set goal: to produce “TOBEORNOTTOBE”). The same holds for evolution. There, too, the laws of nature (i.e., evolution) do not in fact undermine the physico-theological argument. God, who created the laws of nature, created them in such a way that the otherwise impossible process of the random emergence of life became possible thanks to them. The laws direct the process and enable life to arise with a reasonable probability and in reasonable time. This means that the emergence of life proves there was someone who created those laws—and that is precisely the physico-theological argument. Evolution is not a substitute for God but an excellent demonstration of His existence, for without a guiding hand the probability of blind evolution generating life is negligible.
I have formulated this in the past as follows: the physico-theological argument is not an argument “within the laws” but “outside the laws.” The question at its base is not how life arose (given the laws and within them)—to which evolution answers—but how the laws that enabled evolution and the emergence of life arose, and this is a philosophical question unrelated to scientific research and to evolutionary theory. In the parable: the question is how the program arose, not how the program produced the letter sequence.
Note that this is exactly what happens in training an AI network to operate step by step. The programmer causes a process that would not occur on its own to take place by dictating to the machine how to operate (and he does so by decomposing the process into a chain of successive stages of thought—in other words, the move to steps smooths the implausible slope). Therefore the conclusion here too is that the machine is nothing more than the programmer’s long arm, and without him it would not have truly managed to perform the task. It is very odd to claim that if the machine manages to perform tasks that humans perform, then it too is a complete person. This argument ignores the fact that the machine does so only thanks to human involvement that directs and generates the process. The machine is not truly random (just as evolution is not truly blind), for it operates under the direction of a human hand (what, in the case of evolution, the divine hand did).
This argument arises and is rejected in the same way even for the previous generation of machines—those that operate directly and not stepwise. Such a machine also arrives at marvelous achievements (as noted, it too passes Turing tests), and already there were those who wanted to regard it as a person. But as we saw, it was training that created the machine’s ability to operate (the information inserted into it during training and the feedback it receives), so in fact there was a human hand involved in the machine’s operation. The machine does not compete with the human; it performs what he inserts into it and dictates to it (even if indirectly). Regarding a machine that operates stepwise, we saw it achieves better results than the previous machine, and when we understand that this is only because it imitates its programmer’s human reasoning more accurately and closely, that demonstrates even more sharply why there is no basis to regard it as a person. This is a more intensive operation of the human being through the machine; hence the improved achievements. It simply imitates us better, and therefore succeeds more.
The conclusion is that these machines are imitators trained by humans to do what they do, even if the training in this case is indirect—unlike the classical computation described in the previous column. There is no essential difference between these new machines and the older-style computers, and it is hard to accept the claim that we are dealing with human beings—or even with human thinking.
A general lesson: how far can we reason about hypothetical situations?
Looking back, we can see that although these machines completely pass the Turing test (anyone who has conversed with such a model can attest), the doubt about whether we are dealing with a person remains. In my view, the discussion here shows clearly that there is no real doubt on the matter. To conclude, I wanted to show that this discussion teaches us a more general lesson.
For Turing, a situation in which a machine passes his test was science fiction. He could not imagine such a situation other than in the wildest imagination. He made a claim that sounded very reasonable at the time: a computer that passes his test is essentially a person. But when we actually reached that situation and experienced it firsthand (today we all converse with these machines), it turns out this claim is patently false. This means we must be cautious in drawing conclusions about situations we think we understand as long as we have not experienced them firsthand. Thought experiments and armchair reasoning are excellent tools—but “weigh them and suspect them.”
A beautiful example is Mary’s room. In several past columns (for example 142, 446, 452, 493, 662, 686 and more) I drew similar conclusions, among other things regarding halachic rulings in situations very far from the decisor’s world (such as in the Holocaust). My claim was that a decisor must not rule on questions concerning situations far removed from his world—i.e., he cannot imagine the experience of one who lives in that situation. Theoretical, detached understanding—even with perfect knowledge and brilliant analytic skill—is insufficient for halachic ruling. The same applies to questions about the meaning of sacrifices. I have often been asked whether I yearn for the rebuilding of the Temple, and I answered: decidedly not. I have no desire to see priests slaughtering herds of animals in the courtyard and wading there up to their knees in blood. That is not my dream. But I added that one cannot take a position on such a state until we experience it. Perhaps when we live in a period with a Temple and sacrifices, we will understand that it contributes something to the spiritual and religious dimension of our lives that we cannot grasp in our world today. Therefore I think it is not appropriate to take a firm position on this.
This is another example of the same lesson we should also apply to machines and humans. It is good and important to think in advance about such situations and problems—even before we encounter them. That is part of the preparation needed for the encounter. It is futurism, and it has some value (not the “academic discipline”—ahem—of David Passig; that is a worthless spell—see for example columns 88 and 663), but we must be very cautious about the conclusions drawn from such thinking. As long as we have not experienced something firsthand—as long as that future has not actually arrived—it is not right to form a decisive position on it. There is something very significant in direct contact with the situation, far beyond abstract and theoretical understanding.
In the last two columns I discussed whether the machine is a person. The conclusion is that the machine is a machine, not a person. But that of course assumes we are truly dealing with two different beings. In the next column I will address the reverse question, no less interesting and in fact arising from the same contexts: is a human being a machine? The achievements of the new machines raise the question whether our basic distinction between human and machine exists at all. Perhaps the human is nothing but a sophisticated machine (a biological computer, i.e., a computer whose hardware is made of flesh and blood), which would render moot the discussion in the last two columns (which, as noted, assumed these are two different beings).
[1] This is a one-dimensional network. Of course there can be a network in any dimension you like (with such a pattern of connections, this isn’t truly a one-dimensional network; one could draw it as a two-dimensional network with connections both within a row and from row to row).
Discover more from הרב מיכאל אברהם
Subscribe to get the latest posts sent to your email.
Man is not a machine because he has feelings and yet he digs.
Thank you very much for the columns!
Is it possible (based on the difference you presented) to think of actions in which humans would always be better (based on the current model)?
This is the subject of columns 590-2. I have no way to answer this clearly.
Does the fact that man is actually the one who thought, this man, and only spoon-fed the machine (indirectly, etc.) mean that only man can innovate truly? (Because in fact, the intellect is always built on existing material and can only perfect it more and more)
This is the topic of the next column (697)
In the Rabbi's opinion, is there a profession that will always be needed? (At least potentially)
I don't know.
Does the Rabbi know the following argument:
https://x.com/WhilleTrue/status/1902228039133446529
I know and I wrote it myself (I think in columns 590-92). But in my opinion he is wrong and out of date (as I was wrong about it in the past) because he did not really understand what artificial intelligence is today. The writer assumes that he is up to date with the principles, and in my opinion he is wrong about that. Today's LLM systems can go outside the system, since they operate in other ways. Although their operation is mechanical, a mechanical imitation of creative thinking is possible. This is the entire issue that I dealt with in the current series of columns, which is more up to date.
Hi
Regarding the article on man and machine (column 695).
As usual, smart and beautifully written.
The section on the difference between semantics and syntax suited me very well.
In 2005, Eshel and I wrote an article, which I have attached here, called
meaning based natural intelligence vs information based artificial intelligence.
It was published in a book called Cradle of Civilization. The idea was mine, but the writing was Eshel's.
I had another thought. The separation between mechanical calculation and understanding is artificial. We are simply a much more complex machine based on carbon instead of silicon. Understanding is an illusion of the machine that originates from a logical order that the machine manages to produce from a lot of information.
It is clear to me that no believer can accept such an approach, but I have not found a way to undermine it.
I really don't agree that a person is a biological computer, and your claim that our mental plane is an illusion is absurd in my opinion. Whose illusion? Without consciousness and mental dimensions, there is no one who would have the illusion. A computer cannot deceive itself regardless of its complexity because there is no one to deceive. And this is the main difference between it and us. On the sidelines, I do not formulate a position on scientific and factual questions at all based on my religious beliefs. I observe the world like any atheist, through reason and cognition. I am convinced that there is a fundamental difference between a person and a computer without any connection (at least conscious) to my religious beliefs. This is based on logical and scientific considerations. Therefore, I suggest focusing on them. My religious beliefs are irrelevant to this discussion.
I will try to read the article you sent as soon as possible.
Thank you,
I included the article only because you talked, in your column, about semantics versus syntax, and it reminded me of the article from 2005, and not so that you would invest your time in it.
You write that your reference to the essential difference between a human and a computer is based on logical and scientific considerations. It is important for me to understand them.
As far as I am concerned, a scientific consideration should be based, even indirectly, on an observation that is not subjective.
Since I am not aware of such, I ask that you write these considerations, if only at the beginnings of chapters and titles.
I just got to the computer.
Regarding the question of illusion, I suggest reading the follow-up column I posted this morning (696). It seems to me that it sharpens the point, if at all it needs sharpening. My own position is described in the previous two columns. You will not find anything there about religious belief. These are purely logical and scientific arguments.
I wrote in the previous letter that these are logical and scientific considerations. Of course, in this case we are talking mainly about logic, since there are no parameters that can be measured here. Note that of course the opposite thesis is not based on anything else either. Both sides of the argument are like that. Therefore, I argue that my religious beliefs are irrelevant to the discussion, just as someone else's atheism is not relevant to his opposing position. The considerations should be examined on their own merits.
But nevertheless, there is also science in the background. For example, the thought experiment I defined regarding Buridan's man, which is based on science (where the symmetry of the solution is the symmetry of the problem). The perception of the difference between animate and inanimate is also a scientific matter (a computer is inanimate, although many forget this, for some reason). Certainly if psychology is considered a science in this regard, but also without it.
As mentioned, my reasons are described in the last columns and in references to them (see in the first column the three possibilities for comparing a person to a machine). In short, an iron machine does not develop consciousness, understanding, or thinking. We have no scientific indication of this, and therefore all talk about it is an hallucination in my opinion. Unless we mean that the very thing the machine does is what is called thinking or understanding, then it is just categorical confusion. My scientific understanding says that a computer is nothing more than iron that transmits electric currents. Oh, and nothing more. This is a completely scientific claim. Seeing consciousness and thinking and understanding in a computer is about the same as seeing them in a stone. Would the claim that a stone is devoid of consciousness, understanding, and thinking be considered a scientific claim? I don't know, but this is certainly the accepted scientific view regarding inanimate objects. The same is true of a computer. It is no different in any way from a stone or water. There is nothing there beyond physical processes. The meaning of what it does (the semantics) is solely in the user's head. I have explained in these columns that saying that this machine has consciousness and mental dimensions is about the same as saying that an electrical circuit (a primitive computer that I described in column 694, a half-adder, or a multiplier) has these three.
That is in essence. It is delusional to me that an explanation is needed at all, and even stranger that this simple scientific concept is attached to religious belief. If anything, I would attach the opposite concept to religious belief and detached from materialism.
Thank you very much for the column.
It is easy to see that man built the AI machine, and that it is not a man. But I think it is more likely that the fundamental difference between man and machine is the difficult issue of consciousness (feeling). Because even if man created and directed the machine and it does not think and understand, it can still perfectly imitate man (including emotion-like - in the future), I also teach and ”spoon-feed” my child. I want to say that if there is a robot machine that looks and behaves like a human (even in the emotional sense) it will be more difficult to claim that it is not a human (if it copied well enough, I also ” taught” my child) but only remains with the claim that it is without mental dimensions. What do you think?
I didn't understand the question. If you think the difference is awareness, then there is indeed a difference. Machines don't have emotion, and imitation is not emotion. It's also true that it's difficult to see this phenomenologically. So what's the question?