Q&A: Why Does Zipf's Law Work?
Why Does Zipf's Law Work?
Question
Hello Rabbi,
There is an empirical statistical law called Zipf's law, which says that if you take any text and arrange its words by frequency from highest to lowest, then the graph behaves like 1 divided by n, where n is the word's position in the frequency ranking. So, for example, the first word will have twice the frequency of the second word after it, and three times the frequency of the word that comes after the second, and so on. This is in itself a puzzling fact, discovered by the linguist Zipf, after whom the law is named. Moreover, it turns out that this phenomenon also holds in contexts completely different from linguistic ones—for example, city population sizes, income distributions, and earthquake magnitudes all obey Zipf's law. My question is a bit philosophical. I am asking: how can it be that all these unrelated things all obey Zipf's law? Why this law specifically and not some other law, and why is there any regularity at all in their distributions?
Best regards,
Answer
Zipf's law is a well-known law, and it says that different things are distributed according to some power law, where the exponents can vary. I haven't researched the matter, and I'll write what seems to me off the cuff.
Apparently, this means that this collection of phenomena is distributed according to a power law because of its nature (phenomena of a certain kind are distributed according to fitting laws). After all, there are other phenomena that are not distributed according to power laws but according to some other law. This law deals with those phenomena that are in fact distributed according to a power law, and if you gather them that way there is nothing surprising about it.
But when I first heard about this law, I suspected that there is nothing special here, and rather that when you arrange a set of phenomena in ascending or descending order, there are many cases in which distributions can be approximated by some power. Even then, the deviations are always there, so I am somewhat doubtful how far this law really exists and the deviations are merely statistical, or whether in fact there is no power law but many distributions can be approximated by a power. In any case, it exists only for phenomena where that is the distribution, but there are other distributions too.
Discussion on Answer
As far as I remember, I saw that there are quite a few different power laws. But even if the exponent is different, it may be that over a certain range it can be approximated by one divided by x.
I checked again, and it seems you are right: there is an extension of Zipf's law called the Zipf-Mandelbrot law, which talks about more general exponents.
But this is not just some arbitrary power law like one divided by x^2.6; it's 1 divided by x. And not only that, it's always that same exponent of 1 divided by x. It doesn't sometimes change to one divided by x^2, for example. For example, there is no Zipf's Law 2 for phenomena distributed according to one divided by x^2. There is only one Zipf's law, with one particular nice round exponent.