On vague distinctions, causality, and Simpson’s paradox
An interesting post about binary differences versus statistical differences, and the implications for male and female brains.
I would love to hear your opinion,
Discover more from הרב מיכאל אברהם
Subscribe to get the latest posts sent to your email.
0 Answers
Indeed, a common mistake regarding vague distinctions. Like the difference between religious Zionists and ultra-Orthodox, or left and right. Interesting and typical. Note the pictures of the cat and dog there that really look similar. I have a feeling that it’s not just about statistics (meaning, in my opinion, if we change the doses of similarity and difference, we won’t always end up with a dog and a cat even if we keep most of the characteristics of the dog and the cat. There is something essential here beyond the collection of characteristics, but it is difficult to capture it and translate it into words from the simple visual form). Sometimes (and perhaps always) talking about a collection of characteristics is just a limitation in our ability to define and not in the things themselves.
See another very interesting post of his .
To be honest, I don’t agree with what he said. If I understood correctly, there is just careless sampling in the tables. These tables do not represent groups of people who were randomly selected, but rather those who agreed to take the medicine. If you look, the two tables do not add up to the general one (after all, there are 40 in total, and in each sub-table there are also 40). The men are distributed differently from the women (10 took it and 30 did not take it, and vice versa). In short, the whole thing seems to me to be a complete deception, and the distinction between causation and correlation still stands. Furthermore, according to his method, when you introduce the story here, you are introducing causality. It still does not emerge from the statistical data itself.
The cartoon there is beautiful. There are seemingly two parallel planes where the relationship between correlation and causation is examined. In the concepts themselves and in learning (which changed the position causally or not). But in my opinion, if you pay attention, there is a causal drag relationship on three different planes (and not just two): the drag in the concept of causation itself (a cause drags a variable). The causal drag between correlation and causation (correlation drags causation). And the drag between learning statistics to distinguish between correlation and causation. Great. And each of them raises the question: Is there even a causal drag (and maybe everything is correlation. David Hume’s dilemma). Does correlation express causation or is it just a correlation between them (i.e. a correlation between correlation and causation). And did the learning cause a change in perception or is it just a correlation between them.
——————————————————————————————
Asker (another):
By the way, I looked at the overall table, and there are 80 in total, and 40 in each sub-table, so it seems to add up okay.
——————————————————————————————
Rabbi:
I probably made a mistake because of the speed. thanks.
——————————————————————————————
Asks:
1. I don’t think what he’s talking about is consent to take medication. It doesn’t play a significant role in the arguments he presents. For that matter, the same post could have been written when the sample was random.
2. His question about whether there are additional variables that, if we knew them, could turn the picture around – is of course an excellent question, to which there is no answer. Not that we haven’t found the answer, but the answer doesn’t exist (unless we have the overall equation of physics or something grandiose like that).
My professor in the relevant statistics course, when he taught Simpson’s Paradox, explicitly said that there is no way to mathematically or statistically neutralize Simpson’s Paradox. We need to think carefully about all the variables that we manage to neutralize, and that is the best we have.
3. I have no idea what he’s talking about regarding the “best statisticians” who debated Simpson’s paradox. There’s no debate here. When there’s a Simpson’s paradox phenomenon as described in the post, the answer is unequivocal – never give the drug, even when we don’t know the gender of the patient in front of us.
In my opinion, the formal reasoning is as follows. We will denote the patient’s gender to be G, where G=male or G = female. An unknown variable, but it is of course given and fixed with respect to the experiment. So we know that for every G,
P(E|C,G)< P(E|~C,G)
That is, his mistake, in my opinion, is that there is a hidden condition here on other variables, which he simply did not write down. When you give the patient the drug and examine the results, this experiment is conditional and depends on the patient’s gender, and therefore the spelling of P(E|C) zzz is a sloppy spelling (but acceptable, and there is no way to avoid it).
——————————————————————————————
Rabbi:
That’s exactly what I wrote. He presents it as if there is a paradox or a contradiction in the separation between correlation and causation. But the truth is that there are simply more hidden variables (agreement to take medication, or any other variable. It’s not important). Therefore, the statement that correlation is not causation stands. In general, I didn’t understand the nature of this paradox, and why it is a paradox. It simply means that there are more variables, or that the sampling was not careful (without stating that it could have been more careful. Usually we don’t know what the relevant variables are, as you wrote). What’s paradoxical about that?
——————————————————————————————
Asks:
If I understand correctly, the paradox he presents is the question of whether to give medicine to a person whose gender you do not know. In my opinion, there is no paradox in this, because the answer is negative – the medicine is not given to anyone.
The point about causality is generally correct, I think. If I understand correctly, Yehuda Perl’s point is probably this: if we knew the story behind the formation of the correlation before us, we could avoid falling into Simpson’s paradox in advance, because we would know that there is an effect of gender (for example).
More fundamentally: we have a stack of data that links a sample collection X (for example, we have X1,X2,…,Xn people), and some parameter Y (for example, if Yi=0 then Xi is healthy, and if Yi=1 then Xi is sick).
We want to divide X into two subgroups A, B (where B is the complementary group of A within X) and test the hypothesis of whether belonging to group A is related to health (e.g. A is the group of people who took some medicine). Suppose we found such a relationship, and we are happy to decide that belonging to group A is indeed related to health.
The problem we face is whether the division into A and B is justified. Perhaps if we add another division according to another parameter (for example, we divide A into men/women and also B into men/women), we will get different answers.
So, says Yehuda Perel, we should understand the story that underlies the connection between A and health, and if we understand this story that describes causality, we can decide whether to attribute health to belonging to group A, or whether there is Simpson’s paradox here and in fact we need to add the division into men/women in order to “truly understand” why members of group A are healthier.
In fact, mathematically, we can often find some other arbitrary division that will reverse the trend. Men/women is simply a division that we are used to and that probably makes sense in all sorts of situations, but statistics are blind to prior logic. You can always find an arbitrary subgroup that will reverse the trend for us. So the story probably plays an important role, so that we can decide which subdivision is justified to consider.
——————————————————————————————
Rabbi:
This is perfectly clear, but it seems trivial to me. If you know the story, you know the causality, but that’s exactly the problem that correlations don’t give you the story. So what’s new here?
I also completely agree with the paradox. Not giving the medicine because there are other factors that influence it and we haven’t located them. Therefore, our statistical picture is clearly lacking and it is wrong to rely on it. Again, I don’t really understand what’s new here.
Discover more from הרב מיכאל אברהם
Subscribe to get the latest posts sent to your email.
Leave a Reply
Please login or Register to submit your answer