Thursday 24 January 2013

Probability and sexism

Vincent Chapman recently put this comment on a post (Tuesday's child) from a couple of years ago, but I think it deserves a higher profile, so I'm pinching it as fresh blog post. [Edit later: there has been a further comment on that original post]

My post had discussed this:
In summary, Bellos report Gary Foshee saying:
I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"
We are then told that the answer is 13/27 (close to a half), but that if he'd not specified the Tuesday - if he'd just said:
I have two children. One is a boy. What is the probability I have two boys?"
the answer would have been 1/3. 
 And here is Vincent's comment (my emphasis):
After a few minutes of trying to place my finger on exactly why the statement that one of the children is born on Tuesday seems to affect the probability of the other child being a boy, I realized that the exact same logic can be applied to the original "two boy puzzle".

The amazing truth is, that those people who automatically say the answer to the two boy puzzle is 1/2, are actually equally as correct as those who answer 1/3. The thing is, that somebody who walks up to you and says "I have two children. One of them is a boy. What is the probability I have two boys?" could have said "I have two children. One of them is a girl..." if they have a girl, but they would be forced to say they had a boy if they had two boys. Hence, if we assume that the person we are talking to is not sexist, then the probability that they have two boys, given that they volunteered the fact that they have a boy is actually twice the chance of each of the two instances of them having a girl. Hence the probability that the other child is a boy is equal to the probability that the other child is a girl and hence 1/2.

The pitfall of the majority of statisticians who claim the answer to this puzzle is 1/3, is that they have taken each possible permutation of the genders of the two children to be equally likely events, something which we simply cannot presume to be the case. Amusingly the actual answer ranges depending on how sexist the people around you are, where 1/3 is the answer in the case where the person would only ever volunteer the fact they have a girl if they didn't have any boys to brag about and 1/2 is the answer in the case where the person would randomly volunteer boy or girl if they had a boy and a girl with equal probability. The final case is where the person is entirely biased against boys, and in this case we could be absolutely sure that the other child is a boy, otherwise they would have told us they have a girl.

This conclusion links in quite nicely to ideas of information. A sexist person conveys more information about the gender of the other child by the choice they made of which child to volunteer, in the case where they have a boy and a girl. The only case where no information is conveyed is when the child is picked randomly. This is a similar conclusion to the one we came to when talking about the Game-Show Earthquake problem.

No doubt some people would shoot me down when I say this but I'm absolutely convinced that the actual answer to the two boy puzzle is in fact 1/2, despite it seemingly being taken as fact that the answer is 1/3. I say this because in REALITY, if someone walked up to you and asked this question the probability would be close to 1/2. Although ironically now that this puzzle has circulated, anyone who asked this question will probably be more likely to say boy because they know it as the "two boy puzzle" rather than the "two girl puzzle".

This also demonstrates what I always say about people misunderstanding statistics, even people who are absolutely sure they understand it, because those who answer 1/3 to the puzzle have made a similar mistake to those people who say things like "wow, if only I'd bet on that happening before the match! I'd be a millionaire!" The answer being well yes you would, but you'd have to have made presumptions before the match about something you simply couldn't have known was going to happen. In a similar fashion, we've made presumptions about the nature in which the question has been asked when we simply can't do so.

1 comment:

JeffJo said...

Say you have N identical boxes, N gold coins, and N silver coins. Put two coins into each box in such a way that M boxes have one of each kind. That means that (N-M) have just one kind: (N-M)/2 have two gold coins, and (N-M)/2 will have two silver coins.

Pick a box at random. The chances that it has two coins of the same kind are (N-M)/N. But what if, without looking, you reach into a box and pull out a coin in your closed fist so that you still can’t see it. Whatever color it is, there were [M+(N-M)/2]=(N+M)/2 boxes that had a coin of that color, and (N-M)/2 boxes that had two of them. So it would seem the probability this box has two coins of the same color is now (N-M)/(N+M). The probability changed simply because you took a coin out without looking at it. But it would change the same way if you did look, because it can’t depend on what you would see. How can that be?

This apparent paradox is known as Bertrand’s Box Paradox, named for Joseph Bertrand who first described it in 1889. He used N=3 and M=1, but the basic idea holds no matter what the values are. His point was that when you count cases for this type of problem, you shouldn’t count all of the cases that fit the observation. You should only count the proportion of each case that, when you allow all observations to be made equally, would result in what you observed. So in the above problem, of the M boxes that have a gold coin and a silver coin, you should only count M/2 of them because there are two different kinds of coins that could have been withdrawn.

If Gary Foshee had not mentioned Tuesday, the problem is a Bertrand’s Box Problem with N=4 and M=2. The wrong answer is (N-M)/(N+M)=(4-2)/(4+2)=1/3, It is wrong because it assumes the statement “Gary Foshee has a boy” is logically equivalent with “Gray Foshee tells you he has a boy.” They are not equivalent, because it is possible for Gary Foshee to tell you he has a girl, when he has a boy and a girl. The correct answer is (N-M)/N=(4-2)/4=1/2. And when you add in Tuesday, the problem is a little more complicated, but the result is the same. The answer is not 13/27, it is still ½, and your intuition that “Tuesday” shouldn’t matter is confirmed.