Tuesday, 6 July 2010

Tuesday's child (or why New Scientist has got it wrong)

Well I believe they have.

First, as a preface to this post, I should say it is a not-entirely-irrelevant-fact that I have two children, one of whom is a boy born on a Sunday.

There is a discussion on the letters page of the current (3rd July) New Scientist, debating an issue from an article in the 29th May edition (Mathemagical by Alex Bellos). In summary, Bellos report Gary Foshee saying:
I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"
We are then told that the answer is 13/27 (close to a half), but that if he'd not specified the Tuesday - if he'd just said:
I have two children. One is a boy. What is the probability I have two boys?"
the answer would have been 1/3.

The article concludes:
It seems remarkable that the probability of having two boys changes from 1/3 to 13/27 when the birth day of one boy is stated – yet it does, and it's quite a generous difference at that. In fact, if you repeat the question but specify a trait rarer than 1/7 (the chance of being born on a Tuesday), the closer the probability will approach 1/2.
Now there are lots if counter-intuitive things that emerge from probabilities (see for example the Monty Hall problem) and I believe you can set this one up as a perfectly valid counter-intuitive problem as I shall discuss later, but in this case Bellos and the New Scientist have got it wrong.

The difficulty is that we are presented with Foshee volunteering the fact that he has a boy born on a Tuesday. The Bellos argument relies on everyone who has a boy born on a Tuesday telling you that fact. I too have a boy who was born on a Tuesday but I didn't tell you about that one, I told you about the one born on a Sunday. Consequently my case would not be included in the population used in Bellos's argument.

Imagine, if you will, a hall containing lots of parents. To be precise, there is one parent - say the father - from each of 1764 families with two children. (I have chosen 1764 deliberately, of course, so that all my numbers in what follows will be integers.) These 1764 families also happen to be exactly representative of the random distribution of gender and days of the week for their birthdays. (So exactly half the children are boys, and of those 1 in 7 is born on a Tuesday etc etc.)

From this hall, one father is going to come out and tell you the gender of one of his children and the day he/she was born.

Out comes Foshee, and says
I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

We know what Bellos's answer would be: 13/27. But I'm suggesting that he's forgotten to take account of the fact that when I come out I only tell you about the son of mine that was born on a Sunday, even though I, too, have a son who was born on Tuesday.

Lets develop the thought experiment a bit more.

First, announce to everyone in the hall that all those who don't have any sons should go home. That gets rid of all the families with two girls, which is a quarter of all families, because there are four equally-probably family structures: girl-girl, girl-boy, boy-girl, boy-boy. (Bellos and I agree on that.)

That takes away 1764/4 = 441 leaving 1323 families in the hall, all of whom have at least one son. In fact, the people in there now consist of 441 with two boys and 882 with one boy and one girl.

If any random parent comes out now, the probability that they have two sons in 1/3, again in agreement with Bellos.

Next, tell all the families who don't have any boys who were born on Tuesday to go home. The calculation of what that does to the numbers in the hall is a bit complicated, but I agree with the sums done by Bellos and the key point is that we lose a lot more of the ones who have only one boy than the ones who have two boys, because the ones with two boys have almost twice the chance that one of them was born on Tuesday. More specifically, of the 882 with only one boy, 6 out of every 7 depart, leaving 882/7 = 126. The calculation for the 441 with two boys is summarised below, but it ends up that 324 depart, leaving 117 (of whom 9 have both born on a Tuesday and 108 have one born on a Tuesday and the other born on a different day).

We are therefore left with 243 families, all of whom have at least one son born on a Tuesday. Of these, almost half (117/243 = 13/27) have two sons. This gets us to Bellos's answer. By restricting the people in the hall to those with a boy born on a Tuesday, the probability that any one of them has two sons is almost half. When we didn't specify the Tuesday, when just required that they have a son, the probability that any one of them had two sons was a third. Adding in the day of the week has had the surprising effect of increasing the probability that any one of them has two boys from a third to almost a half.

But this is not the case that Bellos has presented. He has not restricted the population in advance to those with boys born on Tuesday. He draws his conclusion from one person who happens to volunteer the fact.

Go back to the hall with all the families that have at least one boy. When anyone comes out of that hall, the probability that they have two sons was, as we saw, 1/3. Suppose that they all come out, one by one, and make a statement like Foshee's or mine. The same argument based on Tuesday can be made for any day of the week. So, for each one, on Bellos's reasoning, the probability that they have two sons is almost 1/2. And yet when they have all come out of the hall, you will find that only 1/3 of them had two sons. What's gone wrong?

It is a sort of double counting. Bellos's reasoning requires everyone who has a son born on Tuesday to tell you that. But since they are only telling you about one son, if they are doing that for Tuesday they can't all do it for Sunday as well. The point is you have to specify what day of the week you are using - which is what happened when we sent home everyone who didn't have a son born on Tuesday.

Coming back Bellos's conclusion
It seems remarkable that the probability of having two boys changes from 1/3 to 13/27 when the birth day of one boy is stated ...
I don't believe that simply stating the birth does this. After all, everyone has some birth day, so if it did work in this way you wouldn't need actually to state the birth date. What makes the change is specifying the birth day.

Specifying the birth day excludes some families, making a real change. Just stating a birth day makes no change. Compare these two:

Case 1

Foshee says: "I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

You can answer: "1/3"

Case 2

Foshee says: "I have two children. One is a boy. What is the probability I have two boys?"

You say "Have you a boy that was born on Tuesday?"

Foshee says: "Yes"

You can answer: "About 1/2"

The difference between these two still seems pretty surprising, but not quite as bizarre as Bellos was claiming.

Acknowledgements: I would like to acknowledge the helpful conversations that I’ve had about this with my son, who was born on a Tuesday. He’s doing Further Maths at A-level and reckons this is typical of what he had to analyse in his stats and prob module

Appendix: Calculation details

441 have two sons.
The probability that both are born on a Tuesday is 1/7 x 1/7 = 1/49. So there are 441/49 = 9 for which both are born on a Tuesday
The probability that the first is born on Tuesday and the second some other day is 1/7 x 6/ 7 = 6/49. So there are 441 x 6/49 = 54 in this category
The probability that the second is born on Tuesday and the first some other day is 6/7 x 1/ 7 = 6/49. So there are 441 x 6/49 = 54 in this category
The probability that both are born on some day other than Tuesday is 6/7 x 6/7 = 36/49. So there are 441 x 36/29 = 324 with neither born on a Tuesday.


Vincent Chapman said...

I know it's been two years since this post, but I came back to it recently after looking at one of your other posts and it got me thinking again.

After a few minutes of trying to place my finger on exactly why the statement that one of the children is born on Tuesday seems to affect the probability of the other child being a boy, I realized that the exact same logic can be applied to the original "two boy puzzle".

The amazing truth is, that those people who automatically say the answer to the two boy puzzle is 1/2, are actually equally as correct as those who answer 1/3. The thing is, that somebody who walks up to you and says "I have two children. One of them is a boy. What is the probability I have two boys?" could have said "I have two children. One of them is a girl..." if they have a girl, but they would be forced to say they had a boy if they had two boys. Hence, if we assume that the person we are talking to is not sexist, then the probability that they have two boys, given that they volunteered the fact that they have a boy is actually twice the chance of each of the two instances of them having a girl. Hence the probability that the other child is a boy is equal to the probability that the other child is a girl and hence 1/2.

The pitfall of the majority of statisticians who claim the answer to this puzzle is 1/3, is that they have taken each possible permutation of the genders of the two children to be equally likely events, something which we simply cannot presume to be the case. Amusingly the actually answer ranges depending on how sexist the people around you are, where 1/3 is the answer in the case where the person would only ever volunteer the fact they have a girl if they didn't have any boys to brag about and 1/2 is the answer in the case where the person would randomly volunteer boy or girl if they had a boy and a girl with equal probability. The final case is where the person is entirely biased against boys, and in this case we could be absolutely sure that the other child is a boy, otherwise they would have told us they have a girl.

This conclusion links in quite nicely to ideas of information. A sexist person conveys more information about the gender of the other child by the choice they made of which child to volunteer, in the case where they have a boy and a girl. The only case where no information is conveyed is when the child is picked randomly. This is a similar conclusion to the one we came to when talking about the Game-Show Earthquake problem.

No doubt some people would shoot me down when I say this but I'm absolutely convinced that the actual answer to the two boy puzzle is in fact 1/2, despite it seemingly being taken as fact that the answer is 1/3. I say this because in REALITY, if someone walked up to you and asked this question the probability would be close to 1/2. Although ironically now that this puzzle has circulated, anyone who asked this question will probably be more likely to say boy because they know it as the "two boy puzzle" rather than the "two girl puzzle".

This also demonstrates what I always say about people misunderstanding statistics, even people who are absolutely sure they understand it, because those who answer 1/3 to the puzzle have made a similar mistake to those people who say things like "wow, if only I'd bet on that happening before the match! I'd be a millionaire!" The answer being well yes you would, but you'd have to have made presumptions before the match about something you simply couldn't have known was going to happen. In a similar fashion, we've made presumptions about the nature in which the question has been asked when we simply can't do so.

JeffJo said...

. This apparently got "bumped" on the search engines because of Vincent's comment, and I just found it. David Chapman had the right idea, but didn’t carry it far enough. To illustrate, I'm going to change the problem in a couple of ways that have absolutely no impact on how we should get to the answer. But I also need to point out that Gary Foshee asked this question at a conference honoring Martin Gardner, the long-time author of Scientific American's "Mathematical Games" column.

You run across an old school buddy - you were both in the Mathematical Games Club - on the street. While getting reacquainted, he tells you "I have two children, and one of them is a XXXXX. Given that information about one gender, what is the probability I have a boy and a girl?" But in the middle of his statement, a passing driver honked his horn, drowning out the gender your friend named (as represented by XXXXX). Can you still answer?

If you had actually heard your friend say "boy," then Gary Foshee says the answer is 2/3 (one minus the chance of two boys). Gary Foshee says this, because he remembers that Martin Gardner once said it for a similar problem. And I think, but it isn't clear, that David Chapman would agree.

But now we have a paradox. Because you would also have to say 2/3 if the obliterated word was "girl." And if the answer is 2/3 regardless of which word was obliterated, it has to be 2/3 even if your friend hadn't made the statement. Yet we know that probability is 1/2.

This is a variation of a famous problem introduced by Joseph Bertrand in 1889, called Bertrand's Box Paradox. His point, much like the one David didn’t take far enough, was that knowledge of the possible combinations is not enough to solve a problem like this. You also need to know why this particular fact was given to you, when other equivalent facts could have been.

If your friend actually has two boys, or two girls, there were no alternatives. But if he has one of each, he had to choose. And if you don't know how he made that decision, you can only assume he choose randomly. That is, he had a 50% chance to say "boy," and a 50% chance to say "girl." And that means that the answer Foshee and David should have is 1/2, not 2/3. Yes, 2/3 of all two-child families at least one with a boy have two, but only half of the fathers who would randomly *tell* you about a boy have two. Ironically, Foshee seems to have missed that Martin Gardner changed his answer in that recalled column for this reason.

And it doesn't matter if other facts are included in the XXXX that you didn't hear. The answer is always 1/2 - exactly, given the assumptions about children's genders - no matter what included. Vincent was being generous is claiming that 1/3 (for the original question) was equally correct as 1/2. 1/3 can only be correct if, as David describes for birth-days but didn’t take far enough, the selected father was forced to tell you about a boy if he could.

David Chapman said...

Thank you for your comment, Jeff, it is fascinating how much there is to this.

In a similar case (though with Knights hunting dragons!) Chris Maslanka in his puzzle column in the Guardian argued a couple of weeks back that you could not give an answer for the probability, because you don't have enough information. You don't know what he's meaning when he says he has one male dragon.

However, I've now thrown the newspaper away and I don't think the puzzles appear in the online version of the paper so I can't check the details.

David Chapman said...

PS, I copied Vincent's comment into a later blog post