Here is the problem: An American friend
of yours has two children. You know that one of them is a girl, but
cannot remember the gender of the other child. What is the
probability that they are both girls?
Many people would equate the lack of
information concerning the gender of the other child with equal
probability of the two possible genders, and answer 50%.
Their reasoning would be completely
wrong but, as it turns out, their answer would be correct, at least
in practical terms. If you read and understood my previous article
on probabilities
http://giuliozambon.blogspot.com.au/2012/11/misunderstood-science-question-of.html,
you might know why. But let’s proceed in order.
For the genders of two children, there
are four possibilities: MM, MF, FM, and FF, which represent the
sample space of the problem. If we assume that boys and girls
are equally probable, the four possibilities are also equally
probable, at 25% of probability each. As you know that one of the
children is a girl, you can exclude the MM case. As a result, you
are left with three possibilities and can conclude that the
probability of both children being girls is 1/3, or approximately
33.3%. And obviously, the probability that your friend’s other
child is a boy is 66.7%.
But then, why did I say that the 50-50
answer is correct from a practical point of view? There are two
reasons:
1. No parent gives the same name to
their two daughters.
2. The frequencies (and hence, the
inferred probabilities) of given names are very low.
Let’s start by taking into
consideration that two daughters in the same family always have
different names. We do so by splitting the ‘F’ of the above
possibilities into ‘x’ and ‘f’, where ‘x’ indicates the
girls with a particular first name, and ‘f’ the other girls, who
have any other name. This results in a sample space consisting of
MM, Mf, Mx, fM, ff, fx, xM, xf, and xx.
‘x’ can be any name we want,
including the name of the daughter we know to belong to your friend’s
family (even if we don’t know that name). Then, after discarding MM
as we did before, we can also discard the possibilities that don’t
contain ‘x’. This leaves us with Mx, fx, xM, xf, and xx. Now,
as parents never give to two daughters the same name, we can also
discard xx, and remain with the four possibilities Mx, xM, fx, and
xf.
If we assume that boys and girls are on
average equally probable and that the genders of children of the same
family are independent from each other, we can calculate the
probabilities associated with the four possibilities:
PMx = PxM = PM
* Px
Pfx = Pxf = Pf
* Px
We can use the frequency ‘y’ with
which the name ‘x’ occurs among girls as an estimate of its
probability, and rewrite the two expressions as follows:
PMx = PxM = PM
* PF * y = 0.25 * y
Pfx = Pxf = PF
* (1 – y) * PF * y = 0.25 * (y – y2)
As you can see, if y2 is
much less than y (i.e., much less tan 1 as stated in our condition
2), all four possibilities have, for all practical purposes, the same
probabilities. Then, the probability that the children are both
girls is indeed 50%.
But is it true that all names have a
frequency much less than 1? If you look at the web site of the US
Social Security Administration, you will find the page
http://www.ssa.gov/oact/babynames/limits.html from which you can
download the number of children born in any particular year and given
any particular name (but only if that name was given to at least five
children).
Let’s say that your friend’s
daughter was born in 2011. Then, you quickly find out that of the
33,723 names listed, out of a total of 3,623,043 girls, the most
frequent girl name was Sophia, which was given 21,695 times. If your
friend gave to his daughter the name Sophia, with y = 21,695 /
3,623,043 = 0.060, the resulting probability for two girls is around
48.45%.
Perhaps 48.45% is not close enough to
50%, but consider that the average occurrence of any name is 33,723 /
3,623,043 = ~108, which provides y = 0.00003. Then, the probability
of two girls not knowing the name of the daughter your friend
certainly had becomes 49.999%. Or perhaps you find out that the name
of your friend’s daughter is Hilde, which in 2011 only occurred 5
times out of 3,623,043. In that case, the probability of him having
two daughter is almost exactly 50%.
All in all, we can conclude that 50%
is, for all practical purposes, correct, even if the reasoning of
many people to reach that value is wrong.