Suvrat is perfectly correct, but in the interest
of
clarity, and yes
pedantry, I'd like to add
my $0.02.
It may not be clear from above how to combine
probabilities. You should associate AND
with multiplication and OR with addition.
Thus the probability of the person 1 having
a unique birthday AND person 2 having a unique birthday
AND etc. is
365 364 363 365-(n-1) 365! 1
--- X --- X --- X ... X --------- = ----- X -----
365 365 365 365 n! 365^n
Indeed, using this formula, here are the probabilities
that in a group of n people there will be at least
two people sharing the same birthday.
n probability
----- -------------
10 11.6948%
20 41.1438%
23 50.7297%
30 70.6316%
40 89.1232%
50 97.0374%
Well, what we neglected was leap day. This is
a good approximation, but let's see if we can get
the absolutely correct answer including leap day.
Person X has one of 366 possible birthdays, but
not with equal weight. To make it easier, let us
group 4 consecutive years together. The chance of
his/her birthday being January 21 is 4/(3*365 + 366)
vs. the chance it's February 29 which is 1/(3*365 + 366)
since there are 3*365 + 366 (=1461) days in 4 years, with
4 Jan 21th's and 1 Feb 29th. Note 3*365+366 = 4*365+1.
OK, remember we are going to compute the probability
that every one of the n people have unique birthdays,
than subtract that probability from 100%. So in our
computation we have two cases to consider: (A)
one where
no one is born on Feb 29 and (B) one where 1 person is born on Feb 29.
Let me drop the denominator, which will be n factors
of 1461.
Case (A) is the same as our approximation before,
except over 4 years instead of 1:
Num(PA) = (4*365 + 1)*(4(365-1)+1)
*(4(365-2)+1)*...*(4(365-(n-1))+1).
Believe it or not, this product has a name,
and is called the Pochhammer function, a ratio
of Gamma functions.
Case (B) is more complicated. Let's say that
the k'th person is the one with the Feb 29th birthday.
Then
Num(PB)(k) = (4*365 + 1)*(4(365-1)+1)*...*
(4(365-(k-2))+1)*(4(365-(k-2))+0)*(4(365-(k-1))+0)*...*
(4(365-(n-2))+0).
This is so convoluted it's just a unnamed
ratio of Gamma functions,
as far as I know.
OK, I'm getting tired. Still, we aren't done since
we have to sum over all possible k's. I'm not going
to write that down, since there isn't a pretty
closed form that would be enlightening. Schematically,
it looks like
Prob = (1460/1461)*Num(PA)/1461
+ (1/1461)SumkNum(PB)(k)/(1461*n)
Instead I wrote a
python program to compute both
the inexact and exact results. /msg me and I'll
e-mail it to you. Someone checking it would be
useful. In fact I'll check it more when I get time.
Here's a table of the
EXACT results
(to 6 digits)
n probability
----- -------------
10 11.6867%
20 41.1213%
23 50.7045%
30 70.6056%
40 89.1057%
50 97.0297%
As a physicist I have to question
my sanity and priorities in life for spending
1.5 hours of my life to compute a 10-3%
effect in such an academic problem, but I
did actually enjoy it.
More pedantry: what makes this a paradox???
Nothing! It's just farking cool!
Even more pedantry: derobert reminds me that leap day is
suspended on years divisible by 100 unless the year is also
divisible by 400, so my calculations are only exact for a room
full of people born no earlier than 1 January 1901. Since 2000
had a leap day, you can use my code reliably until 2100, when
your grandkid can add the appropriate correction factor.