A well known ‘paradox’ in probability is the following : suppose we have a set of 23 people in a room. Then the probability that at least two people share a birthday is more than 50%. This is not a real paradox, but people generally find this somewhat surprising at first, since 23 is so much smaller than 365. In this post, we will look at how the distribution of birthdays affects the probability.
Let’s suppose that birthdays are selected at random from uniformly (no leap years), and suppose we have people in a room. Then the probability that at least two people share a birthday is
And the latter probability is easily shown to be
where denotes the falling factorial.
More realistic probabilities
In real life, birthdays are not uniformly distributed. As with a lot of things, we might expect the symmetrical uniform case to be either a maxima or minima to this problem, that is we may expect the probability of a birthday collision to definitely increase or decrease, if we change the class probabilities away to anything other than the uniform. Furthermore, setting the probability of one day to 1 and the others to 0, suggests that it will be a minimum. This would mean that we can drop the assumption that birthdays are uniformly distributed to claim that 23 people in a room have at least a 50% chance of sharing a birthday. Let’s prove this.
Suppose that for day , the probability of one’s birthday being that day is , with , and
It is quite straightforward to see that the probability of there being no collisions is
hence the probability of no collisions is bounded above by
which is exactly the probability in (1), thus if we subtract the probability of collisions from 1, we see that this is at least as large as the probability from the uniform distribution, which proves the claim.
In practice, the real probability of two people sharing a birthday barely deviates from the case with the uniform distribution, running millions of simulations where we bootstrap from raw birthday data, shows that if we have 23 people in a room, one would expects the probability to be between 0.507 and 0.508, the same as with the uniform.