The Birthday Problem with Generalisations I

A well known ‘paradox’ in probability is the following : suppose we have a set of 23 people in a room.  Then the probability that at least two people share a birthday is more than 50%.  This is not a real paradox, but people generally find this somewhat surprising at first, since 23 is so much smaller than 365.  In this post, we will look at how the distribution of birthdays affects the probability.

Simplest case

Let’s suppose that birthdays are selected at random from \{ 1, \dots , 356 \} uniformly (no leap years), and suppose we have n people in a room.  Then the probability that at least two people share a birthday is

1 - \mathbb{P}(\text {no pair of people share a birthday})

And the latter probability is easily shown to be

\frac{365}{365} \cdot \frac{364}{365} \cdots \frac{365 - n + 1}{365} = \frac{(365)_n}{365^n},        (1)

where (n)_m denotes the falling factorial.

More realistic probabilities

In real life, birthdays are not uniformly distributed.  As with a lot of things, we might expect the symmetrical uniform case to be either a maxima or minima to this problem, that is we may expect the probability of a birthday collision to definitely increase or decrease, if we change the class probabilities away to anything other than the uniform.  Furthermore, setting the probability of one day to 1 and the others to 0, suggests that it will be a minimum.  This would mean that we can drop the assumption that birthdays are uniformly distributed to claim that 23 people in a room have at least a 50% chance of sharing a birthday.  Let’s prove this.

Suppose that for day 1 \leq n \leq 365 , the probability of one’s birthday being that day is p_n , with 0 \leq p_n \leq 1 , and \sum_{i=1}^{365} p_n = 1

It is quite straightforward to see that the probability of there being no collisions is

\sum\limits_{\substack {A \subset \{1, \dots , 365 \} \\ | A | = n }} n! \prod\limits_{p \in A} p \,\,\, .  ,        (2)

This is the n^{\text{th}} elementary symmetric polynomial of p_1, \dots ,p_{365}.  We can apply Maclaurin’s inequality to this.  We rewrite this in the notation on the wikipedia page to see that this is

n! {365 \choose n} S_n ,

and S_n \leq S_1^n , where

S_1 = \frac{\sum_{i = 1}^{365} p_i}{365} = \frac{1}{365},

hence the probability of no collisions is bounded above by

n! {365 \choose n} \frac{1}{365^n} = \frac{(365)_n}{365^n},

which is exactly the probability in (1), thus if we subtract the probability of collisions from 1, we see that this is at least as large as the probability from the uniform distribution, which proves the claim.

In practice, the real probability of two people sharing a birthday barely deviates from the case with the uniform distribution, running millions of simulations where we bootstrap from raw birthday data, shows that if we have 23 people in a room, one would expects the probability to be between 0.507 and 0.508, the same as with the uniform.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s