This is a follow on from the previous post about the birthday problem. Now we will look at the more general case where we want more than two people to share a birthday. Incidentally, the reason for my interest in this problem is that we used to use it as a coding/probability exercise for data science interviews at Qubit (not anymore!), and because there seems to be surprisingly little written about it online.

We will be using the falling factorial notation to ease exposition slightly (because writing latex in wordpress is not so fun).

To simplify, we will assume that birthdays are uniformly distributed throughout the year. Consider the case where we want to know if three or more people from a group of share a birthday. There are other discussions of this on the internet, e.g. this post, but what follows is an easier derivation than what I have seen. Again, we look at the complement probability – let’s try and work out the probability that all days have at most two people with that as their birthday. We will break into cases based on how many pairs of people share a birthday, which we will label , so that . The idea is to count how many ways we can split our people into pairs, and singletons, and count the distinct number of ways to put these into 365 bins. This is :

where the four terms are respectively : the number of ways of selecting pairs from the people, a correction term, since we can swap two people in a pair around, the number of ways of selecting the ‘pair-days’ from 365, and finally the number of way of selecting the remaining people from the remaining days. Simplifying this slightly and taking the sum over , we have the probability of people having at least one triple is

It seems unlikely that there is a clean closed formula to calculate the general case here (one was claimed in this stackexchange post, but it’s wrong, there’s some erroneous double counting here). A natural way to approximate it would be to run a simulation (and in interviews we would push the candidate towards this), but for curiosity’s sake, I’m going to show how you can work it out precisely.

If we think about the case where we want there to be no quadruples, it is already getting quite daunting to try and use our previous strategy to break this up in terms of smaller subgroups (we’d have to look at all the ways that different numbers of triples, pairs and singletons which add up to can be arranged into 365 bins) – you can do it with an induction onto lower terms (left as an exercise!), but it’s messy, slow, and memory intensive to implement. Instead, we are going to try and reduce the number of *days* we need to think about. To this end, let be the number of ways of pigeonholing objects into bins so that there are no more than objects in any one bin. Then the probability that amongst randomly selected people, there is one set of at least who all share a birthday is

Consider a binning that is counted by . Let’s assume there are objects in the first bin, so that . There are sets of objects this could be. So considering what remains, and summing over all valid ‘s, we have the recurrence

This together with the base cases for and zero otherwise, allows us to calculate the probability exactly. Here is some python code to do this (this is essentially a tail recursion on up to 365. Note the use of Decimal, to keep precision.

import math from decimal import Decimal from decimal import getcontext #bit of a trade off between speed/memory and precision here getcontext().prec = 50 def binom(n, m): return Decimal(math.factorial(n)) / (Decimal(math.factorial(m)) * Decimal(math.factorial(n-m))) def get_birthday_prob(N, M): current_row = [Decimal(0)] * (N + 1) #base case for k in range(M): current_row[k] = Decimal(1) #recursion for d in range(1, 365): new_row = [Decimal(0)] * (N + 1) for n in range(N + 1): s = Decimal(0) for m in range(min(n + 1, M)): s += binom(n, m) * current_row[n-m] new_row[n] = s current_row = new_row complement_prob = current_row[N] / Decimal(math.pow(365, N)) return float(1 - complement_prob) print get_birthday_prob(23, 2)

Hi Mike, to cut a long story short, I wanted to know the probability of 5 people in a group of 521 sharing the same birthday. It’s for something of no great consequence, but I am not enough of a mathematician (or Python programmer) to apply your formula, which I came across while searching for a solution to this question, so I wondered if you could simply apply it for me and send me the result. In case you’re interested, I have data on 521 people’s birthdays, and there are 7 different birthdays that are shared by 5 people (5 is the highest number of shared birthdays in the data). Is that within the expected probability?

LikeLike

Heya Harry – I ran a simulation – the chances of 7 or more days being a birthday being shared by at least 5 people is about 30%, so not out of the question!

LikeLike