Birthday Problem

Suppose there are \(n\) (\(n\leq 365\)) students in a class. Assuming there are 365 days in a year, what is the probability that at least two students have the same birthday?

Solution: Define the event \(A=\{\text{at least two students have the same birthday}\}\),

then \(A^c=\{\text{All students have different birthdays}\}.\)

\(\#\) of elements in the sample space = \(\#(S)=365^n\),

\(\#\) of elements in the event \(\displaystyle \#(A^c)=P(365,n)=\frac{365!}{(365-n)!}\).

So, the answer is \[P(A)=1-P(A^c)=\displaystyle 1-\left(\frac{365!}{(365-n)!}\right)\Big/365^n.\]

The Exact Probability

The number of students

n<-24 

Then the number of elements in the sample space \(\#(S)\) is \(365^n=\) 3.1262863^{61}

The exact probability of all birthdays being different, \(P(A^c)=(365!/(365-n)!)/365^n=\) 0.4616557

The exact probability of at least two sharing birthdays, \(1-P(A^c) =\) 0.5383443

Empirical Estimation of the Probability

One can estimate complicated probabilities empirically by simulation. In particular, to estimate \(P(A)\) where \(A\) is an event in a sample space \(S\), first one simulates (or generates) \(N_{sim}\) many outcomes from \(S\) (if possible all outcomes, else a large number of outcomes), and counts the number of times event \(A\) occurs among these simulated outcomes, denoted \(N_A\). Then, \[P(A) \approx \frac{N_A}{N_{sim}}.\] Since the number of outcomes in the sample space is \(365^{24} \approx 3.12x10^{61}\), we can not list all the possible selections of 24 numbers from \(1,2,\ldots,365\), and determine which have at least two repeats. Instead, we can sample 24 numbers (each of these 24 numbers would be an outcome) a large number of times from \(1,2,\ldots,365\), and determine the ones having at least two repeats (or the ones with no repeats to determine the samples in the complement).

Now, we estimate this probability with Monte Carlo simulations, i.e., sample \(n=\) 24 numbers from \(1,2,\ldots,365\) with replacement Nsim many times and determine the ratio of times we have all distinct numbers (this will estimate \(P(A^c)\)), and subtract this ratio from 1 (and this will estimate \(P(A)\))

Longer Code

Nsim<-10000 #number of simulations

all.diff.bday<-vector()
for(i in 1:Nsim) {
  b.days<-sample(365,n,replace=TRUE) #this is sampling with replacement for picking birthdays for n students
  all.diff.bday[i]<-length(unique(b.days))==n #this records TRUE if all n numbers are different
}
1-mean(all.diff.bday)
## [1] 0.5415

The empirical estimate of the probability of all birthdays being different i.e., empirical estimate of \(P(A^c)\) is 0.4585 which is the ratio of TRUEs (i.e. all different birthdays) to the number of simulations. Then the empirical estimate of the \(P(A)=\)“probability of at least two sharing birthdays” 0.5415.

Shorter Code

cnt=0
for(i in 1:Nsim)
  {cnt=ifelse(length(unique(sample(1:365,n, replace=TRUE)))==n, cnt+1, cnt+0)}
1-cnt/Nsim
## [1] 0.5347

One-line Code

PAc<-mean((apply(matrix(sample(1:365,Nsim*n,replace=TRUE),ncol=n),1,anyDuplicated)==0)*1)
1-PAc
## [1] 0.5379

Note that these three estimates are different than due to the randomness in the sampling $n=$24 numbers from \(1,2,\ldots,365\).

Plots of the Exact and Estimated Probabilities

The Exact Probability: We compute the probability of birthday sharing for \(n=2,\ldots,121\) (denoted as nseq below), since for larger \(n\), \(P(A)\) is virtually 1, hence they are omitted.

nseq<-2:121 #sequence of number of students 

PA.seq<-vector() #P(A) for each n=2,3,..,365
for (i in 1:120)
{
#the exact probability of all birthdays being different, P(A^c)
PAc<-factorial(nseq[i])*choose(365,nseq[i])/365^nseq[i]
#the exact probability of at least two sharing birthdays
PA.seq<-c(PA.seq,1-PAc)
}

Empirical Estimation of the Probability:

PA.sim<-vector()
for (i in 1:120)
{
  cnt=0
  for(j in 1:Nsim)
  {cnt=ifelse(length(unique(sample(1:365,nseq[i], replace=TRUE)))==nseq[i], cnt+1, cnt+0)}
PA.sim<-c(PA.sim,1-cnt/Nsim)
}

The plot of the probability of birthday sharing as a function of the number of students \(n\).

Notice that they are virtually same.