Suppose there are \(n\) (\(n\leq 365\)) students in a class. Assuming there are 365 days in a year, what is the probability that at least two students have the same birthday?
Solution: Define the event \(A=\{\text{at least two students have the same birthday}\}\),
then \(A^c=\{\text{All students have different birthdays}\}.\)
\(\#\) of elements in the sample space = \(\#(S)=365^n\),
\(\#\) of elements in the event \(\displaystyle \#(A^c)=P(365,n)=\frac{365!}{(365-n)!}\).
So, the answer is \[P(A)=1-P(A^c)=\displaystyle 1-\left(\frac{365!}{(365-n)!}\right)\Big/365^n.\]
The number of students
n<-24
Then the number of elements in the sample space \(\#(S)\) is \(365^n=\) 3.1262863^{61}
The exact probability of all birthdays being different, \(P(A^c)=(365!/(365-n)!)/365^n=\) 0.4616557
The exact probability of at least two sharing birthdays, \(1-P(A^c) =\) 0.5383443
One can estimate complicated probabilities empirically by simulation. In particular, to estimate \(P(A)\) where \(A\) is an event in a sample space \(S\), first one simulates (or generates) \(N_{sim}\) many outcomes from \(S\) (if possible all outcomes, else a large number of outcomes), and counts the number of times event \(A\) occurs among these simulated outcomes, denoted \(N_A\). Then, \[P(A) \approx \frac{N_A}{N_{sim}}.\] Since the number of outcomes in the sample space is \(365^{24} \approx 3.12x10^{61}\), we can not list all the possible selections of 24 numbers from \(1,2,\ldots,365\), and determine which have at least two repeats. Instead, we can sample 24 numbers (each of these 24 numbers would be an outcome) a large number of times from \(1,2,\ldots,365\), and determine the ones having at least two repeats (or the ones with no repeats to determine the samples in the complement).
Now, we estimate this probability with Monte Carlo simulations, i.e., sample \(n=\) 24 numbers from \(1,2,\ldots,365\) with replacement Nsim many times and determine the ratio of times we have all distinct numbers (this will estimate \(P(A^c)\)), and subtract this ratio from 1 (and this will estimate \(P(A)\))
Nsim<-10000 #number of simulations
all.diff.bday<-vector()
for(i in 1:Nsim) {
b.days<-sample(365,n,replace=TRUE) #this is sampling with replacement for picking birthdays for n students
all.diff.bday[i]<-length(unique(b.days))==n #this records TRUE if all n numbers are different
}
1-mean(all.diff.bday)
## [1] 0.5415
The empirical estimate of the probability of all birthdays being different i.e., empirical estimate of \(P(A^c)\) is 0.4585 which is the ratio of TRUEs (i.e. all different birthdays) to the number of simulations. Then the empirical estimate of the \(P(A)=\)“probability of at least two sharing birthdays” 0.5415.
cnt=0
for(i in 1:Nsim)
{cnt=ifelse(length(unique(sample(1:365,n, replace=TRUE)))==n, cnt+1, cnt+0)}
1-cnt/Nsim
## [1] 0.5347
PAc<-mean((apply(matrix(sample(1:365,Nsim*n,replace=TRUE),ncol=n),1,anyDuplicated)==0)*1)
1-PAc
## [1] 0.5379
Note that these three estimates are different than due to the randomness in the sampling $n=$24 numbers from \(1,2,\ldots,365\).
The Exact Probability: We compute the probability of birthday sharing for \(n=2,\ldots,121\) (denoted as nseq below), since for larger \(n\), \(P(A)\) is virtually 1, hence they are omitted.
nseq<-2:121 #sequence of number of students
PA.seq<-vector() #P(A) for each n=2,3,..,365
for (i in 1:120)
{
#the exact probability of all birthdays being different, P(A^c)
PAc<-factorial(nseq[i])*choose(365,nseq[i])/365^nseq[i]
#the exact probability of at least two sharing birthdays
PA.seq<-c(PA.seq,1-PAc)
}
Empirical Estimation of the Probability:
PA.sim<-vector()
for (i in 1:120)
{
cnt=0
for(j in 1:Nsim)
{cnt=ifelse(length(unique(sample(1:365,nseq[i], replace=TRUE)))==nseq[i], cnt+1, cnt+0)}
PA.sim<-c(PA.sim,1-cnt/Nsim)
}
The plot of the probability of birthday sharing as a function of the number of students \(n\).
Notice that they are virtually same.