Skip to main content

Balls and Bins

We throw mm balls into nn bins. Assume that any ball is equally likely to land into any bin. We can represent each outcome by a function f:[m][n]f: [m] \rightarrow [n] where f(i)=jf(i) = j means that the ii-th ball lands into the jj-th bin. This gives us a sample space Ωm,n={f:[m][n]}\Omega_{m,n} = \{f:[m] \rightarrow [n]\}, so we have Ωm,n=nm|\Omega_{m,n}| = n^m. Define the uniform distribution P[f]=1/nm{\mathbb P}[f] = 1/n^m, so this is clearly a probability space.

Example 34

What is the probability that the first bin is empty?

Proof:

Let A={f:(i[m])f(i)1}A = \{f: (\forall i\in [m])f(i) \neq 1\}. Notice that A=(n1)m|A| = (n-1)^m, and therefore, P[A]=A/Ωm,n=(11/n)m{\mathbb P}[A] = |A|/|\Omega_{m,n}|= (1-1/n)^m. If we use the estimate 11/ne1/n1-1/n \approx e^{-1/n}, we see that this probability is em/ne^{-m/n} for big enough nn. When m=nm = n, this is roughly 1/e0.3681/e \approx 0.368.

Analyzing balls and bins phenomena is of crucial interest to computer scientists: Imagine the balls are the computational loads and bins are the machines. In practice, we often distribute loads to machines at random (well, it saves computational overhead this way - and there are several other benefits), and we want to understand whether the machines would be equally loaded with high probability.

Now, let us use the balls and bins setting to model the situation where the balls correspond to students (says mm is around 40-60 students in this class) and bins to possible birthdays (so n=365n=365 - just ignore the weird years). We assume that for every student is equally likely to have their birthday on any particular day. The following theorem is known by the fancy name of "Birthday paradox".

Theorem 38 (Birthday paradox)

When there are n23n \geq 23 students, the probability that some students have the same birthday is more than half.

Remark that in mathematics, a paradox is a term often reserved for logical contradictions. Here, there is no logical contradiction. The term is only used to illustrate the counter-intuitive nature of the statement.

Why is this counter-intuitive? We have 365 days in total, but why do we only need 23 students to make this event likely to occur? Let us prove this formally. Denote by AA the probability that there are (at least) two students with the same birthday. We will calculate the probability that AA does not occur (after that, we use the fact that P[A]+P[A]=1{\mathbb P}[A]+ {\mathbb P}[\overline{A}] =1), that is, the probability that no two students have the same birthday.

P[A]=n(n1)(nm+1)nm=(11/n)(12/n)(1(m1)/n){\mathbb P}[\overline{A}] = \frac{n \cdot (n-1) \cdots (n-m+1)}{n^m} = (1-1/n)(1-2/n)\cdots (1-(m-1)/n)

To upper bound the product, we use 1+xex1+x \leq e^x, so we have

P[A]e(1+2++(m1))/nem2/2n{\mathbb P}[\overline{A}] \leq e^{-(1+2+\ldots+(m-1))/n} \approx e^{-m^2/2n}

If we plug in m=23m=23 and n=365n=365 into this formula, it already shows that P[A]48%{\mathbb P}[\overline{A}] \leq 48\%.

Exercise 88

Calculate the following probabilities using only the rules we have defined so far. In particular, define the events formally and calculate the probability by summing over the atomic events.

  • What is the probability that the first ball and the second ball land in the same bin?
  • What is the probability that the bin's number of the first ball is less than that of the second ball?
  • What is the probability that the first and second balls go into consecutive bins?

Your answers should be in closed form (no summations and no dots-dots-dots).