Balls and Bins
We throw balls into bins. Assume that any ball is equally likely to land into any bin. We can represent each outcome by a function where means that the -th ball lands into the -th bin. This gives us a sample space , so we have . Define the uniform distribution , so this is clearly a probability space.
What is the probability that the first bin is empty?
Let . Notice that , and therefore, . If we use the estimate , we see that this probability is for big enough . When , this is roughly .
Analyzing balls and bins phenomena is of crucial interest to computer scientists: Imagine the balls are the computational loads and bins are the machines. In practice, we often distribute loads to machines at random (well, it saves computational overhead this way - and there are several other benefits), and we want to understand whether the machines would be equally loaded with high probability.
Now, let us use the balls and bins setting to model the situation where the balls correspond to students (says is around 40-60 students in this class) and bins to possible birthdays (so - just ignore the weird years). We assume that for every student is equally likely to have their birthday on any particular day. The following theorem is known by the fancy name of "Birthday paradox".
When there are students, the probability that some students have the same birthday is more than half.
Remark that in mathematics, a paradox is a term often reserved for logical contradictions. Here, there is no logical contradiction. The term is only used to illustrate the counter-intuitive nature of the statement.
Why is this counter-intuitive? We have 365 days in total, but why do we only need 23 students to make this event likely to occur? Let us prove this formally. Denote by the probability that there are (at least) two students with the same birthday. We will calculate the probability that does not occur (after that, we use the fact that ), that is, the probability that no two students have the same birthday.
To upper bound the product, we use , so we have
If we plug in and into this formula, it already shows that .
Calculate the following probabilities using only the rules we have defined so far. In particular, define the events formally and calculate the probability by summing over the atomic events.
- What is the probability that the first ball and the second ball land in the same bin?
- What is the probability that the bin's number of the first ball is less than that of the second ball?
- What is the probability that the first and second balls go into consecutive bins?
Your answers should be in closed form (no summations and no dots-dots-dots).