Estimates on primes and fingerprinting
Let be the number of primes less than or equal to .
The theorem was independently discovered by Hadamard and de la Vallée Poussin in 1896, postulating the density of primes, i.e., out of numbers between and , there is (roughly) a fraction that are primes (for sufficiently large ). This is a nontrivial statement that we will not be proving here. For instance, if we want to calculate the number of primes that can be represented by digits in decimal system, we can use and therefore is roughly (this is like a huge number of primes).
In these notes, we will see an application of PNT in computer science. Note that we will be talking about probability without actually covering them in detail yet. Suppose Amelia and Ariel are working on the course lecture notes from two different locations. Amelia has file and Ariel has file (encoded as bit strings of length at most ). Amelia is confident that her file is the most up-to-date, while Ariel (having been out surfing on a California's beach) may not have one. How can Amelia effectively check whether Ariel has the correct file?
Of course, Amelia can ask Ariel to send his file to her, and this would consume bits of the network's bandwidth. Can they do better than this?1
- Amelia picks a random prime between and and sends it to Ariel along with .
- Upon receiving , Ariel computes and checks whether
This communication protocol uses very little bandwidth, in total bits (instead of bits). The verification process is correct with high probability (as mentioned, we will cover probability more intensively later).
The probability that this algorithm makes an error is at most .
There are two cases. In the first case when Ariel has the correct file, the algorithm never makes an error since implies that for every . In the other case when , we need to analyze the probability that . Consider and note that is (at most) an -bit number. Let be the number of primes that divide . So, if we are the primes that divide , we must have that , and since each prime is at least , this implies . Taking a logarithm gives us .
The algorithm would make an error when Amelia happens to pick one of these primes . From the Prime Number Theorem, there are primes of value at most , so the probability that is one of those primes is at most .
Notice that the above calculation simply replaces by . Certain things should be cautioned. In particular, in the above example, when is small (say, ), the calculation can be off by a significant margin. But in general, once we take sufficiently large (like or so), this calculation is quite accurate.
Use PNT to prove that where is the -th prime number.
Being able to explain the English meaning of statements such as above is crucial for computer scientists. The above statement implies that two consecutive primes are arbitrarily close to each other infinitely many times. For instance, the predicate "" is true for infinitely many .
Use PNT to prove that .
Again, the English interpretation of the above statement is that for infinitely many . Such a statement is in fact implicit in many computer science results, e.g., when we say that a problem does not admit efficient algorithm, it means "every efficient algorithm must fail to solve the problem for infinitely many input" (therefore, hard computational problems may still be solved for real-world input).
Footnotes
-
It might sound a bit silly that Amelia is trying to be smart in this situation. But, imagine a more real-world situation where Amelia is actually running a company, like Dropbox. In such a case, she has to deal with millions of users, not just our own Ariel, so saving the network's bandwidth would be extremely important. ↩