Application: Error-correcting codes
A remarkable application of polynomials (and more generally, algebraic techniques in computer science) is in the construction of error-correcting codes. In this area, the goal is to encode a message so that it can be recovered when the message is corrupted (e.g., after sending the message through a communication channel). We consider two types of message corruptions: (i) erasure error, when parts of the message may be erased, and (ii) arbitrary error, when parts of the content can be corrupted and changed. The second type is much harder to handle because we would not know which parts of the message are correct and which are corrupted.
Protection against erasure errors
Suppose is the message to be sent. We can break the message into many parts (blocks) where each can be seen as a number in (well, just choose to be a large enough prime). Suppose we want to support the recovery of the message up to erasure errors. We would then need to create some redundancies to the content, and there is a natural way to do so using polynomials.
-
There is a unique polynomial of degree such that . This polynomial contains all the information about message .
-
Create additional blocks by until . We transmit the message which contains blocks. Notice that this is only possible if .
Let be a subset of size at least so that the blocks are not erased. Then polynomial can be interpolated from these.
From Theorem 44, there is a unique polynomial so that for all . We claim that and are the same polynomial: Assume for a contradiction that they are not. Then is a polynomial of degree at most and that for all (so it has at least roots) a contradiction.
Protection against arbitrary errors
Notice that the previous idea is to encode the message by introducing redundancies into a "codeword" , which is longer than the original message. The nice thing is that by introducing additional blocks, we can handle the error up to erasures (and this is the best we can do!).
Now, suppose the error is arbitrary. What does it mean for us? In the erasure case, the message could get partially erased into , so we know exactly that there is a missing block and (as explained before) it is possible to recover. But an arbitrary error could change the content of the message into , and the receiver of the message has no way of knowing which part of the message is corrupted. Interestingly, it would turn out that we do not need to know it!!
The following description corresponds to the famous Reed-Solomon code.
-
On message , let of degree where for all (we know such is unique).
-
Create additional points so that for all . The code is sent across the communication channel. (notice that now we have blocks instead of ).
We will prove some properties of this encoding. For messages and , define as the number of coordinates where (this is called the Hamming distance between and ).
For messages of the same length , , the distance between codewords is at least .
Let and be the degree- polynomials for and respectively. Since they are distinct (do you see why?), we have that is nonzero and of degree at most . Therefore, the codewords of and can agree on at most coordinates (blocks) (do you see why?). This implies the Hamming distance is at least .
Theoretically speaking, the distance property allows us to say that if is corrupted at blocks into , we are still guaranteed that is still the codeword closest to (in Hamming distance), so if we look for message that minimizes , then we would manage to find the original message as .
Indeed, the decoding algorithm by Berlekamp and Welsh exploits this property to efficiently recover the corrupted message (up to blocks). The details about this algorithm are beyond the scope of this course, but we invite you to search them if you are interested.