The Magic of Erasure Codes

Fred Akalin
@fakalin
cashcat0.jpg
cashcat1.jpg
cashcat2.jpg
$\xmapsto{\mathtt{ComputeParity}}$
cashcats.p00
cashcats.p01

Cashcat pictures from @CatsAndMoney.

?
$\xmapsto{\mathtt{ReconstructData}}$






Erasure codes, how do they work?

First, turn into a byte-level problem...
cashcat0.jpg: aa b3 3f 2f 13 33 ...
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: cc 35 d3 3f ff c0 ...

               |
               |
         ComputeParity
               |
              \|/

cashcats.p00: 14 34 ...
cashcats.p01: 25 53 ...
          
…and so on.
Record hashes, and treat corrupted files as missing.
cashcat0.jpg: XX XX XX XX XX XX ... (sha1 mismatch)
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: ?? ?? ?? ?? ?? ?? ... (missing)
cashcats.p00: 14 34 11 50 3f fe ...
cashcats.p01: 25 53 23 36 1f 3d ...

               |
               |
         ReconstructData
               |
              \|/

cashcat0.jpg: aa b3 ...
cashcat1.jpg: bb 34 ...
cashcat2.jpg: cc 35 ...
          
…and so on.
$$(p_0, p_1) = \mathtt{ComputeParity}(d_0, d_1, d_2)$$
$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} = P \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$ $= \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

or

$$ \begin{aligned} p_0 &= 1/3 \cdot d_0 + 1/2 \cdot d_1 + 1/1 \cdot d_2 \\ p_1 &= 1/4 \cdot d_0 + 1/3 \cdot d_1 + 1/2 \cdot d_2 \end{aligned}$$

⚠️ For now, rational numbers (fractions), not bytes. ⚠️

Generate parity matrix with parityCount=2 rows and dataCount=3 columns.

$$P = \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix}$$
$$P[i, j] = \frac{1}{(\texttt{dataCount} + i) - j}$$
for i, j in range(0, 2) × range(0, 3)

Called a Cauchy matrix.

Reconstruction: Start with the parity computation…

$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Append to identity function…

$\begin{pmatrix} d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Remove unknown rows…

$\begin{pmatrix} \xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} \xcancel{1} & \xcancel{0} & \xcancel{0} \\ 0 & 1 & 0 \\ \xcancel{0} & \xcancel{0} & \xcancel{1} \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Remove unknown rows…

$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix} $
$= M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
$$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$$
Find $M^{-1}$ such that $M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
  • Always exists, by special properties of Cauchy matrices!
  • Use row reduction (a.k.a. Gaussian elimination) to compute.
$$(d_0, d_1, d_2) = \mathtt{ReconstructData}(?, d_1, ?, p_0, p_1)$$
$$ \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix} = M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} -1 & -6 & 12 \\ 1 & 0 & 0 \\ -1/6 & 3 & -4 \end{pmatrix} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix}$$

Remaining problem…

How to compute parity bytes instead of parity numbers?

  • Matrix elements must obey a particular interface.
  • They must belong to a field.
  • We want a field with 256 elements.

What is a field?

interface Field<T> {
  static Zero: T, One: T
  plus(b: T): T      // $a \oplus b$
  negate(): T        // $-a$
  times(b: T): T     // $a \otimes b$
  reciprocate(): T   // $a^{-1}$, for $a \ne 0$
  equals(b: T): bool // $a = b$
}
  // $a \ominus b$
  minus(b: T): T => this.plus(b.negate())
  // $a \oslash b$
  dividedBy(b: T): T => this.times(b.reciprocate())
          

Field 📜interface contract📜

  • Identities: $a \oplus 0 = a \otimes 1 = a$.
  • Inverses: $a \oplus -a = 0$, and if $a \ne 0$,
    👉$a \otimes a^{-1} = 1$👈.
  • Associativity: $(a \oplus b) \oplus c = a \oplus (b \oplus c)$, $(a \otimes b) \otimes c = a \otimes (b \otimes c)$.
  • Commutativity: $a \oplus b = b \oplus a$, $a \otimes b = b \otimes a$.
  • Distributivity: $a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$.
  • Rational numbers (i.e., fractions) form a field: $(p/q)^{-1} = q/p$.
  • Integers don’t form a field, since only $1$ and $-1$ have reciprocals.
  • Integers mod $p$ form a field with $p$ elements, only when $p$ is prime, e.g. field of 257 elements.
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 257$, there is exactly one $0 \lt b \lt 257$ such that $(a \cdot b) \bmod 257 = 1\text{.}$
(Can replace $257$ with any prime number $p$).
reciprocate() => {
  if (this === 0) { throw new Error("Division by zero"); }
  for (let i = 0; i < 257; i += 1) {
    if ((this * i) % 257 === 1) { return i; }
  }
  throw new Error("Shouldn't get here!");
}

If speed is a problem, build a table.
  • Field of 257 elements gets us close…
  • Problem: 256 isn’t a prime number!
  • Does this mean we cannot make a field of 256 elements?
No!
We can’t just do so based on standard integer arithmetic.

Binary carry-less arithmetic

Let $a = 23$ and $b = 54$. Then, with binary carry-less addition,
  a = 23 =  10111b
^ b = 54 = 110110b
           -------
           100001b
so $a \oplus b = 100001_{\mathrm{b}} = 33$.
Binary carry-less addition is just bitwise xor!
So is binary carry-less subtraction, since a ^ a = 0.
Let $a = 23$ and $b = 54$. Then, with carry-less arithmetic,
   a = 23 =       10111b
^* b = 54 =      110110b
            ------------
                 10111
          ^     10111
          ^   10111
          ^  10111
            ------------
             1111100010b
so $a \otimes b = 1111100010_{\mathrm{b}} = 994$.
Let $a = 55$ and $b = 19$. Then, with carry-less arithmetic,
                     11b
                --------
b = 19 = 10011b )110111b = 55 = a
               ^ 10011
                 -----
                  10001
                ^ 10011
                  -----
                     10b
so $a \oslash b = 11_{\mathrm{b}} = 3$ with remainder $10_{\mathrm{b}} = 3$.
Important subtlety: we don’t stop when the remainder is less than the divisor, but when it has fewer digits than the divisor.

Properties of (carry-less) mod

  • mod by $256 \le n \lt 512$ $\Rightarrow$ $n$ possible remainders.
  • clmod by $256 \le n \lt 512$ $\Rightarrow$ $256$ possible remainders.
  • Very interesting…clmod by prime number $256 \le p \lt 512$?

What are prime numbers?

  • An integer $p \gt 1$ that cannot be expressed as $p = a \cdot b$, for $a, b \gt 1$.
  • Multiplication operation determines the prime numbers.
  • Want a “carry-less” prime number between 256 and 512!
  • 283 is such a number. (Coincidentally, it’s also a regular prime.)

Field of 256 elements

…is just binary carry-less arithmetic mod 283.
class Field256Element implements Field<Field256Element> {
  static Zero = 0, One = 1
  plus(b) => (a ^ b)
  negate() => a
  times(b): => clmod(clmul(a, b), 283)
  reciprocate() => // next slide
  equals(b) => (a === b)
}
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 256$, there is a $0 \lt b \lt 256$ such that $(a \otimes b) \mathbin{\mathrm{clmod}} 283 = 1$.
reciprocate() => {
  if (this === 0) { throw new Error("Division by zero"); }
  for (let i = 0; i < 256; i += 1) {
    if (clmod(clmul(this, i), 283) === 1) { return i; }
  }
  throw new Error("Shouldn't get here!");
}

The full algorithm

  1. Do everything with Field256Element, i.e. bytes with carry-less arithmetic mod 283.
  2. In particular, compute $P$ like: $$ \begin{aligned} P &= \begin{pmatrix} (3 \oplus 0)^{-1} & (3 \oplus 1)^{-1} & (3 \oplus 2)^{-1} \\ (4 \oplus 0)^{-1} & (4 \oplus 1)^{-1} & (4 \oplus 2)^{-1} \end{pmatrix} \\ &= \begin{pmatrix} 3^{-1} & 2^{-1} & 1 \\ 4^{-1} & 5^{-1} & 6^{-1} \end{pmatrix} = \begin{pmatrix} 246 & 141 & 1 \\ 203 & 82 & 123 \end{pmatrix} \end{aligned} $$
  3. Then proceed as before.
Thank you!
🕴🏼