The Magic of Erasure Codes

Fred Akalin
@fakalin
$\xmapsto{\mathtt{ComputeParity}}$

Cashcat pictures from @CatsAndMoney.

?
$\xmapsto{\mathtt{ReconstructData}}$

Erasure codes, how do they work?

First, turn into a byte-level problem...
cashcat0.jpg: aa b3 3f 2f 13 33 ...
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: cc 35 d3 3f ff c0 ...

|
|
ComputeParity
|
\|/

cashcats.p00: 14 34 ...
cashcats.p01: 25 53 ...

…and so on.
Record hashes, and treat corrupted files as missing.
cashcat0.jpg: XX XX XX XX XX XX ... (sha1 mismatch)
cashcat1.jpg: bb 34 25 36 3f ed ...
cashcat2.jpg: ?? ?? ?? ?? ?? ?? ... (missing)
cashcats.p00: 14 34 11 50 3f fe ...
cashcats.p01: 25 53 23 36 1f 3d ...

|
|
ReconstructData
|
\|/

cashcat0.jpg: aa b3 ...
cashcat1.jpg: bb 34 ...
cashcat2.jpg: cc 35 ...

…and so on.
$$(p_0, p_1) = \mathtt{ComputeParity}(d_0, d_1, d_2)$$
$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} = P \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$ $= \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

or

\begin{aligned} p_0 &= 1/3 \cdot d_0 + 1/2 \cdot d_1 + 1/1 \cdot d_2 \\ p_1 &= 1/4 \cdot d_0 + 1/3 \cdot d_1 + 1/2 \cdot d_2 \end{aligned}

⚠️ For now, rational numbers (fractions), not bytes. ⚠️

Generate parity matrix with parityCount=2 rows and dataCount=3 columns.

$$P = \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix}$$
$$P[i, j] = \frac{1}{(\texttt{dataCount} + i) - j}$$
for i, j in range(0, 2) × range(0, 3)

Called a Cauchy matrix.

$\begin{pmatrix} p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Append to identity function…

$\begin{pmatrix} d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Remove unknown rows…

$\begin{pmatrix} \xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} \xcancel{1} & \xcancel{0} & \xcancel{0} \\ 0 & 1 & 0 \\ \xcancel{0} & \xcancel{0} & \xcancel{1} \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$

Remove unknown rows…

$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 \\ 1/3 & 1/2 & 1/1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
$= M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
$$\begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = M \cdot \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$$
Find $M^{-1}$ such that $M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix}$
• Always exists, by special properties of Cauchy matrices!
• Use row reduction (a.k.a. Gaussian elimination) to compute.
$$(d_0, d_1, d_2) = \mathtt{ReconstructData}(?, d_1, ?, p_0, p_1)$$
$$\begin{pmatrix} d_0 \\ d_1 \\ d_2 \end{pmatrix} = M^{-1} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix} = \begin{pmatrix} -1 & -6 & 12 \\ 1 & 0 & 0 \\ -1/6 & 3 & -4 \end{pmatrix} \cdot \begin{pmatrix} d_1 \\ p_0 \\ p_1 \end{pmatrix}$$

Remaining problem…

How to compute parity bytes instead of parity numbers?

• Matrix elements must obey a particular interface.
• They must belong to a field.
• We want a field with 256 elements.

What is a field?

interface Field<T> {
static Zero: T, One: T
plus(b: T): T      // $a \oplus b$
negate(): T        // $-a$
times(b: T): T     // $a \otimes b$
reciprocate(): T   // $a^{-1}$, for $a \ne 0$
equals(b: T): bool // $a = b$
}
  // $a \ominus b$
minus(b: T): T => this.plus(b.negate())
// $a \oslash b$
dividedBy(b: T): T => this.times(b.reciprocate())


Field 📜interface contract📜

• Identities: $a \oplus 0 = a \otimes 1 = a$.
• Inverses: $a \oplus -a = 0$, and if $a \ne 0$,
👉$a \otimes a^{-1} = 1$👈.
• Associativity: $(a \oplus b) \oplus c = a \oplus (b \oplus c)$, $(a \otimes b) \otimes c = a \otimes (b \otimes c)$.
• Commutativity: $a \oplus b = b \oplus a$, $a \otimes b = b \otimes a$.
• Distributivity: $a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$.
• Rational numbers (i.e., fractions) form a field: $(p/q)^{-1} = q/p$.
• Integers don’t form a field, since only $1$ and $-1$ have reciprocals.
• Integers mod $p$ form a field with $p$ elements, only when $p$ is prime, e.g. field of 257 elements.
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 257$, there is exactly one $0 \lt b \lt 257$ such that $(a \cdot b) \bmod 257 = 1\text{.}$
(Can replace $257$ with any prime number $p$).
reciprocate() => {
if (this === 0) { throw new Error("Division by zero"); }
for (let i = 0; i < 257; i += 1) {
if ((this * i) % 257 === 1) { return i; }
}
throw new Error("Shouldn't get here!");
}

If speed is a problem, build a table.
• Field of 257 elements gets us close…
• Problem: 256 isn’t a prime number!
• Does this mean we cannot make a field of 256 elements?
No!
We can’t just do so based on standard integer arithmetic.

Binary carry-less arithmetic

Let $a = 23$ and $b = 54$. Then, with binary carry-less addition,
  a = 23 =  10111b
^ b = 54 = 110110b
-------
100001b
so $a \oplus b = 100001_{\mathrm{b}} = 33$.
Binary carry-less addition is just bitwise xor!
So is binary carry-less subtraction, since a ^ a = 0.
Let $a = 23$ and $b = 54$. Then, with carry-less arithmetic,
   a = 23 =       10111b
^* b = 54 =      110110b
------------
10111
^     10111
^   10111
^  10111
------------
1111100010b
so $a \otimes b = 1111100010_{\mathrm{b}} = 994$.
Let $a = 55$ and $b = 19$. Then, with carry-less arithmetic,
                     11b
--------
b = 19 = 10011b )110111b = 55 = a
^ 10011
-----
10001
^ 10011
-----
10b
so $a \oslash b = 11_{\mathrm{b}} = 3$ with remainder $10_{\mathrm{b}} = 3$.
Important subtlety: we don’t stop when the remainder is less than the divisor, but when it has fewer digits than the divisor.

Properties of (carry-less) mod

• mod by $256 \le n \lt 512$ $\Rightarrow$ $n$ possible remainders.
• clmod by $256 \le n \lt 512$ $\Rightarrow$ $256$ possible remainders.
• Very interesting…clmod by prime number $256 \le p \lt 512$?

What are prime numbers?

• An integer $p \gt 1$ that cannot be expressed as $p = a \cdot b$, for $a, b \gt 1$.
• Multiplication operation determines the prime numbers.
• Want a “carry-less” prime number between 256 and 512!
• 283 is such a number. (Coincidentally, it’s also a regular prime.)

Field of 256 elements

…is just binary carry-less arithmetic mod 283.
class Field256Element implements Field<Field256Element> {
static Zero = 0, One = 1
plus(b) => (a ^ b)
negate() => a
times(b): => clmod(clmul(a, b), 283)
reciprocate() => // next slide
equals(b) => (a === b)
}
🤓 Math fact 🤓: Given an integer $0 \lt a \lt 256$, there is a $0 \lt b \lt 256$ such that $(a \otimes b) \mathbin{\mathrm{clmod}} 283 = 1$.
reciprocate() => {
if (this === 0) { throw new Error("Division by zero"); }
for (let i = 0; i < 256; i += 1) {
if (clmod(clmul(this, i), 283) === 1) { return i; }
}
throw new Error("Shouldn't get here!");
}

The full algorithm

1. Do everything with Field256Element, i.e. bytes with carry-less arithmetic mod 283.
2. In particular, compute $P$ like: \begin{aligned} P &= \begin{pmatrix} (3 \oplus 0)^{-1} & (3 \oplus 1)^{-1} & (3 \oplus 2)^{-1} \\ (4 \oplus 0)^{-1} & (4 \oplus 1)^{-1} & (4 \oplus 2)^{-1} \end{pmatrix} \\ &= \begin{pmatrix} 3^{-1} & 2^{-1} & 1 \\ 4^{-1} & 5^{-1} & 6^{-1} \end{pmatrix} = \begin{pmatrix} 246 & 141 & 1 \\ 203 & 82 & 123 \end{pmatrix} \end{aligned}
3. Then proceed as before.
Thank you!
🕴🏼