A Gentle Introduction to Erasure Codes
Fred Akalin
1. Overview
This article explains Reed-Solomon erasure codes and the problems they solve in gory detail, with the aim of providing enough background to understand how the PAR1 and PAR2 file formats work, the details of which will be covered in future articles.
I’m assuming that the reader is familiar with programming, but has not had much exposure to coding theory or linear algebra. Thus, I’ll review the basics and treat the results we need as a “black box”, stating them and moving on. However, I’ll give self-contained proofs of those results in a companion article.
So let’s start with the problem we’re trying to solve! Let’s say you have \(n\) files of roughly the same size, and you want to guard against \(m\) of them being lost or corrupted. To do so, you generate \(m\) parity files ahead of time, and if in the future you lose up to \(m\) of the data files, you can use an equal number of parity files to recover the lost data files.
Note that this works even if you lose some of the parity files also; as long as you have \(n\) files, whether they be data or parity files, you’ll be able to recover the original \(n\) data files. Compare making \(n\) parity files with simply making a copy of the \(n\) data files (for \(n > 1\)). In the latter case, if you lose both a data file and its copy, that data file becomes unrecoverable! So parity files take the same amount of space but provide superior recovery capabilities.
Now we can reduce the problem above to a byte-level problem
as follows. Have ComputeParityFiles
pad all the
data files so they’re the same size, and then for each
byte position i
call a function ComputeParityBytes
on the i
th
byte of each data file, and store the results into the i
th
byte of each parity file. Also take a checksum or hash of
each data file and store those (along with the original data
file sizes) with the parity files. Then, ReconstructDataFiles
can detect corrupted files using the checksums/hashes and
treat them as missing, and then for each byte position i
it
can call a function ReconstructDataBytes
on the i
th
byte of each good data and parity file to recover the i
th byte of the corrupted/missing data files.
A byte error where we know the position of the
dropped/corrupted byte is called an erasure. Then, the pair
of functions ComputeParityBytes
and ReconstructDataBytes
which
behave as described above implements what is called an optimal erasure code;
it’s an erasure code because it guards only against byte
erasures, and not more general errors where we don’t know
which data bytes have been corrupted, and it’s optimal because
in general you need at least \(n\) known bytes to recover the \(n\)
data bytes, and that bound is achieved.
ComputeParityBytes<n, m>(data: byte[n]) -> (parity: byte[m])
that computes \(m\) parity bytes given \(n\) data bytes, and a
function
ReconstructDataBytes<n, m>(partialData: (byte?)[n], partialParity: (byte?)[m]) -> ((data: byte[n]) | Error)
that takes in a partial list of data and parity bytes from an
earlier call to ComputeParity
, and returns the full
list of data bytes if there are at least \(n\) known data or parity
bytes (i.e., there are no more than \(m\) omitted data or parity
bytes), and an error otherwise.(In the above pseudocode, I’m using T[n]
to mean an array of n
objects of type T
,
and byte?
to mean byte | None
. Also, I’ll omit the -Bytes<n, m>
suffix
from now on.)
By the end of this article, we’ll find out exactly how the following example works:
Example 1: ComputeParity
and ReconstructData
ComputeParity
Let
d = [ da, db, 0d ]
be the input data bytes and let
m = 2
be the desired parity byte count. Then the output parity bytes
are
p = [ 52, 0c ].
2. Erasure codes for \(m = 1\)
ComputeParitySum(data: byte[n]) {
return [data[0] + … + data[n-1]]
}
where we consider byte
to be an unsigned type such that
addition and subtraction wrap around, i.e. byte arithmetic is done
modulo \(256\). Then also define
ReconstructDataSum(partialData: (byte?)[n], partialParity: (byte?)[1]) {
if there is more than one entry of partialData or partialParity set to None {
return Error
} else if partialData has no entry set to None {
return partialData
}
i := partialData.firstIndexOf(None);
partialSum = partialData[0] + … + partialData[i-1] + partialData[i+1] + … + partialData[n-1]
return partialData[0:i] ++ [partialParity[0] - partialSum] ++ partialData[i+1:n]
}
where a[i:j]
means the subarray of a
starting at i
and
ending (without inclusion) at j
, and ++
is array concatenation.This simple erasure code uses the fact that if you have the sum of a list of numbers, then you can recover a missing number by subtracting the sum of the other numbers from the total sum, and also that this works even if you do the arithmetic modulo \(256\).
^
, or \(\oplus\)) instead
of arithmetic modulo \(256\). Define
ComputeParityXor(data: byte[n]) {
return [data[0] ⊕ … ⊕ data[n-1]]
}
and
ReconstructDataXor(partialData: (byte?)[n], partialParity: (byte?)[1]) {
if there is more than one entry of partialData or partialParity set to None {
return Error
} else if partialData has no entry set to None {
return partialData
}
i := partialData.firstIndexOf(None);
partialXor = partialData[0] ⊕ … ⊕ partialData[i-1] ⊕ partialData[i⊕1] ⊕ … ⊕ partialData[n-1]
return partialData[0:i] ++ [partialParity[0] ⊕ partialXor] ++ partialData[i+1:n]
}
This relies on the fact that \(a \oplus a = 0\), so given the xor of a list of bytes, you can recover a missing byte by xoring with all the known bytes.
3. Erasure codes for \(m = 2\) (almost)
Now coming up with an erasure code for \(m = 2\) is more involved,
but we can get an inkling of how it could work by letting \(n = 3\)
for simplicity, and also letting the output of ComputeParity
be
non-negative integers, instead of just bytes (i.e., less than
\(256\)). In that case, we can consider parity numbers that are
weighted sums of the data bytes. For example, like in the \(m = 1\)
case, we can have the first parity number be
\[
p_0 = d_0 + d_1 + d_2\text{,}
\]
(using \(d_i\) for data bytes and \(p_i\) for parity numbers)
but for the second parity number, we can pick different weights, say
\[
p_1 = 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2\text{.}
\]
We want to make sure that the weights for the second parity number
are “sufficiently different” from that of the first
parity number, which we’ll clarify later, but for example note
that setting
\[
p_1 = 2 \cdot d_0 + 2 \cdot d_1 + 2 \cdot d_2
\]
can’t add any new information, since then \(p_1\) will
always be equal to \(2 \cdot p_0\).
ComputeParity
function looks like
ComputeParityWeighted(data: byte[3]) {
return [
int(data[0]) + int(data[1]) + int(data[2]),
int(data[0]) + 2 * int(data[1]) + 3 * int(data[2]),
]
}
ReconstructData
, if we have two missing data bytes,
say \(d_i\) and \(d_j\) for \(i < j\), and \(p_0\) and \(p_1\),
we can rearrange the equations
\[
\begin{aligned}
p_0 &= d_0 + d_1 + d_2 \\
p_1 &= 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2
\end{aligned}
\]
to get all the unknowns on the left side, letting \(d_k\) be the known data byte:
\[
\begin{aligned}
d_i + d_j &= X = p_0 - d_k \\
(i+1) \cdot d_i + (j+1) \cdot d_j &= Y = p_1 - (k + 1) \cdot d_k\text{.}
\end{aligned}
\]
We can then multiply the first equation by \(i + 1\) and
subtract it from the second to cancel the \(d_i\) term and get
\[
d_j = (Y - (i + 1) \cdot X) / (j - i)\text{,}
\]
and then we can use the first equation to solve for \(d_i\):
\[
d_i = X - d_j = ((j + 1) \cdot X - Y) / (j - i)\text{.}
\]
Thus with these equations, we can implement ReconstructData
:
ReconstructDataWeighted(partialData: (byte?)[3], partialParity: (int?)[2]) {
Handle all cases except when there are exactly two entries set to none in partialData.
[i, j] := indices of the unknown data bytes
k := index of the known data byte
X := partialParity[0] - partialData[k]
Y := partialParity[1] - (k + 1) * partialData[k];
d_i := ((j + 1) * X - Y) / (j - i)
d_j := (Y - (i + 1) * X) / (j - i)
return an array with d_i, d_j, and d[k] in the right order
}
(Generalizing this to larger values of \(n\) is straightforward;
\(p_0\) will still have a weight of \(1\) for each data byte, and
\(p_1\) will have a weight of \(i + 1\) for \(d_i\). \(X\) and \(Y\)
will then have terms for all known bytes, and everything else
proceeds the same after that.)Now what goes wrong if we just try to do everything modulo \(256\)? The most obvious difference from the \(m = 1\) case is that solving for \(d_i\) or \(d_j\) involves division, which works fine for non-negative integers as long as there’s no remainder, but it is not immediately clear how division can make sense modulo \(256\).
One possible way to define division modulo \(256\) would be to first define the multiplicative inverse modulo \(256\) of an integer \(0 \le x \lt 256\) as the integer \(0 \le y \lt 256\) such that \((x \cdot y) \bmod 256 = 1\), if it exists, and then define division by \(x\) modulo \(256\) to be multiplication by \(y\) modulo \(256\). But this immediately runs into problems; \(2\) has no multiplicative inverse modulo \(256\), and the same holds for any even number, so reconstruction will fail if, for example, we have the first and third data bytes missing, since then we’d be trying to divide by \(j - i = 2\).
But for now, let’s leave aside the problem of generating parity bytes instead of parity numbers, and instead focus on how we can generalize the above for larger values of \(m\). To do so, we need to first review some linear algebra.
4. Just enough linear algebra to get by^{[1]}
In our \(n = 3, m = 2\) example in the previous section, the equations for the parity numbers have the form \[ p = a_0 \cdot d_0 + a_1 \cdot d_1 + a_2 \cdot d_2 \] for constants \(a_0\), \(a_1\), and \(a_2\). We call such a weighted sum of the \(d_i\)s a linear combination of the \(d_i\)s, and we write this in a tabular form \[ p = \begin{pmatrix} a_0 & a_1 & a_2 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,} \] where we define the multiplication of a row vector and a column vector by the equation above, generalized in the straightforward manner for any \(n\).
Then since we have two parity numbers \(p_0\) and \(p_1\), each a linear combination of the \(d_i\)s, i.e. \[ \begin{aligned} p_0 &= a_{00} \cdot d_0 + a_{01} \cdot d_1 + a_{02} \cdot d_2 \\ p_1 &= a_{10} \cdot d_0 + a_{11} \cdot d_1 + a_{12} \cdot d_2\text{,} \end{aligned} \] we can write this in a single tabular form as \[ \begin{bmatrix} p_0 \\ p_1 \end{bmatrix} = \begin{pmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,} \] where we define the multiplication of a matrix and a column vector by the equations above.
Now if we restrict parity numbers to be linear combinations of the
data bytes, then we can identify a function
ComputeParity
using some set of weights with the matrix
formed from that set of weights as above. This holds in general: if
a function is defined as a list of linear combinations of its
inputs, then it can be represented using a matrix as above, and we
call it a
linear function. Then we have a correspondence between
linear functions that take \(n\) numbers to \(m\) numbers and
matrices with \(m\) rows and \(n\) columns, which are denoted as \(m
\times n\) matrices.
As the first example of this correspondence, note that we denote
the elements of the matrix above as \(a_{ij}\), where the first
index is the row index and the second index is the column
index. Looking back to the parity equations, we also see that the
first index corresponds to the output arguments of ComputeParity
, and the second index corresponds to
the input arguments.^{[2]}
The usefulness of the correspondence between linear functions and
matrices is that we can store and manipulate a linear function by
storing and manipulating its corresponding matrix of weights, which
you wouldn’t be able to easily do for functions in
general. For example, as we’ll see below, we’ll be able
to compute the inverse of a linear function by matrix operations,
which will be useful for ReconstructData
.
- Deleting a row of a matrix corresponds to deleting an output of a linear function.
- Swapping two rows of a matrix corresponds to swapping two outputs of a linear function.
- Appending a row to a matrix corresponds to adding an output to a linear function.
matrixMultiply(f: Matrix, g: Matrix) {
if (f.columns != g.rows) {
return Error
}
h := new Matrix(f.rows, g.columns)
for i := 0 to f.rows - 1 {
for j := 0 to g.columns - 1 {
t := 0
for k := 0 to f.columns - 1 {
t += f[i, k] * g[k, j]
}
h[i, j] = t
}
}
return h
}
which you can convince yourself is correct by trying to compose some small linear functions by hand.
A useful property of matrix multiplication is that it’s a generalization of the product of a row vector and a column vector, and the product of a matrix and a column vector as we defined above.
I would be remiss if I didn’t talk about the consequences of defining matrix multiplication as the matrix of the composition of the corresponding linear functions. First, this immediately implies that you can only multiply matrices if the left matrix has the same number of rows as the number of columns of the right matrix, which corresponds to the fact that you can only compose functions if the left function takes the same number of inputs as the number of outputs of the right function. Furthermore, even if you have two \(n \times n\) matrices \(F\) and \(G\), unlike numbers, it is not true that \(F \cdot G = G \cdot F\), which corresponds to the fact that in general, for two functions that take \(n\) inputs to \(n\) outputs, it is not true that \(f \circ g = g \circ f\). If you learned matrix multiplication just from the formula above, then these facts are much less obvious!
Finally, an important function is the identity function \(\mathrm{Id}_n\), which return its \(n\) inputs as its outputs. It corresponds to the identity matrix \[ I_n = \begin{pmatrix} 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 1 \end{pmatrix}\text{.} \]
For a linear function \(f\) that takes \(n\) inputs to \(n\) outputs, if there is a function \(g\) such that \(f \circ g = \mathrm{Id}_n\), then we call \(g\) the inverse of \(f\), and denote it as \(f^{-1}\). (It is also true that \(f^{-1} \circ f = \mathrm{Id}_n\), i.e. \((f^{-1})^{-1} = f\).) Not all linear functions taking \(n\) inputs to \(n\) outputs have inverses, but if the inverse exists, it is also linear (and unique, which is why we call it the inverse). Therefore, we can define the inverse of an \(n \times n\) (or square) matrix \(M\) as the unique matrix \(M^{-1}\) such that \(M \cdot M^{-1} = M^{-1} \cdot M = I_n\), if it exists; also, if \(M\) has an inverse, we say that \(M\) is invertible.
Example 2: The matrix/linear function correspondence
f(x: rational[2]) {
return [
1 * x[0] + 2 * x[1],
3 * x[0] + 4 * x[1],
]
}
where rational
is an arbitrary-precision rational
number type.g(y: rational[2]) {
return [
-2 * x[0] + 1 * x[1],
(3/2) * x[0] + (-1/2) * x[1],
]
}
so g
is the inverse function of f
. Indeed, f([5, 6])
is [17, 39]
and g([17, 39])
is [5, 6]
.So now we’ve reduced the problem of finding the inverse of a linear function taking \(n\) inputs to \(n\) outputs to finding the inverse of an \(n \times n\) matrix. Before we tackle the question of computing those inverses, let’s first recast our problem in the language of linear algebra and see why we need to find the inverse of a linear function.
5. Erasure codes in general
ComputeParityWeighted(data: byte[3]) {
return [
int(data[0]) + int(data[1]) + int(data[2]),
int(data[0]) + 2 * int(data[1]) + 3 * int(data[2]),
]
}
which therefore corresponds to the parity matrix
\[
P =
\begin{pmatrix}
1 & 1 & 1 \\
1 & 2 & 3
\end{pmatrix}\text{.}
\]
So in mathematical notation, ComputeParityWeighted
looks like:
\[
\begin{bmatrix}
p_0 \\ p_1
\end{bmatrix}
=
\mathtt{ComputeParityWeighted}(d_0, d_1, d_2) =
\begin{pmatrix}
1 & 1 & 1 \\
1 & 2 & 3
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ d_1 \\ d_2
\end{bmatrix}\text{.}
\]
So let’s now reimplement ReconstructDataWeighted
using linear algebra. First, append the rows of \(P\) to the identity matrix \(I_3\) to get the matrix equation
\[
\begin{bmatrix}
d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1
\end{bmatrix}
=
\begin{pmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 1 & 1 \\
1 & 2 & 3
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ d_1 \\ d_2
\end{bmatrix}\text{,}
\]
which corresponds to a linear function that returns the input data bytes in addition to computing the parity numbers. Now let’s say we lose the data bytes \(d_0\) and \(d_2\). Then let’s remove the rows corresponding to those bytes:
\[
\begin{bmatrix}
\xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1
\end{bmatrix}
=
\begin{pmatrix}
\xcancel{1} & \xcancel{0} & \xcancel{0} \\
0 & 1 & 0 \\
\xcancel{0} & \xcancel{0} & \xcancel{1} \\
1 & 1 & 1 \\
1 & 2 & 3
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ d_1 \\ d_2
\end{bmatrix}\text{,}
\]
which turns into
\[
\begin{bmatrix}
d_1 \\ p_0 \\ p_1
\end{bmatrix} =
\begin{pmatrix}
0 & 1 & 0 \\
1 & 1 & 1 \\
1 & 2 & 3
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ d_1 \\ d_2
\end{bmatrix}\text{,}
\]
which corresponds to a linear function that maps the input data
bytes to the non-lost data bytes and the parity bytes. This
is the inverse of the function we want, so we want
to invert the \(3 \times 3\) matrix above, which we’ll
call \(M\). That inverse is
\[
M^{-1} =
\begin{pmatrix}
-1/2 & 3/2 & -1/2 \\
1 & 0 & 0 \\
-1/2 & -1/2 & 1/2
\end{pmatrix}\text{.}
\]
Multiplying both sides above by \(M^{-1}\), we get
\[
\begin{bmatrix}
d_0 \\ d_1 \\ d_2
\end{bmatrix}
=
\begin{pmatrix}
-1/2 & 3/2 & -1/2 \\
1 & 0 & 0 \\
-1/2 & -1/2 & 1/2
\end{pmatrix}
\cdot
\begin{bmatrix}
d_1 \\ p_0 \\ p_1
\end{bmatrix}\text{,}
\]
which is exactly what we want: the original data bytes in
terms of the known data bytes and the parity numbers!^{[3]}
Comparing this equation to the one we manually computed previously, they don’t look immediately similar, but some rearrangement will reveal that they indeed compute the same thing. As a sanity check, notice that the second row of \(M^{-1}\) means that the first input argument is mapped unchanged to the second output argument, which is exactly what we want for the known byte \(d_1\).
Now what does this look like in general, i.e. for
arbitrary \(n\) and \(m\)? First, we have to generate an
\(m \times n\) parity matrix \(P\) whose rows have to be
“sufficiently different” from each other,
which we still have to clarify. Then ComputeParity
just multiplies \(P\) by \([d]\), the column matrix of input bytes, like so:
\[
\begin{bmatrix}
p_0 \\ \vdots \\ p_{m-1}
\end{bmatrix}
=
\mathtt{ComputeParity}(d_0, \ldots, d_{n-1}) =
\begin{pmatrix}
p_0 \\
\vdots \\
p_{m-1}
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ \vdots \\ d_{n-1}
\end{bmatrix}\text{,}
\]
where the \(p_i\) are the rows of \(P\).
As for ReconstructData
, we first append
the rows of \(P\) to \(I_n\), whose rows we’ll denote as \(e_i\):
\[
\begin{bmatrix}
d_0 \\ \vdots \\ d_{n-1} \\
p_0 \\ \vdots \\ p_{m-1}
\end{bmatrix}
=
\begin{pmatrix}
e_0 \\
\vdots \\
e_{n-1} \\
p_0 \\
\vdots \\
p_{m-1}
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ \vdots \\ d_{n-1}
\end{bmatrix}\text{.}
\]
Now assume that the indices of the missing \(k \le m\) data
bytes are \(i_0, \ldots, i_{k-1}\).
Then we remove the rows
corresponding to the missing data bytes, and keep some \(k\)
parity rows, e.g. \(p_0\) to \(p_{k-1}\). This yields the equation
\[
\begin{bmatrix}
d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\
p_0 \\ \vdots \\ p_{k-1}
\end{bmatrix}
=
\begin{pmatrix}
e_{j_0} \\
\vdots \\
e_{j_{n-k-1}} \\
p_0 \\
\vdots \\
p_{k-1}
\end{pmatrix}
\cdot
\begin{bmatrix}
d_0 \\ \vdots \\ d_{n-1}
\end{bmatrix}\text{,}
\]
where \(j_0, \ldots, j_{m-k-1}\) are the indices of the
present \(n - k\) data bytes. Call that \(n \times n\)
matrix \(M\), and compute its inverse \(M^{-1}\). If \(P\) was chosen correctly, \(M^{-1}\) should always exist, so if the inverse computation fails, raise an error. Therefore, ReconstructData
just multiplies \(M^{-1}\) by the column matrix of present data bytes and chosen parity numbers:
\[
\begin{bmatrix}
d_0 \\ \vdots \\ d_{n-1}
\end{bmatrix}
=
\mathtt{ReconstructData}(d_{j_0}, \ldots, d_{j_{n-k-1}}, p_0, \ldots, p_{k-1})
= M^{-1} \cdot
\begin{bmatrix}
d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\
p_0 \\ \vdots \\ p_{k-1}
\end{bmatrix}\text{.}
\]
As an optimization, some rows of \(M^{-1}\) correspond to just shuffling around the known data bytes \(d_{j_*}\), so we can just remove those rows, compute the missing data bytes, and do the shuffling ourselves.
ComputeParity
and ReconstructData
,
but we still have missing pieces. In particular,
- How do we compute matrix inverses?
- How do we generate “optimal” parity matrices so that \(M^{-1}\) always exists?
- How do we compute parity bytes instead of parity numbers?
So first, let’s see how to compute matrix inverses using row reduction.
6. Finding matrix inverses using row reduction
We developed the theory of matrices by identifying them with linear functions of numbers. To show how to find matrix inverses, we have to look at them in a slightly different way by identifying matrix equations with systems of linear equations of numbers.
For example, consider the matrix equation \[ M \cdot x = y\text{,} \] where \[ M = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\text{,} \quad x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \text{,} \quad \text{and } y = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.} \] This expands to \[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{,} \] or \[ \begin{aligned} y_1 &= 1 \cdot x_1 + 2 \cdot x_2 \\ y_2 &= 3 \cdot x_1 + 4 \cdot x_2\text{,} \end{aligned} \] which is a linear system of equations of numbers. Letting \(M\) be any matrix, and \(x\) and \(y\) be appropriately-sized column matrices of variables, we can see that the matrix equation \(M \cdot x = y\) is shorthand for a system of linear equations of numbers.
If we could find \(M^{-1}\), we could solve the matrix equation easily by multiplying both sides by it: \[ \begin{aligned} M^{-1} \cdot (M \cdot x) &= M^{-1} \cdot y \\ x &= M^{-1} \cdot y\text{,} \end{aligned} \] and therefore solve the linear system for \(x\) in terms of \(y\). Conversely, if we were able to solve the linear system for \(x\), we’d then be able to read off \(M^{-1}\) from the new linear system.
- swapping two equations,
- multiplying an equation by a number,
- adding one equation to another, possibly multiplying the equation by a number before adding.
All of these are valid transformations because they don’t change the solution set of the linear system.
For example, in the equation above, the first step would be to subtract \(3\) times the first equation from the second equation to yield \[ \begin{aligned} y_1 &= x_1 + 2 \cdot x_2 \\ y_2 - 3 \cdot y_1 &= -2 \cdot x_2\text{.} \end{aligned} \] Then, add the second equation back to the first equation: \[ \begin{aligned} y_2 - 2 \cdot y_1 &= x_1 \\ y_2 - 3 \cdot y_1 &= -2 \cdot x_2\text{.} \end{aligned} \] Finally, divide the second equation by \(-2\): \[ \begin{aligned} y_2 - 2 \cdot y_1 &= x_1 \\ (3/2) \cdot y_1 - (1/2) \cdot y_2 &= x_2\text{.} \end{aligned} \] This is equivalent to the matrix equation \[ \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\text{,} \] so \[ M^{-1} = \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix}\text{.} \]
So how do we translate the above process to an algorithm operating on matrices? First, express our matrix equation in a slightly different form: \[ M \cdot x = I \cdot y\text{.} \] Using the example above, this is \[ \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.} \] Then, you can see that the first step above corresponds to subtracting \(-3\) times the first row from the second row to yield: \[ \begin{pmatrix} 1 & 2 \\ 0 & -2 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{pmatrix} 1 & 0 \\ -3 & 1 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.} \] We don’t even need to keep writing the \(x\) and \(y\) column matrices; we can just write the “augmented” matrix. \[ A = \begin{pmatrix} 1 & 2 & | & 1 & 0 \\ 0 & -2 & | & -3 & 1 \end{pmatrix} \] and operate on it.
- swapping two equations corresponds to swapping two rows;
- multiplying an equation by a number corresponds to multiplying a row by a number; and
- adding an equation to another, possibly multiplying the equation by a number before adding, corresponds to adding a row to another row, possibly multiplying the row by a number before adding.
- Turn the left side of \(A\) into a unit upper triangular matrix, which means that all the elements on the main diagonal are \(1\), and all elements below the main diagonal are \(0\), i.e. that \(a_{ii} = 1\) for all \(i\), and \(a_{ij} = 0\) for all \(j > i\).
- Then turn the left side of \(A\) into the identity matrix.
- For each column \(i\) of the left side in ascending order:
- If \(a_{ii}\) is zero, look at the rows below the \(i\)th row for a row \(j > i\) such that \(a_{ji} \ne 0\), and swap rows \(i\) and \(j\). If no such row exists, return an error, as that means that \(A\) is non-invertible.
- Divide the \(i\)th row by \(a_{ii}\), so that \(a_{ii} = 1\).
- For each row \(j > i\), subtract \(a_{ji}\) times the \(i\)th row from it, which will set \(a_{ji}\) to \(0\).
- For each column \(i\) of the left side, in order:
- For each row \(j < i\), subtract \(a_{ji}\) times the \(i\)th row from it, which will set \(a_{ji}\) to \(0\).
Note that we’re assuming that all arithmetic is exact, i.e. we use a arbitrary-precision rational number type. If we use floating point numbers, we’d have to worry a lot more about the order in which we do operations and numerical stability.
Example 3: Matrix inversion via row reduction
Let/ 0 2 2 \ M = | 3 4 5 | \ 6 6 7 /.The initial augmented matrix A is
/ 0 2 2 | 1 0 0 \ | 3 4 5 | 0 1 0 | \ 6 6 7 | 0 0 1 /.We need A_{00} to be non-zero, so swap rows 0 and 1:
/ 0 2 2 | 1 0 0 \ / 3 4 5 | 0 1 0 \ | 3 4 5 | 0 1 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.We need A_{00} to be 1, so divide row 0 by 3:
/ 3 4 5 | 0 1 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.We need A_{20} to be 0, so subtract row 0 scaled by 6 from row 2:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 0 -2 -3 | 0 -2 1 /.We need A_{11} to be 1, so divide row 1 by 2:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 2 2 | 1 0 0 | --> | 0 1 1 | 1/2 0 0 | \ 0 -2 -3 | 0 -2 1 / \ 0 -2 -3 | 0 -2 1 /.We need A_{21} to be 0, so subtract row 1 scaled by −2 from row 2:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 1 1 | 1/2 0 0 | --> | 0 1 1 | 1/2 0 0 | \ 0 -2 -3 | 0 -2 1 / \ 0 0 -1 | 1 -2 1 /.We need A_{22} to be 1, so divide row 2 by −1, which makes the left side of A a unit upper triangular matrix:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 1 1 | 1/2 0 0 | --> | 0 1 1 | 1/2 0 0 | \ 0 0 -1 | 1 -2 1 / \ 0 0 1 | -1 2 -1 /.We need A_{12} to be 0, so subtract row 2 from row 1:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 1 1 | 1/2 0 0 | --> | 0 1 0 | 3/2 -2 1 | \ 0 0 1 | -1 2 -1 / \ 0 0 1 | -1 2 -1 /.We need A_{02} to be 0, so subtract row 2 scaled by 5/3 from row 0:
/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 0 | 5/3 -3 5/3 \ | 0 1 0 | 3/2 -2 1 | --> | 0 1 0 | 3/2 -2 1 | \ 0 0 1 | -1 2 -1 / \ 0 0 1 | -1 2 -1 /.We need A_{01} to be 0, so subtract row 1 scaled by 4/3 from row 0, which makes the left side of A the identity matrix:
/ 1 4/3 0 | 5/3 -3 5/3 \ / 1 0 0 | -1/3 -1/3 1/3 \ | 0 1 0 | 3/2 -2 1 | --> | 0 1 0 | 3/2 -2 1 | \ 0 0 1 | -1 2 -1 / \ 0 0 1 | -1 2 -1 /.Since the left side of A is the identity matrix, the right side of A is M^{-1}. Therefore,
/ -1/3 -1/3 1/3 \ M^{-1} = | 3/2 -2 1 | \ -1 2 -1 /.
Now notice one thing: if \(M\) has a row that is proportional to another row, then row reduction would eventually zero out one of the rows, causing the algorithm to fail, and signaling that \(M\) is non-invertible. In fact, a stronger statement is true: \(M\) has a row that can be expressed as a linear combination of other rows of \(M\) exactly when \(M\) is non-invertible. Informally, this means that the linear function corresponding to \(M\) has one of its outputs redundant with the other outputs, so it is essentially a a linear function taking \(n\) inputs to fewer than \(n\) outputs, and such functions aren’t invertible.
This gets us a partial explanation for what “sufficiently different” means for our parity functions. If one parity function is a linear combination of other parity functions, then it is redundant, and therefore not “sufficiently different”. Therefore, we want our parity matrix \(P\) to be such that no row can be expressed as a linear combination of other rows.
However, this criterion for \(P\) isn’t quite enough
to guarantee that all possible matrices \(M\) computed as
part of ReconstructData
are invertible. For example,
this criterion holds for the identity matrix \(I_n\), but if \(n >
1\) and you pick \(I_n\) as the parity matrix for \(n = m\), you can
certainly end up with a constructed matrix \(M\) with repeated rows,
since you’re starting by appending another copy of \(I_n\) on
top of \(P = I_n\)! This explains in a different way why simply
making a copy of the original data files makes for a poor erasure
code, unless of course you only have one data file. We’re led
to our next topic: what makes a parity matrix “optimal”?
7. Optimal parity matrices
Recall from above that we form the square matrix \[ M = \begin{pmatrix} e_{j_0} \\ \vdots \\ e_{j_{n-k-1}} \\ p_0 \\ \vdots \\ p_{k-1} \end{pmatrix} \] by prepending some rows of the identity matrix to the first \(k\) rows of the parity matrix. We can generalize this a bit more, since we don’t have to take the first \(k\) rows, but instead can take any \(k\) rows of the parity matrix, whose indices we denote here as \(l_0, \ldots, l_{k-1}\): \[ M = \begin{pmatrix} e_{j_0} \\ \vdots \\ e_{j_{n-k-1}} \\ p_{l_0} \\ \vdots \\ p_{l_{k-1}} \end{pmatrix}\text{.} \] So we want to construct \(P\) so that any such square matrix \(M\) formed from the rows of \(P\) is invertible. Therefore, we call a parity matrix \(P\) optimal if it satisfies this criterion.
Example 4: A optimal parity matrix for \(m = 2\)
Recall the parity matrix \[ P = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \] that we were using for our \(n = 3, m = 2\) example. For any \(n\), this matrix looks like \[ P = \begin{pmatrix} 1 & 1 & \ldots & 1 \\ 1 & 2 & \ldots & n-1 \end{pmatrix}\text{.} \] A \(1 \times 1\) matrix is invertible exactly when its single element is non-zero, so any \(1 \times 1\) submatrix of \(P\) is invertible. Any \(2 \times 2\) submatrix of \(P\) looks like \[ A = \begin{pmatrix} 1 & 1 \\ a & b \end{pmatrix} \] for \(a \ne b\), which, using the formula for inverses of \(2 \times 2\) matrices, has inverse \[ A^{-1} = \begin{pmatrix} b/(b-a) & -1/(b-a) \\ -a/(b-a) & 1/(b-a) \end{pmatrix}\text{.} \] These are all the possible square submatrices of \(P\), so therefore this \(P\) is a optimal parity matrix for \(m = 2\).
Then, finally, we now have a complete definition of what makes a list of parity functions “sufficiently different”; it is exactly when the corresponding parity matrix is optimal as we’ve defined it above.
Now this leads us to the question: how do we find such optimal matrices? Fortunately, there’s a whole class of matrices that are optimal: the Cauchy matrices.
Let \(a_0, \ldots, a_{m+n-1}\) be a sequence of distinct integers, meaning that no two \(a_i\) are equal, and let \(x_0, \ldots, x_{m-1}\) be the first \(m\) integers of \(a_i\) with \(y_0, \ldots, y_{n-1}\) the remaining integers. Then form the \(m \times n\) matrix \(A\) by setting its elements according to: \[ a_{ij} = \frac{1}{x_i - y_j}\text{,} \] which is always defined since the denominator is never zero, by the distinctness of the \(a_i\). Then \(A\) is a Cauchy matrix.
Example 5: Cauchy matrices
Let x = [ 1, 2, 3 ] and y = [ -1, 4, 0 ]. Then, the Cauchy matrix constructed from x and y is/ 1/2 -1/3 1 \ | 1/3 -1/2 1/2 | \ 1/4 -1 1/3 /,which has inverse
/ -36/5 96/5 -36/5 \ | -3/10 9/5 -9/5 | \ 9/2 -9 3 /.
Therefore, to generate a optimal parity matrix for any \((n, m)\), all we need to do is to generate an \(m \times n\) Cauchy matrix. We can pick any sequence of distinct \(m + n\) integers, so for simplicity let’s just use \[ x_i = n + i \quad \text{and} \quad y_i = i\text{.} \]
Example 6: Cauchy parity matrices for \(m = 2\)
For \(n = 3, m = 2\), we have the sequences \[ x_0 = 3, x_1 = 4 \quad \text{and} \quad y_0 = 0, y_1 = 1, y_2 = 2\text{,} \] so the corresponding Cauchy parity matrix is \[ P = \begin{pmatrix} 1/3 & 1/2 & 1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix}\text{.} \] Similarly, for any \(n\), \[ P = \begin{pmatrix} 1/n & \cdots & 1/2 & 1 \\ 1/{n + 1} & \cdots & 1/3 & 1/2 \end{pmatrix}\text{.} \] All entries of \(P\) are non-zero, so any \(1 \times 1\) submatrix of \(P\) is invertible. Any \(2 \times 2\) submatrix of \(P\) looks like \[ A = \begin{pmatrix} 1/a & 1/b \\ 1/(a+1) & 1/(b+1) \end{pmatrix} \] for \(a \ne b\), which, using the formula for inverses of \(2 \times 2\) matrices, has inverse \[ A^{-1} = \begin{pmatrix} \frac{ab(a+1)}{b-a} & -\frac{a(a+1)(b+1)}{b-a} \\ -\frac{ab(b+1)}{b-a} & \frac{b(a+1)(b+1)}{b-a} \end{pmatrix}\text{.} \] These are all the possible square submatrices of \(P\), so therefore this \(P\) is a optimal parity matrix for \(m = 2\).
Note that our first parity matrix for \(n = 3, m = 2\) isn’t a Cauchy matrix, since no Cauchy matrix can have repeating elements in a single row. That means that there are other possible optimal parity matrices that aren’t Cauchy matrices.^{[7]}
Also, our previous parity matrices had integers, and Cauchy matrices have rational numbers (i.e., fractions). This means that our parity numbers are now fractions. This isn’t a serious difference, though, since we’d have to deal with fractions when computing matrix inverses anyway. You could also change a parity matrix with fractions into one without by simply multiplying the entire matrix by some non-zero number that gets rid of all the fractions, which doesn’t change the optimality of the matrix. For example, we can multiply \[ \begin{pmatrix} 1/3 & 1/2 & 1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix} \] by \(12\) to get the equivalent parity matrix \[ \begin{pmatrix} 4 & 6 & 12 \\ 3 & 4 & 6 \end{pmatrix}\text{.} \]
Now our only remaining missing piece is this: how do we compute parity bytes instead of parity numbers? Answering this would render the above discussion moot. However, to do so, we first have to take another look at how we’re doing linear algebra.
8. Linear algebra over fields
We ultimately want our parity numbers to be parity bytes, which means that we want to work with matrices of bytes instead of matrices of rational numbers. In order to do that, we need to define an interface for matrix elements that preserves the operations and properties we care about, and then we have to figure out how to implement that interface using bytes.
Looking at the rule for matrix multiplication, we need to be able to add and multiply matrix elements. Looking at how we do matrix inversion, we also need to be able to subtract and divide matrix elements. Finally, there are certain properties that hold for rational numbers that we implicitly assume when doing matrix operations, but that we have to make explicit for matrix elements.
interface Field<T> {
static Zero: T, One: T
plus(b: T): T
negate(): T
times(b: T): T
reciprocate(): T
equals(b: T): bool
minus(b: T) = this.plus(b.negate())
dividedBy(b: T) = this.times(b.reciprocate())
}
We need to be able to add and multiply field elements, which we’ll denote generically by \(\oplus\) and \(\otimes\). We also need to be able to take the negation (additive inverse) of an element \(x\), which we’ll denote by \(-x\), and the reciprocal (multiplicative inverse) of a non-zero element \(x\), which we’ll denote by \(x^{-1}\). Then we can define subtraction of field elements to be \[ a \ominus b = a \oplus -b \] and division of field elements to be \[ a \cldiv b = a \otimes b^{-1}\text{,} \] when \(b \ne 0\).
Field
must satisfy further
properties, which are copied from the number laws you learn in school:
- Identities: \(a \oplus 0 = a \otimes 1 = a\).
- Inverses: \(a \oplus -a = 0\), and for \(a \ne 0\), \(a \otimes a^{-1} = 1\).
- Associativity: \((a \oplus b) \oplus c = a \oplus (b \oplus c)\), and \((a \otimes b) \otimes c = a \otimes (b \otimes c)\).
- Commutativity: \(a \oplus b = b \oplus a\), and \(a \otimes b = b \otimes a\).
- Distributivity: \(a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)\).
class Field257Element : implements Field<Field257Element> {
plus(b) { return (this + b) % 257 }
negate() { return (257 - this) }
times(b) { return (this * b) % 257 }
reciprocate() {
if (this == 0) { return Error }
for i := 0 to 256 {
if (this.times(b) == 1) { return i; }
}
return Error
}
...
}
Example 7: Field with 257 elements
Denote operations on the field with 257 elements by a _{257} subscript, and let a = 23 and b = 54. Then- a +_{257} b = (23 + 54) mod 257 = 77;
- −_{257}b = (257 − 54) mod 257 = 203;
- a −_{257} b = a +_{257} −_{257}b = (23 + 203) mod 257 = 226;
- a ×_{257} b = (23 × 54) mod 257 = 214;
- 54 ×_{257} 119 = 1, so b^{-1}_{257} = 119;
- a ÷_{257} b = a ×_{257} b^{-1}_{257} = (23 × 119) mod 257 = 167, and indeed b ×_{257} (a ÷_{257} b) = (54 × 167) mod 257 = 23 = a.
Field257Element
instead
of a rational number type when implementing ComputeParity
and ReconstructData
,
and if we’ve abstracted our Matrix
type correctly, almost everything should just work. However, there is one
thing we need to check: Are Cauchy parity matrices still
optimal if we use fields other than the rational numbers? Fortunately, the answer is yes:
Now the question remains: can we construct a field with \(256\) elements? As we saw above, we can’t do so the same way as we constructed the field with \(257\) elements. In fact, we need to start with defining different arithmetic operations on the integers. This brings us to the topic of binary carry-less arithmetic.
9. Binary carry-less arithmetic
The basic idea with binary carry-less (which I’ll henceforth shorten to “carry-less”) arithmetic is to express all integers in binary, then perform all arithmetic operations using binary arithmetic, except ignoring all the carries.^{[8]}
How does this work with addition? Let’s denote binary carry-less add as \(\clplus\),^{[9]} and let’s see how it behaves on single binary digits: \[ \begin{aligned} 0 \clplus 0 &= 0 \\ 0 \clplus 1 &= 1 \\ 1 \clplus 0 &= 1 \\ 1 \clplus 1 &= 0\text{.} \end{aligned} \] This is just the exclusive or operation on bits, so if we do carry-less addition on any two integers, it turns out to be nothing but xor! Since xor can also be denoted by \(\clplus\), we can conveniently think of \(\clplus\) as meaning both carry-less addition and xor.
Example 8: Carry-less addition
Let a = 23 and b = 54. Then, with carry-less arithmetic,a = 23 = 10111b ^ b = 54 = 110110b ------- 100001bso a ⊕ b = 100001_{b} = 33.
What about subtraction? Recall that \((a \clplus b) \clplus b = a\) for any \(a\) and \(b\). Therefore, every element \(b\) is its own (carry-less binary) additive inverse, which means that \(a \clminus b = a \clplus b\), i.e. carry-less subtraction is also just xor.
Carry-less multiplication isn’t as simple, but recall that binary multiplication is just adding shifted copies of \(a\) based on which bits are set in \(b\) (or vice versa). To do carry-less multiplication, just ignore carries when adding the shifted copies again, i.e. xor shifted copies instead of adding them.
Example 9: Carry-less multiplication
Let a = 23 and b = 54. Then, with carry-less arithmetic,a = 23 = 10111b ^* b = 54 = 110110b ------------ 10111 ^ 10111 ^ 10111 ^ 10111 ------------ 1111100010bso a ⊗ b = 1111100010_{b} = 994.
Finally, we can define carry-less division with remainder. Binary division with remainder is subtracting shifted copies of \(b\) from \(a\) until you get a remainder less than the divisor; then carry-less binary division with remainder is xor-ing shifted copies of \(b\) with \(a\) until you get a remainder. However, there’s a subtlety; with carry-less arithmetic, it’s not enough to stop when the remainder (for that step) is less than the divisor, because if the highest set bit of the remainder is the same as the highest set bit of the divisor, you can still xor with the divisor one more time to yield a smaller number (which then becomes the final remainder).
Consider the example below, where we’re dividing \(55\) by \(19\). The first remainder is \(17\), which is less than \(19\), but still shares the same highest set bit, so we can xor one more time with \(19\) to get the remainder \(2\).
Example 10: Carry-less division
Let a = 55 and b = 19. Then, with carry-less arithmetic,11b -------- b = 19 = 10011b )110111b = 55 = a ^ 10011 ----- 10001 ^ 10011 ----- 10bso a ⨸ b = 11_{b} = 3 with remainder 10_{b} = 2.
This leads to an interesting difference between the carry-less modulo operation and the standard modulo operation. If you mod by a number \(n\), you get \(n\) possible remainders, from \(0\) to \(n - 1\). However, if you clmod (carry-less mod) by a number \(2^k \le n \lt 2^{k+1}\), you get \(2^k\) possible remainders, from \(0\) to \(2^k-1\), since those are the numbers whose highest set bit is lower than the highest set bit of \(n\).
In particular, if you clmod by a number \(256 \le n < 512\), you always get \(256\) possible remainders. This is very close to what we want—now the hope is to find some \(256 \le n < 512\) so that doing binary carry-less arithmetic clmod \(n\) yields a field, which will then be a field with \(256\) elements!
10. The finite field with \(256\) elements
Since there are only a few numbers between \(256\) and \(512\), we can just try each one of them to see if clmod-ing by one of them yields a field. However, with a bit of math, we can gain more insight into which numbers will work.
Recall the situation with the standard arithmetic operations: arithmetic mod \(p\) yields a field exactly when \(p\) is prime.^{[10]} But recall the definition of a prime number: it is an integer greater than \(1\) whose positive divisors are only itself and \(1\). Stated another way, a prime number is an integer \(p \gt 1\) that cannot be expressed as \(p = a \cdot b\), for \(a, b \gt 1\).
Thus, the concept of a prime number is determined by the multiplication operation, and therefore we can define a “carry-less” prime number to be an integer \( p \gt 1\) that cannot be expressed as \(p = a \clmul b\), for \(a, b \gt 1\).^{[11]}
It is easy enough to test each number \(256 \le n < 512\) for carry-less primality though; doing so, we find the lowest one, \(283\).^{[12]}
class Field256Element : implements Field<Field256Element> {
plus(b) { return this ^ b }
negate() { return b }
times(b) { return clmod(clmul(this, b), 283) }
reciprocate() {
if (this == 0) { return Error }
for i := 0 to 255 {
if (this.times(b) == 1) { return i; }
}
return Error
}
...
}
Similarly to how we find reciprocals mod \(257\), we brute-force
finding reciprocals clmod \(283\) also.Example 11: Field with 256 elements
Denote operations on the field with 256 elements by a _{256} subscript, and let a = 23 and b = 54. Then- a ⊕_{256} b = 23 ⊕ 54 = 33;
- ⊖_{256}b = b = 54;
- a ⊖_{256} b = a ⊕_{256} ⊖_{256}b = a ⊕_{256} b = 33;
- a ⊗_{256} b = (23 ⊗ 54) clmod 283 = 207;
- 54 ⊗_{256} 102 = 1, so b^{-1}_{256} = 102;
- a ø_{256} b = a ⊗_{256} b^{-1}_{256} = (23 ⊗ 102) clmod 283 = 19, and indeed b ⊗_{256} (a ø_{256} b) = (54 × 19) clmod 283 = 23 = a.
11. The full algorithm
Now we have all the pieces we need to construct erasure codes for any \((n, m)\) such that \(m + n \le 256\). First, we can compute an \(m \times n\) Cauchy parity matrix over the field with \(256\) elements. (Recall that this needs \(m + n\) distinct field elements, which is what imposes the condition \(m + n \le 256\).)
Example 12: Cauchy matrices in general
Working over the field with 256 elements, let x = [ 1, 2, 3 ] and y = [ 4, 5, 6 ]. Then, the Cauchy matrix constructed from x and y is/ 82 203 209 \ | 123 209 203 | \ 209 123 82 /,which has inverse
/ 130 31 176 \ | 252 219 31 | \ 108 252 130 /.
Then we can implement matrix multiplication over arbitrary fields,
and thus we can implement ComputeParity
.
Example 13: ComputeParity
in detail
Let
d = [ da, db, 0d ]
be the input data bytes and let
m = 2
be the desired parity byte count. Then, with the input byte
count
n = 3,
the
m × n
Cauchy parity matrix computed using
x_{i} = n + i
and
y_{i} = i
is
/ f6 8d 01 \ \ cb 52 7b /.Therefore, the parity bytes are computed as
_ _ _ _ / f6 8d 01 \ | da | | 52 | \ cb 52 7b / * | db | = |_ 0c _|, |_ 0d _|and thus the output parity bytes are p = [ 52, 0c ].
Then we can implement matrix inversion using row reduction over arbitrary fields.
Example 14: Matrix inversion via row reduction in general
Working over the field with 256 elements, let/ 0 2 2 \ M = | 3 4 5 | \ 6 6 7 /.The initial augmented matrix A is
/ 0 2 2 | 1 0 0 \ | 3 4 5 | 0 1 0 | \ 6 6 7 | 0 0 1 /.We need A_{00} to be non-zero, so swap rows 0 and 1:
/ 0 2 2 | 1 0 0 \ / 3 4 5 | 0 1 0 \ | 3 4 5 | 0 1 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.We need A_{00} to be 1, so divide row 0 by 3:
/ 3 4 5 | 0 1 0 \ / 1 245 3 | 0 246 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.We need A_{20} to be 0, so subtract row 0 scaled by 6 from row 2:
/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 0 14 13 | 0 2 1 /.We need A_{11} to be 1, so divide row 1 by 2:
/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 2 2 | 1 0 0 | --> | 0 1 1 | 141 0 0 | \ 0 14 13 | 0 2 1 / \ 0 14 13 | 0 2 1 /.We need A_{21} to be 0, so subtract row 1 scaled by 14 from row 2:
/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 1 1 | 141 0 0 | --> | 0 1 1 | 141 0 0 | \ 0 14 13 | 0 2 1 / \ 0 0 3 | 7 2 1 /.We need A_{22} to be 1, so divide row 2 by 3, which makes the left side of A a unit upper triangular matrix:
/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 1 1 | 141 0 0 | --> | 0 1 1 | 141 0 0 | \ 0 0 3 | 7 2 1 / \ 0 0 1 | 244 247 246 /.We need A_{12} to be 0, so subtract row 2 from row 1:
/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 1 1 | 141 0 0 | --> | 0 1 0 | 121 247 246 | \ 0 0 1 | 244 247 246 / \ 0 0 1 | 244 247 246 /.We need A_{02} to be 0, so subtract row 2 scaled by 3 from row 0:
/ 1 245 3 | 0 246 0 \ / 1 245 0 | 7 244 1 \ | 0 1 0 | 121 247 246 | --> | 0 1 0 | 121 247 246 | \ 0 0 1 | 244 247 246 / \ 0 0 1 | 244 247 246 /.We need A_{01} to be 0, so subtract row 1 scaled by 245 from row 0, which makes the left side of A the identity matrix:
/ 1 245 0 | 7 244 1 \ / 1 0 0 | 82 82 82 \ | 0 1 0 | 121 247 246 | --> | 0 1 0 | 121 247 246 | \ 0 0 1 | 244 247 246 / \ 0 0 1 | 244 247 246 /.Since the left side of A is the identity matrix, the right side of A is M^{-1}. Therefore,
/ 82 82 82 \ M^{-1} = | 121 247 246 | \ 244 247 246 /.
Finally, we can use that to implement ReconstructData
.
Example 15: ReconstructData
in detail
Let
d_{partial} = [ ??, db, ?? ]
be the input partial data bytes and
p_{partial} = [ 52, 0c ]
be the input partial parity bytes. Then, with the data byte
count
n = 3
and the parity byte count
m = 2,
and appending the rows of the
m × n
Cauchy parity matrix to the
n × n
identity matrix, we get
/ X01X X00X X00X \ | 00 01 00 | | X00X X00X X01X | | f6 8d 01 | \ cb 52 7b /,where the rows corresponding to the unknown data and parity bytes are crossed out. Taking the first n rows that aren’t crossed out, we get the square matrix
/ 00 01 00 \ | f6 8d 01 | \ cb 52 7b /which has inverse
/ 01 d0 d6 \ | 01 00 00 | \ 7b b8 bb /.Therefore, the data bytes are reconstructed from the first n known data and parity bytes as
_ _ _ _ / 01 d0 d6 \ | db | | da | | 01 00 00 | * | 52 | = | db | \ 7b b8 bb / |_ 0c _| |_ 0d _|,and thus the output data bytes are d = [ da, db, 0d ].
And we’re done!
12. Further reading
Next time we’ll talk about the PAR1 file format, which is a practical implementation of an erasure code very similar to the one described above, and the various challenges to make it perform well on sets of large files.
Also, for those of you interested in the mathematical details, I’ll also write a companion article. (This article is already quite long!)
I gave a 15-minute presentation for WaffleJS covering the same topics as this article but at a higher-level and more informally.
I got the idea for explaining the finite field with \(256\) elements in terms of binary carry-less arithmetic from A Painless Guide to CRC Error Detection Algorithms, which is an excellent document in its own right.
Most sources below use Vandermonde matrices, which I plan to cover in the next article on PAR1, instead of Cauchy matrices. Cauchy matrices are more foolproof, which is why I started with them. templexxx, whose Go implementation I cite below, feels the same way. (His blog post is in Chinese, but using Google Translate or a similar service translates it well enough to English.)
I started learning about erasure codes from James Plank’s papers. See A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like systems, but also make sure to read the very important correction to it! Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications covers Cauchy matrices, although in a slightly different context. The first part of Plank’s All About Erasure Codes slides also contains a good overview of the encoding/decoding process, including a nifty color-coded matrix diagram.
As for implementations, klauspost and templexxx have good ones written in Go. They were in turn inspired by Backblaze’s Java implementation. Backblaze’s accompanying blog post is also a good overview of the topic. The toy JS implementation powering the demos on this page are also available on my GitHub.
An Introduction to Galois Fields and Reed-Solomon Coding^{[13]} covers much of the same material as I do, albeit assuming slightly more mathematical background.
Going further afield, Russ Cox, Jeremy Kun, and Nayuki also wrote about finite fields and Reed-Solomon codes.
Thanks to Ying-zong Huang, Ryan Hitchman, Charles Ellis, and Josh Gao for comments/corrections/discussion.
Like this post? Subscribe to my feed or follow me on Twitter .
Footnotes
[1] This discussion of linear algebra is necessarily abbreviated for our purposes. For a more general but still basic treatment, see Khan Academy. ↩
[2] Here and throughout this document, I index vectors and matrices starting with \(0\), to better match array indices in code. Most math texts index vectors and matrices starting at \(1\). ↩
[3] Now would be a good time to talk about the conventions I and other texts use. Following Plank, I use \(n\) for the data byte count and \(m\) for the parity byte count, and I represent arrays and vectors as column vectors, where multiplication with a matrix is done with the column vector on the right, which is the standard in most of math. However, in coding theory, \(k\) is used for the data byte count, which they call the message length, and \(n\) is used for the sum of the data and parity byte counts, which they call the codeword length. Furthermore, contrary to the rest of math, coding theory treats arrays and vectors as row vectors, where multiplication with a matrix is done with the row vector on the left, and the matrix used would be the transpose of the matrix that would be used with a column vector. ↩
[4] Khan Academy has a video stepping through an example for a \(3 \times 3\) matrix. ↩
[5] People with experience in coding theory might recognize that a parity matrix \(P\) being optimal is equivalent to the corresponding erasure code being MDS. ↩
[6] An equivalent statement which is easier to see is that if a row could be expressed as a linear combination of other rows, then one would be able to construct a non-empty square submatrix of \(P\) with those rows, which would then be non-invertible. ↩
[7] It is instead a (transposed) Vandermonde matrix, which we’ll cover when we talk about the PAR1 file format in a follow-up article. ↩
[8] People with experience in abstract algebra might recognize this as arithmetic over \(\mathbb{F}_2[x]\), the polynomials with coefficients in the finite field with \(2\) elements. ↩
[9] Our use of \(\clplus\), \(\clminus\), \(\clmul\), and \(\cldiv\) to denote carry-less arithmetic clashes with our use of the same symbols to denote generic field operations. However, we’ll never need to talk about both at the same time, so whichever one we mean should be obvious in context. ↩
[10] This is a slightly stronger statement than Theorem 3. ↩
[11] People with experience in abstract algebra might recognize carry-less primes as irreducible elements of \(\mathbb{F}_2[x]\). ↩
[12] Coincidentally, \(283\) is also a regular prime number. Using another carry-less prime number \(256 \le p \lt 512\) would also yield a field with \(256\) elements, but is important to consistently use the same carry-less modulus; different carry-less moduli lead to fields with \(256\) elements that are isomorphic, but not identical.
Borrowing notation from CRCs, the carry-less modulus is sometimes represented as a hexadecimal number with the leading digit (which is always \(1\)) omitted. For example, \(283\) would be represented as \(\mathtt{0x1b}\), and we can say that we’re using the field with \(256\) elements defined by \(\mathtt{0x1b}\). ↩
[13] Galois field is just another name for finite field. ↩