Fred Akalin Notes on math, tech, and everything in between 2021-05-17T04:06:01-07:00 https://www.akalin.com/ Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. https://www.akalin.com/fta-connectedness The Fundamental Theorem of Algebra via Connectedness 2021-01-03T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <p>It is intuitive that removing even a single point from a line disconnects it, but removing a finite set of points from a plane leaves it connected.</p> <figure> <img src="fta-connectedness-files/disconnected-line.png" alt="A line disconnected by a single point." /><figcaption aria-hidden="true">A line disconnected by a single point.</figcaption> </figure> <figure> <img src="fta-connectedness-files/connected-plane.png" style="width:50.0%" alt="A plane remaining connected even with a few points removed." /><figcaption aria-hidden="true">A plane remaining connected even with a few points removed.</figcaption> </figure> <p>However, this basic fact leads to a non-trivial property of real and complex polynomials: not all non-constant real polynomials have real roots, but all non-constant complex polynomials have complex roots. The latter, is in fact the <em>fundamental theorem of algebra</em>:</p> <div class="theorem"> <p>(<span class="theorem-name">Fundamental theorem of algebra</span>.) Every non-constant complex polynomial has a root.</p> </div> <p>We’ll prove this theorem using nothing stronger than the complex inverse function theorem. Here’s a synopsis:</p> <ol type="1"> <li>Let <span class="math inline">p \colon \mathbb{C}→ \mathbb{C}</span> be a non-constant complex polynomial, and <span class="math inline">V_{\text{regular}}</span> its set of regular values. Let <span class="math inline">P_{\text{pure}}= p^{-1}(V_{\text{regular}})</span> be its set of pure regular points, so that <span class="math inline">p</span> can be thought of a <span class="math inline">P_{\text{pure}}→ V_{\text{regular}}</span> map.</li> <li>Any complex polynomial, and <span class="math inline">p</span> in particular, is a closed <span class="math inline">\mathbb{C}→ \mathbb{C}</span> map, and thus also a closed <span class="math inline">P_{\text{pure}}→ V_{\text{regular}}</span> map.</li> <li>Furthermore, by the inverse function theorem, <span class="math inline">p</span> is an open <span class="math inline">P_{\text{regular}}→ V_{\text{regular}}</span> map, and thus also an open <span class="math inline">P_{\text{pure}}→ V_{\text{regular}}</span> map.</li> <li><span class="math inline">p</span>, being non-constant, has only finitely many critical points. (<em>This is the step that fails for real polynomials.</em>) Therefore, <span class="math inline">V_{\text{regular}}</span> is the complex plane with a finite set of points removed, and thus is connected. Similarly, <span class="math inline">P_{\text{pure}}</span> is also connected.</li> <li><span class="math inline">p</span>, being a continuous, open, and closed <span class="math inline">P_{\text{pure}}→ V_{\text{regular}}</span> map, must take connected components to connected components. Since <span class="math inline">P_{\text{pure}}</span> and <span class="math inline">V_{\text{regular}}</span> are both connected, that means that <span class="math inline">p</span> maps <span class="math inline">P_{\text{pure}}</span> onto <span class="math inline">V_{\text{regular}}</span>.</li> <li><span class="math inline">p</span> also maps <span class="math inline">P_{\text{critical}}</span> onto <span class="math inline">V_{\text{critical}}</span>, so <span class="math inline">p</span> is surjective on <span class="math inline">\mathbb{C}</span>, and thus must have a root.</li> </ol> <p>This is a wonderfully succinct proof, but it’s full of subtleties and would benefit from elaboration (as well as some diagrams). We’ll do that in the rest of this article. First, we need some definitions.</p> <h3 id="points-and-values">Points and values</h3> <p>If a function <span class="math inline">f(x)</span> maps from <span class="math inline">A</span> to <span class="math inline">B</span>, we’ll call elements of <span class="math inline">A</span> <em>points</em> and elements of <span class="math inline">B</span> <em>values</em>; in our case, <span class="math inline">A</span> and <span class="math inline">B</span> will both be subsets of either <span class="math inline">\mathbb{R}</span> or <span class="math inline">\mathbb{C}</span>, but it’s helpful to distinguish when we’re talking about a real or complex number as a domain element versus a codomain element.</p> <p>If <span class="math inline">f(x)</span> is differentiable, we’ll call <span class="math inline">x</span> a <em>critical point</em> if <span class="math inline">f&#39;(x) = 0</span> and a <em>regular point</em> otherwise. We’ll call <span class="math inline">y</span> a <em>critical value</em> if <span class="math inline">y = f(x)</span> for some critical point <span class="math inline">x</span> and a <em>regular value</em> otherwise. In particular, if <span class="math inline">y</span> is not in the image of <span class="math inline">f</span>, then <span class="math inline">y</span> is a regular value.</p> <p>A regular point may map to a critical value. In that case, we call it an <em>impure regular point</em> and a <em>pure regular point</em> otherwise. (This is nonstandard terminology, but it helps with visualizing what’s going on.)</p> <figure> <img src="fta-connectedness-files/real-function-points-values.png" style="width:50.0%" alt="The points and values of a real function. Red points are critical points, blue values are critical values, and green points are impure regular points. All other points are pure regular, and all other values are regular." /><figcaption aria-hidden="true">The points and values of a real function. <span style="color: red;">Red points</span> are <span style="color: red;">critical points</span>, <span style="color: blue;">blue values</span> are <span style="color: blue;">critical values</span>, and <span style="color: green;">green points</span> are <span style="color: green;">impure regular points</span>. All other points are pure regular, and all other values are regular.</figcaption> </figure> <figure> <img src="fta-connectedness-files/complex-function-points-values.png" alt="The points and values of a complex function, with the same colors as in the previous figure." /><figcaption aria-hidden="true">The points and values of a complex function, with the same colors as in the previous figure.</figcaption> </figure> <p>The strategy of the proof is to show that a non-constant complex polynomial <span class="math inline">f(x)</span> is surjective. By construction, <span class="math inline">f(x)</span> maps impure regular points and critical points onto the critical values. Then it suffices to show that <span class="math inline">f(x)</span> maps the pure regular points <span class="math inline">P_{\text{pure}}</span> onto the regular values <span class="math inline">V_{\text{regular}}</span>. In doing so, we’ll show that there are only a finite number of critical points, critical values, and impure regular points; therefore, <span class="math inline">P_{\text{pure}}</span> is the complex plane minus a finite number of points, and that is where connectedness comes into play.</p> <h3 id="connected-sets">Connected sets</h3> <p>A subset <span class="math inline">X</span> of a topological space is <em>disconnected</em> if it is the union of two disjoint, non-empty open sets, and <em>connected</em> otherwise.</p> <p>For example, the set <span class="math inline">X</span> in the first figure is the real line <span class="math inline">\mathbb{R}</span> with a single point <span class="math inline">a</span> removed. Then <span class="math inline">X = (-∞, a) ∪ (a, ∞)</span>, so it is disconnected.</p> <p>It is harder to show that a set is connected. However, we can use a stronger property that’s easier to show. A subset <span class="math inline">X</span> of a topological space is <em>path-connected</em> if for every two points <span class="math inline">x</span> and <span class="math inline">y</span> in <span class="math inline">X</span>, there exists a <em>path</em> from <span class="math inline">x</span> to <span class="math inline">y</span>—that is, a continuous function <span class="math inline">f \colon [0, 1] → X</span> such that <span class="math inline">f(0) = x</span> and <span class="math inline">f(1) = y</span>. A path-connected set is automatically a connected set—being able to draw paths between any two points makes it impossible to split the set into two disjoint non-empty open subsets.</p> <p>In particular, let <span class="math inline">X</span> be the plane <span class="math inline">\mathbb{R}^2</span> or <span class="math inline">\mathbb{C}</span> with a finite number of points <span class="math inline">p_i</span> removed. Then we’ll show that <span class="math inline">X</span> is path-connected. Let <span class="math inline">d</span> be the minimum distance between any of the removed points, and let <span class="math inline">r = d/3</span>. Then given <span class="math inline">x</span> and <span class="math inline">y</span> in <span class="math inline">X</span>, let <span class="math inline">f</span> be the straight-line path from <span class="math inline">x</span> to <span class="math inline">y</span>. For any <span class="math inline">p_i</span> that is on <span class="math inline">f</span>, replace the segment through <span class="math inline">p_i</span> with a semi-circular arc of radius <span class="math inline">r</span> around <span class="math inline">p_i</span>. Since <span class="math inline">r &lt; d/2</span>, the arc will not have any other removed point on it, and no two arcs will overlap. Therefore, this modified path lies entirely in <span class="math inline">X</span>. Since <span class="math inline">x</span> and <span class="math inline">y</span> were arbitrary, <span class="math inline">X</span> is path-connected, and thus connected.</p> <figure> <img src="fta-connectedness-files/path-connected.png" style="width:50.0%" alt="The path between x and y on a plane with a finite number of points removed." /><figcaption aria-hidden="true">The path between <span class="math inline">x</span> and <span class="math inline">y</span> on a plane with a finite number of points removed.</figcaption> </figure> <p>We’re most interested in connected sets that are maximal in the sense that they’re not contained in a larger connected set. These are called <em>connected components</em>, and any topological space can be decomposed into its connected components. For example, the set <span class="math inline">X</span> in the first figure has two connected components <span class="math inline">(-∞, a)</span>, <span class="math inline">(a, ∞)</span>, and the plane with a finite number of points removed remains connected, and thus only has a single connected component. However, removing a line from a plane splits it into two connected components, one on each side of the line.</p> <p>A continuous function preserves connectedness: it maps connected sets to connected sets. However, it may map a connected component to a connected set that’s not a connected component. We want to show that real and complex polynomials map connected components to connected components—this leads us to the concepts of open and closed maps.</p> <h3 id="open-and-closed-functions">Open and closed functions</h3> <p>If a function <span class="math inline">f(x)</span> between topological spaces <span class="math inline">A</span> and <span class="math inline">B</span> sends open sets of <span class="math inline">A</span> to open sets of <span class="math inline">B</span>, we call it <em>open</em>. Similarly, if it sends closed sets of <span class="math inline">A</span> to closed sets of <span class="math inline">B</span>, we call it <em>closed</em>. Be careful! Like with sets, whether a function is open is unrelated to whether it is closed; a function may be neither open nor closed, just open, just closed, or both.</p> <figure> <img src="fta-connectedness-files/not-open-example.png" style="width:75.0%" alt="The real polynomial p(x) = x^2 + 1 is not open, since it maps the open interval (-1, +1) to the closed interval \lbrack 1, 2)." /><figcaption aria-hidden="true">The real polynomial <span class="math inline">p(x) = x^2 + 1</span> is not open, since it maps the open interval <span class="math inline">(-1, +1)</span> to the closed interval <span class="math inline">\lbrack 1, 2)</span>.</figcaption> </figure> <p>We’re more interested in sets and functions that are both open and closed, which we’ll call <em>clopen</em>. A topological space <span class="math inline">A</span> always has two clopen subsets: <span class="math inline">\emptyset</span> and itself. However, if its disconnected, it may have more: in general, a clopen subset <span class="math inline">X</span> is a union of connected components of <span class="math inline">A</span>. Conversely, if <span class="math inline">A</span> has finitely many connected components, each connected component is clopen.</p> <p>Then since a clopen function <span class="math inline">f(x)</span> between <span class="math inline">A</span> and <span class="math inline">B</span> sends clopen sets of <span class="math inline">A</span> to clopen sets of <span class="math inline">B</span>, it then sends connected components of <span class="math inline">A</span> to unions of connected components of <span class="math inline">B</span>. If <span class="math inline">f(x)</span> is also continuous, then it must send a connected component of <span class="math inline">A</span> to another connected set, which then must be a connected component of <span class="math inline">B</span>.</p> <p>Therefore, since real and complex polynomials are continuous, in order to show that they map connected components to connected components, we need to show that they are also clopen.</p> <h3 id="real-and-complex-polynomials-are-closed">Real and complex polynomials are closed</h3> <p>First, we want to show that a real polynomial <span class="math inline">p(x) \colon \mathbb{R}→ \mathbb{R}</span> or a complex polynomial <span class="math inline">p(x) \colon \mathbb{C}→ \mathbb{C}</span> is closed.</p> <p>If <span class="math inline">p(x)</span> is constant, then this follows immediately. Otherwise, the essential property of polynomials that we use is that if <span class="math inline">x → ∞</span>, then <span class="math inline">p(x) → ∞</span>. In other words, if <span class="math inline">x_n</span> is a sequence such that <span class="math inline">p(x_n)</span> is bounded, then <span class="math inline">x_n</span> must also be bounded.</p> <p>Then let <span class="math inline">U</span> be a closed set of points, and let <span class="math inline">y ∈ \overline{p(U)}</span>; in other words, <span class="math inline">y</span> is a limit point of <span class="math inline">p(U)</span>. To show that <span class="math inline">p(U)</span> is closed, we want to show that <span class="math inline">y</span> is in fact in <span class="math inline">p(U)</span>. Since <span class="math inline">y</span> is a limit point of <span class="math inline">p(U)</span>, there is some sequence <span class="math inline">x_n</span> in <span class="math inline">U</span> such that <span class="math inline">p(x_n)</span> converges to <span class="math inline">y</span>. Then <span class="math inline">p(x_n)</span> is bounded, so by the above, <span class="math inline">x_n</span> is also bounded. Then some subsequence <span class="math inline">x_m</span> of <span class="math inline">x_n</span> converges to some <span class="math inline">\tilde{x} ∈ U</span>. Since <span class="math inline">p</span> is continuous, <span class="math inline">p(x_m)</span> then converges to <span class="math inline">p(\tilde{x})</span>, which must then equal <span class="math inline">y</span>. Therefore, <span class="math inline">y</span> is indeed in <span class="math inline">p(U)</span>, which shows that <span class="math inline">p(x)</span> is a closed map.</p> <figure> <img src="fta-connectedness-files/polynomials-are-closed.png" alt="Diagram for the proof that a non-constant polynomial p(x) is closed." /><figcaption aria-hidden="true">Diagram for the proof that a non-constant polynomial <span class="math inline">p(x)</span> is closed.</figcaption> </figure> <figure> <img src="fta-connectedness-files/not-closed.png" style="width:50.0%" alt="The function f(x) = 1/x is not closed, since the closed interval \lbrack 1, ∞) gets mapped to the half-open interval \lbrack 0, 1)" /><figcaption aria-hidden="true">The function <span class="math inline">f(x) = 1/x</span> is not closed, since the closed interval <span class="math inline">\lbrack 1, ∞)</span> gets mapped to the half-open interval <span class="math inline">\lbrack 0, 1)</span></figcaption> </figure> <p>So polynomials <span class="math inline">\mathbb{R}→ \mathbb{R}</span> or <span class="math inline">\mathbb{C}→ \mathbb{C}</span> are closed, but what we really want to show is that they’re also closed as maps from its pure regular points <span class="math inline">P_{\text{pure}}</span> to its regular values <span class="math inline">V_{\text{regular}}</span>. In general, restricting the domain or codomain of a function doesn’t preserve the property of being closed, but if <span class="math inline">f</span> is a closed map from <span class="math inline">A</span> to <span class="math inline">B</span> and <span class="math inline">D ⊆ B</span>, then <span class="math inline">f</span> is a closed map from <span class="math inline">C = f^{-1}(D)</span> to <span class="math inline">D</span>.</p> <p>A proof: if <span class="math inline">U</span> is a closed subset of <span class="math inline">C</span>, then it is <span class="math inline">U&#39; ∩ C</span> for <span class="math inline">U&#39;</span> a closed subset of <span class="math inline">A</span>. In general we have the identity <span class="math inline">f(X ∩ Y) ⊆ f(X) ∩ f(Y)</span>, so <span class="math display"> f(U&#39; ∩ C) ⊆ f(U&#39;) ∩ f(C) ⊆ f(U&#39;) ∩ D\text{.} </span></p> <p>Conversely, if <span class="math inline">y ∈ f(U&#39;) ∩ D</span>, then <span class="math inline">f(x) = y</span> for some <span class="math inline">x ∈ U&#39;</span>. Since <span class="math inline">f(x) ∈ D</span>, <span class="math inline">x ∈ C = f^{-1}(D)</span>, so <span class="math inline">x ∈ U&#39; ∩ C</span>. Therefore, <span class="math inline">y ∈ f(U&#39; ∩ C)</span>, thus <span class="math inline">f(U&#39;) ∩ D ⊆ f(U&#39; ∩ C)</span>, and</p> <p><span class="math display"> f(U) = f(U&#39; ∩ C) = f(U&#39;) ∩ D\text{.}</span></p> <p><span class="math inline">f(U&#39;)</span> is a closed subset of <span class="math inline">B</span> by <span class="math inline">f</span> being closed, and so <span class="math inline">f(U&#39;) ∩ D</span> is a closed subset of <span class="math inline">D</span>.</p> <p>In particular, <span class="math inline">P_{\text{pure}}</span> is the inverse image of <span class="math inline">V_{\text{regular}}</span> by construction, so a real or complex polynomial is thus a closed map from <span class="math inline">P_{\text{pure}}</span> to <span class="math inline">V_{\text{regular}}</span>.</p> <h3 id="real-and-complex-polynomials-have-finitely-many-critical-points">Real and complex polynomials have finitely many critical points</h3> <p>One subtle but important fact that we need is that non-constant real and complex polynomials have finitely many critical points. A critical point of the real or complex polynomial <span class="math inline">p(x)</span> is a root of <span class="math inline">p&#39;(x)</span>, which is another polynomial, so the statement that a non-constant real or complex polynomial has finitely many critical points is equivalent to the statement that a non-zero real or complex polynomial has finitely many roots.</p> <p>But isn’t that equivalent to the fundamental theorem of algebra? No! For one, it’s also true for real polynomials. More generally, it’s an upper bound on the number of roots, whereas the fundamental theorem of algebra is a lower bound.</p> <p>If a real or complex polynomial <span class="math inline">p(x)</span> of positive degree <span class="math inline">n</span> has a root <span class="math inline">r</span>, then <span class="math inline">p(x) = (x - r) q(x)</span> for some polynomial <span class="math inline">q(x)</span> of degree <span class="math inline">n - 1</span>. Then since non-zero degree-<span class="math inline">0</span> polynomials have no roots, by induction <span class="math inline">p(x)</span> has at most <span class="math inline">n</span> roots.</p> <p>Therefore, a non-constant real or complex polynomial of degree <span class="math inline">n</span> has at most <span class="math inline">n - 1</span> critical points.</p> <h3 id="real-and-complex-polynomials-are-open-on-regular-points">Real and complex polynomials are open on regular points</h3> <p>A real polynomial <span class="math inline">p(x) \colon \mathbb{R}→ \mathbb{R}</span> is <em>not</em> open in general; a figure above shows that <span class="math inline">p(x) = x^2 + 1</span> is a counterexample. Fortunately, it’s only the critical points that are the problem: as functions from <span class="math inline">P_{\text{regular}}</span> to <span class="math inline">\mathbb{R}</span>, real polynomials are open.</p> <p>The complex case is actually easier—the <a href="https://en.wikipedia.org/wiki/Open_mapping_theorem_(complex_analysis)">open mapping theorem</a> implies that a complex polynomial <span class="math inline">p(x) \colon \mathbb{C}→ \mathbb{C}</span> is open in general. However, that theorem uses a bit more complex analysis machinery than we’d like—it turns out that we can use the same proof as in the real case (which is simpler) to show that complex polynomials are open as functions from <span class="math inline">P_{\text{regular}}</span> to <span class="math inline">\mathbb{C}</span>.</p> <p>So let’s start the proof. Let <span class="math inline">p(x)</span> be a real (or complex) polynomial, considered as a function from <span class="math inline">V_{\text{regular}}</span> to <span class="math inline">\mathbb{R}</span> (or <span class="math inline">\mathbb{C}</span>). Let <span class="math inline">U ⊆ V_{\text{regular}}</span> be open, and we want to show that <span class="math inline">p(U)</span> is also open.</p> <p>Let <span class="math inline">y ∈ p(U)</span>. Then <span class="math inline">y = p(x)</span> for some regular point <span class="math inline">x ∈ U</span>. Since <span class="math inline">p&#39;(x) ≠ 0</span>, by the real inverse function theorem (or the complex inverse function theorem) there is some open set <span class="math inline">X</span> containing <span class="math inline">x</span> that is diffeomorphic to <span class="math inline">p(X)</span>.</p> <p><span class="math inline">U</span> is open in <span class="math inline">V_{\text{regular}}</span>, which is <span class="math inline">\mathbb{C}</span> minus a finite number of points. Therefore, <span class="math inline">U</span> is an open set in <span class="math inline">\mathbb{C}</span> minus a finite number of points, and is thus also open in <span class="math inline">\mathbb{C}</span>. (This is where we use the fact that <span class="math inline">p(x)</span> has a finite number of critical points.)</p> <p>Since <span class="math inline">U</span> is open in <span class="math inline">\mathbb{C}</span>, so is <span class="math inline">X ∩ U</span>, which is diffeomorphic to <span class="math inline">p(X ∩ U)</span>, which is thus an open set contained in <span class="math inline">p(U)</span> containing <span class="math inline">y</span>. Since <span class="math inline">y</span> was arbitrary, <span class="math inline">p(U)</span> is open.</p> <figure> <img src="fta-connectedness-files/real-open.png" alt="With the real polynomial p(x) = x^3, X = (-a, 1+a) is an open set containing 1 that is diffeomorphic to p(X). Then X ∩ U = (-a, 0) ∪ (0, 1 + a) is also open, and thus so is p(X ∩ U)." /><figcaption aria-hidden="true">With the real polynomial <span class="math inline">p(x) = x^3</span>, <span class="math inline">X = (-a, 1+a)</span> is an open set containing <span class="math inline">1</span> that is diffeomorphic to <span class="math inline">p(X)</span>. Then <span class="math inline">X ∩ U = (-a, 0) ∪ (0, 1 + a)</span> is also open, and thus so is <span class="math inline">p(X ∩ U)</span>.</figcaption> </figure> <figure> <img src="fta-connectedness-files/complex-open.png" alt="A similar diagram for a complex polynomial p(x)." /><figcaption aria-hidden="true">A similar diagram for a complex polynomial <span class="math inline">p(x)</span>.</figcaption> </figure> <p>Since a real or complex polynomial <span class="math inline">p(x)</span> is open from <span class="math inline">P_{\text{regular}}</span> to <span class="math inline">\mathbb{R}</span> or <span class="math inline">\mathbb{C}</span>, the same reasoning as in the closed case shows that since <span class="math inline">V_{\text{regular}}⊆ \mathbb{C}</span> and <span class="math inline">P_{\text{pure}}= p^{-1}(V_{\text{regular}})</span>, then a real or complex polynomial is an open map from <span class="math inline">P_{\text{pure}}</span> to <span class="math inline">V_{\text{regular}}</span>.</p> <h3 id="non-constant-complex-polynomials-are-surjective-but-not-real-ones">Non-constant complex polynomials are surjective (but not real ones)</h3> <p>Now we’re ready to put it all together. Let <span class="math inline">p(x)</span> be a non-constant complex polynomial. By the above, it is clopen as a map from <span class="math inline">P_{\text{pure}}</span> to <span class="math inline">V_{\text{regular}}</span>. Therefore, since it’s also continuous, it maps each connected components of <span class="math inline">P_{\text{pure}}</span> to a connected component of <span class="math inline">V_{\text{regular}}</span>. But both <span class="math inline">P_{\text{pure}}</span> and <span class="math inline">V_{\text{regular}}</span> are <span class="math inline">\mathbb{C}</span> minus a finite set of points, and thus they both have a single connected component. Therefore, <span class="math inline">p(x)</span> maps <span class="math inline">P_{\text{pure}}</span> onto <span class="math inline">V_{\text{regular}}</span>. Since it also maps <span class="math inline">P_{\text{critical}}</span> onto <span class="math inline">V_{\text{critical}}</span>, it maps <span class="math inline">\mathbb{C}</span> onto <span class="math inline">\mathbb{C}= V_{\text{critical}}∪ V_{\text{regular}}</span>.</p> <p>In particular, this implies that <span class="math inline">p(x)</span> has a root, which is the fundamental theorem of algebra.</p> <p>What about the real case? Consider the real polynomial <span class="math inline">p(x) = x^2 + 1</span>. It has a single critical value <span class="math inline">1</span> mapped to by a single critical point <span class="math inline">0</span>, so <span class="math inline">P_{\text{pure}}</span> has two connected components: <span class="math inline">(-∞, 0)</span> and <span class="math inline">(0, ∞)</span>. <span class="math inline">V_{\text{regular}}</span> has two connected components <span class="math inline">(-∞, 1)</span> and <span class="math inline">(1, ∞)</span>, but <span class="math inline">p(x)</span> maps both connected components of <span class="math inline">P_{\text{pure}}</span> to <span class="math inline">(1, ∞)</span>, and so isn’t surjective on <span class="math inline">\mathbb{R}</span>, and in particular doesn’t have a root.</p> <figure> <img src="fta-connectedness-files/real-poly-connected-components.png" style="width:50.0%" alt="The polynomial p(x) = x^2 + 1 maps the two connected components (-∞, 0) and (0, ∞) of P_{\text{pure}} to only one connected component (1, ∞) of V_{\text{regular}}." /><figcaption aria-hidden="true">The polynomial <span class="math inline">p(x) = x^2 + 1</span> maps the two connected components <span class="math inline">(-∞, 0)</span> and <span class="math inline">(0, ∞)</span> of <span class="math inline">P_{\text{pure}}</span> to only one connected component <span class="math inline">(1, ∞)</span> of <span class="math inline">V_{\text{regular}}</span>.</figcaption> </figure> <h3 id="further-reading">Further reading</h3> <p><a href="https://mathoverflow.net/a/10684">This MathOverflow answer</a> is where I first found this proof, although it’s slightly less elementary (it relies on polynomials being proper) and even more terse.</p> <p>Milnor’s wonderful book “Topology from the Differentiable Viewpoint” has a <a href="https://www.google.com/books/edition/Topology_from_the_Differentiable_Viewpoi/BaQYYJp84cYC?gbpv=1&amp;pg=PA8">similarly elegant proof</a> using the fact that a sphere minus a finite number of points remains connected, whereas a circle minus at least two points becomes disconnected. However, it requires somewhat more machinery.</p> <p><a href="http://faculty.bard.edu/~belk/math461s11/InverseFunctionTheorem.pdf">This set of notes</a> is a self-contained proof of the inverse function theorem for <span class="math inline">\mathbb{R}^n</span> (note that the inverse function theorem for <span class="math inline">\mathbb{C}</span> reduces to the inverse function theorem for <span class="math inline">\mathbb{R}^2</span> by the <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy-Riemann equations</a>.) It turns out that a property called “local surjectivity” is all that’s needed to prove openness, but that’s less well-known and only slightly less complicated than the full inverse function theorem.</p> https://www.akalin.com/curvature-moving-frames Curvature computations with moving frames 2018-03-22T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> KaTeXMacros = { "\\pd": "\\frac{∂{#1}}{∂{#2}}", "\\CSF": "Γ_{#1}", "\\CS": "{Γ^{#1}}_{#2}", "\\cnf": "{ω^{#1}}_{#2}", "\\crf": "{Ω^{#1}}_{#2}", "\\Riem": "{\\operatorname{Riem}^{#1}}_{#2}", "\\Ric": "\\operatorname{Ric}_{#1}", "\\sgn": "\\operatorname{sgn}", }; </script> <style> div.cheatsheet, div.important-equation { border: 1px solid #002b36; /* solarized base03 */ background-color: #fdf6e3; /* solarized base3 */ color: #111; margin: 0.5em 0em; text-align: left; padding-left: 0.5em; padding-right: 0.5em; } div.cheatsheet > h2 { font-weight: bold; } li > h3 { font-weight: bold; font-style: italic; } </style> <section> <header> <h2>Overview</h2> </header> <p>Given a metric on a manifold, it is often necessary to compute its curvature. However, the usual method of first computing the Christoffel symbols and then using those to compute the Riemann curvature tensor is tedious and error-prone.</p> <p>Fortunately, there&rsquo;s another way to compute the curvature that&rsquo;s often quicker and easier: Cartan&rsquo;s method of moving frames, or the <em>repère mobile</em>. Unfortunately, explanations of this method aren&rsquo;t very clear, so here I&rsquo;m going to provide my own, based on working through a few examples.</p> <p>I&rsquo;m going to assume that you know enough Riemannian geometry to be able to compute curvature the usual way, and also that you&rsquo;re familiar with the basics of differential forms and exterior differentiation. Some familiarity with <a href="https://en.wikipedia.org/wiki/Pseudo-Riemannian_manifold">semi-Riemannian metrics</a> will also be helpful, since a lot of motivating examples come from general relativity, which uses <a href="https://en.wikipedia.org/wiki/Pseudo-Riemannian_manifold#Lorentzian_manifold">Lorentzian metrics</a>.</p> </section> <section> <header> <h2>The coordinate frame method</h2> </header> <p>First, a quick overview of the usual method using coordinate frames. Let $$g = g_{ij} \, dx^i ⊗ dx^j$$ be a given semi-Riemannian metric expressed in terms of the coordinates $$(x^1, \dotsc, x^n)$$. We first compute the <em>Christoffel symbols</em> using the formula $\CS{k}{ij} = \frac{1}{2} (g^*)^{kl} \left(∂_j g_{il} + ∂_i g_{lj} - ∂_l g_{ij}\right)\text{,}$ where $$(g^*)^{ij}$$ are the components of the dual metric $$g^*$$, which can be computed by taking components of the inverse of the matrix $$G[i, j] = g_{ij}$$ formed from the metric components, i.e. $$(g^*)^{ij} = G^{-1}[i, j]$$. Recall that the Christoffel symbols are symmetric in the lower indices, so if our manifold is $$n$$-dimensional, then in general we have $$n^2(n+1)/2$$ independent Christoffel symbols.</p> <p>Note that we use the <a href="https://en.wikipedia.org/wiki/Einstein_notation">Einstein summation convention</a>; in the absence of a summation sign, index variables that appear once as a superscript and once as a subscript are implicitly summed over.</p> <p>A useful special case is when the metric $$g$$ is diagonal,<sup><a href="#fn1" id="r1"></a></sup> i.e. $$g = g_{ii} \, dx^i ⊗ dx^i$$. Then $$(g^*)^{ii} = 1/g_{ii}$$ and \begin{alignedat}{2} \CS{k}{ij} &= 0 \qquad & \CS{k}{ik} &= \frac{∂_i g_{kk}}{2 g_{kk}} \\ \CS{k}{ii} &= -\frac{∂_k g_{ii}}{2 g_{kk}} \qquad & \CS{i}{ii} &= \frac{∂_i g_{ii}}{2 g_{ii}}\text{,} \end{alignedat} where $$i$$, $$j$$, and $$k$$ are distinct. Therefore in this case we have $$n^2$$ non-zero independent Christoffel symbols.</p> <p>The Christoffel symbols are important in their own right, but we need them only to compute curvature. We can compute the components of the <em>Riemann curvature tensor</em> using the formula $\Riem{k}{lij} = ∂_i \CS{k}{jl} - ∂_j \CS{k}{il} + \CS{k}{im} \CS{m}{jl} - \CS{k}{jm} \CS{m}{il}\text{.}$ We can then compute the <em>Ricci curvature tensor</em> and the <em>scalar curvature</em>: $\Ric{ij} = \Riem{k}{ikj} \qquad S = (g^*)^{ij} \Ric{ij}\text{.}$</p> <p>For applications, we&rsquo;re most interested in the Ricci curvature tensor, so we usually just want to calculate that directly: $\Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}$</p> <div class="cheatsheet"> <h2>Cheatsheet: coordinate frame method</h2> <div class="p">Given the components $$g_{ij}$$ of a semi-Riemannian metric: <ol> <li>Compute the Christoffel symbols. If the metric $$g$$ is diagonal, use \begin{alignedat}{2} \CS{k}{ij} &= 0 \qquad & \CS{k}{ik} &= \frac{∂_i g_{kk}}{2 g_{kk}} \\ \CS{k}{ii} &= -\frac{∂_k g_{ii}}{2 g_{kk}} \qquad & \CS{i}{ii} &= \frac{∂_i g_{ii}}{2 g_{ii}}\text{.} \end{alignedat} Otherwise, compute the dual metric components $$(g^*)^{ij} = G^{-1}[i, j]$$ where $$G[i, j] = g_{ij}$$ and use $\CS{k}{ij} = \frac{1}{2} (g^*)^{kl} \left(∂_j g_{il} + ∂_i g_{lj} - ∂_l g_{ij}\right)\text{.}$</li> <li>Compute the Ricci curvature tensor: $\Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}$</li> </ol> </div> </div> </section> <section> <header> <h2>The Lagrangian method</h2> </header> <p>An alternate method for computing the Christoffel symbols is to write down the Lagrangian corresponding to the metric: $L(x^1, \dotsc, x^n, v^1, \dotsc, v^n) = g_{ij}(x^1, \dotsc, x^n) \, v^i v^j$ and then to compute the Euler-Lagrange equations for a path $$γ(t) = \big(x^1(t), \dotsc, x^n(t)\big)$$: $\frac{d}{dt} \left( \frac{∂ L}{∂ v^k}(γ(t), \dot{γ}(t)) \right) - \frac{∂ L}{∂ x^k}(γ(t), \dot{γ}(t)) = 0$ to get the geodesic equations. Then we can compare these equations to the geodesic equations expressed in terms of the Christoffel symbols $\ddot{γ}^k + \CS{k}{ij} \dot{γ}^i \dot{γ}^j = 0\text{,}$ and then we can read off the Christoffel symbols from the coefficients of the $$\dot{γ}^i \dot{γ}^j$$ terms.</p> <p>I&rsquo;m not convinced that this method saves that much work, especially when the metric is diagonal, but it&rsquo;s at least a clearer way to organize the computations for the Christoffel symbols.</p> <div class="cheatsheet"> <h2>Cheatsheet: Lagrangian method</h2> <div class="p">Given the components $$g_{ij}$$ of a semi-Riemannian metric: <ol> <li>With the Lagrangian $L = g_{ij} \, v^i v^j\text{,}$ compute the Euler-Lagrange equations $\frac{d}{dt} \left( \frac{∂ L}{∂ v^k}(γ(t), \dot{γ}(t)) \right) - \frac{∂ L}{∂ x^k}(γ(t), \dot{γ}(t)) = 0\text{.}$</li> <li>Compare the Euler-Lagrange equations to the geodesic equation $\ddot{γ}^k + \CS{k}{ij} \dot{γ}^i \dot{γ}^j = 0$ and read off the Christoffel symbols $$\CS{k}{ij}$$. </li> <li>Compute the Ricci curvature tensor: $\Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}$</li> </ol> </div> </div> </section> <section> <header> <h2>The moving frame method</h2> </header> <p>Now, finally, I can explain the method of moving frames. Don&rsquo;t worry too much about understanding this the first time through; I suggest skimming this section and then following along with the examples below, referring back as necessary.</p> <p>For now, let&rsquo;s assume that we have not a semi-Riemannian, but a Riemannian metric $$g = g_{ij} \, dx^i ⊗ dx^j$$ expressed in terms of the coordinates $$(x^1, \dotsc, x^n)$$. We want to find <em>basis one-forms</em> $$(θ^1, \dotsc, θ^n)$$ such that $g = ∑_i θ^i ⊗ θ^i\text{.}$ If the metric is diagonal, this is easy (suspending the summation convention): $θ^i = \sqrt{g_{ii}} \, dx^i\text{.}$ If instead the metric is not diagonal, we may still be able to factor it into a &ldquo;sum of squares&rdquo; form by inspection. Otherwise, an equivalent definition of the $$θ^i$$ is that $g^*(θ^i, θ^j) = δ^i_j\text{,}$ i.e. the basis one-forms $$θ^i$$ comprise an <em>orthonormal dual frame</em>. We can then use a <a href="https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process">Gram-Schmidt-like</a> process on the $$dx^i$$ or some ad hoc method to compute the basis one-forms.</p> <p>It is also convenient to express the coordinate forms in terms of the basis one-forms, which is again simple if the metric is diagonal: $dx^i = \frac{1}{\sqrt{g_{ii}}} \, θ^i\text{.}$ Otherwise, one would need to invert the matrix expressing the $$θ^i$$ in terms of the $$dx^i$$.</p> <div class="p">The next step is compute the <em>connection one-forms</em> $$\cnf{i}{j}$$. To do so, we compute the exterior derivatives of the basis one-forms $$dθ^i$$ and express them in terms of the basis two-forms, i.e. $dθ^i = a^i_{jk} \, θ^j ∧ θ^k$ for functions $$a^i_{jk}$$. Then we can use <em>Cartan&rsquo;s first structure equation</em> <div class="important-equation"> $dθ^i = -\cnf{i}{j} ∧ θ^j$ </div> and the fact that <em>the connection forms are skew symmetric</em> <div class="important-equation"> $\cnf{i}{j} = -\cnf{j}{i}$ </div> to deduce the $$\cnf{i}{j}$$.</div> <p>There&rsquo;s an explicit general formula for $$\cnf{i}{j}$$ in terms of the basis one-forms,<sup><a href="#fn2" id="r2"></a></sup> but it&rsquo;s often easier to compare the expressions for $$dθ^i$$ to the form of the first structure equation, guess what the connection forms are, taking advantage of their skew symmetry, and check that the first structure equation holds. In fact, if the metric is diagonal, the expressions for $$dθ^i$$ are nice enough that you can immediately read off the connection forms. This &ldquo;guess and check&rdquo; method works because the connection forms are guaranteeed to exist, and furthermore are guaranteed to be unique, so any guessed list of $$\cnf{i}{j}$$ that satisfies the first structure equation <em>must</em> be the connection forms.</p> <p>Note that skew symmetry immediately implies that (suspending the Einstein summation convention) $\cnf{i}{i} = 0\text{.}$ Therefore, we have $$n(n-1)/2$$ independent connection forms.</p> <p>There <em>is</em> a formula for the connection forms when $$g$$ is diagonal, which is more useful for deducing properties of diagonal metrics than it is for doing calculations. Suspending the summation convention, \begin{aligned} \cnf{i}{j} &= \frac{∂_j g_{ii}}{2 g_{ii} \sqrt{g_{jj}}} \, θ^i - \frac{∂_i g_{jj}}{2 g_{jj} \sqrt{g_{ii}}} \, θ^j \\ &= \frac{∂_j g_{ii}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^i - \frac{∂_i g_{jj}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^j\text{.} \end{aligned} This formula implies that a diagonal metric has connection forms with at most two components each, as opposed to $$n$$ components in general. Furthermore, if a diagonal metric depends only on a single coordinate $$x^r$$, the only possible non-zero connection forms up to skew symmetry are $$\cnf{i}{r}$$, which are proportional to $$θ^i$$. If instead a diagonal metric depends on two coordinates $$x^r$$ and $$x^s$$, then the only possible non-zero connection forms up to skew symmetry are $$\cnf{i}{r}$$, $$\cnf{i}{s}$$, or $$\cnf{r}{s}$$. The first two cases are proportional to $$θ^i$$, and the last case has at most two components: one proportional to $$θ^r$$ and another proportional to $$θ^s$$.</p> <div class="p">The connection forms play an important role similar to the Christoffel symbols, but we need them only to compute curvature. First, observer that we can express each connection form in two ways: in terms of the $$dx^i$$, and in terms of the $$θ^i$$. We need to compute the derivatives $$d\cnf{i}{j}$$, which is easiest to do if $$\cnf{i}{j}$$ is expressed in terms of the $$dx^i$$, since $$d(dx^i) = 0$$. Then we can compute the <em>curvature forms</em> $$\crf{i}{j}$$ using <em>Cartan&rsquo;s second structure equation</em> <div class="important-equation"> $\crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}\text{.}$ </div> Like the connection forms, <em>the curvature forms are skew symmetric</em>: <div class="important-equation"> $\crf{i}{j} = \crf{j}{i}\text{,}$ </div> so we need only calculate $$n(n-1)/2$$ independent curvature forms, i.e. the ones where $$i ≠ j$$. Also note that in the $$\cnf{i}{k} ∧ \cnf{k}{j}$$ term, one need only take the sum over the $$n - 2$$ terms $$k ∉ \{ i, j \}$$, by (suspending the summation convention) $$\cnf{i}{i} = \cnf{j}{j} = 0$$.</div> <p>From the properties discussed above, if a diagonal metric depends only on a single coordinate, then each curvature form $$\crf{i}{j}$$ is proportional to $$θ^i ∧ θ^j$$. If instead a diagonal metric depends on two coordinates $$x^r$$ and $$x^s$$, then each curvature form $$\crf{i}{r}$$ or $$\crf{i}{s}$$, up to skew symmetry, has at most two components: one proportional to $$θ^i ∧ θ^r$$ and another proportional to $$θ^i ∧ θ^s$$, and all other curvature forms $$\crf{i}{j}$$ are proportional to $$θ^i ∧ θ^j$$.</p> <p>At this point we&rsquo;re done, since the Riemann curvature tensor with respect to the orthonormal frame $$(E_1, \dotsc, E_n)$$ dual to $$(θ^1, \dotsc, θ^n)$$ is $\Riem{l}{kij} = \crf{l}{k}(E_i, E_j)$ and the Ricci curvature tensor is $\Ric{ij} = \crf{k}{i}(E_k, E_j)\text{.}$ Note that it&rsquo;s not necessary to explicitly calculate $$E_i$$; it&rsquo;s enough to use the definition $θ^i(E_j) = δ^i_j\text{,}$ and the definition of the wedge product to derive the relations $(θ^i ∧ θ^j)(E_k, E_l) = \begin{cases} +1 & k = i ≠ j = l \\ -1 & l = i ≠ j = k \\ 0 & \text{otherwise,} \end{cases}$ which can then be used to compute the curvature tensor components.</p> <p>From the properties discussed above, if a diagonal metric depends only on a single coordinate, then $$\crf{i}{j}$$ is proportional to $$θ^i ∧ θ^j$$, which implies that $$\Ric{}$$ is also diagonal. Furthermore, if the metric is diagonal and depends on two coordinates $$x^k$$ and $$x^l$$, then the only possible off-diagonal component is $$\Ric{kl}$$.<sup><a href="#fn3" id="r3"></a></sup></p> <div class="cheatsheet"> <h2>Cheatsheet: The moving frame method for Riemannian metrics</h2> <div class="p">Given the components $$g_{ij}$$ of a Riemannian metric: <ol> <li>Find an orthonormal dual frame, i.e. basis one-forms $$(θ^1, \dotsc, θ^n)$$ such that $g = ∑_i θ^i ⊗ θ^i\text{.}$ If the metric is diagonal, then (suspending the summation convention) $θ^i = \sqrt{g_{ii}} \, dx^i\text{.}$</li> <li>Use the first structure equation $dθ^i = -\cnf{i}{j} ∧ θ^j$ and the skew symmetry relations $\cnf{i}{j} = -\cnf{j}{i}$ to deduce the connection forms $$\cnf{i}{j}$$.</li> <li>Compute the curvature forms using the second structure equation $\crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}$ and the skew symmetry relations $\crf{i}{j} = -\crf{j}{i}\text{.}$ Note that it&rsquo;s easiest to compute $$d\cnf{i}{j}$$ when $$\cnf{i}{j}$$ is expressed in terms of the $$dx^i$$, since $$d(dx^i) = 0$$</li> <li>Compute the components of the Ricci curvature tensor via $\Ric{ij} = \crf{k}{i}(E_k, E_j)$ and the relations $(θ^i ∧ θ^j)(E_k, E_l) = \begin{cases} +1 & k = i ≠ j = l \\ -1 & l = i ≠ j = k \\ 0 & \text{otherwise.} \end{cases}$</li> </ol> </div> </section> <section> <header> <h2>Comparing the methods</h2> </header> <p>As we saw above, one advantage of the moving frame method is that, in the worst case, one need only compute $$n(n-1)/2$$ independent connection forms, each with at most $$n$$ components, rather than $$n^2(n+1)/2$$ independent Christoffel symbols&mdash;a saving of $$n^2$$ &ldquo;component calculations&rdquo;. Even in the simplest case, when the metric is diagonal, you still need to compute $$n^2$$ possibly non-zero independent Christoffel symbols, as opposed to $$n(n - 1)/2$$ independent connection forms, each with at most two components&mdash;still a saving of $$n$$ &ldquo;component calculations&rdquo;.</p> <p>Also, when computing a curvature form, one need only compute a single exterior derivative of a connection form and $$n - 2$$ wedge products of connection forms. This turns out to be less tedious than the corresponding calculation using coordinate methods of $$\Riem{k}{lij}$$ for fixed $$k$$ and $$l$$ such that $$k ≠ l$$.</p> <p>Furthermore, the orthonormality of the dual frame tends to cause symmetries to appear earlier in the calculation, leading to less wasted work. This is advantageous when you know the answer you&rsquo;re looking for, and it&rsquo;s particularly simple, e.g. if you expect the Ricci curvature to be zero, because calculations becoming unduly complicated becomes a sign of an undetected mistake. With coordinate methods, even if calculations become complicated, you can&rsquo;t rule out terms cancelling if you continue, so errors become apparent only later.</p> <p>On the other hand, the moving frame method requires a certain amount of cleverness, first in coming up with the one-forms $$θ^i$$ if the metric isn&rsquo;t diagonal, and second in deducing the connection forms $$\cnf{i}{j}$$. The coordinate methods require less thought, and are more &ldquo;plug and chug&rdquo;. In fact, once we examine the semi-Riemannian case later, we&rsquo;ll see that the coordinate methods remain unchanged, yet the moving frame method becomes more complicated.</p> </section> <section> <header> <h2>Example 1: Orthogonal coordinates on 2D surfaces</h2> </header> <p>Let $$g$$ be a Riemannian metric on a 2D manifold. The method of moving frames makes calculating curvature particularly easy, since there is exactly one connection form and one curvature form. For example, consider the special case when the metric is diagonal, i.e. with line element $ds^2 = E \, du^2 + G \, dv^2\text{.}$ </p> <ol> <li> <h3>Orthonormal dual frame</h3> <p>We can then read off an orthonormal dual frame: $ds^2 = {\underbrace{(\sqrt{E} \, du)}_{θ^1}}^2 + {\underbrace{(\sqrt{G} \, dv)}_{θ^2}}^2\text{,}$ i.e. $θ^1 = \sqrt{E} \, du \qquad θ^2 = \sqrt{G} \, dv\text{,}$ and express the coordinate forms in terms of it: $du = \frac{1}{\sqrt{E}} \, θ^1 \qquad dv = \frac{1}{\sqrt{G}} \, θ^2\text{.}$</p> </li> <li> <h3>Connection forms</h3> <p>The derivatives of the basis one-forms are \begin{aligned} dθ^1 &= \frac{∂_v E}{2 \sqrt{E}} \, dv ∧ du = \frac{∂_v E}{2 E \sqrt{G}} \, θ^2 ∧ θ^1 \\ dθ^2 &= \frac{∂_u G}{2 \sqrt{G}} \, du ∧ dv = \frac{∂_u G}{2 G \sqrt{E}} \, θ^1 ∧ θ^2 \end{aligned} and the first structure equations are \begin{aligned} dθ^1 &= -\cnf{1}{2} ∧ θ^2 \\ dθ^2 &= -\cnf{2}{1} ∧ θ^1 = \cnf{1}{2} ∧ θ^1\text{.} \end{aligned} Rewriting the derivative equations to match the first structure equations, <!-- TODO: File a bug for  in \text{}, and clean up the below once  is supported inside \text{}. --> \begin{aligned} dθ^1 &= -\overbrace{\left(\frac{∂_v E}{2 E \sqrt{G}} \, θ^1\right)}^{\text{one term of \cnf{1}{2}}} ∧ θ^2 \\ dθ^2 &= \underbrace{\left(-\frac{∂_u G}{2 G \sqrt{E}} \, θ^2\right)}_{\text{another term of \cnf{1}{2}}} ∧ θ^1\text{,} \end{aligned} we can guess that $\cnf{1}{2} = \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2\text{.}$ This guess works, since \begin{aligned} -\cnf{1}{2} ∧ θ^2 &= -\left( \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 \right) ∧ θ^2 \\ &= -\frac{∂_v E}{2 E \sqrt{G}} \, θ^1 ∧ θ^2 + \underbrace{\cancel{\frac{∂_u G}{2 G \sqrt{E}} \, θ^2 ∧ θ^2}}_{θ^2 ∧ θ^2 = 0} \\ &= dθ^1 \end{aligned} and \begin{aligned} \cnf{1}{2} ∧ θ^1 &= \left( \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 \right) ∧ θ^1 \\ &= \underbrace{\cancel{\frac{∂_v E}{2 E \sqrt{G}} \, θ^1 ∧ θ^1}}_{θ^1 ∧ θ^1 = 0} - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 ∧ θ^1 \\ &= dθ^2\text{,} \end{aligned} using the fact that $$θ^1 ∧ θ^1 = θ^2 ∧ θ^2 = 0$$. Therefore, by uniqueness of connection forms, this is <em>the</em> connection form. Then, expressing $$\cnf{1}{2}$$ in terms of both the basis one-forms and the coordinate forms, $\cnf{1}{2} = \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 = \frac{∂_v E}{2 \sqrt{EG}} \, du - \frac{∂_u G}{2 \sqrt{EG}} \, dv\text{.}$ (By a very similar method, one can derive the formula stated previously for the $$\cnf{i}{j}$$ of a diagonal metric.)</p> </li> <li> <h3>Curvature forms</h3> <p>Since we only have the single connection form $$\cnf{1}{2}$$, there are no non-zero $$\cnf{i}{k} ∧ \cnf{k}{j}$$ terms, since $$i$$, $$j$$, and $$k$$ would all have to be distinct. Using the expression for $$\cnf{1}{2}$$ in terms of the coordinate forms $$du$$ and $$dv$$, and that $$d(du) = d(dv) = 0$$, the single curvature form is: \begin{aligned} \crf{1}{2} = d\cnf{1}{2} &= \pd{}{v} \left( \frac{∂_v E}{2 \sqrt{EG}} \right) dv ∧ du - \pd{}{u} \left( \frac{∂_u G}{2 \sqrt{EG}} \right) du ∧ dv \\ &\begin{alignedat}{2} &= \, & -\frac{1}{2} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) & \, du ∧ dv \\ &= \, & -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) & \, θ^1 ∧ θ^2\text{.} \end{alignedat} \end{aligned}</p> </li> <li> <h3>Gaussian curvature</h3> <p>Therefore, we get the classical result that the Gaussian curvature $$K$$, which is equal to the single independent component of the Riemann curvature tensor (up to sign), is \begin{aligned} K &= \Riem{1}{212} = \crf{1}{2}(E_1, E_2) \\ &= -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) \, (θ^1 ∧ θ^2)(E_1, E_2) \\ &= -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right)\text{.} \end{aligned} </p> </li> </section> <section> <header> <h2>The semi-Riemannian case</h2> </header> <p>As we alluded to above, in the semi-Riemannian case, the coordinate methods remain unchanged, but the moving frame method gets more complicated. The equation that the one-forms must satisfy becomes $g = ∑_i ε_i \, θ^i ⊗ θ^i\text{,}$ where each $$ε_i$$ is $$±1$$ throughout the whole chart domain.<sup><a href="#fn4" id="r4"></a></sup> For example, in the Riemannian case, we let all $$ε_i = 1$$, and in the Lorentzian case we let $$ε_0 = -1$$ and all other $$ε_i = +1$$. (The entire list $$(ε_i)$$ is called the <a href="https://en.wikipedia.org/wiki/Metric_signature"<em>signature</em></a> of the metric.)</p> <p>If the metric is diagonal, then each $$g_{ii}$$ must be non-zero throughout the whole chart domain, so $$ε_i = \sgn(g_{ii})$$ and (suspending the summation convention) $θ^i = ε_i \sqrt{\lvert g_{ii} \rvert} \, dx^i\text{.}$</p> <p>The equivalent definition of the $$θ^i$$ becomes $g^*(θ^i, θ^j) = ε_i δ^i_j\text{,}$ where each $$ε_i$$ is $$±1$$ throughout the whole chart domain. Furthermore, the Gram-Schmidt process becomes harder to apply; you&rsquo;ll need to find a <em>non-degenerate basis</em> first; see <a href="https://math.stackexchange.com/q/2622562/343314">this Math StackExchange question</a> for details.</p> <div class="p">Both Cartan structure equations still hold, but the connection and curvature forms are not skew symmetric anymore; instead, they&rsquo;re <em>semi-skew symmetric</em>. Suspending the summation convention, <div class="important-equation"> \begin{aligned} \cnf{i}{j} &= -ε_i ε_j \cnf{j}{i} \\ \crf{i}{j} &= -ε_i ε_j \crf{j}{i}\text{.} \end{aligned} </div> Fortunately, this still implies that (suspending the Einstein summation convention) $\cnf{i}{i} = \crf{i}{i} = 0\text{.}$ </div> <p>The formula for the connection forms of a diagonal metric becomes (suspending the summation convention) \begin{aligned} \cnf{i}{j} &= \frac{∂_j g_{ii}}{2 g_{ii} \sqrt{g_{jj}}} \, θ^i - ε_i ε_j \frac{∂_i g_{jj}}{2 g_{jj} \sqrt{g_{ii}}} \, θ^j \\ &= \frac{∂_j g_{ii}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^i - ε_i ε_j \frac{∂_i g_{jj}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^j\text{.} \end{aligned} However, none of the deduced properties of diagonal metrics depending on one or two coordinates change.</p> <p>Finally, note that the relations $(θ^i ∧ θ^j)(E_k, E_l) = \begin{cases} +1 & k = i ≠ j = l \\ -1 & l = i ≠ j = k \\ 0 & \text{otherwise.} \end{cases}$ still hold.</p> <p>As you can tell, the moving frame method forces you to keep careful track of signs, which you may count as a disadvantage.</p> <div class="cheatsheet"> <h2>Cheatsheet: The moving frame method for semi-Riemannian metrics</h2> <div class="p">Given the components $$g_{ij}$$ of a semi-Riemannian metric: <ol> <li>Find an orthonormal dual frame, i.e. basis one-forms $$(θ^1, \dotsc, θ^n)$$ such that $g = ∑_i ε_i \, θ^i ⊗ θ^i\text{,}$ where each $$ε_i$$ is $$±1$$ throughout the whole chart domain. If the metric is diagonal, then (suspending the summation convention) $$ε_i = \sgn(g_{ii})$$, and $θ^i = ε_i \sqrt{\lvert g_{ii} \rvert} \, dx^i\text{.}$</li> <li>Use the first structure equation $dθ^i = -\cnf{i}{j} ∧ θ^j$ and the semi-skew symmetry relations (suspending the summation convention) $\cnf{i}{j} = -ε_i ε_j \cnf{j}{i}$ to deduce the connection forms $$\cnf{i}{j}$$.</li> <li>Compute the curvature forms using the second structure equation $\crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}$ and the semi-skew symmetry relations (suspending the summation convention) $\crf{i}{j} = -ε_i ε_j \crf{j}{i}\text{.}$ Note that it&rsquo;s easiest to compute $$d\cnf{i}{j}$$ when $$\cnf{i}{j}$$ is expressed in terms of the $$dx^i$$, since $$d(dx^i) = 0$$</li> <li>Compute the components of the Ricci curvature tensor via $\Ric{ij} = \crf{k}{i}(E_k, E_j)$ and the relations $(θ^i ∧ θ^j)(E_k, E_l) = \begin{cases} +1 & k = i ≠ j = l \\ -1 & l = i ≠ j = k \\ 0 & \text{otherwise.} \end{cases}$</li> </ol> </div> </div> </section> <section> <header> <h2>Example 2: The Schwarzschild metric</h2> </header> <p>Now we&rsquo;re ready to tackle a more complicated metric. For our first semi-Riemannian example, let $$g$$ be the <a href="https://en.wikipedia.org/wiki/Schwarzschild_metric"><em>Schwarzschild metric</em></a>, with line element $ds^2 = -f(r) \, dt^2 + f(r)^{-1} \, dr^2 + r^2 \, dΩ^2\text{,}$ where $f(r) = 1 - \frac{r_S}{r}\text{,}$ $$r_S$$ is the Schwarzschild radius, which is constant, and $dΩ^2 = dθ^2 + \sin^2 θ \, dφ^2$ is the line element of the round metric $$\mathring{g}$$ on the two-sphere. We want to show that this metric is <em>Ricci-flat</em>, i.e. has vanishing Ricci curvature.</p> <p>We can skip some steps by taking advantage of the metric being diagonal and depending only on the two coordinates $$r$$ and $$θ$$, but in the interest of showing the general method, we&rsquo;ll do everything the &ldquo;hard way&rdquo;, but we&rsquo;ll double-check that our results using the properties of diagonal metrics we deduced earlier.</p> <ol> <li> <h3>Orthonormal dual frame</h3> <p>Since the metric is diagonal, we can read off an orthonormal dual frame with its corresponding signature: $ds^2 = \; \underbrace{-}_{ε_0} \; {\underbrace{\left(f(r)^{1/2} \, dt\right)}_{ϑ^0}}^2 \; \underbrace{+}_{ε_1} \; {\underbrace{\left(f(r)^{-1/2} \, dr\right)}_{ϑ^1}}^2 \; \underbrace{+}_{ε_2} \; {\underbrace{(r \, dθ)}_{ϑ^2}}^2 \; \underbrace{+}_{ε_3} \; {\underbrace{(r \sin θ \, dφ)}_{ϑ^3}}^2\text{.}$ i.e. \begin{alignedat}{2} ϑ^0 &= \, & f(r)^{1/2} & \, dt \\ ϑ^1 &= \, & f(r)^{-1/2} & \, dr \\ ϑ^2 &= \, & r & \, dθ \\ ϑ^3 &= \, & r \sin θ & \, dφ \end{alignedat} with Lorentzian signature $$({-} \; {+} \; {+} \; {+})$$. We can then express the coordinate forms in terms of it: \begin{alignedat}{2} dt &= \, & f(r)^{-1/2} & \, ϑ^0 \\ dr &= \, & f(r)^{1/2} & \, ϑ^1 \\ dθ &= \, & r^{-1} & \, ϑ^2 \\ dφ &= \, & r^{-1} \csc θ & \, ϑ^3\text{.} \end{alignedat} Note that since we&rsquo;re using $$θ$$ as a coordinate, we use $$ϑ^λ$$ to denote the basis one-forms. Furthermore, since this metric is Lorentzian, we adopt the convention that the index of the first coordinate is $$0$$, Greek indices start from $$0$$, and Latin indices start from $$1$$.</p> </li> <li> <h3>Connection forms</h3> <p>The derivatives of the basis one-forms are \begin{alignedat}{2} dϑ^0 &= \frac{1}{2}f(r)^{-1/2} f'(r) \, dr ∧ dt & &= \frac{1}{2}f(r)^{-1/2} f'(r) \, ϑ^1 ∧ ϑ^0 \\ dϑ^1 &= 0 & & \\ dϑ^2 &= dr ∧ dθ & &= \frac{f(r)^{1/2}}{r} \, ϑ^1 ∧ ϑ^2 \\ dϑ^3 &= \sin θ \, dr ∧ dφ + r \cos θ \, dθ ∧ dφ & &= \frac{f(r)^{1/2}}{r} \, ϑ^1 ∧ ϑ^3 + \frac{\cot θ}{r} \, ϑ^2 ∧ ϑ^3\text{.} \end{alignedat} By semi-skew symmetry, since $$ε_0 = -1$$ and $$ε_i = 1$$, $$\cnf{0}{i} = \cnf{i}{0}$$ and $$\cnf{i}{j} = -\cnf{j}{i}$$. Therefore, we can explicitly write out the first structure equations: \begin{alignedat}{4} dϑ^0 &= & &- \cnf{0}{1} ∧ ϑ^1 & &- \cnf{0}{2} ∧ ϑ^2 & &- \cnf{0}{3} ∧ ϑ^3 \\ dϑ^1 &= -\cnf{0}{1} ∧ ϑ^0 & & & &- \cnf{1}{2} ∧ ϑ^2 & &- \cnf{1}{3} ∧ ϑ^3 \\ dϑ^2 &= -\cnf{0}{2} ∧ ϑ^0 & &+ \cnf{1}{2} ∧ ϑ^1 & & & &- \cnf{2}{3} ∧ ϑ^3 \\ dϑ^3 &= -\cnf{0}{3} ∧ ϑ^0 & &+ \cnf{1}{3} ∧ ϑ^1 & &+ \cnf{2}{3} ∧ ϑ^2\text{,} & & \end{alignedat} and rewriting the derivative equations to match: \begin{alignedat}{3} dϑ^0 &= & \; -\overbrace{\left(\frac{1}{2}f(r)^{-1/2} f'(r) \, ϑ^0\right)}^{\text{one term of \cnf{0}{1}}} &∧ ϑ^1 & & \\ dϑ^1 &= 0 & & & & \\ dϑ^2 &= & \overbrace{\left(-\frac{f(r)^{1/2}}{r} \, ϑ^2\right)}^{\text{one term of \cnf{1}{2}}} &∧ ϑ^1 & & \\ dϑ^3 &= & \underbrace{\left(-\frac{f(r)^{1/2}}{r} \, ϑ^3\right)}_{\text{one term of \cnf{1}{3}}} &∧ ϑ^1 & \; + \; \underbrace{\left( -\frac{\cot θ}{r} \, ϑ^3 \right)}_{\text{one term of \cnf{2}{3}}} &∧ ϑ^2\text{,} \end{alignedat} we can guess that \begin{alignedat}{2} \cnf{0}{1} &= \, & \frac{1}{2} f(r)^{-1/2} f'(r) & \, ϑ^0 \\ \cnf{1}{2} &= \, & -\frac{f(r)^{1/2}}{r} & \, ϑ^2 \\ \cnf{1}{3} &= \, & -\frac{f(r)^{1/2}}{r} & \, ϑ^3 \\ \cnf{2}{3} &= \, & -\frac{\cot θ}{r} & \, ϑ^3\text{.} \end{alignedat} Happily, plugging these expressions back into the first structure equations, we find that they hold. Therefore, by uniqueness of the connection forms, they are <em>the</em> connection forms.</p> <p>Rather than plugging our guess into the first structure equations, a slicker way to see that it works would be to split up the first structure equation thus: $dϑ^λ = -∑_{λ \lt μ} \cnf{λ}{μ} ∧ ϑ^μ - ∑_{λ &gt; μ} \cnf{λ}{μ} ∧ ϑ^μ\text{,}$ and notice that our derivative equations have the particularly simple form $dϑ^λ = ∑_{λ \lt μ} (f_μ \, ϑ^λ) ∧ ϑ^μ\text{,}$ so setting $\cnf{λ}{μ} = -f_μ \, ϑ^λ \quad \text{for λ \lt μ}$ takes care of the left sum above. Then by semi-skew symmetry, if $$λ \gt μ$$, $\lvert \cnf{λ}{μ} ∧ ϑ^μ \rvert = \lvert \cnf{μ}{λ} ∧ ϑ^μ \rvert = \lvert (f_λ \, ϑ^μ) ∧ ϑ^μ \rvert = 0\text{.}$ Thus all terms in the right sum above vanish as required.</p> <p>Then, expressing the connection forms in terms of both the basis one-forms and the coordinate forms, \begin{alignedat}{6} \cnf{0}{1} &= & &\cnf{1}{0} & &= \quad & \frac{1}{2} f(r)^{-1/2} f'(r) \, &ϑ^0 & \quad &= \quad & \frac{1}{2} f'(r) \, &dt \\ \cnf{2}{1} &= & \; -&\cnf{1}{2} & &= \quad & \frac{f(r)^{1/2}}{r} \, &ϑ^2 & \quad &= \quad & f(r)^{1/2} \, &dθ \\ \cnf{3}{1} &= & \; -&\cnf{1}{3} & &= \quad & \frac{f(r)^{1/2}}{r} \, &ϑ^3 & \quad &= \quad & f(r)^{1/2} \sin θ \, &dφ \\ \cnf{3}{2} &= & \; -&\cnf{2}{3} & &= \quad & \frac{\cot θ}{r} \, &ϑ^3 & \quad &= \quad & \cos θ \, &dφ \text{.} \end{alignedat}</p> <p>Note that $$\cnf{2}{1}$$ has only one component instead of two; this is because $$g_{11}$$ doesn&rsquo;t depend on $$θ$$. The other connection forms are either zero or have only one component, as expected for a diagonal metric depending on two coordinates.</p> </li> <li> <h3>Curvature forms</h3> <p>Using the expressions for $$\cnf{μ}{ν}$$ in terms of the coordinate one-forms, since $$d(dt) = d(dr) = d(dθ) = d(dφ) = 0$$, the derivatives of the connection forms are: \begin{aligned} d \cnf{0}{1} &= \frac{1}{2} f''(r) \, dr ∧ dt \\ &= \frac{1}{2} f''(r) \, ϑ^1 ∧ ϑ^0 \\ d \cnf{2}{1} &= \frac{1}{2} f(r)^{-1/2} f'(r) \, dr ∧ dθ \\ &= \frac{f'(r)}{2r} \, ϑ^1 ∧ ϑ^2 \\ d \cnf{3}{1} &= \frac{1}{2} f(r)^{-1/2} f'(r) \sin ϑ \, dr ∧ dφ + f(r)^{1/2} \cos θ \, dθ ∧ dφ \\ &= \frac{f'(r)}{2r} \, ϑ^1 ∧ ϑ^3 + \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 \\ d \cnf{3}{2} &= -\sin θ \, dθ ∧ dφ \\ &= -\frac{1}{r^2} \, ϑ^2 ∧ ϑ^3\text{.} \end{aligned} For $$\cnf{μ}{λ} ∧ \cnf{λ}{ν}$$, recalling that one need only sum over $$λ ∉ \{ μ, ν \}$$, the non-zero terms are \begin{alignedat}{3} \cnf{0}{λ} ∧ \cnf{λ}{2} &= \cnf{0}{1} ∧ \cnf{1}{2} & &= \; & -\frac{f'(r)}{2r} \, &ϑ^0 ∧ ϑ^2 \\ \cnf{0}{λ} ∧ \cnf{λ}{3} &= \cnf{0}{1} ∧ \cnf{1}{3} & &= \; & -\frac{f'(r)}{2r} \, &ϑ^0 ∧ ϑ^3 \\ \cnf{1}{λ} ∧ \cnf{λ}{3} &= \cnf{1}{2} ∧ \cnf{2}{3} & &= \; & \frac{f(r)^{1/2} \cot θ}{r^2} \, &ϑ^2 ∧ ϑ^3 \\ \cnf{2}{λ} ∧ \cnf{λ}{3} &= \cnf{2}{1} ∧ \cnf{1}{3} & &= \; & -\frac{f(r)}{r^2} \, &ϑ^2 ∧ ϑ^3\text{.} \end{alignedat} Then we can compute the curvature forms: \begin{aligned} \crf{0}{1} &= d\cnf{0}{1} = \frac{1}{2} f''(r) \, ϑ^1 ∧ ϑ^0 \\ \crf{0}{2} &= \cnf{0}{λ} ∧ \cnf{λ}{2} = -\frac{f'(r)}{2r} \, ϑ^0 ∧ ϑ^2 \\ \crf{0}{3} &= \cnf{0}{λ} ∧ \cnf{λ}{3} = -\frac{f'(r)}{2r} \, ϑ^0 ∧ ϑ^3 \\ \crf{1}{2} &= d\cnf{1}{2} = -\frac{f'(r)}{2r} \, ϑ^1 ∧ ϑ^2 \\ \crf{1}{3} &= d\cnf{1}{3} + \cnf{1}{λ} ∧ \cnf{λ}{3} \\ &= -\frac{f'(r)}{2r} \, ϑ^1 ∧ ϑ^3 - \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 + \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 \\ &= -\frac{f'(r)}{2r} \, ϑ^1 ∧ ϑ^3 \\ \crf{2}{3} &= d\cnf{2}{3} + \cnf{2}{λ} ∧ \cnf{λ}{3} \\ &= \frac{1}{r^2} \, ϑ^2 ∧ ϑ^3 - \frac{f(r)}{r^2} \, ϑ^2 ∧ ϑ^3 \\ &= \frac{1 - f(r)}{r^2} \, ϑ^2 ∧ ϑ^3\text{.} \end{aligned} Again by semi-skew symmetry, since $$ε_0 = -1$$ and $$ε_i = 1$$, $$\crf{0}{i} = \crf{i}{0}$$ and $$\crf{i}{j} = -\crf{j}{i}$$. Therefore, \begin{alignedat}{3} \crf{0}{1} &= \; & \crf{1}{0} &= \; & \frac{1}{2} f''(r) \, &ϑ^1 ∧ ϑ^0 \\ \crf{0}{2} &= \; & \crf{2}{0} &= \; & -\frac{f'(r)}{2r} \, &ϑ^0 ∧ ϑ^2 \\ \crf{0}{3} &= \; & \crf{3}{0} &= \; & -\frac{f'(r)}{2r} \, &ϑ^0 ∧ ϑ^3 \\ \crf{1}{2} &= \; & -\crf{2}{1} &= \; & -\frac{f'(r)}{2r} \, &ϑ^1 ∧ ϑ^2 \\ \crf{1}{3} &= \; & -\crf{3}{1} &= \; & -\frac{f'(r)}{2r} \, &ϑ^1 ∧ ϑ^3 \\ \crf{2}{3} &= \; & -\crf{3}{2} &= \; & \frac{1 - f(r)}{r^2} \, &ϑ^2 ∧ ϑ^3\text{.} \end{alignedat} </p> </li> <li> <h3>Ricci curvature</h3> <p>We can compute the Ricci tensor $$\Ric{μν}$$ as $\Ric{μν} = \Riem{λ}{μλν} = \crf{λ}{μ}(E_λ, E_ν)\text{,}$ where the $$E_λ$$ comprise the dual frame to $$ϑ^λ$$. From the relations $(θ^μ ∧ θ^ν)(E_ρ, E_σ) = \begin{cases} +1 & σ = μ ≠ ν = ρ \\ -1 & ρ = μ ≠ ν = σ \\ 0 & \text{otherwise,} \end{cases}$ we can examine the expressions above and conclude that $$\crf{ρ}{σ}(E_μ, E_ν)$$ is possibly non-zero only when $$\{ μ, ν \} = \{ ρ, σ \}$$. Furthermore, examining the expression for $$\Ric{μν}$$, we can further conclude that $$\Ric{μν}$$ is zero when $$μ ≠ ν$$. Therefore, it suffices to check $$\Ric{λλ}$$. (One of the properties we deduced for a diagonal metric depending on two coordinates was that $$\Ric{}$$ would be diagonal except for possibly $$\Ric{12}$$, but since $$\cnf{1}{2}$$ turned out to not have a $$ϑ^1$$ term, that immediately leads to $$\Ric{12} = 0$$.)</p> <p>From the expressions above, \begin{aligned} \crf{0}{1}(E_0, E_1) &= -\frac{1}{2} f''(r) \\ \crf{0}{2}(E_0, E_2) &= \crf{0}{3}(E_0, E_3) = \crf{1}{2}(E_1, E_2) = \crf{1}{3}(E_1, E_3) = -\frac{f'(r)}{2r} \\ \crf{2}{3}(E_2, E_3) &= \frac{1 - f(r)}{r^2}\text{,} \end{aligned} so using the skew symmetry of two-forms $\crf{μ}{ν}(E_ρ, E_σ) = -\crf{μ}{ν}(E_σ, E_ρ)$ and the semi-skew symmetry of $$\crf{μ}{ν}$$ $\crf{0}{i} = \crf{i}{0} \quad \text{and} \quad \crf{i}{j} = -\crf{j}{i} \text{,}$ we can compute $$\Ric{λλ}$$: \begin{aligned} \Ric{00} &= \crf{1}{0}(E_1, E_0) + \crf{2}{0}(E_2, E_0) + \crf{3}{0}(E_3, E_0) \\ &= -\crf{0}{1}(E_0, E_1) - \crf{0}{2}(E_0, E_2) - \crf{0}{3}(E_0, E_3) \\ &= \frac{1}{2} f''(r) + \frac{f'(r)}{r} \\ \Ric{11} &= \crf{0}{1}(E_0, E_1) + \crf{2}{1}(E_2, E_1) + \crf{3}{1}(E_3, E_1) \\ &= \crf{0}{1}(E_0, E_1) + \crf{1}{2}(E_1, E_2) + \crf{1}{3}(E_1, E_3) \\ &= -\Ric{00} \\ \Ric{22} &= \crf{0}{2}(E_0, E_2) + \crf{1}{2}(E_1, E_2) + \crf{3}{2}(E_3, E_2) \\ &= \crf{0}{2}(E_0, E_2) + \crf{1}{2}(E_1, E_2) + \crf{2}{3}(E_2, E_3) \\ &= -\frac{f'(r)}{r} + \frac{1 - f(r)}{r^2} \\ \Ric{33} &= \crf{0}{3}(E_0, E_3) + \crf{1}{3}(E_1, E_3) + \crf{2}{3}(E_2, E_3) \\ &= \Ric{22}\text{.} \end{aligned}</p> <p>Finally, a computation shows that for $$f(r) = 1 - \frac{r_S}{r}$$, $\frac{1 - f(r)}{r^2} = -\frac{1}{2} f''(r) = \frac{f'(r)}{r} \text{,}$ so all the Ricci tensor components above vanish.<sup><a href="#fn5" id="r5"></a></sup></p> </li> </ol> </section> <section> <header> <h2>Example 3: The pp-wave metric</h2> </header> <p>For our last example, to keep things interesting, let&rsquo;s consider a non-diagonal metric. Let $g = H(u, x, y) \, du ⊗ du + du ⊗ dv + dv ⊗ du + dx ⊗ dx + dy ⊗ dy$ be the <a href="https://en.wikipedia.org/wiki/Pp-wave_spacetime"><em>pp-wave metric</em></a>, where $$H(u, x, y)$$ is some smooth function. We want to derive a necessary and sufficient condition for $$g$$ to be Ricci-flat.</p> <ol> <li> <h3>Orthonormal dual frame</h3> <p>This metric has the matrix $G = \begin{pmatrix} H & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\text{,}$ which has inverse $G^{-1} = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 1 & -H & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\text{,}$ so the dual metric is $g^* = ∂_u ⊗ ∂_v + ∂_v ⊗ ∂_u - H(u, x, y) \, ∂_v ⊗ ∂_v + ∂_x ⊗ ∂_x + ∂_y ⊗ ∂_y\text{.}$ We can see that $$dx$$ and $$dy$$ form part of an orthonormal dual frame, but we have to find the other two, which involve $$du$$ and $$dv$$. First we have to figure out the signature of the metric. So set \begin{aligned} θ^0 &= A \, du + B \, dv \\ θ^1 &= C \, du + D \, dv \\ θ^2 &= dx \\ θ^3 &= dy\text{,} \end{aligned} and solve for $$A$$, $$B$$, $$C$$, and $$D$$ using the orthonormality conditions \begin{aligned} g^*(θ^0, θ^0) &= 2AB - B^2 H = ε_0 \\ g^*(θ^0, θ^1) &= AD + BC - BDH = 0 \\ g^*(θ^1, θ^1) &= 2CD - D^2 H = ε_1\text{.} \end{aligned} The tricky thing is to pick the $$θ^μ$$ without assuming that $$H$$ is non-zero. The simplest way to do that is to assume that none of the coefficients of $$H$$ vanish, and, since we have four unknowns (not counting $$ε_0$$ and $$ε_1$$) and three equations, to set $$B = 1$$. Then the first equation gives $$A = (ε_0 + H)/2$$, the second equation gives $$C = D(H - A)$$, and plugging everything into the third equation gives $$D^2 = -ε_1 / ε_0$$, which implies that $$ε_1 = -ε_0$$ and $$D = ±1$$. Set $$ε_0 = -1$$ to make the frame have a Lorentzian signature $$({-} \; {+} \; {+} \; {+})$$, and let $$D = ε$$. Then \begin{aligned} A &= \frac{H - 1}{2} \\ B &= 1 \\ C &= ε\frac{H + 1}{2} \\ D &= ε\text{.} \end{aligned} Setting $$ε = 1$$ for symmetry, we finally have \begin{aligned} θ^0 &= \frac{H-1}{2} \, du + dv \\ θ^1 &= \frac{H+1}{2} \, du + dv = θ^0 + du \\ θ^2 &= dx \\ θ^3 &= dy \end{aligned} and \begin{aligned} du &= θ^1 - θ^0 \\ dx &= θ^2 \\ dy &= θ^3\text{;} \end{aligned} it&rsquo;ll turn out that we don&rsquo;t need to express $$dv$$ in terms of the $$θ^μ$$.</p> </li> <li> <h3>Connection forms</h3> <p>Since \begin{aligned} θ^1 &= θ^0 + du \\ θ^2 &= dx \\ θ^3 &= dy\text{,} \end{aligned} the derivatives of the basis one-forms are \begin{aligned} dθ^0 &= dθ^1 = \frac{1}{2} (H_x \, dx + H_y \, dy) ∧ du \\ &= \frac{H_x}{2} \, θ^2 ∧ θ^1 - \frac{H_x}{2} \, θ^2 ∧ θ^0 + \frac{H_y}{2} \, θ^3 ∧ θ^1 - \frac{H_y}{2} \, θ^3 ∧ θ^0 \\ dθ^2 &= 0 \\ dθ^3 &= 0\text{.} \end{aligned} </p> <p>Similarly to the Schwarzschild example, by semi-skew symmetry, since $$ε_0 = -1$$ and $$ε_i = 1$$, $$\cnf{0}{i} = \cnf{i}{0}$$ and $$\cnf{i}{j} = -\cnf{j}{i}$$. Therefore, we can explicitly write out the first structure equations: \begin{alignedat}{4} dθ^0 &= & &- \cnf{0}{1} ∧ θ^1 & &- \cnf{0}{2} ∧ θ^2 & &- \cnf{0}{3} ∧ θ^3 \\ dθ^1 &= -\cnf{0}{1} ∧ θ^0 & & & &- \cnf{1}{2} ∧ θ^2 & &- \cnf{1}{3} ∧ θ^3 \\ dθ^2 &= -\cnf{0}{2} ∧ θ^0 & &+ \cnf{1}{2} ∧ θ^1 & & & &- \cnf{2}{3} ∧ θ^3 \\ dθ^3 &= -\cnf{0}{3} ∧ θ^0 & &+ \cnf{1}{3} ∧ θ^1 & &+ \cnf{2}{3} ∧ θ^2\text{.} & & \end{alignedat} However, unlike the Schwarzschild example, we can&rsquo;t simply read off the non-zero connection forms; for example, it&rsquo;s not immediately clear whether the $$\frac{H_x}{2} \, θ^2 ∧ θ^1$$ term in $$dθ^0$$ belongs to the $$\cnf{0}{1} ∧ θ^1$$ term or the $$\cnf{0}{2} ∧ θ^2$$ term. However, since $$dθ^0 = dθ^1$$, we can guess that $$\cnf{0}{2} = \cnf{1}{2}$$ and $$\cnf{0}{3} = \cnf{1}{3}$$. Subtracting the first structure equations for $$dθ^1$$ and $$dθ^0$$, we get $\cnf{0}{1} ∧ (θ^1 - θ^0) = 0\text{,}$ i.e. that $$\cnf{0}{1} ∼ θ^1 - θ^0$$. However, plugging this into the first structure equation for $$dθ^0$$ or $$dθ^1$$, we get a $$θ^0 ∧ θ^1$$ term, which isn&rsquo;t present in the derivative equation for $$dθ^0 = dθ^1$$, which then implies that $$\cnf{0}{1} = 0$$. Thus, there&rsquo;s only one way to assign each term of the derivative equation for $$dθ^0 = dθ^1$$ to $$\cnf{0}{2} ∧ θ^2$$ or $$\cnf{0}{3} ∧ θ^3$$: \begin{aligned} \cnf{0}{2} &= \cnf{1}{2} = -\frac{H_x}{2} \, (θ^1 - θ^0) = -\frac{H_x}{2} \, du \\ \cnf{0}{3} &= \cnf{1}{3} = -\frac{H_y}{2} \, (θ^1 - θ^0) = -\frac{H_y}{2} \, du\text{.} \end{aligned} Plugging this into the structure equations for $$dθ^2$$ and $$dθ^3$$, we get \begin{aligned} dθ^2 &= -\cnf{0}{2} ∧ θ^0 + \cnf{1}{2} ∧ θ^1 - \cnf{2}{3} ∧ θ^3 \\ &= \cnf{0}{2} ∧ du - \cnf{2}{3} ∧ θ^3 \\ &= -\frac{H_x}{2} \, du ∧ du - \cnf{2}{3} ∧ θ^3 \\ &= -\cnf{2}{3} ∧ θ^3 \\ dθ^3 &= -\cnf{0}{3} ∧ θ^0 + \cnf{1}{3} ∧ θ^1 + \cnf{2}{3} ∧ θ^2 \\ &= \cnf{0}{3} ∧ du + \cnf{2}{3} ∧ θ^2 \\ &= -\frac{H_y}{2} \, du ∧ du + \cnf{2}{3} ∧ θ^2 \\ &= \cnf{2}{3} ∧ θ^2\text{.} \end{aligned} Since $$dθ^2 = dθ^3 = 0$$ from the derivative equations, $$\cnf{2}{3}$$ is proportional to both $$θ^2$$ and $$θ^3$$, i.e. $$\cnf{2}{3} = 0$$. We&rsquo;ve found expressions for $$\cnf{μ}{ν}$$ that satisfy the first structure equations. Therefore, by uniqueness of the connection forms, these expressions are <em>the</em> connection forms. Then, expressing the connection forms in terms of both the basis one-forms and the coordinate forms, \begin{aligned} \cnf{0}{2} &= \cnf{2}{0} = \cnf{1}{2} = -\cnf{2}{1} = -\frac{H_x}{2} \, (θ^1 - θ^0) = -\frac{H_x}{2} \, du \\ \cnf{0}{3} &= \cnf{3}{0} = \cnf{1}{3} = -\cnf{3}{1} = -\frac{H_y}{2} \, (θ^1 - θ^0) = -\frac{H_y}{2} \, du\text{.} \end{aligned}</p> </li> <li> <h3>Curvature forms</h3> <p>Using the expressions for $$\cnf{μ}{ν}$$ in terms of the coordinate one-forms, since $$d(du) = 0$$, the derivative of $$\cnf{0}{2} = \cnf{1}{2}$$ is \begin{aligned} d\cnf{0}{2} &= d\cnf{1}{2} = -\frac{1}{2} \, dH_x ∧ du \\ &= -\frac{1}{2} (H_{xx} \, dx + H_{xy} \, dy) ∧ du \\ &= -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0) \end{aligned} and similarly the derivative of $$\cnf{0}{3} = \cnf{1}{3}$$ is $d\cnf{0}{3} = d\cnf{1}{3} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.}$ Since all the connection forms are proportional to $$du$$, all possible sums $$\cnf{μ}{λ} ∧ \cnf{λ}{ν}$$ equal $$0$$. Then we can compute the curvature forms: \begin{aligned} \crf{0}{2} &= \crf{1}{2} = -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0) \\ \crf{0}{3} &= \crf{1}{3} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.} \end{aligned} Again by semi-skew symmetry, since $$ε_0 = -1$$ and $$ε_i = 1$$, $$\crf{0}{i} = \crf{i}{0}$$ and $$\crf{i}{j} = -\crf{j}{i}$$. Therefore, \begin{aligned} \crf{0}{2} &= \crf{2}{0} = \crf{1}{2} = -\crf{2}{1} = -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0) \\ \crf{0}{3} &= \crf{3}{0} = \crf{1}{3} = -\crf{3}{1} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.} \end{aligned} </p> </li> <li> <h3>Ricci curvature</h3> <p>We can compute the Ricci tensor $$\Ric{μν}$$ as $\Ric{μν} = \Riem{λ}{μλν} = \crf{λ}{μ}(E_λ, E_ν)\text{,}$ where the $$E_λ$$ comprise the dual frame to $$ϑ^λ$$. First, using the relations $(θ^μ ∧ θ^ν)(E_ρ, E_σ) = \begin{cases} +1 & σ = μ ≠ ν = ρ \\ -1 & ρ = μ ≠ ν = σ \\ 0 & \text{otherwise,} \end{cases}$ we compute $\Ric{0ν} = \crf{λ}{0}(E_λ, E_ν) = \crf{0}{λ}(E_λ, E_ν) = \crf{0}{2}(E_2, E_ν) + \crf{0}{3}(E_3, E_ν)$ and see that it&rsquo;s only non-zero for $$ν ∈ \{ 0, 1 \}$$; furthermore, $$\Ric{01} = -\Ric{00}$$. Similarly, $\Ric{1ν} = \crf{λ}{1}(E_λ, E_ν) = -\crf{1}{λ}(E_λ, E_ν) = -\crf{0}{λ}(E_λ, E_ν) = -\Ric{0ν}\text{.}$ For the last two, we can save some effort by calculating $$(θ^1 - θ^0)(E_0 + E_1) = 0$$, which implies $(θ^μ ∧ (θ^1 - θ^0))(E_ν, E_0 + E_1) = 0\text{.}$ Then, using skew symmetry of two-forms $\crf{μ}{ν}(E_ρ, E_σ) = -\crf{μ}{ν}(E_σ, E_ρ)\text{,}$ we compute $\Ric{2ν} = \crf{λ}{2}(E_λ, E_ν) = -\crf{λ}{2}(E_ν, E_λ) = -\crf{0}{2}(E_ν, E_0) - \crf{1}{2}(E_ν, E_1) = -\crf{0}{2}(E_ν, E_0 + E_1) = 0$ and $\Ric{3ν} = \crf{λ}{3}(E_λ, E_ν) = -\crf{λ}{3}(E_ν, E_λ) = -\crf{0}{3}(E_ν, E_0) - \crf{1}{3}(E_ν, E_1) = -\crf{0}{3}(E_ν, E_0 + E_1) = 0\text{,}$ so it suffices to compute $$\Ric{00}$$: \begin{aligned} \Ric{00} &= \crf{0}{2}(E_2, E_0) + \crf{0}{3}(E_3, E_0) \\ &= \frac{1}{2} (H_{xx} + H_{yy})\text{.} \end{aligned} Finally, we can conclude that the pp-wave metric is Ricci flat exactly when $H_{xx} + H_{yy} = 0\text{.}$</p> </li> </ol> </section> <section> <header> <h2>Further reading</h2> </header> <p>The classic reference for the method of moving frames is Volume&nbsp;2, Chapter&nbsp;7 of Spivak&rsquo;s &ldquo;A Comprehensive Introduction to Differential Geometry&rdquo;. However, this only covers the Riemannian case. For the semi-Riemannian case, look to §&nbsp;1.8 of O&rsquo;Neill&rsquo;s &ldquo;The Geometry of Kerr Black Holes&rdquo;, or §&nbsp;14.6 of <a href="https://en.wikipedia.org/wiki/Gravitation_(book)">Gravitation</a>.</p> </section> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> A metric can only be diagonal <em>with respect to a particular coordinate system</em>, but for brevity I&rsquo;ll only mention it here. <a href="#r1">↩</a></p> <p id="fn2"> See p.&nbsp;52 of <em>The Geometry of Kerr Black Holes</em> by Barret O&lsquo;Neill. <a href="#r2">↩</a></p> <p id="fn3"> The paper <a href="https://arxiv.org/abs/gr-qc/9602015">&ldquo;Ricci Tensor of Diagonal Metric&rdquo;</a> has a similar discussion using coordinate methods; note that the calculations are much more laborious! <a href="#r3">↩</a></p> <p id="fn4"> One subtle technical point is that there might not be such an expression for $$g$$ throughout the whole chart domain; see <a href="https://math.stackexchange.com/q/2625887/343314">this Math StackExchange question</a> for details. In practice, though, this doesn&rsquo;t turn out to be a problem. <a href="#r4">↩</a></p> <p id="fn5"> The Schwarzschild metric describes the field outside a spherically symmetric and non-rotating massive body. If we let $$f(r)$$ have an $$r^{-2}$$ term, e.g. $f(r) = 1 - \frac{r_S}{r} + \frac{r_Q^2}{r^2}$ for some constant $$r_Q$$, then we have non-vanishing Ricci components. However, this metric, called the <a href="https://en.wikipedia.org/wiki/Reissner%E2%80%93Nordstr%C3%B6m_metric">Reissner–Nordström metric</a>, is still useful, as it describes a <em>charged</em>, spherically symmetric, non-rotating massive body. <a href="#r5">↩</a></p> </section> https://www.akalin.com/intro-erasure-codes A Gentle Introduction to Erasure Codes 2017-11-30T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script src="https://unpkg.com/preact@8.2.7"></script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn.js"></script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn2.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/arithmetic.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/math.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/field_257.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/field_256.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/rational.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/matrix.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/cauchy_erasure_code.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/demo_common.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless_demo_common.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/matrix_demo_common.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/erasure_code_demo_common.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless_div_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/row_reduce.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless_add_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless_mul_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/carryless_div_demo_util.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/field_257_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/field_256_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/cauchy_matrix_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/matrix_inverse_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/compute_parity_demo.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/intro-erasure-codes/8d5e10f/reconstruct_data_demo.js"></script> <script> KaTeXMacros = { "\\clplus": "\\oplus", "\\clminus": "\\ominus", "\\clmul": "\\otimes", "\\cldiv": "\\oslash", "\\bclmod": "\\mathbin{\\mathrm{clmod}}", }; </script> <section> <header> <h2>1. Overview</h2> </header> <p>This article explains Reed-Solomon erasure codes and the problems they solve in gory detail, with the aim of providing enough background to understand how the <a href="https://en.wikipedia.org/wiki/Parchive">PAR1 and PAR2</a> file formats work, the details of which will be covered in future articles.</p> <p>I&rsquo;m assuming that the reader is familiar with programming, but has not had much exposure to coding theory or linear algebra. Thus, I&rsquo;ll review the basics and treat the results we need as a &ldquo;black box&rdquo;, stating them and moving on. However, I&rsquo;ll give self-contained proofs of those results in a companion article.</p> <p>So let&rsquo;s start with the problem we&rsquo;re trying to solve! Let&rsquo;s say you have $$n$$ files of roughly the same size, and you want to guard against $$m$$ of them being lost or corrupted. To do so, you generate $$m$$ <em>parity files</em> ahead of time, and if in the future you lose up to $$m$$ of the data files, you can use an equal number of parity files to recover the lost data files.</p> <style> .fig { display: flex; flex-flow: row; width: 100%; } .fig img { border: 1px solid black; height: auto; } .fig div.column { display: flex; align-items: center; flex-flow: column; flex-grow: 1; justify-content: center; } #fig1 div.column > div { margin: 0.5em; } #fig1 img { width: 9.375em; } #fig2 img { margin: 0.5em 0em; width: 6.25em; } </style> <figure> <div class="fig" id="fig1"> <div class="column"> <div> <div><code>cashcat0.jpg</code></div> <img src="intro-erasure-codes-files/cashcat0.jpg" /> </div> <div> <div><code>cashcat1.jpg</code></div> <img src="intro-erasure-codes-files/cashcat1.jpg" /> </div> <div> <div><code>cashcat2.jpg</code></div> <img src="intro-erasure-codes-files/cashcat2.jpg" /> </div> </div> <div class="column"> <div>$$\xmapsto{\mathtt{GenerateParityFiles}}$$</div> </div> <div class="column"> <div> <div><code>cashcats.p00</code></div> <img src="intro-erasure-codes-files/cashcats.p00.png" /> </div> <div> <div><code>cashcats.p01</code></div> <img src="intro-erasure-codes-files/cashcats.p01.jpg" /> </div> </div> </div> <figcaption> <span class="figure-text">Figure 1</span>&ensp; Using parity codes to protect against the loss or corruption of up to two images (out of three) of <a href="https://twitter.com/CatsAndMoney">cashcats</a>. </figcaption> </figure> <figure> <div class="fig" id="fig2"> <div class="column"> <img src="intro-erasure-codes-files/cashcat0-glitched.png" /> <img src="intro-erasure-codes-files/cashcat1.jpg" /> <img src="intro-erasure-codes-files/broken-image.png" /> <img src="intro-erasure-codes-files/cashcats.p00.png" /> <img src="intro-erasure-codes-files/cashcats.p01.jpg" /> </div> <div class="column"> <div>$$\xmapsto{\mathtt{ReconstructDataFiles}}$$</div> </div> <div class="column"> <img src="intro-erasure-codes-files/cashcat0.jpg" /> <img src="intro-erasure-codes-files/cashcat1.jpg" /> <img src="intro-erasure-codes-files/cashcat2.jpg" /> </div> </div> <figcaption> <span class="figure-text">Figure 2</span>&ensp; With a corrupted and a missing file, recovering the original cashcat images using the parity files from Figure&nbsp;1. </figcaption> </figure> <p>Note that this works even if you lose some of the parity files also; as long as you have $$n$$ files, whether they be data or parity files, you&rsquo;ll be able to recover the original $$n$$ data files. Compare making $$n$$ parity files with simply making a copy of the $$n$$ data files (for $$n > 1$$). In the latter case, if you lose both a data file and its copy, that data file becomes unrecoverable! So parity files take the same amount of space but provide superior recovery capabilities.</p> <p>Now we can reduce the problem above to a byte-level problem as follows. Have <code>ComputeParityFiles</code> pad all the data files so they&rsquo;re the same size, and then for each byte position <code>i</code> call a function <code>ComputeParityBytes</code> on the <code>i</code>th byte of each data file, and store the results into the <code>i</code>th byte of each parity file. Also take a checksum or hash of each data file and store those (along with the original data file sizes) with the parity files. Then, <code>ReconstructDataFiles</code> can detect corrupted files using the checksums/hashes and treat them as missing, and then for each byte position <code>i</code> it can call a function <code>ReconstructDataBytes</code> on the <code>i</code>th byte of each good data and parity file to recover the <code>i</code>th byte of the corrupted/missing data files.</p> <p>A byte error where we <em>know</em> the position of the dropped/corrupted byte is called an <em>erasure</em>. Then, the pair of functions <code>ComputeParityBytes</code> and <code>ReconstructDataBytes</code> which behave as described above implements what is called an <a href="https://en.wikipedia.org/wiki/Erasure_code#Optimal_erasure_codes"><em>optimal erasure code</em></a>; it&rsquo;s an erasure code because it guards only against byte erasures, and not more general errors where we don&rsquo;t know which data bytes have been corrupted, and it&rsquo;s optimal because in general you need at least $$n$$ known bytes to recover the $$n$$ data bytes, and that bound is achieved.</p> <div class="p">In detail, an optimal erasure code is composed of some set of possible $$(n, m)$$ pairs, and for each possible pair, a function <pre class="code-container"><code class="language-javascript">ComputeParityBytes&lt;n, m&gt;(data: byte[n]) -> (parity: byte[m])</code></pre> that computes $$m$$ parity bytes given $$n$$ data bytes, and a function <pre class="code-container"><code class="language-javascript">ReconstructDataBytes&lt;n, m&gt;(partialData: (byte?)[n], partialParity: (byte?)[m]) -> ((data: byte[n]) | Error)</code></pre> that takes in a partial list of data and parity bytes from an earlier call to <code>ComputeParity</code>, and returns the full list of data bytes if there are at least $$n$$ known data or parity bytes (i.e., there are no more than $$m$$ omitted data or parity bytes), and an error otherwise.</div> <p>(In the above pseudocode, I&rsquo;m using <code>T[n]</code> to mean an array of <code>n</code> objects of type <code>T</code>, and <code>byte?</code> to mean <code>byte | None</code>. Also, I&rsquo;ll omit the <code>-Bytes&lt;n, m&gt;</code> suffix from now on.)</p> <p>By the end of this article, we&rsquo;ll find out exactly how the following example works:</p> <div class="interactive-example"> <h3>Example 1: <code>ComputeParity</code> and <code>ReconstructData</code></h3> <div class="interactive-example" id="computeParityDemo"> <h3><code>ComputeParity</code></h3> Let <span style="white-space: nowrap;"> <var>d</var> = [ da, db, 0d ] </span> be the input data bytes and let <span style="white-space: nowrap;"> <var>m</var> = 2 </span> be the desired parity byte count. Then the output parity bytes are <span style="white-space: nowrap;"> <var>p</var> = [ <span class="result">52</span>, <span class="result">0c</span> ]. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('computeParityDemo'); render(h(ComputeParityDemo, { initialD: 'da, db, 0d', initialM: '2', name: 'computeParityDemo', detailed: false, header: h('h3', {}, h('code', {}, 'ComputeParity')), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <br /> <div class="interactive-example" id="reconstructDataDemo"> Let <span style="white-space: nowrap;"> <var>d</var><sub>partial</sub> = [ ??, db, ?? ] </span> be the input partial data bytes and <span style="white-space: nowrap;"> <var>p</var><sub>partial</sub> = [ 52, 0c ] </span> be the input partial parity bytes. Then the output data bytes are <span style="white-space: nowrap;"> <var>d</var> = [ <span class="result">da</span>, <span class="result">db</span>, <span class="result">0d</span> ]. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('reconstructDataDemo'); render(h(ReconstructDataDemo, { initialPartialD: '??, db, ??', initialPartialP: '52, 0c', name: 'reconstructDataDemo', detailed: false, header: h('h3', {}, h('code', {}, 'ReconstructData')), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> </div> </section> <section> <header> <h2>2. Erasure codes for $$m = 1$$</h2> </header> <div class="p">The simplest erasure codes are when $$m = 1$$. For example, define <pre class="code-container"><code class="language-javascript">ComputeParitySum(data: byte[n]) { return [data + &hellip; + data[n-1]] }</code></pre> where we consider <code>byte</code> to be an unsigned type such that addition and subtraction wrap around, i.e. byte arithmetic is done modulo $$256$$. Then also define <pre class="code-container"><code class="language-javascript">ReconstructDataSum(partialData: (byte?)[n], partialParity: (byte?)) { if <em>there is more than one entry of partialData or partialParity set to None</em> { return Error } else if <em>partialData has no entry set to None</em> { return partialData } i := partialData.firstIndexOf(None); partialSum = partialData + &hellip; + partialData[i-1] + partialData[i+1] + &hellip; + partialData[n-1] return partialData[0:i] ++ [partialParity - partialSum] ++ partialData[i+1:n] }</code></pre> where <code>a[i:j]</code> means the subarray of <code>a</code> starting at <code>i</code> and ending (without inclusion) at <code>j</code>, and <code>++</code> is array concatenation.</div> <p>This simple erasure code uses the fact that if you have the sum of a list of numbers, then you can recover a missing number by subtracting the sum of the other numbers from the total sum, and also that this works even if you do the arithmetic modulo $$256$$.</p> <div class="p">Another erasure code for $$m = 1$$ uses <a href="https://en.wikipedia.org/wiki/Exclusive_or#Bitwise_operation">bitwise exclusive or</a> (denoted as xor, <code>^</code>, or $$\oplus$$) instead of arithmetic modulo $$256$$. Define <pre class="code-container"><code class="language-javascript">ComputeParityXor(data: byte[n]) { return [data &oplus; &hellip; &oplus; data[n-1]] }</code></pre> and <pre class="code-container"><code class="language-javascript">ReconstructDataXor(partialData: (byte?)[n], partialParity: (byte?)) { if <em>there is more than one entry of partialData or partialParity set to None</em> { return Error } else if <em>partialData has no entry set to None</em> { return partialData } i := partialData.firstIndexOf(None); partialXor = partialData &oplus; &hellip; &oplus; partialData[i-1] &oplus; partialData[i&oplus;1] &oplus; &hellip; &oplus; partialData[n-1] return partialData[0:i] ++ [partialParity &oplus; partialXor] ++ partialData[i+1:n] }</code></pre> </div> <p>This relies on the fact that $$a \oplus a = 0$$, so given the xor of a list of bytes, you can recover a missing byte by xoring with all the known bytes.</p> </section> <section> <header> <h2>3. Erasure codes for $$m = 2$$ (almost)</h2> </header> <p>Now coming up with an erasure code for $$m = 2$$ is more involved, but we can get an inkling of how it could work by letting $$n = 3$$ for simplicity, and also letting the output of <code>ComputeParity</code> be non-negative integers, instead of just bytes (i.e., less than $$256$$). In that case, we can consider parity numbers that are weighted sums of the data bytes. For example, like in the $$m = 1$$ case, we can have the first parity number be $p_0 = d_0 + d_1 + d_2\text{,}$ (using $$d_i$$ for data bytes and $$p_i$$ for parity numbers) but for the second parity number, we can pick different weights, say $p_1 = 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2\text{.}$ We want to make sure that the weights for the second parity number are &ldquo;sufficiently different&rdquo; from that of the first parity number, which we&rsquo;ll clarify later, but for example note that setting $p_1 = 2 \cdot d_0 + 2 \cdot d_1 + 2 \cdot d_2$ can&rsquo;t add any new information, since then $$p_1$$ will always be equal to $$2 \cdot p_0$$.</p> <div class="p">So then our <code>ComputeParity</code> function looks like <pre class="code-container"><code class="language-javascript">ComputeParityWeighted(data: byte) { return [ int(data) + int(data) + int(data), int(data) + 2 * int(data) + 3 * int(data), ] }</code></pre> </div> <div class="p">As for <code>ReconstructData</code>, if we have two missing data bytes, say $$d_i$$ and $$d_j$$ for $$i &lt; j$$, and $$p_0$$ and $$p_1$$, we can rearrange the equations \begin{aligned} p_0 &= d_0 + d_1 + d_2 \\ p_1 &= 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2 \end{aligned} to get all the unknowns on the left side, letting $$d_k$$ be the known data byte: \begin{aligned} d_i + d_j &= X = p_0 - d_k \\ (i+1) \cdot d_i + (j+1) \cdot d_j &= Y = p_1 - (k + 1) \cdot d_k\text{.} \end{aligned} We can then multiply the first equation by $$i + 1$$ and subtract it from the second to cancel the $$d_i$$ term and get $d_j = (Y - (i + 1) \cdot X) / (j - i)\text{,}$ and then we can use the first equation to solve for $$d_i$$: $d_i = X - d_j = ((j + 1) \cdot X - Y) / (j - i)\text{.}$ Thus with these equations, we can implement <code>ReconstructData</code>: <pre class="code-container"><code class="language-javascript">ReconstructDataWeighted(partialData: (byte?), partialParity: (int?)) { <em>Handle all cases except when there are exactly two entries set to none in partialData.</em> [i, j] := <em>indices of the unknown data bytes</em> k := <em>index of the known data byte</em> X := partialParity - partialData[k] Y := partialParity - (k + 1) * partialData[k]; d_i := ((j + 1) * X - Y) / (j - i) d_j := (Y - (i + 1) * X) / (j - i) return <em>an array with d_i, d_j, and d[k] in the right order</em> }</code></pre> (Generalizing this to larger values of $$n$$ is straightforward; $$p_0$$ will still have a weight of $$1$$ for each data byte, and $$p_1$$ will have a weight of $$i + 1$$ for $$d_i$$. $$X$$ and $$Y$$ will then have terms for all known bytes, and everything else proceeds the same after that.)</div> <p>Now what goes wrong if we just try to do everything modulo $$256$$? The most obvious difference from the $$m = 1$$ case is that solving for $$d_i$$ or $$d_j$$ involves division, which works fine for non-negative integers as long as there&rsquo;s no remainder, but it is not immediately clear how division can make sense modulo $$256$$.</p> <p>One possible way to define division modulo $$256$$ would be to first define the <em>multiplicative inverse</em> modulo $$256$$ of an integer $$0 \le x \lt 256$$ as the integer $$0 \le y \lt 256$$ such that $$(x \cdot y) \bmod 256 = 1$$, if it exists, and then define division by $$x$$ modulo $$256$$ to be multiplication by $$y$$ modulo $$256$$. But this immediately runs into problems; $$2$$ has no multiplicative inverse modulo $$256$$, and the same holds for any even number, so reconstruction will fail if, for example, we have the first and third data bytes missing, since then we&rsquo;d be trying to divide by $$j - i = 2$$.</p> <p>But for now, let&rsquo;s leave aside the problem of generating parity bytes instead of parity numbers, and instead focus on how we can generalize the above for larger values of $$m$$. To do so, we need to first review some linear algebra.</p> </section> <section> <header> <h2>4. Just enough linear algebra to get by<sup><a href="#fn1" id="r1"></a></sup></h2> </header> <p>In our $$n = 3, m = 2$$ example in the previous section, the equations for the parity numbers have the form $p = a_0 \cdot d_0 + a_1 \cdot d_1 + a_2 \cdot d_2$ for constants $$a_0$$, $$a_1$$, and $$a_2$$. We call such a weighted sum of the $$d_i$$s a <em>linear combination</em> of the $$d_i$$s, and we write this in a tabular form $p = \begin{pmatrix} a_0 & a_1 & a_2 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,}$ where we define the multiplication of a <em>row vector</em> and a <em>column vector</em> by the equation above, generalized in the straightforward manner for any $$n$$.</p> <p>Then since we have two parity numbers $$p_0$$ and $$p_1$$, each a linear combination of the $$d_i$$s, i.e. \begin{aligned} p_0 &= a_{00} \cdot d_0 + a_{01} \cdot d_1 + a_{02} \cdot d_2 \\ p_1 &= a_{10} \cdot d_0 + a_{11} \cdot d_1 + a_{12} \cdot d_2\text{,} \end{aligned} we can write this in a single tabular form as $\begin{bmatrix} p_0 \\ p_1 \end{bmatrix} = \begin{pmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,}$ where we define the multiplication of a <em>matrix</em> and a column vector by the equations above.</p> <p>Now if we restrict parity numbers to be linear combinations of the data bytes, then we can identify a function <code>ComputeParity</code> using some set of weights with the matrix formed from that set of weights as above. This holds in general: if a function is defined as a list of linear combinations of its inputs, then it can be represented using a matrix as above, and we call it a <em>linear function</em>. Then we have a correspondence between linear functions that take $$n$$ numbers to $$m$$ numbers and matrices with $$m$$ rows and $$n$$ columns, which are denoted as $$m \times n$$ matrices.</p> <p>As the first example of this correspondence, note that we denote the elements of the matrix above as $$a_{ij}$$, where the first index is the row index and the second index is the column index. Looking back to the parity equations, we also see that the first index corresponds to the output arguments of <code>ComputeParity</code>, and the second index corresponds to the input arguments.<sup><a href="#fn2" id="r2"></a></sup></p> <p>The usefulness of the correspondence between linear functions and matrices is that we can store and manipulate a linear function by storing and manipulating its corresponding matrix of weights, which you wouldn&rsquo;t be able to easily do for functions in general. For example, as we&rsquo;ll see below, we&rsquo;ll be able to compute the inverse of a linear function by matrix operations, which will be useful for <code>ReconstructData</code>.</p> <div class="p">First, let&rsquo;s examine some simple matrix operations and their effects on the corresponding linear function: <ul> <li><em>Deleting a row</em> of a matrix corresponds to <em>deleting an output</em> of a linear function.</li> <li><em>Swapping two rows</em> of a matrix corresponds to <em>swapping two outputs</em> of a linear function.</li> <li><em>Appending a row</em> to a matrix corresponds to <em>adding an output</em> to a linear function.</li> </ul> In general, matrix row operations correspond to manipulations of a linear function&rsquo;s outputs.</div> <div class="p">An important operation on functions is composition: if $$f$$ takes $$k$$ inputs to $$m$$ outputs, and $$g$$ takes $$m$$ inputs to $$n$$ outputs, then we can define $$(g \circ f)(x_0, \dotsc, x_k) = g(f(x_0, \dotsc, x_k))$$ which takes $$k$$ inputs to $$n$$ outputs. It turns out that the composition of two linear functions is again a linear function, and so there must be an operation which takes the corresponding $$m \times k$$ matrix $$F$$ and the $$n \times m$$ matrix $$G$$ and yields a $$n \times k$$ matrix. This important operation, the bane of high-schoolers everywhere, is called <a href="https://en.wikipedia.org/wiki/Matrix_multiplication"><em>matrix multiplication</em></a>, denoted by $$F \cdot G$$. If $$H = F \cdot G$$, then the explicit formula for its elements is $h_{ij} = \sum_{k=0}^{m-1} f_{ik} \cdot g_{kj}\text{,}$ which corresponds to the following code: <pre class="code-container"><code class="language-javascript">matrixMultiply(f: Matrix, g: Matrix) { if (f.columns != g.rows) { return Error } h := new Matrix(f.rows, g.columns) for i := 0 to f.rows - 1 { for j := 0 to g.columns - 1 { t := 0 for k := 0 to f.columns - 1 { t += f[i, k] * g[k, j] } h[i, j] = t } } return h }</code></pre> You can convince yourself that the above formula and code is correct by trying to compose some small linear functions by hand. </div> <p>A useful property of matrix multiplication is that it&rsquo;s a generalization of the product of a row vector and a column vector, and the product of a matrix and a column vector as we defined above.</p> <p>I would be remiss if I didn&rsquo;t talk about the consequences of defining matrix multiplication as the matrix of the composition of the corresponding linear functions. First, this immediately implies that you can only multiply matrices if the left matrix has the same number of rows as the number of columns of the right matrix, which corresponds to the fact that you can only compose functions if the left function takes the same number of inputs as the number of outputs of the right function. Furthermore, even if you have two $$n \times n$$ matrices $$F$$ and $$G$$, unlike numbers, it is not true that $$F \cdot G = G \cdot F$$, which corresponds to the fact that in general, for two functions that take $$n$$ inputs to $$n$$ outputs, it is not true that $$f \circ g = g \circ f$$. If you learned matrix multiplication just from the formula above, then these facts are much less obvious!</p> <p>Finally, an important function is the <a href="https://en.wikipedia.org/wiki/Identity_function"><em>identity function</em></a> $$\mathrm{Id}_n$$, which return its $$n$$ inputs as its outputs. It corresponds to the <a href="https://en.wikipedia.org/wiki/Identity_matrix"><em>identity matrix</em></a> $I_n = \begin{pmatrix} 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 1 \end{pmatrix}\text{.}$</p> <p>For a linear function $$f$$ that takes $$n$$ inputs to $$n$$ outputs, if there is a function $$g$$ such that $$f \circ g = \mathrm{Id}_n$$, then we call $$g$$ the inverse of $$f$$, and denote it as $$f^{-1}$$. (It is also true that $$f^{-1} \circ f = \mathrm{Id}_n$$, i.e. $$(f^{-1})^{-1} = f$$.) Not all linear functions taking $$n$$ inputs to $$n$$ outputs have inverses, but if the inverse exists, it is also linear (and unique, which is why we call it <em>the</em> inverse). Therefore, we can define the <em>inverse</em> of an $$n \times n$$ (or <em>square</em>) matrix $$M$$ as the unique matrix $$M^{-1}$$ such that $$M \cdot M^{-1} = M^{-1} \cdot M = I_n$$, if it exists; also, if $$M$$ has an inverse, we say that $$M$$ is <em>invertible</em>.</p> <div class="interactive-example"> <h3>Example 2: The matrix/linear function correspondence</h3> <div class="p">Let $M = \begin{pmatrix} 1 & 2 \\ 3 & 4\end{pmatrix}\text{.}$ This corresponds to the linear function <pre class="code-container"><code class="language-javascript">f(x: rational) { return [ 1 * x + 2 * x, 3 * x + 4 * x, ] }</code></pre> where <code>rational</code> is an arbitrary-precision rational number type.</div> <div class="p">$$M$$ is invertible with inverse $M^{-1} = \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2\end{pmatrix}\text{.}$ This corresponds to the linear function <pre class="code-container"><code class="language-javascript">g(y: rational) { return [ -2 * x + 1 * x, (3/2) * x + (-1/2) * x, ] }</code></pre> so <code>g</code> is the inverse function of <code>f</code>. Indeed, <code>f([5, 6])</code> is <code>[17, 39]</code> and <code>g([17, 39])</code> is <code>[5, 6]</code>.</div> </div> <p>So now we&rsquo;ve reduced the problem of finding the inverse of a linear function taking $$n$$ inputs to $$n$$ outputs to finding the inverse of an $$n \times n$$ matrix. Before we tackle the question of computing those inverses, let&rsquo;s first recast our problem in the language of linear algebra and see why we need to find the inverse of a linear function.</p> </section> <section> <header> <h2>5. Erasure codes in general</h2> </header> <div class="p">So, revisiting our $$n = 3, m = 2$$ erasure code from above, we have the linear function <pre class="code-container"><code class="language-javascript">ComputeParityWeighted(data: byte) { return [ int(data) + int(data) + int(data), int(data) + 2 * int(data) + 3 * int(data), ] }</code></pre> which therefore corresponds to the <em>parity matrix</em> $P = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix}\text{.}$ So in mathematical notation, <code>ComputeParityWeighted</code> looks like: $\begin{bmatrix} p_0 \\ p_1 \end{bmatrix} = \mathtt{ComputeParityWeighted}(d_0, d_1, d_2) = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{.}$ </div> <p>So let&rsquo;s now reimplement <code>ReconstructDataWeighted</code> using linear algebra. First, append the rows of $$P$$ to the identity matrix $$I_3$$ to get the matrix equation $\begin{bmatrix} d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1 \end{bmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,}$ which corresponds to a linear function that returns the input data bytes in addition to computing the parity numbers. Now let&rsquo;s say we lose the data bytes $$d_0$$ and $$d_2$$. Then let&rsquo;s remove the rows corresponding to those bytes: $\begin{bmatrix} \xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1 \end{bmatrix} = \begin{pmatrix} \xcancel{1} & \xcancel{0} & \xcancel{0} \\ 0 & 1 & 0 \\ \xcancel{0} & \xcancel{0} & \xcancel{1} \\ 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,}$ which turns into $\begin{bmatrix} d_1 \\ p_0 \\ p_1 \end{bmatrix} = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix}\text{,}$ which corresponds to a linear function that maps the input data bytes to the non-lost data bytes and the parity bytes. This is the <em>inverse</em> of the function we want, so we want to invert the $$3 \times 3$$ matrix above, which we&rsquo;ll call $$M$$. That inverse is $M^{-1} = \begin{pmatrix} -1/2 & 3/2 & -1/2 \\ 1 & 0 & 0 \\ -1/2 & -1/2 & 1/2 \end{pmatrix}\text{.}$ Multiplying both sides above by $$M^{-1}$$, we get $\begin{bmatrix} d_0 \\ d_1 \\ d_2 \end{bmatrix} = \begin{pmatrix} -1/2 & 3/2 & -1/2 \\ 1 & 0 & 0 \\ -1/2 & -1/2 & 1/2 \end{pmatrix} \cdot \begin{bmatrix} d_1 \\ p_0 \\ p_1 \end{bmatrix}\text{,}$ which is exactly what we want: the original data bytes in terms of the known data bytes and the parity numbers!<sup><a href="#fn3" id="r3"></a></sup></p> <p>Comparing this equation to the one we manually computed previously, they don&rsquo;t look immediately similar, but some rearrangement will reveal that they indeed compute the same thing. As a sanity check, notice that the second row of $$M^{-1}$$ means that the first input argument is mapped unchanged to the second output argument, which is exactly what we want for the known byte $$d_1$$.</p> <p>Now what does this look like in general, i.e. for arbitrary $$n$$ and $$m$$? First, we have to generate an $$m \times n$$ parity matrix $$P$$ whose rows have to be &ldquo;sufficiently different&rdquo; from each other, which we still have to clarify. Then <code>ComputeParity</code> just multiplies $$P$$ by $$[d]$$, the column matrix of input bytes, like so: $\begin{bmatrix} p_0 \\ \vdots \\ p_{m-1} \end{bmatrix} = \mathtt{ComputeParity}(d_0, \dotsc, d_{n-1}) = \begin{pmatrix} p_0 \\ \vdots \\ p_{m-1} \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ \vdots \\ d_{n-1} \end{bmatrix}\text{,}$ where the $$p_i$$ are the rows of $$P$$.</p> <p>As for <code>ReconstructData</code>, we first append the rows of $$P$$ to $$I_n$$, whose rows we&rsquo;ll denote as $$e_i$$: $\begin{bmatrix} d_0 \\ \vdots \\ d_{n-1} \\ p_0 \\ \vdots \\ p_{m-1} \end{bmatrix} = \begin{pmatrix} e_0 \\ \vdots \\ e_{n-1} \\ p_0 \\ \vdots \\ p_{m-1} \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ \vdots \\ d_{n-1} \end{bmatrix}\text{.}$ Now assume that the indices of the missing $$k \le m$$ data bytes are $$i_0, \dotsc, i_{k-1}$$. Then we remove the rows corresponding to the missing data bytes, and keep some $$k$$ parity rows, e.g. $$p_0$$ to $$p_{k-1}$$. This yields the equation $\begin{bmatrix} d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\ p_0 \\ \vdots \\ p_{k-1} \end{bmatrix} = \begin{pmatrix} e_{j_0} \\ \vdots \\ e_{j_{n-k-1}} \\ p_0 \\ \vdots \\ p_{k-1} \end{pmatrix} \cdot \begin{bmatrix} d_0 \\ \vdots \\ d_{n-1} \end{bmatrix}\text{,}$ where $$j_0, \dotsc, j_{m-k-1}$$ are the indices of the <em>present</em> $$n - k$$ data bytes. Call that $$n \times n$$ matrix $$M$$, and compute its inverse $$M^{-1}$$. If $$P$$ was chosen correctly, $$M^{-1}$$ should always exist, so if the inverse computation fails, raise an error. Therefore, <code>ReconstructData</code> just multiplies $$M^{-1}$$ by the column matrix of present data bytes and chosen parity numbers: $\begin{bmatrix} d_0 \\ \vdots \\ d_{n-1} \end{bmatrix} = \mathtt{ReconstructData}(d_{j_0}, \dotsc, d_{j_{n-k-1}}, p_0, \dotsc, p_{k-1}) = M^{-1} \cdot \begin{bmatrix} d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\ p_0 \\ \vdots \\ p_{k-1} \end{bmatrix}\text{.}$ </p> <p>As an optimization, some rows of $$M^{-1}$$ correspond to just shuffling around the known data bytes $$d_{j_*}$$, so we can just remove those rows, compute the missing data bytes, and do the shuffling ourselves.</p> <div class="p">So we now have outlines of implementations of both <code>ComputeParity</code> and <code>ReconstructData</code>, but we still have missing pieces. In particular, <ol> <li>How do we compute matrix inverses?</li> <li>How do we generate &ldquo;optimal&rdquo; parity matrices so that $$M^{-1}$$ always exists?</li> <li>How do we compute parity bytes instead of parity numbers?</li> </ol> </div> <p>So first, let&rsquo;s see how to compute matrix inverses using row reduction.</p> </section> <section> <header> <h2>6. Finding matrix inverses using row reduction</h2> </header> <p>We developed the theory of matrices by identifying them with linear functions of numbers. To show how to find matrix inverses, we have to look at them in a slightly different way by identifying matrix equations with systems of linear equations of numbers.</p> <p>For example, consider the matrix equation $M \cdot x = y\text{,}$ where $M = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}\text{,} \quad x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \text{,} \quad \text{and } y = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.}$ This expands to $\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{,}$ or \begin{aligned} y_1 &= 1 \cdot x_1 + 2 \cdot x_2 \\ y_2 &= 3 \cdot x_1 + 4 \cdot x_2\text{,} \end{aligned} which is a linear system of equations of numbers. Letting $$M$$ be any matrix, and $$x$$ and $$y$$ be appropriately-sized column matrices of variables, we can see that the matrix equation $$M \cdot x = y$$ is shorthand for a system of linear equations of numbers.</p> <p>If we could find $$M^{-1}$$, we could solve the matrix equation easily by multiplying both sides by it: \begin{aligned} M^{-1} \cdot (M \cdot x) &= M^{-1} \cdot y \\ x &= M^{-1} \cdot y\text{,} \end{aligned} and therefore solve the linear system for $$x$$ in terms of $$y$$. Conversely, if we were able to solve the linear system for $$x$$, we&rsquo;d then be able to read off $$M^{-1}$$ from the new linear system.</p> <div class="p">But how do we solve a linear system? From the theory of linear systems of equations, we have a few tools at our disposal: <ul> <li>swapping two equations,</li> <li>multiplying an equation by a number,</li> <li>adding one equation to another, possibly multiplying the equation by a number before adding.</li> </ul> </div> <p>All of these are valid transformations because they don&rsquo;t change the solution set of the linear system.</p> <p>For example, in the equation above, the first step would be to subtract $$3$$ times the first equation from the second equation to yield \begin{aligned} y_1 &= x_1 + 2 \cdot x_2 \\ y_2 - 3 \cdot y_1 &= -2 \cdot x_2\text{.} \end{aligned} Then, add the second equation back to the first equation: \begin{aligned} y_2 - 2 \cdot y_1 &= x_1 \\ y_2 - 3 \cdot y_1 &= -2 \cdot x_2\text{.} \end{aligned} Finally, divide the second equation by $$-2$$: \begin{aligned} y_2 - 2 \cdot y_1 &= x_1 \\ (3/2) \cdot y_1 - (1/2) \cdot y_2 &= x_2\text{.} \end{aligned} This is equivalent to the matrix equation $\begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\text{,}$ so $M^{-1} = \begin{pmatrix} -2 & 1 \\ 3/2 & -1/2 \end{pmatrix}\text{.}$ </p> <p>So how do we translate the above process to an algorithm operating on matrices? First, express our matrix equation in a slightly different form: $M \cdot x = I \cdot y\text{.}$ Using the example above, this is $\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.}$ Then, you can see that the first step above corresponds to subtracting $$-3$$ times the first row from the second row to yield: $\begin{pmatrix} 1 & 2 \\ 0 & -2 \end{pmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{pmatrix} 1 & 0 \\ -3 & 1 \end{pmatrix} \cdot \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{.}$ We don&rsquo;t even need to keep writing the $$x$$ and $$y$$ column matrices; we can just write the &ldquo;augmented&rdquo; matrix. $A = \left( \hskip -5pt \begin{array}{cc|cc} 1 & 2 & 1 & 0 \\ 0 & -2 & -3 & 1 \end{array} \hskip -5pt \right)$ and operate on it.</p> <div class="p">Thus, the operations listed above on linear systems have corresponding operations on augmented matrices: <ul> <li><em>swapping two equations</em> corresponds to <em>swapping two rows</em>;</li> <li><em>multiplying an equation by a number</em> corresponds to <em>multiplying a row by a number</em>; and</li> <li><em>adding an equation to another</em>, possibly multiplying the equation by a number before adding, corresponds to <em>adding a row to another row</em>, possibly multiplying the row by a number before adding.</li> </ul> Then, the goal is to use these <em>row operations</em> to transform the initial augmented matrix, where the right side looks like the identity matrix, into one where the left side looks like the identity matrix. Then, translating the augmented matrix back into a matrix equation, that would give $$M^{-1}$$ on the right side.<sup><a href="#fn4" id="r4"></a></sup></div> <div class="p">When doing this by hand, one usually works with the linear system itself, trying to see which variables can be easily eliminated so as to minimize arithmetic. However, to translate this to an algorithm, we&rsquo;re more interested in a systematic way of doing this. Fortunately, there&rsquo;s an easy two-step process: <ol> <li>Turn the left side of $$A$$ into a <em>unit upper triangular matrix</em>, which means that all the elements on the main diagonal are $$1$$, and all elements below the main diagonal are $$0$$, i.e. that $$a_{ii} = 1$$ for all $$i$$, and $$a_{ij} = 0$$ for all $$j > i$$.</li> <li>Then turn the left side of $$A$$ into the identity matrix.</li> </ol> This algorithm is called <a href="https://en.wikipedia.org/wiki/Row_reduction">row reduction</a>. The first step can be further broken down: <ol type="a"> <li>For each column $$i$$ of the left side in ascending order: <ol type="i"> <li>If $$a_{ii}$$ is zero, look at the rows below the $$i$$th row for a row $$j > i$$ such that $$a_{ji} \ne 0$$, and swap rows $$i$$ and $$j$$. If no such row exists, return an error, as that means that $$A$$ is non-invertible.</li> <li>Divide the $$i$$th row by $$a_{ii}$$, so that $$a_{ii} = 1$$.</li> <li>For each row $$j > i$$, subtract $$a_{ji}$$ times the $$i$$th row from it, which will set $$a_{ji}$$ to $$0$$.</li> </ol> </li> </ol> The second step can be similarly broken down: <ol type="a"> <li>For each column $$i$$ of the left side, in order: <ol type="i"> <li>For each row $$j &lt; i$$, subtract $$a_{ji}$$ times the $$i$$th row from it, which will set $$a_{ji}$$ to $$0$$.</li> </ol> </li> </ol> </div> <p>Note that we&rsquo;re assuming that all arithmetic is exact, i.e. we use a arbitrary-precision rational number type. If we use floating point numbers, we&rsquo;d have to worry a lot more about the order in which we do operations and numerical stability.</p> <style> .swap-row-a { color: #dc322f; /* solarized red */ } .swap-row-b { color: #268bd2; /* solarized blue */ } .divide-row { color: #dc322f; /* solarized red */ } .subtract-scaled-row-src { color: #268bd2; /* solarized blue */ } .subtract-scaled-row-dest { color: #dc322f; /* solarized red */ } </style> <div class="interactive-example" id="matrixInverseDemo"> <h3>Example 3: Matrix inversion via row reduction</h3> Let <pre> / 0 2 2 \ M = | 3 4 5 | \ 6 6 7 /.</pre> The initial augmented matrix <var>A</var> is <pre>/ 0 2 2 | 1 0 0 \ | 3 4 5 | 0 1 0 | \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>00</sub> to be non-zero, so swap rows <span class="swap-row-a">0</span> and <span class="swap-row-b">1</span>: <pre>/ <span class="swap-row-a">0 2 2</span> | <span class="swap-row-a">1 0 0</span> \ / <span class="swap-row-b">3 4 5</span> | <span class="swap-row-b">0 1 0</span> \ | <span class="swap-row-b">3 4 5</span> | <span class="swap-row-b">0 1 0</span> | --> | <span class="swap-row-a">0 2 2</span> | <span class="swap-row-a">1 0 0</span> | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>00</sub> to be 1, so divide row <span class="divide-row">0</span> by 3: <pre>/ <span class="divide-row">3 4 5</span> | <span class="divide-row">0 1 0</span> \ / <span class="divide-row">1 4/3 5/3</span> | <span class="divide-row">0 1/3 0</span> \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>20</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">0</span> scaled by 6 from row <span class="subtract-scaled-row-dest">2</span>: <pre>/ <span class="subtract-scaled-row-src">1 4/3 5/3</span> | <span class="subtract-scaled-row-src">0 1/3 0</span> \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ <span class="subtract-scaled-row-dest">6 6 7</span> | <span class="subtract-scaled-row-dest">0 0 1</span> / \ <span class="subtract-scaled-row-dest">0 -2 -3</span> | <span class="subtract-scaled-row-dest">0 -2 1</span> /.</pre> We need <var>A</var><sub>11</sub> to be 1, so divide row <span class="divide-row">1</span> by 2: <pre>/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | <span class="divide-row">0 2 2 </span> | <span class="divide-row"> 1 0 0</span> | --> | <span class="divide-row">0 1 1 </span> | <span class="divide-row">1/2 0 0</span> | \ 0 -2 -3 | 0 -2 1 / \ 0 -2 -3 | 0 -2 1 /.</pre> We need <var>A</var><sub>21</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">1</span> scaled by &minus;2 from row <span class="subtract-scaled-row-dest">2</span>: <pre>/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | <span class="subtract-scaled-row-src">0 1 1</span> | <span class="subtract-scaled-row-src">1/2 0 0</span> | --> | 0 1 1 | 1/2 0 0 | \ <span class="subtract-scaled-row-dest">0 -2 -3</span> | <span class="subtract-scaled-row-dest">0 -2 1</span> / \ <span class="subtract-scaled-row-dest">0 0 -1</span> | <span class="subtract-scaled-row-dest">1 -2 1</span> /.</pre> We need <var>A</var><sub>22</sub> to be 1, so divide row <span class="divide-row">2</span> by &minus;1, which makes the left side of <var>A</var> a unit upper triangular matrix: <pre>/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | 0 1 1 | 1/2 0 0 | --> | 0 1 1 | 1/2 0 0 | \ <span class="divide-row">0 0 -1 </span> | <span class="divide-row"> 1 -2 1</span> / \ <span class="divide-row">0 0 1 </span> | <span class="divide-row">-1 2 -1</span> /.</pre> We need <var>A</var><sub>12</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">2</span> from row <span class="subtract-scaled-row-dest">1</span>: <pre>/ 1 4/3 5/3 | 0 1/3 0 \ / 1 4/3 5/3 | 0 1/3 0 \ | <span class="subtract-scaled-row-dest">0 1 1</span> | <span class="subtract-scaled-row-dest">1/2 0 0</span> | --> | <span class="subtract-scaled-row-dest">0 1 0</span> | <span class="subtract-scaled-row-dest">3/2 -2 1</span> | \ <span class="subtract-scaled-row-src">0 0 1</span> | <span class="subtract-scaled-row-src">-1 2 -1</span> / \ 0 0 1 | -1 2 -1 /.</pre> We need <var>A</var><sub>02</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">2</span> scaled by 5/3 from row <span class="subtract-scaled-row-dest">0</span>: <pre>/ <span class="subtract-scaled-row-dest">1 4/3 5/3</span> | <span class="subtract-scaled-row-dest">0 1/3 0</span> \ / <span class="subtract-scaled-row-dest">1 4/3 0</span> | <span class="subtract-scaled-row-dest">5/3 -3 5/3</span> \ | 0 1 0 | 3/2 -2 1 | --> | 0 1 0 | 3/2 -2 1 | \ <span class="subtract-scaled-row-src">0 0 1</span> | <span class="subtract-scaled-row-src">-1 2 -1</span> / \ 0 0 1 | -1 2 -1 /.</pre> We need <var>A</var><sub>01</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">1</span> scaled by 4/3 from row <span class="subtract-scaled-row-dest">0</span>, which makes the left side of <var>A</var> the identity matrix: <pre>/ <span class="subtract-scaled-row-dest">1 4/3 0</span> | <span class="subtract-scaled-row-dest">5/3 -3 5/3</span> \ / <span class="subtract-scaled-row-dest">1 0 0</span> | <span class="subtract-scaled-row-dest">-1/3 -1/3 1/3</span> \ | <span class="subtract-scaled-row-src">0 1 0</span> | <span class="subtract-scaled-row-src">3/2 -2 1</span> | --> | 0 1 0 | 3/2 -2 1 | \ 0 0 1 | -1 2 -1 / \ 0 0 1 | -1 2 -1 /.</pre> Since the left side of <var>A</var> is the identity matrix, the right side of <var>A</var> is <var>M</var><sup>-1</sup>. Therefore, <pre> / -1/3 -1/3 1/3 \ M^{-1} = | 3/2 -2 1 | \ -1 2 -1 /.</pre> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('matrixInverseDemo'); render(h(MatrixInverseDemo, { initialElements: '0, 2, 2, 3, 4, 5, 6, 6, 7', initialFieldType: 'rational', name: 'matrixInverseDemo', header: h('h3', null, 'Example 3: Matrix inversion via row reduction'), containerClass: 'interactive-example', inputClass: 'parameter', buttonClass: 'interactive-example-button', allowFieldTypeChanges: false, swapRowAColor: '#dc322f', // solarized red swapRowBColor: '#268bd2', // solarized blue divideRowColor: '#dc322f', // solarized red subtractScaledRowSrcColor: '#268bd2', // solarized blue subtractScaledRowDestColor: '#dc322f', // solarized red }), root.parent, root); })(); </script> <p>Now notice one thing: if $$M$$ has a row that is proportional to another row, then row reduction would eventually zero out one of the rows, causing the algorithm to fail, and signaling that $$M$$ is non-invertible. In fact, a stronger statement is true: $$M$$ has a row that can be expressed as a linear combination of other rows of $$M$$ exactly when $$M$$ is non-invertible. Informally, this means that the linear function corresponding to $$M$$ has one of its outputs redundant with the other outputs, so it is essentially a a linear function taking $$n$$ inputs to fewer than $$n$$ outputs, and such functions aren&rsquo;t invertible.</p> <p>This gets us a partial explanation for what &ldquo;sufficiently different&rdquo; means for our parity functions. If one parity function is a linear combination of other parity functions, then it is redundant, and therefore not &ldquo;sufficiently different&rdquo;. Therefore, we want our parity matrix $$P$$ to be such that no row can be expressed as a linear combination of other rows.</p> <p>However, this criterion for $$P$$ isn&rsquo;t quite enough to guarantee that all possible matrices $$M$$ computed as part of <code>ReconstructData</code> are invertible. For example, this criterion holds for the identity matrix $$I_n$$, but if $$n > 1$$ and you pick $$I_n$$ as the parity matrix for $$n = m$$, you can certainly end up with a constructed matrix $$M$$ with repeated rows, since you&rsquo;re starting by appending another copy of $$I_n$$ on top of $$P = I_n$$! This explains in a different way why simply making a copy of the original data files makes for a poor erasure code, unless of course you only have one data file. We&rsquo;re led to our next topic: what makes a parity matrix &ldquo;optimal&rdquo;?</p> </section> <section> <header> <h2>7. Optimal parity matrices</h2> </header> <p>Recall from above that we form the square matrix $M = \begin{pmatrix} e_{j_0} \\ \vdots \\ e_{j_{n-k-1}} \\ p_0 \\ \vdots \\ p_{k-1} \end{pmatrix}$ by prepending some rows of the identity matrix to the first $$k$$ rows of the parity matrix. We can generalize this a bit more, since we don&rsquo;t have to take the first $$k$$ rows, but instead can take any $$k$$ rows of the parity matrix, whose indices we denote here as $$l_0, \dotsc, l_{k-1}$$: $M = \begin{pmatrix} e_{j_0} \\ \vdots \\ e_{j_{n-k-1}} \\ p_{l_0} \\ \vdots \\ p_{l_{k-1}} \end{pmatrix}\text{.}$ So we want to construct $$P$$ so that any such square matrix $$M$$ formed from the rows of $$P$$ is invertible. Therefore, we call a parity matrix $$P$$ <em>optimal</em> if it satisfies this criterion.</p> <div class="p">Fortunately, there is a simpler criterion for optimal parity matrices. First, define a <a href="https://en.wikipedia.org/wiki/Matrix_(mathematics)#Submatrix"><em>submatrix</em></a> of a matrix $$P$$ to be a matrix that you get by deleting any number of rows or columns, and call a matrix <em>non-empty</em> if it has at least one row and one column. Then: <div class="theorem">(<span class="theorem-name">Theorem&nbsp;1</span>.) A parity matrix $$P$$ is optimal exactly when any non-empty square submatrix of $$P$$ is invertible.<sup><a href="#fn5" id="r5"></a></sup></div> Note that this criterion is stronger than the one in the previous section, where we want a parity matrix $$P$$ to have no row that can be expressed as a linear combination of other rows. That is, if any non-empty square submatrix of $$P$$ is invertible, that means that no row can be expressed as a linear combination of other rows.<sup><a href="#fn6" id="r6"></a></sup> However, it is possible to have a matrix $$P$$ where no row can be expressed as a linear combination of other rows, but which is not optimal. We&rsquo;ve already seen an example above: $$I_n$$ for $$n \gt 1$$, and indeed, $I_2 = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\text{,}$ has the $$1 \times 1$$ non-invertible submatrix $$\begin{pmatrix} 0 \end{pmatrix}$$.</div> <div class="interactive-example"> <h3>Example 4: A optimal parity matrix for $$m = 2$$</h3> <p>Recall the parity matrix $P = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix}$ that we were using for our $$n = 3, m = 2$$ example. For any $$n$$, this matrix looks like $P = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ 1 & 2 & \cdots & n-1 \end{pmatrix}\text{.}$ A $$1 \times 1$$ matrix is invertible exactly when its single element is non-zero, so any $$1 \times 1$$ submatrix of $$P$$ is invertible. Any $$2 \times 2$$ submatrix of $$P$$ looks like $A = \begin{pmatrix} 1 & 1 \\ a & b \end{pmatrix}$ for $$a \ne b$$, which, using the <a href="https://en.wikipedia.org/wiki/Invertible_matrix#Inversion_of_2_.C3.97_2_matrices">formula for inverses of $$2 \times 2$$ matrices</a>, has inverse $A^{-1} = \begin{pmatrix} b/(b-a) & -1/(b-a) \\ -a/(b-a) & 1/(b-a) \end{pmatrix}\text{.}$ These are all the possible square submatrices of $$P$$, so therefore this $$P$$ is a optimal parity matrix for $$m = 2$$.</p> </div> <p>Then, finally, we now have a complete definition of what makes a list of parity functions &ldquo;sufficiently different&rdquo;; it is exactly when the corresponding parity matrix is optimal as we&rsquo;ve defined it above.</p> <p>Now this leads us to the question: how do we find such optimal matrices? Fortunately, there&rsquo;s a whole class of matrices that are optimal: the <em>Cauchy matrices</em>.</p> <p>Let $$a_0, \dotsc, a_{m+n-1}$$ be a sequence of distinct integers, meaning that no two $$a_i$$ are equal, and let $$x_0, \dotsc, x_{m-1}$$ be the first $$m$$ integers of $$a_i$$ with $$y_0, \dotsc, y_{n-1}$$ the remaining integers. Then form the $$m \times n$$ matrix $$A$$ by setting its elements according to: $a_{ij} = \frac{1}{x_i - y_j}\text{,}$ which is always defined since the denominator is never zero, by the distinctness of the $$a_i$$. Then $$A$$ is a <em>Cauchy matrix</em>.</p> <div class="p">What makes Cauchy matrices useful is the following theorem: <div class="theorem">(<span class="theorem-name">Theorem&nbsp;2</span>.) Any non-empty square Cauchy matrix is invertible.</div> Combining this with the simple fact that any submatrix of a Cauchy matrix is also a Cauchy matrix, we get: <div class="theorem">(<span class="theorem-name">Corollary&nbsp;1</span>.) Any non-empty square submatrix of a Cauchy matrix is invertible, and thus any Cauchy parity matrix is optimal.</div> </div> <div class="interactive-example" id="cauchyMatrixDemo"> <h3>Example 5: Cauchy matrices</h3> Let <span style="white-space: nowrap;"> <var>x</var> = [ 1, 2, 3 ] </span> and <span style="white-space: nowrap;"> <var>y</var> = [ -1, 4, 0 ]. </span> Then, the Cauchy matrix constructed from <var>x</var> and <var>y</var> is <pre>/ 1/2 -1/3 1 \ | 1/3 -1/2 1/2 | \ 1/4 -1 1/3 /,</pre> which has inverse <pre>/ -36/5 96/5 -36/5 \ | -3/10 9/5 -9/5 | \ 9/2 -9 3 /.</pre> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('cauchyMatrixDemo'); render(h(CauchyMatrixDemo, { initialX: '1, 2, 3', initialY: '-1, 4, 0', initialFieldType: 'rational', name: 'cauchyMatrixDemo', header: h('h3', null, 'Example 5: Cauchy matrices'), containerClass: 'interactive-example', inputClass: 'parameter', allowFieldTypeChanges: false, }), root.parent, root); })(); </script> <p>Therefore, to generate a optimal parity matrix for any $$(n, m)$$, all we need to do is to generate an $$m \times n$$ Cauchy matrix. We can pick any sequence of distinct $$m + n$$ integers, so for simplicity let&rsquo;s just use $x_i = n + i \quad \text{and} \quad y_i = i\text{.}$</p> <div class="interactive-example"> <h3>Example 6: Cauchy parity matrices for $$m = 2$$</h3> <p>For $$n = 3, m = 2$$, we have the sequences $x_0 = 3, x_1 = 4 \quad \text{and} \quad y_0 = 0, y_1 = 1, y_2 = 2\text{,}$ so the corresponding Cauchy parity matrix is $P = \begin{pmatrix} 1/3 & 1/2 & 1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix}\text{.}$ Similarly, for any $$n$$, $P = \begin{pmatrix} 1/n & \cdots & 1/2 & 1 \\ 1/{n + 1} & \cdots & 1/3 & 1/2 \end{pmatrix}\text{.}$ All entries of $$P$$ are non-zero, so any $$1 \times 1$$ submatrix of $$P$$ is invertible. Any $$2 \times 2$$ submatrix of $$P$$ looks like $A = \begin{pmatrix} 1/a & 1/b \\ 1/(a+1) & 1/(b+1) \end{pmatrix}$ for $$a \ne b$$, which, using the <a href="https://en.wikipedia.org/wiki/Invertible_matrix#Inversion_of_2_.C3.97_2_matrices">formula for inverses of $$2 \times 2$$ matrices</a>, has inverse $A^{-1} = \begin{pmatrix} \frac{ab(a+1)}{b-a} & -\frac{a(a+1)(b+1)}{b-a} \\ -\frac{ab(b+1)}{b-a} & \frac{b(a+1)(b+1)}{b-a} \end{pmatrix}\text{.}$ These are all the possible square submatrices of $$P$$, so therefore this $$P$$ is a optimal parity matrix for $$m = 2$$.</p> </div> <p>Note that our first parity matrix for $$n = 3, m = 2$$ isn&rsquo;t a Cauchy matrix, since no Cauchy matrix can have repeating elements in a single row. That means that there are other possible optimal parity matrices that aren&rsquo;t Cauchy matrices.<sup><a href="#fn7" id="r7"></a></sup></p> <p>Also, our previous parity matrices had integers, and Cauchy matrices have rational numbers (i.e., fractions). This means that our parity numbers are now fractions. This isn&rsquo;t a serious difference, though, since we&rsquo;d have to deal with fractions when computing matrix inverses anyway. You could also change a parity matrix with fractions into one without by simply multiplying the entire matrix by some non-zero number that gets rid of all the fractions, which doesn&rsquo;t change the optimality of the matrix. For example, we can multiply $\begin{pmatrix} 1/3 & 1/2 & 1 \\ 1/4 & 1/3 & 1/2 \end{pmatrix}$ by $$12$$ to get the equivalent parity matrix $\begin{pmatrix} 4 & 6 & 12 \\ 3 & 4 & 6 \end{pmatrix}\text{.}$ </p> <p>Now our only remaining missing piece is this: how do we compute parity bytes instead of parity numbers? Answering this would render the above discussion moot. However, to do so, we first have to take another look at how we&rsquo;re doing linear algebra.</p> </section> <section> <header> <h2>8. Linear algebra over fields</h2> </header> <p>We ultimately want our parity numbers to be parity bytes, which means that we want to work with matrices of bytes instead of matrices of rational numbers. In order to do that, we need to define an interface for matrix elements that preserves the operations and properties we care about, and then we have to figure out how to implement that interface using bytes.</p> <p>Looking at the rule for matrix multiplication, we need to be able to add and multiply matrix elements. Looking at how we do matrix inversion, we also need to be able to subtract and divide matrix elements. Finally, there are certain properties that hold for rational numbers that we implicitly assume when doing matrix operations, but that we have to make explicit for matrix elements.</p> <div class="p">This leads us to the concept of a <em>field</em>, which essentially defines the interface that matrix elements should implement. Here it is: <pre class="code-container"><code class="language-javascript">interface Field&lt;T&gt; { static Zero: T, One: T plus(b: T): T negate(): T times(b: T): T reciprocate(): T equals(b: T): bool minus(b: T) = this.plus(b.negate()) dividedBy(b: T) = this.times(b.reciprocate()) }</code></pre> </div> <p>We need to be able to add and multiply field elements, which we&rsquo;ll denote generically by $$\oplus$$ and $$\otimes$$. We also need to be able to take the negation (additive inverse) of an element $$x$$, which we&rsquo;ll denote by $$-x$$, and the reciprocal (multiplicative inverse) of a non-zero element $$x$$, which we&rsquo;ll denote by $$x^{-1}$$. Then we can define subtraction of field elements to be $a \ominus b = a \oplus -b$ and division of field elements to be $a \cldiv b = a \otimes b^{-1}\text{,}$ when $$b \ne 0$$.</p> <div class="p">Also, an implementation of <code>Field</code> must satisfy further properties, which are copied from the number laws you learn in school: <ul> <li>Identities: $$a \oplus 0 = a \otimes 1 = a$$.</li> <li>Inverses: $$a \oplus -a = 0$$, and for $$a \ne 0$$, $$a \otimes a^{-1} = 1$$.</li> <li>Associativity: $$(a \oplus b) \oplus c = a \oplus (b \oplus c)$$, and $$(a \otimes b) \otimes c = a \otimes (b \otimes c)$$.</li> <li>Commutativity: $$a \oplus b = b \oplus a$$, and $$a \otimes b = b \otimes a$$.</li> <li>Distributivity: $$a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$$.</li> </ul> Of the above, guaranteeing the existence of reciprocals of non-zero elements is usually the non-trivial part. Now the rational numbers satisfy all of the above, since $(p/q)^{-1} = q/p\text{,}$ so we say that they <em>form a field</em>. However, the integers <em>do not</em> form a field, since for example $$2$$ has no integer reciprocal; only $$1$$ and $$-1$$ have integer reciprocals. Furthermore, as we saw above, the integers modulo $$256$$, i.e. the numbers from $$0$$ to $$255$$ with standard arithmetic operations modulo $$256$$, do not form a field, as we saw earlier, since $$(2 \cdot b) \bmod 256 \ne 1$$ for any $$b$$.</div> <div class="p">However, we can construct a field with $$257$$ elements, using the fact that $$257$$ is a prime number, and the following theorem: <div class="theorem">(<span class="theorem-name" id="theorem-3">Theorem&nbsp;3</span>.) Given a prime number $$p$$, for every integer $$0 \lt a \lt p$$, there is exactly one $$0 \lt b \lt p$$ such that $$(a \cdot b) \bmod p = 1$$.</div> There are clever ways to find multiplcative inverses mod $$p$$, but since $$257$$ is so small, we can just brute-force it. So an implementation would look like: <pre class="code-container"><code class="language-javascript">class Field257Element : implements Field&lt;Field257Element&gt; { plus(b) { return (this + b) % 257 } negate() { return (257 - this) } times(b) { return (this * b) % 257 } reciprocate() { if (this == 0) { return Error } for i := 0 to 256 { if (this.times(b) == 1) { return i; } } return Error } ... }</code></pre> </div> <div class="interactive-example" id="field257Demo"> <h3>Example 7: Field with 257 elements</h3> Denote operations on the field with 257 elements by a <sub>257</sub> subscript, and let <span style="white-space: nowrap;"> <var>a</var> = 23 </span> and <span style="white-space: nowrap;"> <var>b</var> = 54. </span> Then <ul> <li> <span style="white-space: nowrap;"> <var>a</var> +<sub>257</sub> <var>b</var> = (23 + 54) mod 257 = <span class="result">77</span>; </span> </li> <li> <span style="white-space: nowrap;"> &minus;<sub>257</sub><var>b</var> = (257 &minus; 54) mod 257 = <span class="result">203</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &minus;<sub>257</sub> <var>b</var> = <var>a</var> +<sub>257</sub> &minus;<sub>257</sub><var>b</var> = (23 + 203) mod 257 = <span class="result">226</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &times;<sub>257</sub> <var>b</var> = (23 &times; 54) mod 257 = <span class="result">214</span>; </span> </li> <li> <span style="white-space: nowrap;"> 54 &times;<sub>257</sub> 119 = 1, </span> so <span style="white-space: nowrap;"> <var>b</var><sup>-1</sup><sub>257</sub> = <span class="result">119</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &divide;<sub>257</sub> <var>b</var> = <var>a</var> &times;<sub>257</sub> <var>b</var><sup>-1</sup><sub>257</sub> = (23 &times; 119) mod 257 = <span class="result">167</span>, </span> and indeed <span style="white-space: nowrap;"> <var>b</var> &times;<sub>257</sub> (<var>a</var> &divide;<sub>257</sub> <var>b</var>) = (54 &times; 167) mod 257 = 23 = <var>a</var>. </span> </li> </ul> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('field257Demo'); render(h(Field257Demo, { initialA: '23', initialB: '54', header: h('h3', null, 'Example 7: Field with 257 elements'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <div class="p">So this gets us closer, since we can use <code>Field257Element</code> instead of a rational number type when implementing <code>ComputeParity</code> and <code>ReconstructData</code>, and if we&rsquo;ve abstracted our <code>Matrix</code> type correctly, almost everything should just work. However, there <em>is</em> one thing we need to check: Are Cauchy parity matrices still optimal if we use fields other than the rational numbers? Fortunately, the answer is yes: <div class="theorem">(<span class="theorem-name">Theorem&nbsp;1, general version</span>.) A parity matrix $$P$$ over any field is optimal exactly when any non-empty square submatrix of $$P$$ is invertible.</div> <div class="theorem">(<span class="theorem-name">Theorem&nbsp;2, general version</span>.) Any non-empty square Cauchy matrix over any field is invertible.</div> <div class="theorem">(<span class="theorem-name">Corollary&nbsp;1, general version</span>.) Any square submatrix of a Cauchy matrix over any field is invertible, and thus any Cauchy parity matrix over any field is optimal.</div> However, note that to construct an $$m \times n$$ Cauchy matrix, we need $$m + n$$ distinct elements. So if we&rsquo;re working with a field with $$257$$ elements, then this imposes the condition that $$m + n \le 257$$, i.e. using a finite field limits the number of data bytes and parity numbers you can have.</div> <p>Now the question remains: can we construct a field with $$256$$ elements? As we saw above, we can&rsquo;t do so the same way as we constructed the field with $$257$$ elements. In fact, we need to start with defining different arithmetic operations on the integers. This brings us to the topic of <em>binary carry-less arithmetic</em>.</p> </section> <section> <header> <h2>9. Binary carry-less arithmetic</h2> </header> <p>The basic idea with binary carry-less (which I&rsquo;ll henceforth shorten to &ldquo;carry-less&rdquo;) arithmetic is to express all integers in binary, then perform all arithmetic operations using binary arithmetic, except ignoring all the carries.<sup><a href="#fn8" id="r8"></a></sup></p> <p>How does this work with addition? Let&rsquo;s denote binary carry-less add as $$\clplus$$,<sup><a href="#fn9" id="r9"></a></sup> and let&rsquo;s see how it behaves on single binary digits: \begin{aligned} 0 \clplus 0 &= 0 \\ 0 \clplus 1 &= 1 \\ 1 \clplus 0 &= 1 \\ 1 \clplus 1 &= 0\text{.} \end{aligned} This is just the exclusive or operation on bits, so if we do carry-less addition on any two integers, it turns out to be nothing but xor! Since xor can also be denoted by $$\clplus$$, we can conveniently think of $$\clplus$$ as meaning both carry-less addition and xor.</p> <div class="interactive-example" id="carrylessAddDemo"> <h3>Example 8: Carry-less addition</h3> Let <span style="white-space: nowrap;"> <var>a</var> = 23 </span> and <span style="white-space: nowrap;"> <var>b</var> = 54. </span> Then, with carry-less arithmetic, <pre> a = 23 = 10111b ^ b = 54 = 110110b ------- 100001b</pre> so <span style="white-space: nowrap;"> <var>a</var> &oplus; <var>b</var> = 100001<sub>b</sub> = <span class="result">33</span>. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('carrylessAddDemo'); render(h(AddDemo, { initialA: '23', initialB: '54', name: 'carrylessAddDemo', header: h('h3', null, 'Example 8: Carry-less addition'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <p>What about subtraction? Recall that $$(a \clplus b) \clplus b = a$$ for any $$a$$ and $$b$$. Therefore, every element $$b$$ is its own (carry-less binary) additive inverse, which means that $$a \clminus b = a \clplus b$$, i.e. carry-less subtraction is also just xor.</p> <p><a href="https://en.wikipedia.org/wiki/Carry-less_product">Carry-less multiplication</a> isn&rsquo;t as simple, but recall that binary multiplication is just adding shifted copies of $$a$$ based on which bits are set in $$b$$ (or vice versa). To do carry-less multiplication, just ignore carries when adding the shifted copies again, i.e. xor shifted copies instead of adding them.</p> <div class="interactive-example" id="carrylessMulDemo"> <h3>Example 9: Carry-less multiplication</h3> Let <span style="white-space: nowrap;"> <var>a</var> = 23 </span> and <span style="white-space: nowrap;"> <var>b</var> = 54. </span> Then, with carry-less arithmetic, <pre> a = 23 = 10111b ^* b = 54 = 110110b ------------ 10111 ^ 10111 ^ 10111 ^ 10111 ------------ 1111100010b</pre> so <span style="white-space: nowrap;"> <var>a</var> &otimes; <var>b</var> = 1111100010<sub>b</sub> = <span class="result">994</span>. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('carrylessMulDemo'); render(h(MulDemo, { initialA: '23', initialB: '54', name: 'carrylessMulDemo', header: h('h3', null, 'Example 9: Carry-less multiplication'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <p>Finally, we can define carry-less division with remainder. Binary division with remainder is subtracting shifted copies of $$b$$ from $$a$$ until you get a remainder less than the divisor; then carry-less binary division with remainder is xor-ing shifted copies of $$b$$ with $$a$$ until you get a remainder. However, there&rsquo;s a subtlety; with carry-less arithmetic, it&rsquo;s not enough to stop when the remainder (for that step) is less than the divisor, because if the highest set bit of the remainder is the same as the highest set bit of the divisor, you can still xor with the divisor one more time to yield a smaller number (which then becomes the final remainder).</p> <p>Consider the example below, where we&rsquo;re dividing $$55$$ by $$19$$. The first remainder is $$17$$, which is less than $$19$$, but still shares the same highest set bit, so we can xor one more time with $$19$$ to get the remainder $$2$$.</p> <div class="interactive-example" id="carrylessDivDemo"> <h3>Example 10: Carry-less division</h3> Let <span style="white-space: nowrap;"> <var>a</var> = 55 </span> and <span style="white-space: nowrap;"> <var>b</var> = 19. </span> Then, with carry-less arithmetic, <pre> 11b -------- b = 19 = 10011b )110111b = 55 = a ^ 10011 ----- 10001 ^ 10011 ----- 10b</pre> so <span style="white-space: nowrap;"> <var>a</var> &odiv; <var>b</var> = 11<sub>b</sub> = <span class="result">3</span> </span> with remainder <span style="white-space: nowrap;"> 10<sub>b</sub> = <span class="result">2</span>. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('carrylessDivDemo'); render(h(DivDemo, { initialA: '55', initialB: '19', name: 'carrylessDivDemo', header: h('h3', null, 'Example 10: Carry-less division'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <p>This leads to an interesting difference between the carry-less modulo operation and the standard modulo operation. If you mod by a number $$n$$, you get $$n$$ possible remainders, from $$0$$ to $$n - 1$$. However, if you clmod (carry-less mod) by a number $$2^k \le n \lt 2^{k+1}$$, you get $$2^k$$ possible remainders, from $$0$$ to $$2^k-1$$, since those are the numbers whose highest set bit is lower than the highest set bit of $$n$$.</p> <p>In particular, if you clmod by a number $$256 \le n &lt; 512$$, you always get $$256$$ possible remainders. This is very close to what we want&mdash;now the hope is to find <em>some</em> $$256 \le n &lt; 512$$ so that doing binary carry-less arithmetic clmod $$n$$ yields a field, which will then be a field with $$256$$ elements!</p> </section> <section> <header> <h2>10. The finite field with $$256$$ elements</h2> </header> <p>Since there are only a few numbers between $$256$$ and $$512$$, we can just try each one of them to see if clmod-ing by one of them yields a field. However, with a bit of math, we can gain more insight into which numbers will work.</p> <p>Recall the situation with the standard arithmetic operations: arithmetic mod $$p$$ yields a field exactly when $$p$$ is prime.<sup><a href="#fn10" id="r10"></a></sup> But recall the definition of a prime number: it is an integer greater than $$1$$ whose positive divisors are only itself and $$1$$. Stated another way, a prime number is an integer $$p \gt 1$$ that cannot be expressed as $$p = a \cdot b$$, for $$a, b \gt 1$$.</p> <p>Thus, the concept of a prime number is determined by the multiplication operation, and therefore we can define a &ldquo;carry-less&rdquo; prime number to be an integer $$p \gt 1$$ that cannot be expressed as $$p = a \clmul b$$, for $$a, b \gt 1$$.<sup><a href="#fn11" id="r11"></a></sup></p> <div class="p">The only question remaining is whether there is an equivalent of <a href="#theorem-3">Theorem&nbsp;3</a> for carry-less arithmetic. And indeed there is: <div class="theorem">(<span class="theorem-name">Theorem&nbsp;4</span>.) Given a carry-less prime number $$2^k \lt p \le 2^{k+1}$$, for every integer $$0 \lt a \lt 2^k$$, there is a exactly one $$0 \lt b \lt 2^k$$ such that $$(a \clmul b) \bclmod p = 1$$.</div> Now we just need to find a carry-less prime number $$256 \le p &lt; 512$$. However, the set of prime numbers and the set of carry-less prime numbers are not necessarily related, so for example, even though $$257$$ is a prime number, it is <em>not</em> a carry-less prime number.</div> <p>It is easy enough to test each number $$256 \le n &lt; 512$$ for carry-less primality though; doing so, we find the lowest one, $$283$$.<sup><a href="#fn12" id="r12"></a></sup></p> <div class="p">So finally, we have a field with $$256$$ elements: the integers with binary carry-less arithmetic clmod $$283$$. An implementation would look like: <pre class="code-container"><code class="language-javascript">class Field256Element : implements Field&lt;Field256Element&gt; { plus(b) { return this ^ b } negate() { return b } times(b) { return clmod(clmul(this, b), 283) } reciprocate() { if (this == 0) { return Error } for i := 0 to 255 { if (this.times(b) == 1) { return i; } } return Error } ... }</code></pre> Similarly to how we find reciprocals mod $$257$$, we brute-force finding reciprocals clmod $$283$$ also.</div> <div class="interactive-example" id="field256Demo"> <h3>Example 11: Field with 256 elements</h3> Denote operations on the field with 256 elements by a <sub>256</sub> subscript, and let <span style="white-space: nowrap;"> <var>a</var> = 23 </span> and <span style="white-space: nowrap;"> <var>b</var> = 54. </span> Then <ul> <li> <span style="white-space: nowrap;"> <var>a</var> &oplus;<sub>256</sub> <var>b</var> = 23 &oplus; 54 = <span class="result">33</span>; </span> </li> <li> <span style="white-space: nowrap;"> &ominus;<sub>256</sub><var>b</var> = <var>b</var> = <span class="result">54</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &ominus;<sub>256</sub> <var>b</var> = <var>a</var> &oplus;<sub>256</sub> &ominus;<sub>256</sub><var>b</var> = <var>a</var> &oplus;<sub>256</sub> <var>b</var> = <span class="result">33</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &otimes;<sub>256</sub> <var>b</var> = (23 &otimes; 54) clmod 283 = <span class="result">207</span>; </span> </li> <li> <span style="white-space: nowrap;"> 54 &otimes;<sub>256</sub> 102 = 1, </span> so <span style="white-space: nowrap;"> <var>b</var><sup>-1</sup><sub>256</sub> = <span class="result">102</span>; </span> </li> <li> <span style="white-space: nowrap;"> <var>a</var> &oslash;<sub>256</sub> <var>b</var> = <var>a</var> &otimes;<sub>256</sub> <var>b</var><sup>-1</sup><sub>256</sub> = (23 &otimes; 102) clmod 283 = <span class="result">19</span>, </span> and indeed <span style="white-space: nowrap;"> <var>b</var> &otimes;<sub>256</sub> (<var>a</var> &oslash;<sub>256</sub> <var>b</var>) = (54 &times; 19) clmod 283 = 23 = <var>a</var>. </span> </li> </ul> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('field256Demo'); render(h(Field256Demo, { initialA: '23', initialB: '54', header: h('h3', null, 'Example 11: Field with 256 elements'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> </section> <section> <header> <h2>11. The full algorithm</h2> </header> <p>Now we have all the pieces we need to construct erasure codes for any $$(n, m)$$ such that $$m + n \le 256$$. First, we can compute an $$m \times n$$ Cauchy parity matrix over the field with $$256$$ elements. (Recall that this needs $$m + n$$ distinct field elements, which is what imposes the condition $$m + n \le 256$$.)</p> <div class="interactive-example" id="cauchyMatrixDemoGeneral"> <h3>Example 12: Cauchy matrices in general</h3> Working over the field with 256 elements, let <span style="white-space: nowrap;"> <var>x</var> = [ 1, 2, 3 ] </span> and <span style="white-space: nowrap;"> <var>y</var> = [ 4, 5, 6 ]. </span> Then, the Cauchy matrix constructed from <var>x</var> and <var>y</var> is <pre>/ 82 203 209 \ | 123 209 203 | \ 209 123 82 /,</pre> which has inverse <pre>/ 130 31 176 \ | 252 219 31 | \ 108 252 130 /.</pre> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('cauchyMatrixDemoGeneral'); render(h(CauchyMatrixDemo, { initialX: '1, 2, 3', initialY: '4, 5, 6', initialFieldType: 'gf256', name: 'cauchyMatrixDemoGeneral', header: h('h3', null, 'Example 12: Cauchy matrices in general'), containerClass: 'interactive-example', inputClass: 'parameter', allowFieldTypeChanges: true, }), root.parent, root); })(); </script> <p>Then we can implement matrix multiplication over arbitrary fields, and thus we can implement <code>ComputeParity</code>.</p> <div class="interactive-example" id="computeParityDetailDemo"> <h3>Example 13: <code>ComputeParity</code> in detail</h3> Let <span style="white-space: nowrap;"> <var>d</var> = [ da, db, 0d ] </span> be the input data bytes and let <span style="white-space: nowrap;"> <var>m</var> = 2 </span> be the desired parity byte count. Then, with the input byte count <span style="white-space: nowrap;"> <var>n</var> = 3, </span> the <span style="white-space: nowrap;"> <var>m</var> &times; <var>n</var> </span> Cauchy parity matrix computed using <span style="white-space: nowrap;"> <var>x</var><sub>i</sub> = <var>n</var> + <var>i</var> </span> and <span style="white-space: nowrap;"> <var>y</var><sub>i</sub> = <var>i</var> </span> is <pre>/ f6 8d 01 \ \ cb 52 7b /.</pre> Therefore, the parity bytes are computed as <pre> _ _ _ _ / f6 8d 01 \ | da | | 52 | \ cb 52 7b / * | db | = |_ 0c _|, |_ 0d _|</pre> and thus the output parity bytes are <span style="white-space: nowrap;"> <var>p</var> = [ <span class="result">52</span>, <span class="result">0c</span> ]. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('computeParityDetailDemo'); render(h(ComputeParityDemo, { initialD: 'da, db, 0d', initialM: '2', name: 'computeParityDetailDemo', detailed: true, header: h('h3', null, 'Example 13: ', h('code', null, 'ComputeParity'), ' in detail'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <p>Then we can implement matrix inversion using row reduction over arbitrary fields.</p> <div class="interactive-example" id="matrixInverseDemoGeneral"> <h3>Example 14: Matrix inversion via row reduction in general</h3> Working over the field with 256 elements, let <pre> / 0 2 2 \ M = | 3 4 5 | \ 6 6 7 /.</pre> The initial augmented matrix <var>A</var> is <pre>/ 0 2 2 | 1 0 0 \ | 3 4 5 | 0 1 0 | \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>00</sub> to be non-zero, so swap rows <span class="swap-row-a">0</span> and <span class="swap-row-b">1</span>: <pre>/ <span class="swap-row-a">0 2 2</span> | <span class="swap-row-a">1 0 0</span> \ / <span class="swap-row-b">3 4 5</span> | <span class="swap-row-b">0 1 0</span> \ | <span class="swap-row-b">3 4 5</span> | <span class="swap-row-b">0 1 0</span> | --> | <span class="swap-row-a">0 2 2</span> | <span class="swap-row-a">1 0 0</span> | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>00</sub> to be 1, so divide row <span class="divide-row">0</span> by 3: <pre>/ <span class="divide-row">3 4 5</span> | <span class="divide-row">0 1 0</span> \ / <span class="divide-row">1 245 3</span> | <span class="divide-row">0 246 0</span> \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ 6 6 7 | 0 0 1 / \ 6 6 7 | 0 0 1 /.</pre> We need <var>A</var><sub>20</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">0</span> scaled by 6 from row <span class="subtract-scaled-row-dest">2</span>: <pre>/ <span class="subtract-scaled-row-src">1 245 3</span> | <span class="subtract-scaled-row-src">0 246 0</span> \ / 1 245 3 | 0 246 0 \ | 0 2 2 | 1 0 0 | --> | 0 2 2 | 1 0 0 | \ <span class="subtract-scaled-row-dest">6 6 7</span> | <span class="subtract-scaled-row-dest">0 0 1</span> / \ <span class="subtract-scaled-row-dest">0 14 13</span> | <span class="subtract-scaled-row-dest">0 2 1</span> /.</pre> We need <var>A</var><sub>11</sub> to be 1, so divide row <span class="divide-row">1</span> by 2: <pre>/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | <span class="divide-row">0 2 2</span> | <span class="divide-row">1 0 0</span> | --> | <span class="divide-row">0 1 1</span> | <span class="divide-row">141 0 0</span> | \ 0 14 13 | 0 2 1 / \ 0 14 13 | 0 2 1 /.</pre> We need <var>A</var><sub>21</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">1</span> scaled by 14 from row <span class="subtract-scaled-row-dest">2</span>: <pre>/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | <span class="subtract-scaled-row-src">0 1 1</span> | <span class="subtract-scaled-row-src">141 0 0</span> | --> | 0 1 1 | 141 0 0 | \ <span class="subtract-scaled-row-dest">0 14 13</span> | <span class="subtract-scaled-row-dest"> 0 2 1</span> / \ <span class="subtract-scaled-row-dest">0 0 3</span> | <span class="subtract-scaled-row-dest"> 7 2 1</span> /.</pre> We need <var>A</var><sub>22</sub> to be 1, so divide row <span class="divide-row">2</span> by 3, which makes the left side of <var>A</var> a unit upper triangular matrix: <pre>/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | 0 1 1 | 141 0 0 | --> | 0 1 1 | 141 0 0 | \ <span class="divide-row">0 0 3</span> | <span class="divide-row">7 2 1</span> / \ <span class="divide-row">0 0 1</span> | <span class="divide-row">244 247 246</span> /.</pre> We need <var>A</var><sub>12</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">2</span> from row <span class="subtract-scaled-row-dest">1</span>: <pre>/ 1 245 3 | 0 246 0 \ / 1 245 3 | 0 246 0 \ | <span class="subtract-scaled-row-dest">0 1 1</span> | <span class="subtract-scaled-row-dest">141 0 0 </span> | --> | <span class="subtract-scaled-row-dest">0 1 0</span> | <span class="subtract-scaled-row-dest">121 247 246</span> | \ <span class="subtract-scaled-row-src">0 0 1</span> | <span class="subtract-scaled-row-src">244 247 246</span> / \ 0 0 1 | 244 247 246 /.</pre> We need <var>A</var><sub>02</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">2</span> scaled by 3 from row <span class="subtract-scaled-row-dest">0</span>: <pre>/ <span class="subtract-scaled-row-dest">1 245 3</span> | <span class="subtract-scaled-row-dest"> 0 246 0 </span> \ / <span class="subtract-scaled-row-dest">1 245 0</span> | <span class="subtract-scaled-row-dest"> 7 244 1 </span> \ | 0 1 0 | 121 247 246 | --> | 0 1 0 | 121 247 246 | \ <span class="subtract-scaled-row-src">0 0 1</span> | <span class="subtract-scaled-row-src">244 247 246</span> / \ 0 0 1 | 244 247 246 /.</pre> We need <var>A</var><sub>01</sub> to be 0, so subtract row <span class="subtract-scaled-row-src">1</span> scaled by 245 from row <span class="subtract-scaled-row-dest">0</span>, which makes the left side of <var>A</var> the identity matrix: <pre>/ <span class="subtract-scaled-row-dest">1 245 0</span> | <span class="subtract-scaled-row-dest"> 7 244 1 </span> \ / <span class="subtract-scaled-row-dest">1 0 0</span> | <span class="subtract-scaled-row-dest"> 82 82 82</span> \ | <span class="subtract-scaled-row-src">0 1 0</span> | <span class="subtract-scaled-row-src">121 247 246</span> | --> | 0 1 0 | 121 247 246 | \ 0 0 1 | 244 247 246 / \ 0 0 1 | 244 247 246 /.</pre> Since the left side of <var>A</var> is the identity matrix, the right side of <var>A</var> is <var>M</var><sup>-1</sup>. Therefore, <pre> / 82 82 82 \ M^{-1} = | 121 247 246 | \ 244 247 246 /.</pre> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('matrixInverseDemoGeneral'); render(h(MatrixInverseDemo, { initialElements: '0, 2, 2, 3, 4, 5, 6, 6, 7', initialFieldType: 'gf256', name: 'matrixInverseDemoGeneral', header: h('h3', null, 'Example 14: Matrix inversion via row reduction in general'), containerClass: 'interactive-example', inputClass: 'parameter', buttonClass: 'interactive-example-button', allowFieldTypeChanges: true, swapRowAColor: '#dc322f', // solarized red swapRowBColor: '#268bd2', // solarized blue divideRowColor: '#dc322f', // solarized red subtractScaledRowSrcColor: '#268bd2', // solarized blue subtractScaledRowDestColor: '#dc322f', // solarized red }), root.parent, root); })(); </script> <p>Finally, we can use that to implement <code>ReconstructData</code>.</p> <div class="interactive-example" id="reconstructDataDetailDemo"> <h3>Example 15: <code>ReconstructData</code> in detail</h3> Let <span style="white-space: nowrap;"> <var>d</var><sub>partial</sub> = [ ??, db, ?? ] </span> be the input partial data bytes and <span style="white-space: nowrap;"> <var>p</var><sub>partial</sub> = [ 52, 0c ] </span> be the input partial parity bytes. Then, with the data byte count <span style="white-space: nowrap;"> <var>n</var> = 3 </span> and the parity byte count <span style="white-space: nowrap;"> <var>m</var> = 2, </span> and appending the rows of the <span style="white-space: nowrap;"> <var>m</var> &times; <var>n</var> </span> Cauchy parity matrix to the <span style="white-space: nowrap;"> <var>n</var> &times; <var>n</var> </span> identity matrix, we get <pre>/ X01X X00X X00X \ | 00 01 00 | | X00X X00X X01X | | f6 8d 01 | \ cb 52 7b /,</pre> where the rows corresponding to the unknown data and parity bytes are crossed out. Taking the first <var>n</var> rows that aren&rsquo;t crossed out, we get the square matrix <pre>/ 00 01 00 \ | f6 8d 01 | \ cb 52 7b /</pre> which has inverse <pre>/ 01 d0 d6 \ | 01 00 00 | \ 7b b8 bb /.</pre> Therefore, the data bytes are reconstructed from the first <var>n</var> known data and parity bytes as <pre> _ _ _ _ / 01 d0 d6 \ | db | | da | | 01 00 00 | * | 52 | = | db | \ 7b b8 bb / |_ 0c _| |_ 0d _|,</pre> and thus the output data bytes are <span style="white-space: nowrap;"> <var>d</var> = [ <span class="result">da</span>, <span class="result">db</span>, <span class="result">0d</span> ]. </span> </div> <script> 'use strict'; (function() { const { h, render } = window.preact; const root = document.getElementById('reconstructDataDetailDemo'); render(h(ReconstructDataDemo, { initialPartialD: '??, db, ??', initialPartialP: '52, 0c', name: 'reconstructDataDetailDemo', detailed: true, header: h('h3', null, 'Example 15: ', h('code', null, 'ReconstructData'), ' in detail'), containerClass: 'interactive-example', inputClass: 'parameter', resultColor: '#268bd2', // solarized blue }), root.parent, root); })(); </script> <p>And we&rsquo;re done!</p> </section> <section> <header> <h2>12. Further reading</h2> </header> <p>Next time we&rsquo;ll talk about the PAR1 file format, which is a practical implementation of an erasure code very similar to the one described above, and the various challenges to make it perform well on sets of large files.</p> <p>Also, for those of you interested in the mathematical details, I&rsquo;ll also write a companion article. (This article is already quite long!)</p> <p>I gave <a href="./magic-erasure-codes">a 15-minute presentation</a> for <a href="https://wafflejs.com">WaffleJS</a> covering the same topics as this article but at a higher-level and more informally.</p> <p>I got the idea for explaining the finite field with $$256$$ elements in terms of binary carry-less arithmetic from <a href="http://www.zlib.net/crc_v3.txt">A Painless Guide to CRC Error Detection Algorithms</a>, which is an excellent document in its own right.</p> <p>Most sources below use Vandermonde matrices, which I plan to cover in the next article on PAR1, instead of Cauchy matrices. Cauchy matrices are more foolproof, which is why I started with them. templexxx, whose Go implementation I cite below, <a href="http://www.templex.xyz/blog/101/cauchy.html">feels the same way</a>. (His blog post is in Chinese, but using <a href="https://translate.google.com/">Google Translate</a> or a similar service translates it well enough to English.)</p> <p>I started learning about erasure codes from <a href="https://web.eecs.utk.edu/~plank/">James Plank&rsquo;s</a> papers. See <a href="https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf">A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like systems</a>, but also make sure to read the very important <a href="https://web.eecs.utk.edu/~plank/plank/papers/CS-03-504.pdf">correction</a> to it! <a href="http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf">Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications</a> covers Cauchy matrices, although in a slightly different context. The first part of Plank&rsquo;s <a href="http://web.eecs.utk.edu/~plank/plank/classes/cs560/560/notes/Erasure/2004-ICL.pdf">All About Erasure Codes</a> slides also contains a good overview of the encoding/decoding process, including a nifty color-coded matrix diagram.</p> <p>As for implementations, <a href="https://github.com/klauspost/reedsolomon">klauspost</a> and <a href="https://github.com/templexxx/reedsolomon">templexxx</a> have good ones written in Go. They were in turn inspired by <a href="https://github.com/Backblaze/JavaReedSolomon">Backblaze&rsquo;s Java implementation</a>. <a href="https://www.backblaze.com/blog/reed-solomon/">Backblaze&rsquo;s accompanying blog post</a> is also a good overview of the topic. The toy JS implementation powering the demos on this page are also available on <a href="https://github.com/akalin/intro-erasure-codes">my GitHub</a>.</p> <p><a href="https://people.cs.clemson.edu/~westall/851/rs-code.pdf">An Introduction to Galois Fields and Reed-Solomon Coding</a><sup><a href="#fn13" id="r13"></a></sup> covers much of the same material as I do, albeit assuming slightly more mathematical background.</p> <p>Going further afield, <a href="https://research.swtch.com/field">Russ Cox</a>, <a href="https://jeremykun.com/2015/03/23/the-codes-of-solomon-reed-and-muller/">Jeremy Kun</a>, and <a href="https://www.nayuki.io/page/reed-solomon-error-correcting-code-decoder">Nayuki</a> also wrote about finite fields and Reed-Solomon codes.</p> </section> <hr /> <p class="thanks">Thanks to Ying-zong Huang, Ryan Hitchman, Charles Ellis, and Josh Gao for comments/corrections/discussion.</p> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> This discussion of linear algebra is necessarily abbreviated for our purposes. For a more general but still basic treatment, see <a href="https://www.khanacademy.org/math/linear-algebra/matrix-transformations">Khan Academy</a>. <a href="#r1">↩</a></p> <p id="fn2"> Here and throughout this document, I index vectors and matrices starting with $$0$$, to better match array indices in code. Most math texts index vectors and matrices starting at $$1$$. <a href="#r2">↩</a></p> <p id="fn3"> Now would be a good time to talk about the conventions I and other texts use. Following <a href="https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf">Plank</a>, I use $$n$$ for the data byte count and $$m$$ for the parity byte count, and I represent arrays and vectors as <em>column vectors</em>, where multiplication with a matrix is done with the column vector on the <em>right</em>, which is the standard in most of math. However, in coding theory, $$k$$ is used for the data byte count, which they call the <em>message length</em>, and $$n$$ is used for the sum of the data and parity byte counts, which they call the <em>codeword length</em>. Furthermore, contrary to the rest of math, coding theory treats arrays and vectors as <em>row vectors</em>, where multiplication with a matrix is done with the row vector on the <em>left</em>, and the matrix used would be the transpose of the matrix that would be used with a column vector. <a href="#r3">↩</a></p> <p id="fn4"> Khan Academy has a <a href="https://www.khanacademy.org/math/algebra-home/alg-matrices/alg-determinants-and-inverses-of-large-matrices/v/inverting-matrices-part-3">video stepping through an example</a> for a $$3 \times 3$$ matrix. <a href="#r4">↩</a></p> <p id="fn5"> People with experience in coding theory might recognize that a parity matrix $$P$$ being optimal is equivalent to the corresponding erasure code being <a href="https://en.wikipedia.org/wiki/Singleton_bound#MDS_codes">MDS</a>. <a href="#r5">↩</a></p> <p id="fn6"> An equivalent statement which is easier to see is that if a row could be expressed as a linear combination of other rows, then one would be able to construct a non-empty square submatrix of $$P$$ with those rows, which would then be non-invertible. <a href="#r6">↩</a></p> <p id="fn7"> It is instead a (transposed) <a href="https://en.wikipedia.org/wiki/Vandermonde_matrix"><em>Vandermonde matrix</em></a>, which we&rsquo;ll cover when we talk about the PAR1 file format in a follow-up article. <a href="#r7">↩</a></p> <p id="fn8"> People with experience in abstract algebra might recognize this as <a href="https://en.wikipedia.org/wiki/Finite_field_arithmetic#Effective_polynomial_representation">arithmetic over $$\mathbb{F}_2[x]$$</a>, the polynomials with coefficients in the finite field with $$2$$ elements. <a href="#r8">↩</a></p> <p id="fn9"> Our use of $$\clplus$$, $$\clminus$$, $$\clmul$$, and $$\cldiv$$ to denote carry-less arithmetic clashes with our use of the same symbols to denote generic field operations. However, we&rsquo;ll never need to talk about both at the same time, so whichever one we mean should be obvious in context. <a href="#r9">↩</a></p> <p id="fn10"> This is a slightly stronger statement than <a href="#theorem-3">Theorem&nbsp;3</a>. <a href="#r10">↩</a></p> <p id="fn11"> People with experience in abstract algebra might recognize carry-less primes as <a href="https://en.wikipedia.org/wiki/Irreducible_element">irreducible elements</a> of $$\mathbb{F}_2[x]$$. <a href="#r11">↩</a></p> <p id="fn12"> Coincidentally, $$283$$ is also a regular prime number. Using another carry-less prime number $$256 \le p \lt 512$$ would also yield a field with $$256$$ elements, but is important to consistently use the same carry-less modulus; different carry-less moduli lead to fields with $$256$$ elements that are <em>isomorphic</em>, but not identical.</p> <p>Borrowing <a href="https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks#Polynomial_representations">notation from CRCs</a>, the carry-less modulus is sometimes represented as a hexadecimal number with the leading digit (which is always $$1$$) omitted. For example, $$283$$ would be represented as $$\mathtt{0x1b}$$, and we can say that we&rsquo;re using the field with $$256$$ elements <em>defined by</em> $$\mathtt{0x1b}$$. <a href="#r12">↩</a></p> <p id="fn13"> <em>Galois field</em> is just another name for finite field. <a href="#r13">↩</a></p> </section> https://www.akalin.com/quintic-unsolvability Why is the Quintic Unsolvable? 2016-09-26T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <link rel="stylesheet" type="text/css" href="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/jsxgraph.css" /> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jsxgraph/0.99.5/jsxgraphcore.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/complex.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/complex_poly.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/animation.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/rotation_counter.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/display.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/complex_formula.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/quadratic.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/cubic.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/quartic.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/b8e50dd/quintic.js"></script> <!-- KaTeX messes up axes labels, for some reason, so remember to surround a jxgbox div with <nokatex></nokatex>. --> <style> .graph { display: block; width: 300px; height: 300px; margin: 0.5em 0.2em; } .graph-container { display: inline-block; vertical-align: top; max-width: 300px; } </style> <p><em>(This was discussed on <a href="https://www.reddit.com/r/math/comments/57n07e/why_is_the_quintic_unsolvable/">r/math</a> and <a href="https://news.ycombinator.com/item?id=14685466">Hacker News</a>.)</em></p> <section> <header> <h2>1. Overview</h2> </header> <p>In this article, I hope to convince you that the quintic equation is unsolvable, in the sense that I can&rsquo;t write down the solution to the equation $ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0$ using only addition, subtraction, multiplication, division, raising to an integer power, and taking an integer root. In fact, I hope to go further and explain how this is true for the same reason that I can&rsquo;t write down the solution to the equation $ax^2 + bx + c = 0$ using only the first five operations above!</p> <p>The usual approach to the above claim involves a semester&rsquo;s worth of abstract algebra and Galois theory. However, there&rsquo;s a much easier and shorter proof which involves only a bit of group theory and complex analysis&mdash;enough to fit in a blog post&mdash;and some interactive visualizations.<sup><a href="#fn1" id="r1"></a></sup></p> </section> <section> <header> <h2>2. Quadratic Equations</h2> </header> <p>Let&rsquo;s start with quadratic equations, which hopefully you all remember from high school. Given two complex numbers $$r_1$$ and $$r_2$$, you can determine the quadratic equation whose solutions are $$r_1$$ and $$r_2$$, namely $(x - r_1)(x - r_2) = x^2 - (r_1 + r_2) x + r_1 r_2 = 0\text{.}$ If we take the standard form of a quadratic equation to be $a x^2 + bx + c = 0\text{,}$ then we can define a function from $$r_1$$ and $$r_2$$ to $$a$$, $$b$$, and $$c$$, which is shown by the first two panels in the visualization below; drag either of the points $$r_1$$ and $$r_2$$ and notice how $$b$$ and $$c$$ move ($$a$$ will always remain fixed at $$1$$).</p> <p>Now pretend that we misremember the quadratic formula as $x_{1, 2} = \frac{-b ± b^2 - 4ac}{4a}\text{.}$ The results of this formula&mdash;our candidate solution&mdash;are shown in the third panel. Note that since $$x_1$$ and $$x_2$$ depend on $$a$$, $$b$$, and $$c$$, which all depend on $$r_1$$ and $$r_2$$, they also move when you drag either $$r_1$$ and $$r_2$$</p> <div class="interactive-example"> <h3>Interactive Example 1: An incorrect quadratic formula</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardQuad1" class="graph jxgbox"></div></nokatex> <button class="interactive-example-button quad1DisableWhileSwapping" type="button" onclick="quad1.swap();"> Swap $$r_1$$ and $$r_2$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardQuad1" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardQuad1" class="graph jxgbox"></div></nokatex> </div> </div> <script type="text/javascript"> 'use strict'; function runOp(display, op, time, disableSelector, state, doneCallback) { if (state.running) { return; } state.running = true; var oldFixed = display.setRootsFixed(true); var elems = document.querySelectorAll(disableSelector); for (var i = 0; i < elems.length; ++i) { elems[i].disabled = true; } op.run(time, function() { state.running = false; display.setRootsFixed(oldFixed); for (var i = 0; i < elems.length; ++i) { elems[i].disabled = false; } if (doneCallback !== undefined) { doneCallback(); } }); } var incorrectQuadraticFormula = (function() { var a = ComplexFormula.select(-1); var b = ComplexFormula.select(-2); var x1 = b.neg().plus(quadraticDiscriminantFormula).div(a.times(4)); var x2 = b.neg().minus(quadraticDiscriminantFormula).div(a.times(4)); return x1.concat(x2); })(); var quad1 = (function() { var initialRoots = [ new Complex(1, 0), new Complex(-1, 0) ]; var display = new Display( "rootBoardQuad1", "coeffBoardQuad1", "formulaBoardQuad1", initialRoots, incorrectQuadraticFormula, function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); var op = display.swapRootOp(0, 1, function() {}); function swap() { runOp(display, op, 1000, '.quad1DisableWhileSwapping', {}); }; return { display: display, swap: swap }; })(); </script> <p>Now this formula looks right, since $$x_1$$ and $$x_2$$ are at the same coordinates as $$r_1$$ and $$r_2$$. However, if you move $$r_1$$ or $$r_2$$ around, you can easily convince yourself that this formula can&rsquo;t be right, since $$x_1$$ and $$x_2$$ don&rsquo;t move in the same way.</p> <p>Now if you remember from high school, the real quadratic formula involves taking a square root, and since our candidate solution doesn&rsquo;t do that, that means it&rsquo;s probably incorrect. I say &ldquo;probably&rdquo; because there&rsquo;s no immediate reason why there can&rsquo;t be <em>multiple</em> quadratic formulas, some simpler than others, of which one is simple enough to not need a square root. From manipulating $$r_1$$ and $$r_2$$, we know that our candidate formula is incorrect, but that doesn&rsquo;t immediately follow from it not having a square root.</p> <p>Fortunately, there is a general way to rule out candidate solutions that are similar to the one above, namely those that use only addition, subtraction, multiplication, division, and raising to an integer power; we&rsquo;ll call these <em>rational expressions</em>. Here&rsquo;s how it goes: if you press the button to swap $$r_1$$ and $$r_2$$, which moves $$r_1$$ to $$r_2$$&rsquo;s position and vice versa, $$a$$, $$b$$, and $$c$$ move from their starting positions but return once $$r_1$$ and $$r_2$$ reach their destinations. This makes sense, because the coefficients of a polynomial don&rsquo;t depend on how you order the roots. But since $$x_1$$ and $$x_2$$ depend only on $$a$$, $$b$$, and $$c$$, they too must loop back to their starting positions.</p> <p>But that means that our candidate solution cannot be the quadratic formula! If it were, then $$x_1$$ and $$x_2$$ would have ended up swapped, too. Instead, they went back to their starting positions, which is a contradiction. This reasoning holds for any expression which is a <em>single-valued</em> function of $$a$$, $$b$$, and $$c$$, so in particular this holds for rational expressions.</p> <div class="p">Let&rsquo;s summarize our reasoning in a theorem: <div class="theorem">(<span class="theorem-name">Theorem 1</span>.) A rational expression<sup><a href="#fn2" id="r2"></a></sup> in the coefficients of the general quadratic equation $ax^2 + bx + c = 0$ cannot be a solution to this equation.</div> <div class="proof"> <p><span class="proof-name">Sketch of proof.</span> Assume to the contrary that the rational expression $$x = f(a, b, c)$$ is a solution. Assume that we start with $$r_1 = z_1$$ and $$r_2 = z_2 \ne z_1$$, and without loss of generality assume that we start with $$x = z_1$$.</p> <p>Run $$r_1$$ and $$r_2$$ along continuous paths that swap their two positions, i.e. make $$r_1$$ head from $$z_1$$ to $$z_2$$ continuously, and at the same time make $$r_2$$ head from $$z_2$$ to $$z_1$$ continuously, and make sure to pick paths such that $$r_1$$ and $$r_2$$ never coincide.</p> <p>Since $$a$$, $$b$$, and $$c$$ are continuous functions of $$r_1$$ and $$r_2$$, and $$x$$ is a rational function of $$a$$, $$b$$ and $$c$$, and thus continuous, $$x$$ then depends continuously on $$r_1$$ and $$r_2$$. Thus, since we start with $$x = r_1 = z_1$$, and $$r_1$$ never coincides with $$r_2$$, then as $$r_1$$ moves, $$x = r_1$$ must continue to hold, since $$x$$ is a solution, and therefore $$x$$&rsquo;s final position must be the same as $$r_1$$&rsquo;s, which is $$z_2$$.</p> <p>However, since the coefficients $$a$$, $$b$$, and $$c$$ don&rsquo;t depend on the ordering of $$r_1$$ and $$r_2$$, then their final positions are the same as their initial positions. Since $$x$$ is a function of only $$a$$, $$b$$, and $$c$$, its final position also must be the same as its initial position, $$z_1$$. This contradicts the above, and therefore $$x$$ cannot be a solution. &#x220e;</p> </div> Now consider the candidate solution $x_{1,2} = \sqrt{b^2 - 4ac}\text{.}$ This isn&rsquo;t a rational expression since it has a square root. In particular, in the visualization below, it behaves quite differently from our first candidate solution. First, even though we have just a single expression, it yields two points $$x_1$$ and $$x_2$$. Second, and more surprisingly, if you swap $$r_1$$ and $$r_2$$, $$x_1$$ and $$x_2$$ also exchange places, seemingly contradicting Theorem&nbsp;1! What is going on? </div> <div class="interactive-example"> <h3>Interactive Example 2: The quadratic equation</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardQuad2" class="graph jxgbox"></div></nokatex> <button class="interactive-example-button quad2DisableWhileSwapping" type="button" onclick="quad2.swap();"> Swap $$r_1$$ and $$r_2$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardQuad2" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardQuad2" class="graph jxgbox"></div></nokatex> <label> <input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio" onchange="quad2.switchFormula(incorrectQuadraticFormula);" /> $$x_{1, 2} = \frac{-b \pm b^2 - 4ac}{4a}$$ </label> <br /> <label> <input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio" onchange="quad2.switchFormula(quadraticDiscriminantFormula);" /> $$x_1 = b^2 - 4ac$$ </label> <br /> <label> <input checked class="quad2DisableWhileSwapping" name="quad2Formula" type="radio" onchange="quad2.switchFormula(quadraticDiscriminantFormula.root(2));" /> $$x_{1, 2} = \sqrt{b^2 - 4ac}$$ </label> <br /> <label> <input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio" onchange="quad2.switchFormula(newQuadraticFormula());" /> $$x_{1, 2} = \frac{-b + \sqrt{b^2 - 4ac}}{2a}$$ <br /> (the quadratic formula) </label> </div> </div> <script type="text/javascript"> 'use strict'; function switchFormula(display, state, formula) { if (state.running) { return; } var numResults = display.setFormula(formula); } var quad2 = (function() { var initialRoots = [ new Complex(1, 0), new Complex(0, 1) ]; var display = new Display( "rootBoardQuad2", "coeffBoardQuad2", "formulaBoardQuad2", initialRoots, quadraticDiscriminantFormula.root(2), function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); var op = display.swapRootOp(0, 1, function() {}); var state = {}; function swap() { runOp(display, op, 1000, '.quad2DisableWhileSwapping', state); } function switchQuadFormula(formula) { switchFormula(display, state, formula); } return { display: display, swap: swap, switchFormula: switchQuadFormula }; })(); </script> <p>To answer this, we first need to review some facts about complex numbers. Recall that a complex number $$z$$ can be expressed in polar coordinates, where it has a length $$r$$ and an angle $$θ$$, and that it can be converted to the usual Cartesian coordinates using <a href="https://en.wikipedia.org/wiki/Euler%27s_formula">Euler&rsquo;s formula</a>: $z = r e^{iθ} = r \cos θ + i \, r \sin θ\text{.}$ Then, if you have two complex numbers $$z_1 = r_1 e^{iθ_1}$$ and $$z_2 = r_2 e^{iθ_2}$$ in polar form, you can multiply them by multiplying their lengths, and adding their angles: $z_1 z_2 = r_1 r_2 e^{i (θ_1 + θ_2)}\text{.}$ So a square root of a complex number $$z = r e^{iθ}$$ is just $$\sqrt{r} e^{iθ/2}$$, as you can easily verify. However, if $$z$$ is non-zero, there is one more square root of $$z$$, namely $$\sqrt{r} e^{i (θ/2 + π)}$$, as you can also verify. (Recall that angles that differ by $$2π = 360^\circ$$ are considered the same.)</p> <p>So in general, the square root of a rational expression, like our candidate solution, yields two distinct points as long as the rational expression is non-zero. In our case, $$b^2 - 4ac$$ remains non-zero as $$r_1$$ and $$r_2$$ don&rsquo;t coincide. (We&rsquo;ll have more to say about this expression, called the <em>discriminant</em>, once we talk about cubic equations below.) Therefore, if we want to examine how $$x_1$$ and $$x_2$$ move as $$r_1$$ and $$r_2$$ move, we have to number the square roots of $$b^2 - 4ac$$, and we have to keep this numbering consistent.</p> <p>To do so, we have to do two things: we have to vary $$r_1$$ and $$r_2$$ only continuously, and we have to vary $$r_1$$ and $$r_2$$ such that they never coincide. If we do this, then we can intuitively &ldquo;lift&rdquo; the expression $$b^2 - 4ac$$ from the complex plane to a new surface $$S$$ where we consider only angles that differ by $$4π = 720^\circ$$, rather than $$2π$$, to be the same. In this space, we can take the &ldquo;first&rdquo; square root of a non-zero complex number to be the one with half the angle, and the &ldquo;second&rdquo; square root to be the one with half the angle plus $$π$$, and have these two square root functions behave continuously as their argument goes around the origin.</p> <figure> <img src="quintic-unsolvability-files/Riemann_sqrt.svg"/> <figcaption> <span class="figure-text">Figure 1</span>&ensp;$$S$$, which is the <a href="https://en.wikipedia.org/wiki/Riemann_surface">Riemann surface</a> of $$\sqrt{z}$$. (Image by <a href="https://en.wikipedia.org/wiki/File:Riemann_sqrt.svg">Leonid 2</a> licensed under <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">CC BY-SA 3.0</a>.) </figcaption> </figure> <p>Now this answers the question of why the proof of Theorem&nbsp;1 fails for $$\sqrt{b^2 - 4ac}$$. $$a$$, $$b$$, and $$c$$, go around a single loop as $$r_1$$ is swapped with $$r_2$$, and therefore $$b^2 - 4ac$$ goes around a single loop in the complex plane, but when $$b^2 - 4ac$$ is lifted to $$S$$, the final position of $$b^2 - 4ac$$ differs from the initial position only by an angle of $$2π$$, so it is <em>distinct</em> from the initial position, and thus we can&rsquo;t conclude that the final position of $$\sqrt{b^2 - 4ac}$$ is the same as the initial position.</p> <p>Similar reasoning holds for any algebraic expression that isn&rsquo;t a rational expression, i.e. ones that involve taking any integer root, so Theorem&nbsp;1 cannot apply to algebraic expressions in general. Of course, this is consistent with what we know about the quadratic formula, since we know that it has a square root!</p> </section> <section> <header> <h2>3. Cubic Equations</h2> </header> <p>Now we can move on to cubic equations. Similarly, given three complex numbers $$r_1$$, $$r_2$$, and $$r_3$$, you can determine the cubic equation with those solutions, namely $(x - r_1) (x - r_2) (x - r_3) = x^3 - (r_1 + r_2 + r_3) x^2 + (r_1 r_2 + r_1 r_3 + r_2 r_3) x - r_1 r_2 r_3\text{,}$ and so we can define a function from $$r_1$$, $$r_2$$, and $$r_3$$ to $$a$$, $$b$$, $$c$$, and $$d$$, where $a x^3 + b x^2 + c x + d$ is the standard form of a cubic polynomial, and this is shown in the visualization below.</p> <p>In the previous section, we talked about the discriminant $$b^2 - 4ac$$ of the general quadratic polynomial. However, the discriminant is an expression that is defined for <em>any</em> polynomial. If $$r_1, \dotsc, r_n$$ are the roots of a polynomial (counting multiplicity) with leading coefficient $$a_n$$, then the <a href="https://en.wikipedia.org/wiki/Discriminant">discriminant</a> is $Δ = a_n^{2n - 2} ∏_{i \lt j} (r_i - r_j)^2\text{.}$ In other words, the discriminant is, up to sign and a power of the leading coefficient, the product of the differences of all pairs of different roots. In particular, if the polynomial has repeated roots, the discriminant is zero.</p> <p>Using the formula above, you can express the discriminant in terms of the coefficients of the polynomial, as you can verify for yourself with the quadratic equation. Indeed this is true in general; for cubic polynomials, the discriminant can be expressed in terms of the coefficients as $Δ = b^2 c^2 - 4 a c^3 - 4 b^3 d - 27 a^2 d^2 + 18 a b c d\text{.}$ But why do we care? Because, as you can see in the visualization below, if you swap any pair of roots, this causes the discriminant to make a single loop around the origin, so it serves as a useful test functions for taking roots.</p> <p>So now that we have three roots, we can swap them in multiple ways. If $$R$$ is a list that starts off as $$\langle r_1, r_2, r_3 \rangle$$, let $$↺_{i, j}$$ denote counter-clockwise paths that takes the root at the $$i$$th index of $$R$$ to the one at the $$j$$th index of $$R$$ and vice versa, and similarly for $$↻_{i, j}$$. (Note that this is not the same as the paths that swap $$r_i$$ and $$r_j$$! Play around with the buttons in the visualization below to understand the difference.)</p> <div class="interactive-example"> <h3>Interactive Example 3: The cubic discriminant</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardCubic1" class="graph jxgbox"></div></nokatex> <span id="rootListCubic1"> $$R = \langle r_1, r_2, r_3 \rangle$$ </span> <br /> <button class="interactive-example-button cubic1DisableWhileRunningOp" type="button" onclick="cubic1.runOp(cubic1.opA, 1000);"> $$↺_{1, 2}$$ </button> <button class="interactive-example-button cubic1DisableWhileRunningOp" type="button" onclick="cubic1.runOp(cubic1.opB, 1000);"> $$↺_{2, 3}$$ </button> <br /> <button class="interactive-example-button cubic1DisableWhileRunningOp" type="button" onclick="cubic1.runOp(cubic1.opA.invert(), 1000);"> $$↻_{1, 2}$$ </button> <button class="interactive-example-button cubic1DisableWhileRunningOp" type="button" onclick="cubic1.runOp(cubic1.opB.invert(), 1000);"> $$↻_{2, 3}$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardCubic1" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardCubic1" class="graph jxgbox"></div></nokatex> <label> <input class="cubic1DisableWhileRunningOp" name="cubic1Formula" type="radio" onchange="cubic1.switchFormula(cubicDiscFormula);" /> $$x_1 = Δ$$ </label> <br /> <label> <input checked class="cubic1DisableWhileRunningOp" name="cubic1Formula" type="radio" onchange="cubic1.switchFormula(cubicDiscFormula.root(5));" /> $$x_{1, 2, 3, 4, 5} = \sqrt{Δ}$$ </label> </div> </div> <script type="text/javascript"> 'use strict'; function updateRootList(display, rootListID) { var rootPermutation = display.getRootPermutation(); var rootList = document.getElementById(rootListID); var TeXOutput = 'R = \\langle ' + rootPermutation.map(function(i) { return 'r_{' + (i+1) + '}'; }).join(', ') + ' \\rangle'; katex.render(TeXOutput, rootList); } function updateResultList(display, resultListID) { var resultPermutation = display.getResultPermutation(); var resultList = document.getElementById(resultListID); var TeXOutput = 'X = \\langle ' + resultPermutation.map(function(i) { return 'x_{' + (i+1) + '}'; }).join(', ') + ' \\rangle'; katex.render(TeXOutput, resultList); } var cubicDiscFormula = cubicScaledDiscFormula.div( ComplexFormula.select(-1).pow(2).times(-27)); var cubic1 = (function() { var initialRoots = [ new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1) ]; var display = new Display( "rootBoardCubic1", "coeffBoardCubic1", "formulaBoardCubic1", initialRoots, cubicDiscFormula.root(5), function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); function updateRootListCubic(display) { updateRootList(display, "rootListCubic1"); } var opA = display.swapRootOp(0, 1, updateRootListCubic); var opB = display.swapRootOp(1, 2, updateRootListCubic); var state = {} function runCubicOp(op, time) { runOp(display, op, time, '.cubic1DisableWhileRunningOp', state); }; function switchCubicFormula(formula) { switchFormula(display, state, formula); updateRootAndResultList(display); } return { display: display, opA: opA, opB: opB, runOp: runCubicOp, cubicDiscFormula: cubicDiscFormula, switchFormula: switchCubicFormula }; })(); </script> <p>Now, with the formula $$Δ$$, the same reasoning as in the previous section shows that it cannot possibly be the cubic formula, nor can any other rational expression. However, unlike the quadratic case, we can also rule out $$\sqrt{Δ}$$, or any other algebraic formula with no nested radicals (i.e., that doesn&rsquo;t have a radical within a radical like $$\sqrt{a - \sqrt{bc - 5}}$$). If you apply the operations $$↺_{2, 3}$$, $$↺_{1, 2}$$, $$↻_{2, 3}$$, and $$↻_{1, 2}$$ in sequence, $$r_1$$, $$r_2$$, and $$r_3$$ rotate among themselves, but all the $$x_i$$ go back to their original positions. Therefore, by similar reasoning as the previous section, $$\sqrt{Δ}$$ also cannot possibly be the cubic formula!</p> <p>To make this statement precise, we need to review some group theory. Recall that a <a href="https://en.wikipedia.org/wiki/Group_(mathematics)">group</a> is a set with an associative binary operation, an identity element, and inverse elements. Most basic examples of groups are related to numbers, like the integers under addition, or the non-zero rationals under multiplication. However, more interesting examples of groups are related to <em>functions</em>, none the least because the group operation for functions is <em>composition</em>, which is in general not commutative; in other words, if $$f$$ and $$g$$ are functions, $$f \circ g \ne g \circ f$$, and it is this non-commutativity that will come in handy for our purposes.</p> <p>So let&rsquo;s say we have a list of $$n$$ objects, and we&rsquo;re interested in the functions that rearrange this list&rsquo;s elements. These are <a href="https://en.wikipedia.org/wiki/Permutation">permutations</a>, and they naturally form a group under composition, as you can check for yourself, called $$S_n$$, the <a href="https://en.wikipedia.org/wiki/Symmetric_group">symmetric group</a> on $$n$$ objects.</p> <p>There&rsquo;s a convenient way to write permutations, called <a href="https://en.wikipedia.org/wiki/Permutation#Cycle_notation">cycle notation</a>. If you write $$(i_1 \; i_2 \; \dotsc \; i_k)$$, this denotes the permutation that maps the $$i_1$$th position of the list to the $$i_2$$th position the $$i_2$$th position to the $$i_3$$th, and so on, called a <em>cycle</em>. Then you can write <em>any</em> permutation as a composition of disjoint cycles, so this provides a convenient way to write down and compute with permutations.</p> <p>In the visualization above, we have four operations $$↺_{1, 2}$$, $$↺_{2, 3}$$, $$↻_{1, 2}$$, and $$↻_{2, 3}$$, which <em>act on $$R$$</em>, meaning that they define permutations on $$R$$. In particular, $$↺_{1, 2}$$ and $$↻_{1, 2}$$ both swap the first and second elements of $$R$$, so we say that $$↺_{1, 2}$$ and $$↻_{1, 2}$$ act on $$R$$ as $$(1 \; 2)$$, and similarly, $$↺_{2, 3}$$ and $$↻_{2, 3}$$ act on $$R$$ as $$(2 \; 3)$$.</p> <p>Now concatenating two operations&mdash;doing one after the other&mdash;corresponds to composing their mapped-to permutations on $$R$$. Denoting $$o_2 * o_1$$ as doing $$o_1$$, then doing $$o_2$$, the sequence of operations above is $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ (note the order!), which acts on $$R$$ like $$(1 \; 2) (2 \; 3) (1 \; 2) (2 \; 3)$$, which is equal to $$(1 \; 3 \; 2)$$.<sup><a href="#fn3" id="r3"></a></sup> (The $$\circ$$ is usually dropped when composing permutations.)</p> <p>Now for the formula $$Δ$$, all the operations make $$x_1$$ loop around the origin either clockwise or counter-clockwise; in other words, they all induce a rotation of $$2π$$ or $$-2π$$ on $$x_1$$, and the final distance of $$x_1$$ from the origin is the same as the initial distance. Therefore, if we apply an equal number of clockwise and counter-clockwise rotations, the total angle of rotation will be $$0$$ and the final distance will be the same as the initial distance, i.e. the final position of $$x_1$$ is the same as it&rsquo;s initial distance. But the same reasoning holds for the formula $$\sqrt{Δ}$$; all the operations induce a rotation of $$2π/5$$ or $$-2π/5$$ and leave the distance from the origin unchanged, so an equal number of clockwise and counter-clockwise rotations will still induce a total angle of $$0$$ and leave the distance from the origin unchanged. Therefore, the operation $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ acts like $$(1 \; 3\; 2)$$ on $$R$$, but leaves all $$x_i$$ unchanged.</p> <p>But how did we come up with $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ in the first place? This involves a bit more group theory. $$S_3$$ is <em>not</em> a <a href="https://en.wikipedia.org/wiki/Abelian_group">commutative group</a>; in particular, $$(1 \; 2) (2 \; 3) \ne (2 \; 3) (1 \; 2)$$. For two group elements $$g$$ and $$h$$, we can define their <a href="https://en.wikipedia.org/wiki/Commutator">commutator</a><sup><a href="#fn4" id="r4"></a></sup> $$[ g, h ]$$, which is the group element that corrects for $$g$$ and $$h$$ not commutating. That is, we want the equation $g h = h g [g, h]$ to hold, which means that $[g, h] = g^{-1} h^{-1} g h\text{.}$ So the commutator provides a convenient way to generate a non-trivial permutation from two other non-commuting permutations. Furthermore, it involves two appearances of both elements, so we can pick a sequence of operations that induce the commutator and also have an equal number of clockwise and counter-clockwise operations. Then we&rsquo;re guaranteed that this sequence of operations permutes $$R$$ and leaves all $$x_i$$ unchanged, even if each individual operation moves some $$x_i$$. But of course, this is just $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$!</p> <p>Let&rsquo;s define some terminology to make proofs and discussion easier. If $$o$$ is an operation that acts on $$R$$ non-trivially but has the final position of the expression $$x = f(a, b, c, \dotsc)$$ the same as its initial position, we say that $$o$$ <em>rules out</em> the expression $$x = f(a, b, c, \dotsc)$$. For example, Theorem&nbsp;1 says that swapping both roots of a quadratic rules out all rational expressions.</p> <div class="p">Now we&rsquo;re ready to state and prove the theorem: <div class="theorem">(<span class="theorem-name">Theorem 2</span>.) An algebraic expression with no nested radicals in the coefficients of the general cubic equation $ax^3 + bx^2 + cx + d = 0$ cannot be a solution to this equation.</div> <div class="proof"> <p><span class="proof-name">Sketch of proof.</span> First assume to the contrary that the expression $$x = \sqrt[k]{r(a, b, c, d)}$$ is a solution, where $$r(a, b, c, d)$$ is a rational expression. Assume we start with $$r_1 = z_1$$, $$r_2 = z_2$$, and $$r_3 = z_3$$, where all $$z_i$$ are distinct, and without loss of generality assume that we start with $$x = z_1$$.</p> <p>Any of the operations $$↺_{1, 2}$$, $$↺_{2, 3}$$, $$↻_{1, 2}$$, and $$↻_{2, 3}$$ applied to $$x = r(a, b, c, d)$$ cause $$x$$&rsquo;s final position to be the same as its initial position, by Theorem&nbsp;1. Pick a point $$z_0$$ that is never equal to any point $$x$$ traverses under any operation. Then, by the same reasoning as above, the total angle induced by $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ on $$x = \sqrt[k]{r(a, b, c, d)}$$ around $$z_0$$ is $$0$$, and the distance from $$z_0$$ remains unchanged. Thus $$x$$ remains fixed, and this operation rules out $$x = \sqrt[k]{r(a, b, c, d)}$$.</p> <p>For the general case, it suffices to show that if $$o$$ rules out the expressions $$f$$ and $$g$$, then $$o$$ also rules out $$f$$ raised to an integer power, $$f + g\text{,}$$ $$f - g\text{,}$$ $$f \cdot g\text{,}$$ and $$f / g$$ where $$g \ne 0\text{.}$$ But this is straightforward, and such formulas are just the algebraic expressions with no nested radicals, so the statement holds in general. &#x220e;</p> </div> </div> <p>Theorem&nbsp;2 can be summarized thus: any $$↺_{i, j}$$ or $$↻_{i, j}$$ rules out any rational expression as the cubic formula, and if given an algebraic expression with no nested radicals, either some $$↺_{i, j}$$ or $$↻_{i, j}$$ rules it out, or $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ rules it out.</p> <p>Now we can consider algebraic expressions with one level of nesting. Can such formulas be ruled out as being the cubic formula? We can&rsquo;t do so via Theorem&nbsp;2, at least; we would need a non-trivial element of $$S_3$$ that is the commutator of commutators. But you can calculate that all non-trivial commutators of $$S_3$$ are either $$(3 \; 2 \; 1)$$ or $$(1 \; 2\; 3)$$, and these two elements commute, so $$S_3$$ cannot have a non-trivial commutator of commutators.</p> <p>In fact, as we would expect, the actual <a href="https://en.wikipedia.org/wiki/Cubic_function#General_formula">cubic formula</a> has such an algebraic expression, which is $$C$$ in the visualization below, so that serves as a convenient example of an algebraic expression with a single nested radical that can&rsquo;t be ruled out by Theorem&nbsp;2.</p> <div class="interactive-example"> <h3>Interactive Example 4: The cubic equation</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardCubic2" class="graph jxgbox"></div></nokatex> <span id="rootListCubic2"> $$R = \langle r_1, r_2, r_3 \rangle$$ </span> <br /> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opA, 1000);"> $$↺_{1, 2}$$ </button> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opB, 1000);"> $$↺_{2, 3}$$ </button> <br /> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opA.invert(), 1000);"> $$↻_{1, 2}$$ </button> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opB.invert(), 1000);"> $$↻_{2, 3}$$ </button> <br /> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opComAB, 4000);"> $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ </button> <br /> <button class="interactive-example-button cubic2DisableWhileRunningOp" type="button" onclick="cubic2.runOp(cubic2.opComAB.invert(), 4000);"> $$↺_{1, 2} * ↺_{2, 3} * ↻_{1, 2} * ↻_{2, 3}$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardCubic2" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardCubic2" class="graph jxgbox"></div></nokatex> <span id="resultListCubic2"> $$X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle$$ </span> <br /> <label> <input class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio" onchange="cubic2.switchFormula(cubicScaledDiscFormula);" /> $$x_1 = -27a^2 Δ = {Δ_1}^2 - 4 {Δ_0}^3$$ </label> <br /> <label> <input checked class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio" onchange="cubic2.switchFormula(newCubicCCubedFormula());" /> $$x_{1, 2} = C^3 = \frac{Δ_1 + \sqrt{-27a^2 Δ}}{2}$$ </label> <br /> <label> <input checked class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio" onchange="cubic2.switchFormula(newCubicCCubedFormula().root(3));" /> $$x_{1,2,3,4,5,6} = C$$ </label> <br /> <label> <input class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio" onchange="cubic2.switchFormula(newCubicFormula());" /> $$x_{1, 2, 3} = -\frac{1}{3a} \left( b + C + \frac{Δ_0}{C} \right)$$ <br /> (the cubic formula) </label> </div> </div> <script type="text/javascript"> 'use strict'; var cubic2 = (function() { var initialRoots = [ new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1) ]; var display = new Display( "rootBoardCubic2", "coeffBoardCubic2", "formulaBoardCubic2", initialRoots, newCubicCCubedFormula().root(3), function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); function updateRootAndResultList(display) { updateRootList(display, "rootListCubic2"); updateResultList(display, "resultListCubic2"); } var opA = display.swapRootOp(0, 1, updateRootAndResultList); var opB = display.swapRootOp(1, 2, updateRootAndResultList); var opComAB = newCommutatorAnimation(opA, opB); var state = {} function runCubicOp(op, time) { runOp(display, op, time, '.cubic2DisableWhileRunningOp', state); }; function switchCubicFormula(formula) { switchFormula(display, state, formula); updateRootAndResultList(display); } return { display: display, opA: opA, opB: opB, opComAB: opComAB, runOp: runCubicOp, cubicDiscFormula: cubicDiscFormula, switchFormula: switchCubicFormula }; })(); </script> <p>Note that there is a new list $$X$$, which lists the $$x_i$$ in the order which they occupy their initial positions, like how $$R$$ does the same for the $$r_i$$. In general, we can&rsquo;t do this, since a general multi-valued function won&rsquo;t necessarily permute that $$x_i$$ among themselves, but in the interactive visualizations we&rsquo;ll only consider expressions that do.</p> <p>We can then talk how an operation acts on $$X$$. For example, if we pick $$\sqrt{Δ}$$ in Interactive&nbsp;Example&nbsp;3, we can say that $$↺_{i, j}$$ acts like $$(5 \; 1 \; 2 \; 3 \; 4)$$ on $$X$$ and $$↻_{i, j}$$ acts like $$(1 \; 2 \; 3 \; 4 \; 5)$$ on $$X$$. Therefore, $$↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}$$ acts non-trivially on $$R$$ but acts trivially on $$X$$, which is another more algebraic way of saying that if this operation rules out $$\sqrt{Δ}$$, since the action on $$X$$ depends on the candidate formula. On the other hand, if you choose $$C$$ in the visualization above, you can convince yourself that no operation acts non-trivially on $$R$$ without also acting non-trivially on $$X$$, and so $$C$$ can&rsquo;t be ruled out as the cubic formula.</p> </section> <section> <header> <h2>4. Quartic Equations</h2> </header> <p>Now we can move on to quartic equations. As usual, given four complex numbers $$r_1$$, $$r_2$$, $$r_3$$, and $$r_4$$, you can map this to the coefficients $$a$$, $$b$$, $$c$$, $$d$$, and $$e$$ of the standard form of a quartic polynomial, as shown in the visualization below, such that the $$r_i$$ are the solutions to the quartic equation $a x^4 + b x^3 + c x^2 + d x + e = 0\text{.}$ <p>Now that we have four roots, we have even more ways to permute them using the $$↺_{i, j}$$ and $$↻_{i, j}$$. Before we move on, we need more terminology and group theory to handle this more complicated case.</p> <p>First, we want a convenient way to denote the combination of operations that act like a commutator, so let&rsquo;s define $$↺_{i, j}^\prime$$ to mean $$↻_{i, j}$$ and vice versa, $$(o_1 \circ o_2 \circ \dotsb \circ o_n)^\prime$$ to mean $$o_n^\prime \circ o_{n-1}^\prime \circ \dotsb \circ o_1^\prime$$, and $$[\![ o_1, o_2 ]\!]$$ to mean $$o_1^\prime \circ o_2^\prime \circ o_1 \circ o_2$$, so that if $$o_i$$ acts on $$R$$ like $$g_i$$, then $$o_i^\prime$$ acts on $$R$$ like $$g_i^{-1}$$ and $$[\![o_i, o_j]\!]$$ acts on $$R$$ like $$[g_i, g_j]$$. For example, in the previous section, we were using $$[\![ ↺_{1, 2}, ↺_{2, 3} ]\!]$$ to rule out algebraic expressions with no nested radicals.</p> <p>Then not only do we want to talk about commutators of particular permutations, we want to talk about the set of commutators of a particular group. In fact, for a group $$G$$, this set of commutators forms a subgroup $$K(G)$$ called the <a href="https://en.wikipedia.org/wiki/Commutator_subgroup">commutator subgroup</a>. For the quadratic case, we just have $$S_2$$, which has only a single non-trivial element, so its commutator subgroup $$K(S_2)$$ is the trivial group. For the cubic case, we started with $$S_3$$, and we computed the commutator subgroup $$K(S_3)$$, which is just $$\{ e, (1 \; 2 \; 3), (3 \; 2 \; 1) \}$$. We can also compute the commutator of <em>this</em> group, which is just the trivial group again, since $$K(S_3)$$ is commutative. So we can see that $$K(K(S_3))$$ being the trivial group means that we can&rsquo;t rule out algebraic expressions with nested radicals as solutions to the general cubic equation.</p> <p>Given all the elements of a group $$G$$, it&rsquo;s not particularly complicated to compute the commutator subgroup&mdash;just take all possible pairs of elements $$g, h \in G$$, compute $$[g, h]$$, and remove duplicates. However, we can make things easier for ourselves by finding generators for $$K(G)$$ as commutators of generators of $$G$$, since then we can easily map those back to $$[\![ o_1, o_2 ]\!]$$ applied on the appropriate operations. Fortunately, when $$G = S_n$$, we can use a few facts from group theory to easily compute $$K(S_n)$$. First, $$K(S_n)$$ is called the <a href="https://en.wikipedia.org/wiki/Alternating_group">alternating group</a> $$S_n$$, and is generated by the $$3$$-cycles of the form $$(i \enspace i+1 \enspace i+2)$$, similar to how $$S_n$$ is generated by the $$2$$-cycles of the form $$(i \enspace i + 1)$$. But a $$3$$-cycle $$(i \enspace i+1 \enspace i+2)$$ can be expressed as the commutator of two $$2$$-cycles $$[(i+2 \enspace i+1), (i \enspace i+1)]$$.</p> <p>Therefore, for $$S_4$$, the generators for $$K(S_4)$$ are just $$(1 \; 2 \; 3) = [(2 \; 3), (1 \; 2)]$$ and $$(2 \; 3 \; 4) = [(3 \; 4), (2 \; 3)]$$, with respective operations $$[\![ ↺_{2, 3}, ↺_{1, 2} ]\!]$$ and $$[\![ ↺_{3, 4}, ↺_{2, 3} ]\!]$$. However, these two generators are not quite enough to generate $$K^{(2)}(S_4)$$ via commutators. Fortunately, it suffices to just add $$↺_{4, 1}$$ to the list of operations, which lets us add $$(1 \; 4)$$ to the list of generators for $$S_4$$, and then add $$(3 \; 4 \; 1)$$ to the list of generators for $$K(S_4)$$. Then $$(1 \; 4) (2 \; 3) = [(2 \; 3 \; 4), (1 \; 2 \; 3)]$$ and $$(2 \; 1) (3 \; 4) = [(3 \; 4 \; 1), (2 \; 3 \; 4)]$$ suffice to generate $$K^{(2)}(S_4)$$.<sup><a href="#fn5" id="r5"></a></sup> Finally, we can easily compute $$K^{(3)}(S_4)$$ to be the trivial group.</p> <p>What does that tell us about what expressions we can rule out as solutions to the general quartic equation? Similarly to the cubic case, we expect to be able to rule out rational expressions and algebraic expressions with no nested radicals, and since $$K^{(2)}(S_4)$$ is not the trivial group, we also expect to be able to rule out algebraic expressions with singly-nested radicals, like $$\sqrt{a - \sqrt{bc - 4}}$$. But since $$K^{(3)}(S_4)$$ is the trivial group, we don&rsquo;t expect to be able to rule out algebraic expressions with doubly-nested radicals, like $$\sqrt{a - \sqrt{bc - \sqrt{d + 3}}}$$.</p> <p>As an antidote to all the abstractness above, here is a visualization for quartics, where you can examine how the various operations interact with the <a href="https://en.wikipedia.org/wiki/Quartic_function#General_formula_for_roots">quartic formula</a> and its subexpressions.</p> <div class="interactive-example"> <h3>Interactive Example 5: The quartic equation</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardQuartic" class="graph jxgbox"></div></nokatex> <span id="rootListQuartic"> $$R = \langle r_1, r_2, r_3, r_4 \rangle$$ </span> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.resetRootAndResultList();"> Reset $$R$$ and $$X$$ order </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA1, 1000);"> $$A_1 = ↺_{1, 2}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA2, 1000);"> $$A_2 = ↺_{2, 3}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA3, 1000);"> $$A_3 = ↺_{3, 4}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA4, 1000);"> $$A_4 = ↺_{4, 1}$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA1.invert(), 1000);"> $$A_1^\prime = ↻_{1, 2}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA2.invert(), 1000);"> $$A_2^\prime = ↻_{2, 3}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA3.invert(), 1000);"> $$A_3^\prime = ↻_{3, 4}$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opA4.invert(), 1000);"> $$A_4^\prime = ↻_{4, 1}$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB1, 4000);"> $$B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB2, 4000);"> $$B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB3, 4000);"> $$B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 1)$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB1.invert(), 4000);"> $$B_1^\prime$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB2.invert(), 4000);"> $$B_2^\prime$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opB3.invert(), 4000);"> $$B_3^\prime$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opC1, 16000);"> $$C_1 = [\![ B_2, B_1 ]\!] \mapsto (1 \; 4) (2 \; 3)$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opC2, 16000);"> $$C_2 = [\![ B_3, B_2 ]\!] \mapsto (2 \; 1) (3 \; 4)$$ </button> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opC1.invert(), 16000);"> $$C_1^\prime$$ </button> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.runOp(quartic.opC2.invert(), 16000);"> $$C_2^\prime$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardQuartic" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardQuartic" class="graph jxgbox"></div></nokatex> <span id="resultListQuartic"> $$X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle$$ </span> <span id="resultNoteQuartic"></span> <br /> <button class="interactive-example-button quarticDisableWhileRunningOp" type="button" onclick="quartic.findFirstOpRulingOutSelectedFormula();"> Find first operation that rules out selected formula </button> <span id="findFirstOpStatusQuartic"></span> <br /> <label> <input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio" onchange="quartic.switchFormula(quarticScaledDiscFormula);" /> $$x_1 = -27 Δ$$ </label> <br /> <label> <input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio" onchange="quartic.switchFormula(newQuarticQCubedFormula());" /> $$x_{1, 2} = Q^3 = \frac{Δ_1 + \sqrt{-27 Δ}}{2}$$ </label> <br /> <label> <input checked class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio" onchange="quartic.switchFormula(newQuarticQCubedFormula().root(3));" /> $$x_{1, 2, 3, 4, 5, 6} = Q$$ </label> <br /> <label> <input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio" onchange="quartic.switchFormula(newQuarticSFormula());" /> $$x_{1, 2, 3, 4, 5, 6} = S =$$ <br /> $$\qquad \frac{1}{2} \sqrt{-\frac{2}{3} p + \frac{1}{3a} \left( Q + \frac{Δ_0}{Q} \right)}$$ </label> <br /> <label> <input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio" onchange="quartic.switchFormula(newQuarticFormula());" /> $$x_{1, 2, 3, 4} =$$ <br /> $$\qquad -\frac{b}{4a} \mp S + \frac{1}{2} \sqrt{-4S^2 - 2p \pm \frac{q}{S}}$$ <br /> (the quartic formula) </label> </div> </div> <script type="text/javascript"> 'use strict'; function isIdentityPermutation(permutation) { for (var i = 0; i < permutation.length; ++i) { if (permutation[i] != i) { return false; } } return true; } function updateResultNote(display, resultNoteID, formulaName) { var rootPermutation = display.getRootPermutation(); var resultPermutation = display.getResultPermutation(); var resultNote = document.getElementById(resultNoteID); if (isIdentityPermutation(rootPermutation) == isIdentityPermutation(resultPermutation)) { resultNote.innerHTML = ''; } else { resultNote.innerHTML = '(Applied operation rules out selected formula as the ' + formulaName + ' formula.)'; } } function checkOpRulesOutFormula( display, resetFn, runOpFn, op, time, undoCallback, doneCallback) { resetFn(); runOpFn(op, time, function() { var rootPermutation = display.getRootPermutation(); var resultPermutation = display.getResultPermutation(); var rulesOut = (isIdentityPermutation(rootPermutation) != isIdentityPermutation(resultPermutation)); undoCallback(); runOpFn(op.invert(), time, function() { doneCallback(rulesOut); }); }); } function findFirstOpRulingOutSelectedFormulaHelper( display, resetFn, runOpFn, opInfos, statusCallback, doneCallback) { var i = 0; var undoCallback = function() { statusCallback(opInfos[i], true); } var _doneCallback = function(rulesOut) { if (rulesOut) { doneCallback(opInfos[i]); return; } ++i; if (i >= opInfos.length) { doneCallback(null); return; } statusCallback(opInfos[i], false); checkOpRulesOutFormula( display, resetFn, runOpFn, opInfos[i].op, opInfos[i].time, undoCallback, _doneCallback); }; statusCallback(opInfos); checkOpRulesOutFormula( display, resetFn, runOpFn, opInfos.op, opInfos.time, undoCallback, _doneCallback); } function findFirstOpRulingOutSelectedFormula( display, resetFn, runOpFn, opInfos, statusID) { var status = document.getElementById(statusID); var statusCallback = function(opInfo, isUndo) { if (isUndo) { status.innerHTML = 'Undoing ' + opInfo.name + '...'; } else { status.innerHTML = 'Trying ' + opInfo.name + '...'; } }; var doneCallback = function(opInfo) { if (opInfo === null) { status.innerHTML = 'No op ruling out selected formula found'; } else { status.innerHTML = opInfo.name + ' rules out selected formula'; } }; findFirstOpRulingOutSelectedFormulaHelper( display, resetFn, runOpFn, opInfos, statusCallback, doneCallback); } var quartic = (function() { var initialRoots = [ new Complex(0, 1), new Complex(-0.5, -0.5), new Complex(0.5, 0.5), new Complex(0.5, -0.5) ]; var display = new Display( "rootBoardQuartic", "coeffBoardQuartic", "formulaBoardQuartic", initialRoots, newQuarticQCubedFormula().root(3), function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); function updateRootAndResultList(display) { updateRootList(display, "rootListQuartic"); updateResultList(display, "resultListQuartic"); updateResultNote(display, "resultNoteQuartic", "quartic"); } var state = {}; function runQuarticOp(op, time, doneCallback) { runOp(display, op, time, '.quarticDisableWhileRunningOp', state, doneCallback); }; function switchQuarticFormula(formula) { switchFormula(display, state, formula); updateRootAndResultList(display); } function resetRootAndResultList() { display.reorderPointsBySubscript(); display.resetResultRotationCounters(); updateRootAndResultList(display); } var opA1 = display.swapRootOp(0, 1, updateRootAndResultList); var opA2 = display.swapRootOp(1, 2, updateRootAndResultList); var opA3 = display.swapRootOp(2, 3, updateRootAndResultList); var opA4 = display.swapRootOp(3, 0, updateRootAndResultList); var opB1 = newCommutatorAnimation(opA2, opA1); var opB2 = newCommutatorAnimation(opA3, opA2); var opB3 = newCommutatorAnimation(opA4, opA3); var opC1 = newCommutatorAnimation(opB2, opB1); var opC2 = newCommutatorAnimation(opB3, opB2); var opInfos = [ { name: 'A<sub>1</sub>', op: opA1, time: 1000 }, { name: 'A<sub>2</sub>', op: opA2, time: 1000 }, { name: 'A<sub>3</sub>', op: opA3, time: 1000 }, { name: 'A<sub>4</sub>', op: opA4, time: 1000 }, { name: 'B<sub>1</sub>', op: opB1, time: 4000 }, { name: 'B<sub>2</sub>', op: opB2, time: 4000 }, { name: 'B<sub>3</sub>', op: opB3, time: 4000 }, { name: 'C<sub>1</sub>', op: opC1, time: 16000 }, { name: 'C<sub>2</sub>', op: opC2, time: 16000 } ]; function findFirstOpRulingOutSelectedFormulaQuartic() { findFirstOpRulingOutSelectedFormula( display, resetRootAndResultList, runQuarticOp, opInfos, 'findFirstOpStatusQuartic'); } return { display: display, opA1: opA1, opA2: opA2, opA3: opA3, opA4: opA4, opB1: opB1, opB2: opB2, opB3: opB3, opC1: opC1, opC2: opC2, runOp: runQuarticOp, resetRootAndResultList: resetRootAndResultList, switchFormula: switchQuarticFormula, findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuartic }; })(); </script> <p>There are a few additions to the interactive display above. It now prints a message when it detects that the selected expression is ruled out as the quartic formula, which just looks at whether $$R$$ is not in order and $$X$$ is, and vice versa. There&rsquo;s also a button to reset the ordering of $$R$$ and $$X$$.</p> <p>The second addition is that the operations have been organized to make clear what commutator subgroup they&rsquo;re in. The $$A_i$$ map to generators of $$S_4$$. Then taking the commutators of adjacent $$A_i$$ give $$B_i$$, which map to the generators of $$K(S_4)$$, and similarly for $$C_i$$.</p> <div class="p">The third addition is a button that finds the first operation that rules out the selected formula, if any. It simply tries all the $$A_i$$s, then all the $$B_i$$s, then all the $$C_i$$s, checking $$R$$ and $$X$$ in between. The general algorithm, which assumes a fixed set of roots $$r_1, \dotsc, r_n\text{,}$$ takes an expression $$f(a_n, a_{n-1}, \dotsc)$$ where $$a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0$$ is the general $$n$$th-degree polynomial equation, takes a depth limit $$k$$, and looks like this (defining $$K^{(0)}(G)$$ to be just $$G$$): <ol> <li>For $$i$$ from 0 to $$k$$: <ol> <li>If $$K^{(i)}(S_n)$$ is trivial, then terminate indicating that $$f(a_n, a_{n-1}, \dotsc)$$ was unable to be ruled out because $$K^{(i)}(S_n)$$ is trivial.</li> <li>Otherwise, find operations $$o_1$$ to $$o_m$$ that act as the generators $$g_1$$ to $$g_m$$ of $$K^{(i)}(S_n)$$. For $$i > 0$$, this can be done by applying $$[\![ o_1, o_2 ]\!]$$ to the operations corresponding to the generators of $$K^{(i-1)}(S_n)$$.</li> <li>For each $$o_j$$: <ol> <li>Apply $$o_j$$.</li> <li>If $$R$$ is not in order but $$X$$ is, terminate indicating that $$o_j$$ rules out $$f(a_n, a_{n-1}, \dotsc)$$.</li> <li>Undo $$o_j$$, i.e. apply $$o_j^\prime$$ or reset to the initial state of $$r_1, \dotsc, r_n$$.</li> </ol></li> </ol></li> <li>Terminate indicating that $$f(a_n, a_{n-1}, \dotsc)$$ was unable to be ruled out because the depth limit has been reached.</li> </ol> </div> <p>This algorithm basically just implements the proof of the following lemma, which generalizes the previous theorems, except that it tries to find the simplest operation that is a generator that rules out the given expression.</p> <p>Before we state the lemma, we need another definition: let the <em>radical level</em> of an algebraic expression $$f(a_n, a_{n-1}, \dotsc)$$ be $$0$$ if $$f(a_n, a_{n-1}, \dotsc)$$ is a rational expression, $$1$$ if $$f(a_n, a_{n-1}, \dotsc)$$ has only non-nested radicals, and $$n + 1$$ if the maximum number of nested radicals is $$n$$.</p> <div class="theorem">(<span class="theorem-name">Lemma 3</span>.) If the algebraic expression $$f(a_n, a_{n-1}, \dotsc)$$ has radical level $$d$$ and $$K^{(d)}(S_n)$$ is non-trivial, then any operator that maps to a non-trivial element $$g$$ in $$K^{(d)}(S_n)$$ rules out $$f(a_n, a_{n-1}, \dotsc)$$ as the solution to the general $$n$$th-degree polynomial equation $a_n x^n + a_{n+1} x^{n+1} + \dotsb + a_0 = 0\text{.}$</div> <div class="proof"> <p><span class="proof-name">Rough sketch of proof.</span> We just do induction on $$d$$. For the base case $$d = 0$$, if $$K^{(0)}(S_n)$$ is non-trivial, then $$n \ge 2$$. Let $$g = (i \; j)$$ for any $$i \ne j$$, of which there must at least be one. Then by the same reasoning as Theorem&nbsp;1, $$g$$ rules out $$f(a_n, a_{n-1}, \dotsc)$$. Since the $$(i \; j)$$ generate $$S_n$$, then any $$g \in S_n$$ is the composition of some sequence of $$(i \; j)$$s, each of which rules out $$f(a_n, a_{n-1}, \dotsc)$$, so $$g$$ must also rule it out.</p> <p>Assume the lemma holds for $$d$$, and let $$x = f_{d+1}(a_n, a_{n-1}, \dotsc) = \sqrt[k]{f_d(a_n, a_{n-1}, \dotsc)}$$ for some $$k$$, where $$f_d$$ has radical level $$d$$. Let $$o$$ act on $$R$$ like any non-trivial element $$g$$ of $$K^{(d+1)}(S_n)$$. By the induction hypothesis, all elements $$h_i \in K^{(d)}(S_n)$$ cause $$x = f_d(a_n, a_{n-1}, \dotsc)$$ to go around a loop, so pick a point $$z_0$$ that is never equal to any point $$x$$ traverses under any operation corresponding to $$h_i$$. Then, since $$g = [h, k]$$ for $$h, k \in K^{(d)}(S_n)$$, by the same reasoning as in Theorem&nbsp;2, the total angle induced by $$o$$ on $$x = f_{d+1}(a_n, a_{n-1}, \dotsc)$$ around $$z_0$$ is $$0$$, and the distance from $$z_0$$ remains unchanged. Thus, $$x = f_{d+1}(a_n, a_{n-1}, \dotsc)$$ remains fixed, and $$o$$ rules it out.</p> <p>By the same reasoning as in Theorem 2, this can be extended to the general case of $$f(a_n, a_{n-1}, \dotsc)$$ being any algebraic formula with nesting level $$d + 1$$. &#x220e;</p> </div> <div>We can immediately deduce the following corollaries, using the fact that $$K^{(2)}(S_4)$$ is non-trivial: <div class="theorem">(<span class="theorem-name">Corollary 4</span>.) An algebraic expression with at most singly-nested radicals in the coefficients of the general quartic equation $ax^4 + bx^3 + cx^2 + dx + e = 0$ cannot be a solution to this equation.<sup><a href="#fn6" id="r6"></a></sup></div> </div> </section> <section> <header> <h2>5. Quintic Equations</h2> </header> <p>Now, finally, the quintic. Let&rsquo;s jump right to the interactive example.</p> <div class="interactive-example"> <h3>Interactive Example 6: The quintic equation</h3> <div class="graph-container"> Roots <nokatex><div id="rootBoardQuintic" class="graph jxgbox"></div></nokatex> <span id="rootListQuintic"> $$R = \langle r_1, r_2, r_3, r_4, r_5 \rangle$$ </span> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.resetRootAndResultList();"> Reset $$R$$ and $$X$$ order </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA1, 1000);"> $$A_1 = ↺_{1, 2}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA2, 1000);"> $$A_2 = ↺_{2, 3}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA3, 1000);"> $$A_3 = ↺_{3, 4}$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA4, 1000);"> $$A_4 = ↺_{4, 5}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA5, 1000);"> $$A_5 = ↺_{5, 1}$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA1.invert(), 1000);"> $$A_1^\prime = ↻_{1, 2}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA2.invert(), 1000);"> $$A_2^\prime = ↻_{2, 3}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA3.invert(), 1000);"> $$A_3^\prime = ↻_{3, 4}$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA4.invert(), 1000);"> $$A_4^\prime = ↻_{4, 5}$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opA5.invert(), 1000);"> $$A_5^\prime = ↻_{5, 1}$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB1, 4000);"> $$B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB2, 4000);"> $$B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB3, 4000);"> $$B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 5)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB4, 4000);"> $$B_4 = [\![ A_5, A_4 ]\!] \mapsto (4 \; 5 \; 1)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB5, 4000);"> $$B_5 = [\![ A_1, A_5 ]\!] \mapsto (5 \; 1 \; 2)$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB1.invert(), 4000);"> $$B_1^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB2.invert(), 4000);"> $$B_2^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB3.invert(), 4000);"> $$B_3^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB4.invert(), 4000);"> $$B_4^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opB5.invert(), 4000);"> $$B_5^\prime$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC1, 16000);"> $$C_1 = [\![ B_3, B_1 ]\!] \mapsto (2 \; 3 \; 5)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC2, 16000);"> $$C_2 = [\![ B_4, B_2 ]\!] \mapsto (3 \; 4 \; 1)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC3, 16000);"> $$C_3 = [\![ B_5, B_3 ]\!] \mapsto (4 \; 5 \; 2)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC4, 16000);"> $$C_4 = [\![ B_1, B_4 ]\!] \mapsto (5 \; 1 \; 3)$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC5, 16000);"> $$C_5 = [\![ B_2, B_5 ]\!] \mapsto (1 \; 2 \; 4)$$ </button> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC1.invert(), 16000);"> $$C_1^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC2.invert(), 16000);"> $$C_2^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC3.invert(), 16000);"> $$C_3^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC4.invert(), 16000);"> $$C_4^\prime$$ </button> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.runOp(quintic.opC5.invert(), 16000);"> $$C_5^\prime$$ </button> </div> <div class="graph-container"> Coefficients <nokatex><div id="coeffBoardQuintic" class="graph jxgbox"></div></nokatex> </div> <div class="graph-container"> Candidate solution <nokatex><div id="formulaBoardQuintic" class="graph jxgbox"></div></nokatex> <span id="resultListQuintic"> $$X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle$$ </span> <span id="resultNoteQuintic"></span> <br /> <button class="interactive-example-button quinticDisableWhileRunningOp" type="button" onclick="quintic.findFirstOpRulingOutSelectedFormula();"> Find first operation that rules out selected formula </button> <span id="findFirstOpStatusQuintic"></span> <br /> <label> <input class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio" onchange="quintic.switchFormula(quintic.fA);" /> $$x_1 = f_A = Δ$$ </label> <br /> <label> <input class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio" onchange="quintic.switchFormula(quintic.newFB());" /> $$x_{1, 2} = f_B = \sqrt{f_A}$$ </label> <br /> <label> <input checked class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio" onchange="quintic.switchFormula(quintic.newFC());" /> $$x_{1, 2, 3, 4, 5, 6} = f_C =$$ <br /> $$\qquad \sqrt{(f_B - 0.8)(f_B - 0.75)}$$ </label> </div> </div> <script type="text/javascript"> 'use strict'; var quintic = (function() { var initialRoots = [ new Complex(0, 1), new Complex(-0.5, -0.5), new Complex(0.5, -0.5), new Complex(1, 0), new Complex(0.5, 0.5) ]; var display = new Display( "rootBoardQuintic", "coeffBoardQuintic", "formulaBoardQuintic", initialRoots, newFC(), function() {}); display._resultRotationCounterPoint.setAttribute({visible: false}); for (var i = 0; i < display._rootPointsBySubscript.length; ++i) { display._rootPointsBySubscript[i].setAttribute({ fixed: true }); } function updateRootAndResultList(display) { updateRootList(display, "rootListQuintic"); updateResultList(display, "resultListQuintic"); updateResultNote(display, "resultNoteQuintic", "quintic"); } var state = {}; function runQuinticOp(op, time, doneCallback) { runOp(display, op, time, '.quinticDisableWhileRunningOp', state, doneCallback); }; function switchQuinticFormula(formula) { switchFormula(display, state, formula); updateRootAndResultList(display); } function resetRootAndResultList() { display.reorderPointsBySubscript(); display.resetResultRotationCounters(); updateRootAndResultList(display); } var opA1 = display.swapRootOp(0, 1, updateRootAndResultList); var opA2 = display.swapRootOp(1, 2, updateRootAndResultList); var opA3 = display.swapRootOp(2, 3, updateRootAndResultList); var opA4 = display.swapRootOp(3, 4, updateRootAndResultList); var opA5 = display.swapRootOp(4, 0, updateRootAndResultList); var opA1Inv = opA1.invert(); var opA2Inv = opA2.invert(); var opA3Inv = opA3.invert(); var opA4Inv = opA4.invert(); var opA5Inv = opA5.invert(); var opB1 = newCommutatorAnimation(opA2, opA1); var opB2 = newCommutatorAnimation(opA3, opA2); var opB3 = newCommutatorAnimation(opA4, opA3); var opB4 = newCommutatorAnimation(opA5, opA4); var opB5 = newCommutatorAnimation(opA1, opA5); var opB1Inv = opB1.invert(); var opB2Inv = opB2.invert(); var opB3Inv = opB3.invert(); var opB4Inv = opB4.invert(); var opB5Inv = opB5.invert(); var opC1 = newCommutatorAnimation(opB3, opB1); var opC2 = newCommutatorAnimation(opB4, opB2); var opC3 = newCommutatorAnimation(opB5, opB3); var opC4 = newCommutatorAnimation(opB1, opB4); var opC5 = newCommutatorAnimation(opB2, opB5); var opC1Inv = opC1.invert(); var opC2Inv = opC2.invert(); var opC3Inv = opC3.invert(); var opC4Inv = opC4.invert(); var opC5Inv = opC5.invert(); var opInfos = [ { name: 'A<sub>1</sub>', op: opA1, time: 1000 }, { name: 'A<sub>2</sub>', op: opA2, time: 1000 }, { name: 'A<sub>3</sub>', op: opA3, time: 1000 }, { name: 'A<sub>4</sub>', op: opA4, time: 1000 }, { name: 'A<sub>5</sub>', op: opA5, time: 1000 }, { name: 'B<sub>1</sub>', op: opB1, time: 4000 }, { name: 'B<sub>2</sub>', op: opB2, time: 4000 }, { name: 'B<sub>3</sub>', op: opB3, time: 4000 }, { name: 'B<sub>4</sub>', op: opB4, time: 4000 }, { name: 'B<sub>5</sub>', op: opB5, time: 4000 }, { name: 'C<sub>1</sub>', op: opC1, time: 16000 }, { name: 'C<sub>2</sub>', op: opC2, time: 16000 }, { name: 'C<sub>3</sub>', op: opC3, time: 16000 }, { name: 'C<sub>4</sub>', op: opC4, time: 16000 }, { name: 'C<sub>5</sub>', op: opC5, time: 16000 } ]; function findFirstOpRulingOutSelectedFormulaQuintic() { findFirstOpRulingOutSelectedFormula( display, resetRootAndResultList, runQuinticOp, opInfos, 'findFirstOpStatusQuintic'); } // Ruled out by A_i. var fA = quinticDiscFormula; // Ruled out by B_i. function newFB() { return quinticDiscFormula.root(2); } // Has a rotation number with B_1, B_2, B_4, and B_5. function newPreFC1() { return newFB().minusAll(0.8); } // Has a rotation number with B_3. function newPreFC2() { return newFB().minusAll(0.75); } // Has a rotation number with all B_i. function newPreFC3() { return ComplexFormula.times( newPreFC1(), newPreFC2() ); } // 2 evenly divides the rotation numbers with B_1, B_2, B_4, and B_5, so // this doesn't work for f_C. function newPreFC4() { return newPreFC3().root(2); } // Ruled out by C_i. function newFC() { return newPreFC3().root(3); } return { display: display, opA1: opA1, opA2: opA2, opA3: opA3, opA4: opA4, opA5: opA5, opB1: opB1, opB2: opB2, opB3: opB3, opB4: opB4, opB5: opB5, opC1: opC1, opC2: opC2, opC3: opC3, opC4: opC4, opC5: opC5, fA: fA, newFB: newFB, newFC: newFC, runOp: runQuinticOp, resetRootAndResultList: resetRootAndResultList, switchFormula: switchQuinticFormula, findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuintic }; })(); </script> <p>Similarly to the interactive example for the quartic, the operations are organized to make clear what commutator subgroup they&rsquo;re in. There&rsquo;s something interesting though&mdash;the $$C_i$$ seem very similar to the $$B_i$$. In fact, the $$C_i$$ also act on $$R$$ like $$A_5$$! Also, if you compute $$D_i = [\![ C_{(i+1) \bmod 5}, C_{i \bmod 5} ]\!]$$, you will find that $$D_i$$ acts exactly like $$B_i$$ on $$R$$!</p> <div class="p">Why can we do this for the quintic, but not for anything of lower degree? This is because $$A_5$$ is <a href="https://en.wikipedia.org/wiki/Perfect_group">perfect</a>, which means that it equals its own commutator subgroup. (You can verify this yourself by brute force, e.g. writing a program, or you can play around with $$3$$-cycles and see that any $$3$$-cycle is the commutator of two other $$3$$-cycles.) Then this immediately implies that $$K^{(n)}(S_5)$$ is non-trivial for any $$n$$, which then implies our main result: <div class="theorem">(<span class="theorem-name">Abel-Ruffini theorem</span>.) An algebraic expression in the coefficients of the general $$n$$th-degree polynomial equation $a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0$ for $$n \ge 5$$ cannot be a solution to this equation.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> By the above, $$A_5$$ is perfect, so $$K^{(d)}(S_5)$$ is non-trivial for all $$d$$.</p> <p>Since $$S_5$$ is a subgroup of $$S_n$$ for $$n \ge 5$$, $$A_5 = K(S_5)$$ must also be a subgroup of $$A_n = K(S_n)$$ for $$n \ge 5$$. But since $$A_5$$ is perfect, then $$A_5$$ must also be a subgroup of $$K^{(d)}(S_n)$$ for any $$d$$, which means that $$K^{(d)}(S_n)$$ is non-trivial for any $$d$$ and $$n \ge 5$$.</p> <p>An algebraic expression has some finite radical level $$d$$, but $$K^{(d)}(S_5)$$ is non-trivial for any $$d$$ and $$n \ge 5$$, so by Lemma&nbsp;3 no algebraic expression can be solution to the general $$n$$th-degree polynomial equation for $$n \ge 5$$. &#x220e;</p> </div> </div> <p>With the theorem above, we now have a succinct answer to the question at the beginning of this article. You can&rsquo;t write down a solution to the general quadratic equation that is a rational expression because you can find an operation on the roots that will permute them non-trivially and yet leave the result of the expression constant. For the same reason, you can&rsquo;t write down a solution to the general $$n$$th-degree polynomial equation that is an algebraic equation!</p> <p>Finally, as a bonus, I&rsquo;ll explain how to generate algebraic expressions that require a &ldquo;$$d$$th-level&rdquo; operator, meaning an operator that maps to an element of $$K^{(d)}(S_n)$$, assuming it&rsquo;s non-trivial. This shows that there&rsquo;s no single &ldquo;super-operation&rdquo; that rules out all algebraic expressions.</p> <div class="p">As an example, the formulas in the interactive example above are chosen so that $$f_A$$ is ruled out by the $$A_i$$, $$f_B$$ is ruled out by the $$B_i$$, etc. They depend on the particular roots chosen, of course, which is why this interactive example doesn&rsquo;t let you move the roots around, but in principle you could build formulas for any polynomial that is first ruled out by $$C_i$$, or $$D_i$$, or whatever you wish. Given a polynomial $$P = a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0$$ of degree $$n \ge 5$$ and $$d$$, a recursive algorithm to generate an expression that is ruled out only by a &ldquo;$$d$$th-level&rdquo; operator is: <ol> <li>If $$d = 0$$, return $$Δ(a_n, a_{n-1}, \dotsc)$$.</li> <li>Otherwise, run this algorithm with $$P$$ and $$d-1$$ to get $$f_{d-1}(a_n, a_{n-1}, \dotsc)$$.</li> <li>Find operations $$o_1$$ to $$o_m$$ that correspond to generators $$g_1$$ to $$g_m$$ of $$K^{(d-1)}(S_n)$$.</li> <li>For each $$o_i$$: <ol> <li>Apply $$o_i$$, which makes $$x = f_{d-1}(a_n, a_{n-1}, \dotsc)$$ go around a loop. Record the looped-around regions and their associated rotation numbers (i.e., the total angle divided by $$2π$$).</li> </ol> </li> <li>Pick points $$z_1, \dotsc, z_t$$ such that each $$z_i$$ has a non-zero rotation number for at least one $$o_j$$. $$t$$ can be at most $$m$$.</li> <li>Let $$k$$ be the least number such that, for every $$o_i$$, $$k$$ doesn&rsquo;t divide any of the rotation numbers of any $$z_j$$ with respect to $$o_i$$. Return $$f_d(a_n, a_{n-1}, \dotsc) = \sqrt[k]{\prod_i (f_{k-1}(a_n, a_{n-1}, \dotsc) - z_i)}$$. </li> </ol> </div> </section> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> This proof is originally due to <a href="https://en.wikipedia.org/wiki/Vladimir_Arnold">Arnold</a>. There are a <a href="https://www.youtube.com/watch?v=RhpVSV6iCko">couple</a> of <a href="http://drorbn.net/dbnvp/AKT-140314.php">videos</a> that talk about this proof, as well as <a href="http://link.springer.com/book/10.1007%2F1-4020-2187-9">this book</a> based on Arnold&rsquo;s lectures, and <a href="https://www.tmna.ncu.pl/static/files/v16n2-02.pdf">this paper</a>. I mostly follow Boaz&rsquo;s video, and the interactive visualizations are based on the visualizations he has in his video.</p> <p>The interactive visualizations were generated using the excellent <a href="http://jsxgraph.uni-bayreuth.de/wp/index.html">JSXGraph</a> library. <a href="#r1">↩</a></p> <p id="fn2"> Theorem&nbsp;1 can be generalized even more! We can append other functions and operations to rational expressions, as long as those functions and operations are continuous and single-valued. For example, we can allow the use of exponentials and trigonometric functions, which is something that the standard Galois theory cannot handle.<a href="#r2">↩</a></p> <p id="fn3"> More precisely, a $$↺_{i, j}$$ contains a pair of simple paths, i.e. continuous injective functions $$[0, 1] \to \mathbb{C}$$, between two distinct points of $$\mathbb{C}$$, such that their concatenation defines a simple closed curve around a region in $$\mathbb{C}$$ with a counter-clockwise orientation. Also, depending on the exact method of formalizing $$↺_{i, j}$$, it either explicitly or implicitly encodes a permutation on $$R$$. Then we can define an operation $$*$$ on the $$↺_{i, j}$$ and $$↻_{i, j}$$ (defined analogously) which concatenates the paths (and composes the permutations, if explicitly encoded). Since the space of paths has no inverses or an identity, the $$↺_{i, j}$$ and $$↻_{i, j}$$ generate a <a href="https://en.wikipedia.org/wiki/Free_semigroup">free semigroup</a> with the operation $$*$$. Then this semigroup defines an <a href="https://en.wikipedia.org/wiki/Semigroup_action">action</a> on $$R$$ via its associated permutation on $$R$$, which then just generates $$S_n$$, since $$S_n$$ is generated by adjacent swaps.</p> <p>We make a distinction between the operation $$↺_{i, j}$$ and the permutation it induces on $$R$$, since the latter &ldquo;loses&rdquo; the orientation information, which is important to preserve when talking about the action of $$↺_{i, j}$$ on some $$x_i$$. <a href="#r3">↩</a></p> <p id="fn4"> Note that, depending on the text, the commutator may be defined slightly differently as $$g h g^{-1} h^{-1}$$. <a href="#r4">↩</a></p> <p id="fn5"> $$K(A_4)$$ is isomorphic to $$V$$, the <a href="https://en.wikipedia.org/wiki/Klein_four-group">Klein four-group</a>. <a href="#r5">↩</a></p> <p id="fn6"> In fact, the quartic formula has three nested radicals. I wonder why? <a href="#r6">↩</a></p> </section> https://www.akalin.com/computing-iroot Computing Integer Roots 2016-01-10T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> KaTeXMacros = { "\\iroot": "\\operatorname{iroot}", "\\Bits": "\\operatorname{Bits}", "\\Err": "\\operatorname{Err}", "\\NewtonRoot": "\\mathrm{N{\\small EWTON}\\text{-}I{\\small ROOT}}", }; </script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn.js"></script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn2.js"></script> <section> <header> <h2>1. The algorithm</h2> </header> <p>Today I&rsquo;m going to talk about the generalization of the <a href="/computing-isqrt">integer square root algorithm</a> to higher roots. That is, given $$n$$ and $$p$$, computing $$\iroot(n, p) = \lfloor \sqrt[p]{n} \rfloor$$, or the greatest integer whose $$p$$th power is less than or equal to $$n$$. The generalized algorithm is straightforward, and it&rsquo;s easy to generalize the proof of correctness, but the run-time bound is a bit trickier, since it has a dependence on $$p$$.</p> <div class="p">First, the algorithm, which we&rsquo;ll call $$\NewtonRoot$$: <ol> <li>If $$n = 0$$, return $$0$$.</li> <li>If $$p \ge \Bits(n)$$ return $$1$$.</li> <li>Otherwise, set $$i$$ to $$0$$ and set $$x_0$$ to $$2^{\lceil \Bits(n) / p\rceil}$$.</li> <li>Repeat: <ol> <li>Set $$x_{i+1}$$ to $$\lfloor ((p - 1) x_i + \lfloor n/x_i^{p-1} \rfloor) / p \rfloor$$.</li> <li>If $$x_{i+1} \ge x_i$$, return $$x_i$$. Otherwise, increment $$i$$.</li> </ol> </li> </ol> </div> <div class="p">and its implementation in Javascript:<sup><a href="#fn1" id="r1"></a></sup> <script> // iroot returns the greatest number x such that x^p <= n. The type of // n must behave like BigInteger (e.g., // https://github.com/akalin/jsbn ), n must be non-negative, and // p must be a positive integer. // // Example (open up the JS console on this page and type): // // iroot(new BigInteger("64"), 3).toString() function iroot(n, p) { var s = n.signum(); if (s < 0) { throw new Error('negative radicand'); } if (p <= 0) { throw new Error('non-positive degree'); } if (p !== (p|0)) { throw new Error('non-integral degree'); } if (s == 0) { return n; } var b = n.bitLength(); if (p >= b) { return n.constructor.ONE; } // x = 2^ceil(Bits(n)/p) var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p)); var pMinusOne = new n.constructor((p - 1).toString()); var pBig = new n.constructor(p.toString()); while (true) { // y = floor(((p-1)x + floor(n/x^(p-1)))/p) var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig); if (y.compareTo(x) >= 0) { return x; } x = y; } } </script> <pre class="code-container"><code class="language-javascript">// iroot returns the greatest number x such that x^p &lt;= n. The type of // n must behave like BigInteger (e.g., // https://github.com/akalin/jsbn ), n must be non-negative, and // p must be a positive integer. // // Example (open up the JS console on this page and type): // // iroot(new BigInteger(&quot;64&quot;), 3).toString() function iroot(n, p) { var s = n.signum(); if (s &lt; 0) { throw new Error(&#39;negative radicand&#39;); } if (p &lt;= 0) { throw new Error(&#39;non-positive degree&#39;); } if (p !== (p|0)) { throw new Error(&#39;non-integral degree&#39;); } if (s == 0) { return n; } var b = n.bitLength(); if (p &gt;= b) { return n.constructor.ONE; } // x = 2^ceil(Bits(n)/p) var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p)); var pMinusOne = new n.constructor((p - 1).toString()); var pBig = new n.constructor(p.toString()); while (true) { // y = floor(((p-1)x + floor(n/x^(p-1)))/p) var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig); if (y.compareTo(x) &gt;= 0) { return x; } x = y; } }</code></pre> </div> <p>This algorithm turns out to require $$Θ(p) + O(\lg \lg n)$$ loop iterations, with the run-time for a loop iteration depending on what kind of arithmetic operations are used.</p> </section> <section> <header> <h2>2. Correctness</h2> </header> <p>Again we look at the iteration rule: $x_{i+1} = \left\lfloor \frac{(p - 1) x_i + \left\lfloor \frac{n}{x_i^{p-1}} \right\rfloor}{p} \right\rfloor$ Letting $$f(x)$$ be the right-hand side, we can again use basic properties of the floor function to remove the inner floor: $f(x) = \left\lfloor \frac{1}{p} ((p-1) x + n/x^{p-1}) \right\rfloor$ Letting $$g(x)$$ be its real-valued equivalent: $g(x) = \frac{1}{p} ((p-1) x + n/x^{p-1})$ we can, again using basic properties of the floor function, show that $$f(x) \le g(x)$$, and for any integer $$m$$, $$m \le f(x)$$ if and only if $$m \le g(x)$$.</p> <p>Finally, let&rsquo;s give a name to our desired output: let $$s = \iroot(n, p) = \lfloor \sqrt[p]{n} \rfloor$$.<sup><a href="#fn2" id="r2"></a></sup></p> <div class="p">Unsurprisingly, $$f(x)$$ never underestimates: <div class="theorem">(<span class="theorem-name">Lemma 1</span>.) For $$x \gt 0$$, $$f(x) \ge s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> By the basic properties of $$f(x)$$ and $$g(x)$$ above, it suffices to show that $$g(x) \ge s$$. $$g'(x) = (1 - 1/p) (1 - n/x^p)$$ and $$g''(x) = (p - 1) (n/x^{p+1})$$. Therefore, $$g(x)$$ is concave-up for $$x \gt 0$$; in particular, its single positive extremum at $$x = \sqrt[p]{n}$$ is a minimum. But $$g(\sqrt[p]{n}) = \sqrt[p]{n} \ge s$$. &#x220e;</p> </div> Also, our initial guess is always an overestimate: <div class="theorem">(<span class="theorem-name">Lemma 2</span>.) $$x_0 \gt s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> $$\Bits(n) = \lfloor \lg n \rfloor + 1 \gt \lg n$$. Therefore, \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / p \rceil} \\ &\ge 2^{\Bits(n) / p} \\ &\gt 2^{\lg n / p} \\ &= \sqrt[p]{n} \\ &\ge s\text{.} \; \blacksquare \end{aligned} </p> </div> Therefore, we again have the invariant that $$x_i \ge s$$, which lets us prove partial correctness: <div class="theorem">(<span class="theorem-name">Theorem 1</span>.) If $$\NewtonRoot$$ terminates, it returns the value $$s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Assume it terminates. If it terminates in step $$1$$ or $$2$$, then we are done. Otherwise, it can only terminate in step $$4.2$$ where it returns $$x_i$$ such that $$x_{i+1} = f(x_i) \ge x_i$$. This implies $$g(x_i) = ((p-1)x_i + n/x_i^{p-1}) / p \ge x_i$$. Rearranging yields $$n \ge x_i^p$$ and combining with our invariant we get $$\sqrt[p]{n} \ge x_i \ge s$$. But $$s + 1 \gt \sqrt[p]{n}$$, so that forces $$x_i$$ to be $$s$$, and thus $$\NewtonRoot$$ returns $$s$$ if it terminates. &#x220e;</p> </div> </div> <div class="p">Total correctness is also easy: <div class="theorem">(<span class="theorem-name">Theorem 2</span>.) $$\NewtonRoot$$ terminates.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Assume it doesn&rsquo;t terminate. Then we have a strictly decreasing infinite sequence of integers $$\{ x_0, x_1, \dotsc \}$$. But this sequence is bounded below by $$s$$, so it cannot decrease indefinitely. This is a contradiction, so $$\NewtonRoot$$ must terminate. &#x220e;</p> </div> Note that, like $$\NewtonRoot$$, the check in step $$4.2$$ cannot be weakened to $$x_{i+1} = x_i$$, as doing so would cause the algorithm to oscillate. In fact, as $$p$$ grows, so do the number of values of $$n$$ that exhibit this behavior, and so do the number of possible oscillations. For example, $$n = 972$$ with $$p = 3$$ would yield the sequence $$\{ 16, 11, 10, 9, 10, 9, \dotsc \}$$, and $$n = 80$$ with $$p = 4$$ would yield the sequence $$\{ 4, 3, 2, 4, 3, 2, \dotsc \}$$.</div> </section> <section> <header> <h2>3. Run-time</h2> </header> <p>We will show that $$\NewtonRoot$$ takes $$Θ(p) + O(\lg \lg n)$$ loop iterations. Then we will analyze a single loop iteration and the arithmetic operations used to get a total run-time bound.</p> <div class="p">Analagous to the square root case, define $$\Err(x) = x^p/n - 1$$ and let $$ϵ_i = \Err(x_i)$$. First, let&rsquo;s prove our lower bound for $$ϵ_i$$, which translates directly from the square root case: <div class="theorem">(<span class="theorem-name">Lemma 3</span>.) $$x_i \ge s + 1$$ if and only if $$ϵ_i \ge 1/n$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> $$n \lt (s + 1)^p$$, so $$n + 1 \le (s + 1)^p$$, and therefore $$(s + 1)^p/n - 1 \ge 1/n$$. But the expression on the left side is just $$\Err(s + 1)$$. $$x_i \ge s + 1$$ if and only if $$ϵ_i \ge \Err(s + 1)$$, so the result immediately follows. &#x220e;</p> </div> </div> <p>Now for the next few lemmas we need to do some algebra and calculus. Inverting $$\Err(x)$$, we get that $$x_i = \sqrt[p]{(ϵ_i + 1) \cdot n}$$. Expressing $$g(x_i)$$ in terms of $$ϵ_i$$ and $$q = 1 - 1/p$$ we get $g(x_i) = \sqrt[p]{n} \left( \frac{ϵ_i q + 1}{(ϵ_i + 1)^q} \right)$ and $\Err(g(x_i)) = \frac{(q ϵ_i + 1)^p}{(ϵ_i + 1)^{p-1}} - 1\text{.}$ Let $f(ϵ) = \frac{(q ϵ + 1)^p}{(ϵ + 1)^{p-1}} - 1\text{.}$ Then computing derivatives, \begin{aligned} f'(ϵ) &= q ϵ \frac{(q ϵ + 1)^{p-1}}{(ϵ + 1)^p}\text{,} \\ f''(ϵ) &= q \frac{(q ϵ + 1)^{p-2}}{(ϵ + 1)^{p + 1}}\text{, and} \\ f'''(ϵ) &= -q (2 + q (2 + 3 ϵ)) \frac{(q ϵ + 1)^{p-3}}{(ϵ + 1)^{p + 2}}\text{.} \end{aligned} Note that $$f(0) = f'(0) = 0$$, and $$f''(0) = q$$. Also, for $$ϵ > 0$$, $$f'(ϵ) \gt 0$$, $$f''(ϵ) \gt 0$$, and $$f'''(ϵ) &lt; 0$$.</p> <div class="p">Now we&rsquo;re ready to show that the $$ϵ_i$$ shrink quadratically: <div class="theorem">(<span class="theorem-name">Lemma 4</span>.) $$f(ϵ) \lt (ϵ/\sqrt{2})^2$$ for $$ϵ \gt 0$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Taylor-expand $$f(ϵ)$$ around $$0$$ with the <a href="https://en.wikipedia.org/wiki/Taylor%27s_theorem#Explicit_formulae_for_the_remainder">Lagrange remainder form</a> to get $f(ϵ) = f(0) + f'(0) ϵ + \frac{f''(0)}{2} ϵ^2 + \frac{f'''(\xi)}{6} ϵ^3$ for some some $$\xi$$ such that $$0 \lt \xi \lt ϵ$$. Plugging in values, we see that $$f(ϵ) = \frac{1}{2} q ϵ^2 + \frac{1}{6} f'''(\xi) ϵ^3$$ with the last term being negative, so $$f(ϵ) \lt \frac{1}{2} q ϵ^2 \lt \frac{1}{2} ϵ^2$$. &#x220e;</p> </div> But this is only a useful upper bound when $$ϵ_i \le 1$$. In the square root case this was okay, since $$ϵ_1 \le 1$$, but that is not true for larger values of $$p$$. In fact, in general, the $$ϵ_i$$ start off shrinking <em>linearly</em>: <div class="theorem">(<span class="theorem-name">Lemma 5</span>.) For $$ϵ \gt 1$$, $$f(ϵ) \gt ϵ/8$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Since $$f(0) = f'(0) = 0$$, and $$f''(ϵ) \gt 0$$ for $$ϵ \ge 0$$, $$f'(ϵ)$$ and $$f(ϵ)$$ are increasing, and thus $$f(1) \gt 0$$ and $$f(ϵ)$$ is a concave-up curve.</p> <p>Then $$(0, 0)$$ and $$(1, f(1))$$ are two points on a concave-up curve, and thus geometrically the line $$y = f(1) ϵ$$ must lie below $$y = f(ϵ)$$ for $$ϵ \gt 1$$, and thus $$f(ϵ) \gt f(1) ϵ$$ for $$ϵ \gt 1$$. Algebraically, this also follows from the definition of <a href="https://en.wikipedia.org/wiki/Convex_function">(strict) convexity</a> (with $$x_1 = 0$$, $$x_2 = ϵ$$, and $$t = 1 - 1/ϵ$$).</p> <p>But $$f(1) = (2 - 1/p)^p/2^{p-1} - 1 = 2 \left(1 - \frac{1}{2p}\right)^p - 1$$, which is always increasing as a function of $$p$$, as you can see by calculating its derivative. Therefore, its minimum is at $$p = 2$$, which is $$1/8$$, and so $$f(ϵ) \gt f(1) ϵ \ge ϵ/8$$. &#x220e;</p> </div> Finally, let&rsquo;s bound our initial values: <div class="theorem">(<span class="theorem-name">Lemma 6</span>.) $$x_0 \le 2s$$ and $$ϵ_0 \le 2^p - 1$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> This is a straightforward generalization of the equivalent lemma from the square root case. Let&rsquo;s start with $$x_0$$: \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / p \rceil} \\ &= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + p - 1)/p \rfloor} \\ &= 2^{\lfloor \lg n / p \rfloor + 1} \\ &= 2 \cdot 2^{\lfloor \lg n / p \rfloor}\text{.} \end{aligned} Then $$x_0/2 = 2^{\lfloor \lg n / p \rfloor} \le 2^{\lg n / p} = \sqrt[p]{n}$$. Since $$x_0/2$$ is an integer, $$x_0/2 \le \sqrt[p]{n}$$ if and only if $$x_0/2 \le \lfloor \sqrt[p]{n} \rfloor = s$$. Therefore, $$x_0 \le 2s$$.</p> <p>As for $$ϵ_0$$: \begin{aligned} ϵ_0 &= \Err(x_0) \\ &\le \Err(2s) \\ &= (2s)^p/n - 1 \\ &= 2^p s^p/n - 1\text{.} \end{aligned} Since $$s^p \le n$$, $$2^p s^p/n \le 2^p$$ and thus $$ϵ_0 \le 2^p - 1$$. &#x220e;</p> </div> </div> <div class="p">Now we&rsquo;re ready to show our main result, which involves calculating how long the $$ϵ_i$$ shrink linearly: <div class="theorem">(<span class="theorem-name">Theorem 3</span>.) $$\NewtonRoot$$ performs $$Θ(p) + O(\lg \lg n)$$ loop iterations.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Assume that $$ϵ_i \gt 1$$ for $$i \le j$$, $$ϵ_{j+1} \le 1$$, and $$j+k$$ is the number of loop iterations performed when running the algorithm for $$n$$ and $$p$$ (i.e., $$x_{j+k} \ge x_{j+k-1}$$). Using Lemma 5, $\left( \frac{1}{8} \right)^{j+1} ϵ_0 \lt ϵ_{j+1} \le 1\text{,}$ which implies $j \gt \frac{\lg ϵ_0}{3} - 1\text{.}$ </p> <p>Similarly, $\left( \frac{1}{8} \right)^j ϵ_0 \ge ϵ_j \gt 1\text{,}$ which implies $j \lt \frac{\lg ϵ_0}{3} \text{.}$ Therefore, $$j = Θ(\lg ϵ_0)$$, which is $$Θ(p)$$ by Lemma 6.</p> <p>Now assume $$k \ge 5$$. Then $$x_i \ge s + 1$$ for $$i \lt j + k - 1$$. Since $$ϵ_{j+1} \le 1$$ by assumption, $$ϵ_{j+3} \le 1/2$$ and $$ϵ_i \le (ϵ_{j+3})^{2^{i-j-3}}$$ for $$j + 3 \le i \lt j + k - 1$$ by Lemma 4, then $$ϵ_{j+k-2} \le 2^{-2^{k-5}}$$. But $$1/n \le ϵ_{j+k-2}$$ by Lemma 3, so $$1/n \le 2^{-2^{k-5}}$$. Taking logs to bring down the $$k$$ yields $$k - 5 \le \lg \lg n$$. Then $$k \le \lg \lg n + 5$$, and thus $$k = O(\lg \lg n)$$.</p> <p>Therefore, the total number of loop iterations is $$Θ(p) + O(\lg \lg n)$$. &#x220e;</p> </div> </div> <p>Note that $$p \le \lg n$$, so we can just say that $$\NewtonRoot$$ performs $$Θ(\lg n)$$ operations. But that obscures rather than simplifies. Note that the proof above is very similar to the proof of the worse run-time of $$\mathrm{N{\small EWTON}\text{-}I{\small SQRT}'}$$ where the initial guess varies. In this case, the error in our initial guess is magnified, since we raise it to the $$(p-1)$$th power, and so that manifests as the $$Θ(p)$$ term.</p> <p>Furthermore, unlike the square root case, the number of arithmetic operations in a loop iteration isn&rsquo;t constant. In particular, the sub-step to compute $$x_i^{p-1}$$ takes a number of arithmetic operations dependent on $$p - 1$$. Using repeated squarings, this computation would take $$Θ(\lg p)$$ squarings and at most $$Θ(\lg p)$$ multiplications.</p> <p>If the cost of an arithmetic operation is constant, e.g., we&rsquo;re working with fixed-size integers, then the run-time bounds is the above multiplied by $$Θ(\lg p)$$.</p> <p>Otherwise, if the cost of an arithmetic operation depends on the length of its arguments, then we only have to multiply by a constant factor to get the run-time bounds in terms of arithmetic operations. If the cost of multiplying two numbers $$\le x$$ is $$M(x) = O(\lg^k x)$$, then the cost of computing $$x^p$$ is $$O((p \lg x)^k)$$. But $$x$$ is $$Θ(n^{1/p})$$, so the cost of computing $$x^p$$ is $$O(\lg^k n)$$, which is on the order of the cost of multiplying two numbers $$\le n$$. Furthermore, note that we divide the result into $$n$$, so we can stop once the computation of $$x_i^{p-1}$$ exceeds $$n$$. So in that case, we can treat a loop iteration as if it were performing a constant number of arithmetic operations on numbers of order $$n$$, and so, like in the square root case, we pick up a factor of $$D(n)$$, where $$D(n)$$ is the run-time of dividing $$n$$ by some number $$\le n$$.</p> </section> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> Go and JS implementations are available on <a href="https://github.com/akalin/iroot">my GitHub</a>. <a href="#r1">↩</a></p> <p id="fn2"> Here, and in most of the article, we&rsquo;ll implicitly assume that $$n \gt 0$$ and $$p \gt 1$$. <a href="#r2">↩</a></p> </section> https://www.akalin.com/sampling-visible-sphere Sampling the Visible Sphere 2015-08-26T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <p><em>(Note: this article is a summary of <a href="http://ompf2.com/viewtopic.php?f=3&t=1914">this thread on ompf2</a>.) </em></p> <p>The usual method for sampling a sphere from a point outside the sphere is to calculate the angle of the cone of the visible portion and uniformly sample within that cone, as described in <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.6561">Shirley/Wang</a>.</p> <p>However, one detail that is glossed over is that you still need to map from the sampled direction to the point on the sphere. The usual method is to simply generate a ray from the point and the sampled direction and intersect it with the sphere. However, this intersection test may fail due to floating point inaccuracies (e.g., if the sphere is small and the distance from the point is large).</p> <p>I've found a couple of existing ways to deal with this. As described in the pbrt book, pbrt simply assumes that the ray just grazes the sphere if the intersection fails, and then projects the center of the sphere onto the ray (<a href="https://github.com/mmp/pbrt-v2/blob/master/src/shapes/sphere.cpp#L249">code here</a>). mitsuba moves the origin of the ray closer to the sphere (in fact, from within it) before doing the test (falling back to projecting the center onto the ray if that still fails) (<a href="https://www.mitsuba-renderer.org/repos/mitsuba/files/aeb7f95b37111187cc2ddf21cfffeff118bc52d2/src/shapes/sphere.cpp#L287">code here</a>).</p> <p>However, this seems inelegant. I've come up with a better way, which involves converting the sampled cone angle $$θ$$ (as measured from the segment connecting the point to the sphere center) into an angle $$α$$ from the inside of the sphere, and then simply using $$α$$ and the sampled polar angle $$\varphi$$ onto the sphere. This turns out to be simple, and in my unscientific tests a bit faster.</p> <p>Here's a crude diagram showing the geometry:<p> <img src="/sampling-visible-sphere-files/diagram.png" alt="Diagram for derivation of cos &alpha;" /> <p>You can see that $L = d \cos θ - \sqrt{r^2 - d^2 \sin^2 θ}$ and also by the law of cosines, $L^2 = d^2 + r^2 - 2 d r \cos α\text{.}$ We're actually more interested in $$\cos α$$, so solving for that we get $\cos α = \frac{d}{r} \sin^2 θ + \cos θ \sqrt{1 - \frac{d^2}{r^2} \sin^2 θ}\text{.}$ An alternate form, which may be easier to analyze, recalling that $$\sin θ_{\max} = r/d$$, is $\cos α = \frac{\sin^2 θ}{\sin θ_{\max}} + \cos θ \sqrt{1 - \frac{\sin^2 θ}{\sin^2 θ_{\max}}}\text{.}$ </p> <div class="p">So sampling pseudocode would look like: <pre class="code-container"><code class="language-c++">(cos θ, φ) = uniformSampleCone(rng, cos θmax) D = 1 - d² sin² θ / r² if D ≤ 0 { cos α = sin θmax } else { cos α = (d/r) sin² θ + cos θ √D } ω = sphericalDirection(cos α, φ) pSurface = C + r ω</code></pre> I haven't done any analysis yet on the most robust way [in the floating-point sense] to do the calculations above.)</div> <p>There's no backfacing since we clamp $$\cos α$$ to $$\sin θ_{\max}$$, which is analogous to the case when the ray from $$P$$ misses the sphere.</p> <p>Note that one cannot just compute $$α_{\max}$$ and uniformly sample the cone from inside the sphere, as that doesn't produce the same distribution over the visible region as sampling the cone from outside the sphere. To preserve correctness, you would have to use the (uniform) PDF over the surface area of the visible portion of the sphere, but you would have to then convert that to a PDF with respect to projected solid angle from $$P$$, which is suboptimal to just doing the sampling with respect to projected solid angle from $$P$$ as described above.</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> https://www.akalin.com/computing-isqrt Computing the Integer Square Root 2014-12-09T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> KaTeXMacros = { "\\isqrt": "\\operatorname{isqrt}", "\\Bits": "\\operatorname{Bits}", "\\Err": "\\operatorname{Err}", "\\NewtonSqrt": "\\mathrm{N{\\small EWTON}\\text{-}I{\\small SQRT}}", }; </script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn.js"></script> <script src="https://cdn.rawgit.com/akalin/jsbn/v1.4/jsbn2.js"></script> <section> <header> <h2>1. The algorithm</h2> </header> <p>Today I&rsquo;m going to talk about a fast algorithm to compute the <em><a href="https://en.wikipedia.org/wiki/Integer_square_root">integer square root</a></em> of a non-negative integer $$n$$, $$\isqrt(n) = \lfloor \sqrt{n} \rfloor$$, or in words, the greatest integer whose square is less than or equal to $$n$$.<sup><a href="#fn1" id="r1"></a></sup> Most sources that describe the algorithm take it for granted that it is correct and fast. This is far from obvious! So I will prove both correctness and speed below.</p> <p>One simple fact is that $$\isqrt(n) \le n/2$$, so a straightforward algorithm is just to test every non-negative integer up to $$n/2$$. This takes $$O(n)$$ arithmetic operations, which is bad since it&rsquo;s exponential in the <em>size</em> of the input. That is, letting $$\Bits(n)$$ be the number of bits required to store $$n$$ and letting $$\lg n$$ be the base-$$2$$ logarithm of $$n$$, $$\Bits(n) = O(\lg n)$$, and thus this algorithm takes $$O(2^{\Bits(n)})$$ arithmetic operations.</p> <p>We can do better by doing binary search; start with the range $$[0, n/2]$$ and adjust it based on comparing the square of an integer in the middle of the range to $$n$$. This takes $$O(\lg n) = O(\Bits(n))$$ arithmetic operations.</p> <div class="p">However, the algorithm below is even faster:<sup><a href="#fn2" id="r2"></a></sup> <ol> <li>If $$n = 0$$, return $$0$$.</li> <li>Otherwise, set $$i$$ to $$0$$ and set $$x_0$$ to $$2^{\lceil \Bits(n) / 2\rceil}$$.</li> <li>Repeat: <ol> <li>Set $$x_{i+1}$$ to $$\lfloor (x_i + \lfloor n/x_i \rfloor) / 2 \rfloor$$.</li> <li>If $$x_{i+1} \ge x_i$$, return $$x_i$$. Otherwise, increment $$i$$.</li> </ol> </li> </ol> </div> <div class="p">Call this algorithm $$\NewtonSqrt$$, since it&rsquo;s based on <a href="https://en.wikipedia.org/wiki/Newton%27s_method">Newton&rsquo;s method</a>. It&rsquo;s not obvious, but this algorithm returns $$\isqrt(n)$$ using only $$O(\lg \lg n) = O(\lg(\Bits(n)))$$ arithmetic operations, as we will prove below. But first, here&rsquo;s an implementation of the algorithm in Javascript:<sup><a href="#fn3" id="r3"></a></sup> <script> // isqrt returns the greatest number x such that x^2 <= n. The type of // n must behave like BigInteger (e.g., // https://github.com/akalin/jsbn ), and n must be non-negative. // // // Example (open up the JS console on this page and type): // // isqrt(new BigInteger("64")).toString() function isqrt(n) { var s = n.signum(); if (s < 0) { throw new Error('negative radicand'); } if (s == 0) { return n; } // x = 2^ceil(Bits(n)/2) var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2)); while (true) { // y = floor((x + floor(n/x))/2) var y = x.add(n.divide(x)).shiftRight(1); if (y.compareTo(x) >= 0) { return x; } x = y; } } </script> <pre class="code-container"><code class="language-javascript">// isqrt returns the greatest number x such that x^2 &lt;= n. The type of // n must behave like BigInteger (e.g., // https://github.com/akalin/jsbn ), and n must be non-negative. // // // Example (open up the JS console on this page and type): // // isqrt(new BigInteger(&quot;64&quot;)).toString() function isqrt(n) { var s = n.signum(); if (s &lt; 0) { throw new Error(&#39;negative radicand&#39;); } if (s == 0) { return n; } // x = 2^ceil(Bits(n)/2) var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2)); while (true) { // y = floor((x + floor(n/x))/2) var y = x.add(n.divide(x)).shiftRight(1); if (y.compareTo(x) &gt;= 0) { return x; } x = y; } }</code></pre> </div> </section> <section> <header> <h2>2. Correctness</h2> </header> <p>The core of the algorithm is the iteration rule: $x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i} \rfloor}{2} \right\rfloor$ where the <a href="https://en.wikipedia.org/wiki/Floor_and_ceiling_functions">floor functions</a> are there only because we&rsquo;re using integer division. Define an integer-valued function $$f(x)$$ for the right side. Using basic properties of the floor function, you can show that you can remove the inner floor: $f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor$ which makes it a bit easier to analyze. Also, the properties of $$f(x)$$ are closely related to its equivalent real-valued function: $g(x) = \frac{1}{2} (x + n/x)\text{.}$</p> <p>For starters, again using basic properties of the floor function, you can show that $$f(x) \le g(x)$$, and for any integer $$m$$, $$m \le f(x)$$ if and only if $$m \le g(x)$$.</p> <p>Finally, let&rsquo;s give a name to our desired output: let $$s = \isqrt(n) = \lfloor \sqrt{n} \rfloor$$.<sup><a href="#fn4" id="r4"></a></sup></p> <div class="p">Intuitively, $$f(x)$$ and $$g(x)$$ &ldquo;average out&rdquo; however far away their input $$x$$ is from $$\sqrt{n}$$. Conveniently, this &ldquo;average&rdquo; is never an undereestimate: <div class="theorem">(<span class="theorem-name">Lemma 1</span>.) For $$x \gt 0$$, $$f(x) \ge s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> By the basic properties of $$f(x)$$ and $$g(x)$$ above, it suffices to show that $$g(x) \ge s$$. $$g'(x) = (1 - n/x^2)/2$$ and $$g''(x) = n/x^3$$. Therefore, $$g(x)$$ is concave-up for $$x \gt 0$$; in particular, its single positive extremum at $$x = \sqrt{n}$$ is a minimum. But $$g(\sqrt{n}) = \sqrt{n} \ge s$$. &#x220e;</p> </div> (You can also prove Lemma 1 without calculus; show that $$g(x) \ge s$$ if and only if $$x^2 - 2sx + n \ge 0$$, which is true when $$s^2 \le n$$, which is true by definition.)</div> <div class="p">Furthermore, our initial estimate is always an overestimate: <div class="theorem">(<span class="theorem-name">Lemma 2</span>.) $$x_0 \gt s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> $$\Bits(n) = \lfloor \lg n \rfloor + 1 \gt \lg n$$. Therefore, \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &\ge 2^{\Bits(n) / 2} \\ &\gt 2^{\lg n / 2} \\ &= \sqrt{n} \\ &\ge s\text{.} \; \blacksquare \end{aligned} </p> </div> </div> <p>(Note that any number greater than $$s$$, say $$n$$ or $$\lceil n/2 \rceil$$, can be chosen for our initial guess without affecting correctness. However, the expression above is necessary to guarantee performance. Another possibility is $$2^{\lceil \lceil \lg n \rceil / 2 \rceil}$$, which has the advantage that if $$n$$ is an even power of $$2$$, then $$x_0$$ is immediately set to $$\sqrt{n}$$. However, this is usually not worth the cost of checking that $$n$$ is a power of $$2$$, as is required to compute $$\lceil \lg n \rceil$$.)</p> <div class="p">An easy consequence of Lemmas 1 and 2 is that the invariant $$x_i \ge s$$ holds. That lets us prove partial correctness of $$\NewtonSqrt$$: <div class="theorem">(<span class="theorem-name">Theorem 1</span>.) If $$\NewtonSqrt$$ terminates, it returns the value $$s$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Assume it terminates. If it terminates in step $$1$$, then we are done. Otherwise, it can only terminate in step $$3.2$$ where it returns $$x_i$$ such that $$x_{i+1} = f(x_i) \ge x_i$$. This implies that $$g(x_i) = (x_i + n/x_i) / 2 \ge x_i$$. Rearranging yields $$n \ge x_i^2$$ and combining with our invariant we get $$\sqrt{n} \ge x_i \ge s$$. But $$s + 1 \gt \sqrt{n}$$, so that forces $$x_i$$ to be $$s$$, and thus $$\NewtonSqrt$$ returns $$s$$ if it terminates. &#x220e;</p> </div> For total correctness we also need to show that $$\NewtonSqrt$$ terminates. But this is easy: <div class="theorem">(<span class="theorem-name">Theorem 2</span>.) $$\NewtonSqrt$$ terminates.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Assume it doesn&rsquo;t terminate. Then we have a strictly decreasing infinite sequence of integers $$\{ x_0, x_1, \dotsc \}$$. But this sequence is bounded below by $$s$$, so it cannot decrease indefinitely. This is a contradiction, so $$\NewtonSqrt$$ must terminate. &#x220e;</p> </div> </div> <p>We are done proving correctness, but you might wonder if the check $$x_{i+1} \ge x_i$$ in step $$3.2$$ is necessary. That is, can it be weakened to the check $$x_{i+1} = x_i$$? The answer is &ldquo;no&rdquo;; to see that, let $$k = n - s^2$$. Since $$n \lt (s+1)^2$$, $$k \lt 2s + 1$$. On the other hand, consider the inequality $$f(x_i) \gt x_i$$. Since that would cause the algorithm to terminate and return $$x_i$$, that implies that $$x_i = s$$. Therefore, that inequality is equivalent to $$f(s) \gt s$$, which is equivalent to $$f(s) \ge s + 1$$, which is equivalent to $$g(s) = (s + n/s) / 2 \ge s + 1$$. Rearranging yields $$n \ge s^2 + 2s$$. Substituting in $$n = s^2 + k$$, we get $$s^2 + k \ge s^2 + 2s$$, which is equivalent to $$k \ge 2s$$. But since $$k \lt 2s + 1$$, that forces $$k$$ to equal $$2s$$. That is the maximum value $$k$$ can be, so therefore $$n$$ must be one less than a perfect square. Indeed, for such numbers, weakening the check would cause the algorithm to oscillate between $$s$$ and $$s + 1$$. For example, $$n = 99$$ would yield the sequence $$\{ 16, 11, 10, 9, 10, 9, \dotsc \}$$.</p> </section> <section> <header> <h2>3. Run-time</h2> </header> <p>We will show that $$\NewtonSqrt$$ takes $$O(\lg \lg n)$$ arithmetic operations. Since each loop iteration does only a fixed number of arithmetic operations (with the division of $$n$$ by $$x$$ being the most expensive), it suffices to show that our algorithm performs $$O(\lg \lg n)$$ loop iterations.</p> <p>It is well known that Newton&rsquo;s method <a href="https://en.wikipedia.org/wiki/Newton%27s_method#Proof_of_quadratic_convergence_for_Newton.27s_iterative_method">converges quadratically</a> sufficiently close to a simple root. We can&rsquo;t actually use this result directly, since it&rsquo;s not clear that the convergence properties of Newton&rsquo;s method are preserved when using integer operations, but we can do something similar.</p> <p>Define $$\Err(x) = x^2/n - 1$$ and let $$ϵ_i = \Err(x_i)$$. Intuitively, $$\Err(x)$$ is a conveniently-scaled measure of the error of $$x$$: it is less than $$1$$ for most of the values we care about and it bounded below for integers greater than our target $$s$$. Also, we will show that the $$ϵ_i$$ shrink quadratically. These facts will then let us show our bound for the iteration count.</p> <div class="p">First, let&rsquo;s prove our lower bound for $$ϵ_i$$: <div class="theorem">(<span class="theorem-name">Lemma 3</span>.) $$x_i \ge s + 1$$ if and only if $$ϵ_i \ge 1/n$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> $$n \lt (s + 1)^2$$, so $$n + 1 \le (s + 1)^2$$, and therefore $$(s + 1)^2/n - 1 \ge 1/n$$. But the expression on the left side is just $$\Err(s + 1)$$. $$x_i \ge s + 1$$ if and only if $$ϵ_i \ge \Err(s + 1)$$, so the result immediately follows. &#x220e;</p> </div> Then we can use that to show that the $$ϵ_i$$ shrink quadratically: <div class="theorem">(<span class="theorem-name">Lemma 4</span>.) If $$x_i \ge s + 1$$, then $$ϵ_{i+1} \lt (ϵ_i/2)^2$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> $$ϵ_{i+1}$$ is just $$\Err(f(x_i)) \le \Err(g(x_i))$$, so it suffices to show that $$\Err(g(x_i)) \lt (ϵ_i/2)^2$$. Inverting $$\Err(x)$$, we get that $$x_i = \sqrt{(ϵ_i + 1) \cdot n}$$. Expressing $$g(x_i)$$ in terms of $$ϵ_i$$ we get $g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{ϵ_i + 2}{\sqrt{ϵ_i + 1}} \right)$ and $\Err(g(x_i)) = \frac{(ϵ_i/2)^2}{ϵ_i+1}\text{.}$ Therefore, it suffices to show that the denominator is greater than $$1$$. But $$x_i \ge s + 1$$ implies $$ϵ_i \gt 0$$ by Lemma 3, so that follows immediately and the result is proved. &#x220e;</p> </div> Then let&rsquo;s bound our initial values: <div class="theorem">(<span class="theorem-name">Lemma 5</span>.) $$x_0 \le 2s$$, $$ϵ_0 \le 3$$, and $$ϵ_1 \le 1$$.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Let&rsquo;s start with $$x_0$$: \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + 1)/2 \rfloor} \\ &= 2^{\lfloor \lg n / 2 \rfloor + 1} \\ &= 2 \cdot 2^{\lfloor \lg n / 2 \rfloor}\text{.} \end{aligned} Then $$x_0/2 = 2^{\lfloor \lg n / 2 \rfloor} \le 2^{\lg n / 2} = \sqrt{n}$$. Since $$x_0/2$$ is an integer, $$x_0/2 \le \sqrt{n}$$ if and only if $$x_0/2 \le \lfloor \sqrt{n} \rfloor = s$$. Therefore, $$x_0 \le 2s$$.</p> <p>As for $$ϵ_0$$: \begin{aligned} ϵ_0 &= \Err(x_0) \\ &\le \Err(2s) \\ &= (2s)^2/n - 1 \\ &= 4s^2/n - 1\text{.} \end{aligned} Since $$s^2 \le n$$, $$4s^2/n \le 4$$ and thus $$ϵ_0 \le 3$$.</p> <p>Finally, $$ϵ_1$$ is just $$\Err(f(x_0))$$. Using calculations from Lemma 4, \begin{aligned} ϵ_1 &\le \Err(g(x_0)) \\ &= (ϵ_0/2)^2/(ϵ_0 + 1) \\ &\le (3/2)^2/(3 + 1) \\ &= 9/16\text{.} \end{aligned} Therefore, $$ϵ_1 \le 1$$. &#x220e;</p> </div> </div> <div class="p">Finally, we can show our main result: <div class="theorem">(<span class="theorem-name">Theorem 3</span>.) $$\NewtonSqrt$$ performs $$O(\lg \lg n)$$ loop iterations.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> Let $$k$$ be the number of loop iterations performed when running the algorithm for $$n$$ (i.e., $$x_k \ge x_{k-1}$$) and assume $$k \ge 4$$. Then $$x_i \ge s + 1$$ for $$i \lt k - 1$$. Since $$ϵ_1 \le 1$$ by Lemma 5, $$ϵ_2 \le 1/2$$ and $$ϵ_i \le (ϵ_2)^{2^{i-2}}$$ for $$2 \le i \lt k - 1$$ by Lemma 4, then $$ϵ_{k-2} \le 2^{-2^{k-4}}$$. But $$1/n \le ϵ_{k-2}$$ by Lemma 3, so $$1/n \le 2^{-2^{k-4}}$$. Taking logs to bring down the $$k$$ yields $$k - 4 \le \lg \lg n$$. Then $$k \le \lg \lg n + 4$$, and thus $$k = O(\lg \lg n)$$. &#x220e;</p> </div> Note that in general, an arithmetic operation is not constant-time, and in fact has run-time $$\Omega(\lg n)$$. Since the most expensive arithmetic operation we do is division, we can say that $$\NewtonSqrt$$ has run-time that is both $$\Omega(\lg n)$$ and $$O(D(n) \cdot \lg \lg n)$$, where $$D(n)$$ is the run-time of dividing $$n$$ by some number $$\le n$$.<sup><a href="#fn5" id="r5"></a></sup></div> </section> <section> <header> <h2>4. The Initial Guess</h2> </header> <p>It&rsquo;s also useful to show that if the initial guess $$x_0$$ is bad, then the run-time degrades to $$Θ(\lg n)$$. We&rsquo;ll do this by defining the function $$\NewtonSqrt$$ except that it takes a function $$\mathrm{I{\small NITIAL}\text{-}G{\small UESS}}$$ that is called with $$n$$ and assigned to $$x_0$$ in step 1. Then, we can treat $$ϵ_0$$ as a function of $$n$$ and analyze how long $$ϵ_i$$ stays above $$1$$ to show that $$\NewtonSqrt$$ uses an initial guess such that $$ϵ_0(n) = Θ(1)$$, then Theorem 4 reduces to Theorem 3 in that case. However, if $$x_0$$ is chosen to be $$Θ(n)$$, e.g. the initial guess is just $$n$$ or $$n/k$$ for some $$k$$, then $$ϵ_0(n)$$ will also be $$Θ(n)$$, and so the run time will degrade to $$Θ(\lg n)$$. So having a good initial guess is important for the performance of $$\NewtonSqrt$$!</p> </section> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> Aside from the <a href="https://en.wikipedia.org/wiki/Integer_square_root">Wikipedia article</a>, the algorithm is described as Algorithm 9.2.11 in <a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime Numbers: A Computational Perspective</a>. <a href="#r1">↩</a></p> <p id="fn2"> Note that only integer operations are used, which makes this algorithm suitable for arbitrary-precision integers. <a href="#r2">↩</a></p> <p id="fn3"> Go and JS implementations are available on <a href="https://github.com/akalin/iroot">my GitHub</a>. <a href="#r3">↩</a></p> <p id="fn4"> Here, and in most of the article, we&rsquo;ll implicitly assume that $$n \gt 0$$. <a href="#r4">↩</a></p> <p id="fn5"> $$D(n)$$ is $$Θ(\lg^2 n)$$ using long division, but fancier division algorithms have better run-times. <a href="#r5">↩</a></p> </section> https://www.akalin.com/constant-time-mssb Finding the Most Significant Set Bit of a Word in Constant Time 2014-07-03T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> // Converts the given binary string (possibly with whitespace) to an integer. function b(s) { return parseInt(s.replace(/\s+/g, ''), 2); } // Converts the given integer to a binary string. function bs(x) { return x.toString(2); } </script> <section> <header> <h2>1. Overall method</h2> </header> <p>Finding the most significant set bit of a word (equivalently, finding the integer log base 2 of a word, or counting the leading zeros of a word) is a <a href="https://stackoverflow.com/questions/2589096/find-most-significant-bit-left-most-that-is-set-in-a-bit-array">well-studied problem</a>. <a href="http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious">Bit Twiddling Hacks</a> lists various methods, and <a href="https://en.wikipedia.org/wiki/Count_leading_zeros">Wikipedia</a> gives the CPU instructions that perform the operation directly.</p> <p>However, all of these methods are either specific to a certain word size or take more than constant time (in terms of number of word operations). That raises the question of whether there <em>is</em> a method that takes constant time&mdash;surprisingly, the answer is &ldquo;yes&rdquo;!<sup><a href="#fn1" id="r1"></a></sup></p> <p>The key idea is to split a word into $$\lceil \sqrt{w} \rceil$$ blocks of $$\lceil \sqrt{w} \rceil$$ bits (where $$w$$ is the number of bits in a word). One can then do certain operations on blocks &ldquo;in parallel&rdquo; by stuffing multiple blocks into a word and then performing a single word operation.</p> <p>Furthermore, since the block size and block count are the same, one can transform the bits of a block into the blocks of a word and vice versa in various ways using only a constant number of word operations.</p> <p>In particular, this lets us split up the problem into two parts: finding the most significant set (i.e., non-zero) block, and finding the most significant set bit within that block. It then turns out that both parts can be done in constant time.</p> <p>For concreteness, we'll use 32-bit words when explaining the method below.<sup><a href="#fn2" id="r2"></a></sup></p> </section> <section> <header> <h2>2. Finding the most significant set bit of a block</h2> </header> <p>First, let's consider the sub-problem of finding the most significant set bit of a block. In fact, let's give ourselves a bit of room and consider only blocks with the high bit cleared for now; we'll see why we need this extra bit of room soon.</p> <div class="p">For 32 bits, the block size is 6 bits, so with the high bit of a block cleared we're left with 5 bits. Let's look at a naive implementation: <script> function mssb5_naive(x) { var c = 0; for (var i = 0; i < 5 && x >= (1 << i); ++i) { ++c; } return c - 1; } </script> <pre class="code-container"><code class="language-javascript">function mssb5_naive(x) { var c = 0; for (var i = 0; i &lt; 5 &amp;&amp; x &gt;= (1 &lt;&lt; i); ++i) { ++c; } return c - 1; }</code></pre> In the above, we consider successive powers of 2 until we find one greater than our given number. Then the answer is simply one less than that power.</div> <p>Notice that the loop has at most 5 iterations; this lines up nicely with the 5 full blocks in an entire 32-bit word. (This is why we saved our extra bit of room.) If we can copy our block to the higher 4 blocks and then use word operations to operate on those blocks in parallel, then we're good.</p> <p>For our example, let $$x = 5 = 00101$$. Duplicating $$x$$ among all the blocks can easily be done by multiplying by the appropriate constant:</p> <style> pre.binary-example { border: 1px solid #073642; /* solarized base02 */ background-color: #fdf6e3; /* solarized base3 */ color: #586e75; padding: 1em; } pre.binary-example span.dont-care { color: #a3b1b1; } pre.binary-example span.last-operand { text-decoration: underline; } </style> <pre class="binary-example"> <span class="first-five" >00 000000 000000 000000 000000 000101</span> * <span class="last-operand low-bit-full" >00 000001 000001 000001 000001 000001</span> <span class="first-five" >00 000000 000000 000000 000000 000101</span> <span class="first-five" >00 000000 000000 000000 000101</span> <span class="first-five" >00 000000 000000 000101</span> <span class="first-five" >00 000000 000101</span> <span class="first-five last-operand" >00 000101 </span> <span class="lower-bits-full" >00 000101 000101 000101 000101 000101</span> </pre> <p>In fact, this is a simple use of a more general tool. If $$x$$ and $$y$$ are expressed in binary, then multiplying $$x$$ by $$y$$ can be seen as taking the index of each set bit in $$y$$, creating a copy of $$x$$ shifted by each such index, and then adding up all the shifted copies. This case is just taking $$y$$ to be the constant where the $$\{ 0, 6, 12, 18, 24 \}$$th bits are set.</p> <p>The first operation we need to parallelize is the comparisons to the powers of 2. This can be converted to a word operation by noting the comparison $$x \geq y$$ can be performed by checking the sign of $$x - y$$, and that checking the sign can be done by setting the unused high bit of $$x$$ before doing the comparison, and then checking to see if that high bit was left intact (i.e., not borrowed from). So we pre-compute a constant with the $$n$$th block containing the $$n$$th power of 2, then subtract that from our block containing the duplicated blocks with the high bit set. Finally, we can then mask off the unneeded lower bits:</p> <pre class="binary-example"> <span class="lower-bits-full" >00 000101 000101 000101 000101 000101</span> | <span class="last-operand high-bit-full" >00 100000 100000 100000 100000 100000</span> <span class="full" >00 100101 100101 100101 100101 100101</span> - <span class="last-operand lower-bits-full" >00 010000 001000 000100 000010 000001</span> <span class="high-bit-full" >00 010101 011101 100001 100011 100100</span> & <span class="last-operand high-bit-full" >00 100000 100000 100000 100000 100000</span> <span class="high-bit-full" >00 000000 000000 100000 100000 100000</span> </pre> <p>We're left with a word where all bits except for the high bits of a block are zero. We still need to sum up those bits, but since they're a block apart, that can be done by multiplication with a constant to line up the bits in a single column. The constant turns out to have the $$\{ 0, 6, 12, 18, 24 \}$$th bits set, with the answer being in the top three bits:<sup><a href="#fn3" id="r3"></a></sup></p> <pre class="binary-example"> <span class="high-bit-full" >00 000000 000000 100000 100000 100000</span> * <span class="last-operand low-bit-full" >00 000001 000001 000001 000001 000001</span> <span class="high-bit-full" >00 000000 000000 100000 100000 100000</span> <span class="high-bit-full" >00 000000 100000 100000 100000</span> <span class="high-bit-full" >00 100000 100000 100000</span> <span class="high-bit-full" >00 100000 100000</span> <span class="high-bit-full last-operand" >00 100000 </span> <span class="top-three" >01 100001 100001 100001 000000 100000</span> MSSB5(x) = 011 - 1 = 2 </pre> <div class="p">We can now write <code>mssb5()</code> using a constant number of word operations:<sup><a href="#fn4" id="r4"></a></sup> <script> function mssb5(x) { // Duplicate x among all the blocks. x *= b('00 000001 000001 000001 000001 000001'); // Compare to successive powers of 2 in parallel. x |= b('00 100000 100000 100000 100000 100000'); x -= b('00 010000 001000 000100 000010 000001'); x &= b('00 100000 100000 100000 100000 100000'); // Sum up the bits into the high 3 bits. x *= b('00 000001 000001 000001 000001 000001'); // Shift down and subtract 1 to get the answer. return (x >>> 29) - 1; } </script> <pre class="code-container"><code class="language-javascript">function mssb5(x) { // Duplicate x among all the blocks. x *= b(&#39;00 000001 000001 000001 000001 000001&#39;); // Compare to successive powers of 2 in parallel. x |= b(&#39;00 100000 100000 100000 100000 100000&#39;); x -= b(&#39;00 010000 001000 000100 000010 000001&#39;); x &amp;= b(&#39;00 100000 100000 100000 100000 100000&#39;); // Sum up the bits into the high 3 bits. x *= b(&#39;00 000001 000001 000001 000001 000001&#39;); // Shift down and subtract 1 to get the answer. return (x &gt;&gt;&gt; 29) - 1; }</code></pre> Then we can then find the most significant set bit of a full block by simply testing the high bit first: <script> function mssb6(x) { return (x & b('100000')) ? 5 : mssb5(x); } </script> <pre class="code-container"><code class="language-javascript">function mssb6(x) { return (x &amp; b(&#39;100000&#39;)) ? 5 : mssb5(x); }</code></pre> </div> </section> <section> <header> <h2>3. Finding the most significant set block</h2> </header> <p>Let's now consider the sub-problem of finding the most significant set block of a word (ignoring the partial one). Similar to the above, we'd like to be able to use subtraction to compare all the blocks to zero at the same time. However, that requires the high bit of each block to be unused. That's easy enough to handle: just separate the high bit and the lower bits of each block, test the lower bits, and then bitwise-or the results together:</p> <pre class="binary-example"> x = <span class="full" >00 100000 000000 010000 000000 000001</span> & C = <span class="last-operand high-bit-full" >00 100000 100000 100000 100000 100000</span> y1 = <span class="high-bit-full" >00 100000 000000 000000 000000 100000</span> x = <span class="full" >00 100000 000000 010000 000000 000001</span> & ~C = <span class="last-operand lower-bits-full" >00 011111 011111 011111 011111 011111</span> t1 = <span class="lower-bits-full" >00 000000 000000 010000 000000 000001</span> C = <span class="full" >00 100000 100000 100000 100000 100000</span> - t1 = <span class="last-operand lower-bits-full" >00 000000 000000 010000 000000 000001</span> t2 = <span class="high-bit-full" >00 100000 100000 010000 100000 011111</span> ~t2 = <span class="high-bit-full" >11 011111 011111 101111 011111 100000</span> & C = <span class="last-operand high-bit-full" >00 100000 100000 100000 100000 100000</span> y2 = <span class="high-bit-full" >00 000000 000000 100000 000000 100000</span> y1 = <span class="high-bit-full" >00 100000 000000 000000 000000 100000</span> | y2 = <span class="last-operand high-bit-full" >00 000000 000000 100000 000000 100000</span> y = <span class="high-bit-full" >00 100000 000000 100000 000000 100000</span> </pre> <p>The result is stored in the high bits of each block. If we could pack all the bits together, we'd then be able to use <code>mssb5()</code>. This is similar to where we had to add all the bits together in part 2, but we need a constant to stagger the bits instead of lining them up. The constant to put the answer in the high bits turns out to have the $$\{ 7, 12, 17, 22, 27 \}$$th bits set:</p> <pre class="binary-example"> y >>> 5 = <span class="low-bit-full" >00 000001 000000 000001 000000 000001</span> * <span class="last-operand every-fifth-from-seventh" >00 001000 010000 100001 000010 000000</span> <span class="low-bit-full" >10 000000 000010 000000 00001</span> <span class="low-bit-full" >00 000001 000000 000001</span> <span class="low-bit-full" >00 100000 000000 1</span> <span class="low-bit-full" >00 000000 01</span> <span class="last-operand low-bit-full" >00 001 </span> = <span class="top-five" >10 101001 010010 100001 000010 000000</span> </pre> <p>This yields the answer <code>10101</code>, where the $$i$$th bit is set exactly when the $$i$$th block of $$x$$ is non-zero. Therefore, the most significant block is then simply <code>mssb5(10101)</code>.</p> </section> <section> <header> <h2>4. Putting it all together</h2> </header> <div class="p">With the building blocks above, we can now implement the algorithm for finding the most significant set bit in the full blocks of a word:<sup><a href="#fn5" id="r5"></a></sup> <script> function mssb30(x) { var C = b('00 100000 100000 100000 100000 100000'); // Check whether the high bit of each block is set. var y1 = x & C; // Check whether the lower bits of each block is set. var y2 = ~(C - (x & ~C)) & C; var y = y1 | y2; // Shift the result bits down to the lowest 5 bits. var z = ((y >>> 5) * b('0000 10000 10000 10000 10000 10000000')) >>> 27; // Compute the bit index of the most significant set block. var b1 = 6 * mssb5(z); // Compute the most significant set bit inside the most significant // set block. var b2 = mssb6((x >>> b1) & b('111111')); return b1 + b2; } </script> <pre class="code-container"><code class="language-javascript">function mssb30(x) { var C = b(&#39;00 100000 100000 100000 100000 100000&#39;); // Check whether the high bit of each block is set. var y1 = x &amp; C; // Check whether the lower bits of each block is set. var y2 = ~(C - (x &amp; ~C)) &amp; C; var y = y1 | y2; // Shift the result bits down to the lowest 5 bits. var z = ((y &gt;&gt;&gt; 5) * b(&#39;0000 10000 10000 10000 10000 10000000&#39;)) &gt;&gt;&gt; 27; // Compute the bit index of the most significant set block. var b1 = 6 * mssb5(z); // Compute the most significant set bit inside the most significant // set block. var b2 = mssb6((x &gt;&gt;&gt; b1) &amp; b(&#39;111111&#39;)); return b1 + b2; }</code></pre> And then it's simple enough to extend it to find the most significant set bit of a full word: <script> function mssb32(x) { // Check the high duplet and fall back to mssb30 if it's not set. var h = x >>> 30; return h ? (30 + mssb5(h)) : mssb30(x); } </script> <pre class="code-container"><code class="language-javascript">function mssb32(x) { // Check the high duplet and fall back to mssb30 if it&#39;s not set. var h = x &gt;&gt;&gt; 30; return h ? (30 + mssb5(h)) : mssb30(x); }</code></pre> So the code above shows that we can find the most significant set bit of a 32-bit word in a constant number of 32-bit word operations. It is easy enough to see how it can be adapted to yield a similar algorithm for a given arbitrary (but sufficiently large) word size, simply by pre-computing the various word-size-dependent constants.</div> <p>It is also easy to see why no one actually uses this method on real computers even in the absence of built-in instructions: it is much more complicated and almost certainly slower than existing methods for real word sizes! Also, the word-RAM model&mdash;where we assume all word operations take constant time&mdash;is useful only when the word size is fixed or narrowly bounded. When we allow word size to vary arbitrarily, the word-RAM model breaks down&mdash;for one, multiplication grows super-linearly with respect to word size! Alas, this method is doomed to remain a theoretical curiosity, albeit one that uses a few clever tricks.</p> <script> function highlightIndices(str, indices) { var highlightedStr = ''; var i = 0, j = 0; for (var k = 0; k < str.length; ++k) { var chStr = str[str.length - k - 1]; if (chStr == '0' || chStr == '1') { if (j < indices.length && i == indices[j]) { ++j; } else { chStr = '<span class="dont-care">' + chStr + '</span>'; } ++i; } highlightedStr = chStr + highlightedStr; } return highlightedStr; } function highlightElements(selector, indices) { var es = document.querySelectorAll(selector); for (var i = 0; i < es.length; ++i) { var e = es[i]; e.innerHTML = highlightIndices(e.textContent, indices); } } highlightElements('pre.binary-example span.first-five', [0, 1, 2, 3, 4]); highlightElements('pre.binary-example span.low-bit-full', [0, 6, 12, 18, 24]); highlightElements('pre.binary-example span.every-fifth-from-seventh', [7, 12, 17, 22, 27]); highlightElements('pre.binary-example span.lower-bits-full', [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28]); highlightElements('pre.binary-example span.high-bit-full', [5, 11, 17, 23, 29]); highlightElements('pre.binary-example span.full', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]); highlightElements('pre.binary-example span.top-three', [29, 30, 31]); highlightElements('pre.binary-example span.top-five', [27, 28, 29, 30, 31]); </script> </section> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> The constant-time method is detailed in the original papers for the <a href="https://en.wikipedia.org/wiki/Fusion_tree">fusion tree</a> data structure. <a href="http://dl.acm.org/citation.cfm?id=100217">The first paper</a> is unfortunately behind a paywall, but <a href="https://www.sciencedirect.com/science/article/pii/0022000093900404?np=y">the second paper</a>, essentially a rehash of the first one, is freely downloadable.</p> <p>The method is also explained in <a href="http://courses.csail.mit.edu/6.851/spring12/lectures/L12.html">lecture 12</a> of Erik Demaine's <a href="http://courses.csail.mit.edu/6.851/spring12/">Advanced Data Structures</a> class, which is how I originally found out about it. <a href="#r1">↩</a></p> <p id="fn2"> Demaine uses 16-bit words, which factors nicely into 4 blocks of 4 bits, but it is instructive to see how the method deals with the word size not a perfect square. <a href="#r2">↩</a></p> <p id="fn3"> In this case, the partial 6th block has enough room to hold the answer but this may not be true in general. This can be remedied easily enough by shifting down the block high bits to the low bits before multiplying; the answer will then be in the last full block. <a href="#r3">↩</a></p> <p id="fn4"> <code>b(str)</code> just parses a number from its binary string representation. <a href="#r4">↩</a></p> <p id="fn5"> Try out this function (and the others on this page) by opening up the JS console on this page! <a href="#r5">↩</a></p> </section> https://www.akalin.com/primality-testing-polynomial-time-part-2 Primality Testing in Polynomial Time (Ⅱ) 2012-12-29T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/trial-division.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/euler-phi.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/multiplicative-order.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/primality-testing.js"></script> <p><em>(Note: this article isn't fully polished yet, but I thought it would be a shame to let it languish during my sabbatical. Happy new year!)</em></p> <section> <header> <h2>5. Strengthening the AKS theorem</h2> </header> <div class="p">It turns out the conditions of the AKS theorem are stronger than they appear; they themselves imply that $$n$$ is prime. To show this, we need the following theorem, which we'll state without proof: <div class="theorem"> (<span class="theorem-name">Lenstra's squarefree test</span>.) If $$a^n \equiv a \pmod{n}$$ for $$1 \le a \lt \ln^2 n$$, then $$n$$ is <a href="http://en.wikipedia.org/wiki/Squarefree">squarefree</a>.<sup><a href="#fn1" id="r1"></a></sup></div> We also need a couple of lemmas: <div class="theorem"> (<span class="theorem-name">Lemma 1</span>.) For $$0 \le a \lt n$$ and $$r \gt 1$$, let $(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}$ Then $(a + 1)^n = a + 1 \pmod{n}\text{.}$ </div> <div class="proof"> <p><span class="proof-name">Proof.</span> By definition, $$(X + a)^n - (X^n + a) = k(X) \cdot (X^r - 1) \pmod{n}$$. Treating both sides as a function of $$x$$ and substituting in $$1$$, we immediately get $$(1 + a)^n - (1 + a) = 0 \pmod{n}$$. &#x220e;</p> </div> <div class="theorem"> (<span class="theorem-name">Lemma 2</span>.) For $$n \gt 1$$, $$\lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n$$. </div> <div class="proof"> <p><span class="proof-name">Proof.</span> Since $$\ln n = \frac{\lg n}{\lg e}$$ and $$e \gt 2$$, $$\lg n \gt \ln n$$ for $$n \gt 1$$.</p> <p>Letting $$k = \lfloor \lg n \rfloor$$, $$\ln n \lt \frac{k + 1}{\lg e}$$, so if $$\frac{k + 1}{\lg e} \lt k$$, that implies that $$\ln n \lt k$$. Solving for $$k$$, we get that $$k \gt \frac{1}{\lg e - 1}$$, which is true when $$n \ge 8$$.</p> <p>So if $$n \ge 8$$, then $$\ln n \lt \lfloor \lg n \rfloor$$. Checking manually, we find that $$\ln n \lt \lfloor \lg n \rfloor$$ holds also for $$n \in \{ 2, 4, 5, 6, 7 \}$$, immediately implying the lemma for all $$n \gt 1$$ except $$3$$. But checking manually again, we find that the lemma holds for $$3$$ also. &#x220e;</p> </div> </div> <div class="p">Then, we can prove the strong version of the AKS theorem: <div class="theorem"> (<span class="theorem-name">AKS theorem, strong version</span>.) Let $$n \ge 2$$, $$r$$ be relatively prime to $$n$$ with $$o_r(n) \gt \lg^2 n$$, and $$M \gt \sqrt{φ(r)} \lg n$$. Furthermore, let $$n$$ have no prime factor less than $$M$$ and let $(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}$ for $$0 \le a \lt M$$. Then $$n$$ is prime.</div> <div class="proof"> <p><span class="proof-name">Proof.</span> From Lemma 1, we know that $$a^n = a \pmod{n}$$ for $$1 \le a \lt M$$. Since $$M \gt \lfloor \sqrt{t} \rfloor \lg n \gt \lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n$$ by Lemma 2, we can apply Lenstra's squarefree test to show that $$n$$ is squarefree. From the weak version of the AKS theorem, we also know that $$n$$ is a prime power. But since $$n$$ is squarefree, it must have distinct prime factors, which immediately implies that $$n$$ is prime. &#x220e;</p> </div> </div> </section> <section> <header> <h2>6. Finding a suitable $$r$$</h2> </header> <div class="p">The only remaining loose end is to show that there exists an $$r$$ with $$o_r(n) \gt \lg^2 n$$ and that it's small enough (i.e., polylog in $$n$$). The existence of $$r$$ is easy to see; we can simply pick the smallest $$r$$ that is co-prime to $$n$$ and greater than $$n^{\lg^2 n}$$. But that's obviously too big. We can do better: <div class="theorem"> <span class="theorem-name">(Upper bound for $$r$$.)</span> Let $$n \ge 2$$. Then there exists some $$r \le \max(3, \lceil \lg n \rceil^5)$$ such that $$o_r(n) \gt \lceil \lg n \rceil^2$$.<sup><a href="#fn2" id="r2"></a></sup> </div> <div class="proof"> <div class="p"><span class="proof-name">(Proof.)</span> Let's first prove the following lemma: <div class="theorem"> <span class="theorem-name">(Lemma 3.)</span> Let $$n \ge 9$$ and $$b = \lceil \lg n \rceil$$. Then for $$m \ge 1$$, there exists some $$r \le b^{2m + 1}$$ such that $$o_r(n) \gt b^m$$. </div> <div class="proof"> <p><span class="proof-name">(Proof.)</span> Let $N = n \cdot (n - 1) \cdot (n^2 - 1) \dotsm (n^{b^m} - 1)\text{.}$ Note that $$r$$ divides $$N$$ if and only if $$o_r(n) \le b^m$$. So it suffices to find some $$r$$ that does not divide $$N$$.</p> <p>We can see that: \begin{aligned} N &= n \cdot (n - 1) \cdot (n^2 - 1) \dotsm (n^{b^m} - 1) \\ &\lt n \cdot n \cdot n^2 \dotsm n^{b^m} \\ &= n^{1 + 1 + 2 + 3 + \dotsm + b^m} \\ &= n^{1 + b^m (b^m + 1) / 2} \\ &= n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1}\text{.} \end{aligned} Furthermore, we can upper-bound the exponent of $$n$$: \begin{aligned} b^{2m} &\gt \frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1 \\ \frac{1}{2} b^{2m} - \frac{1}{2} b^m - 1 &\gt 0 \\ b^{2m} - b^m - 2 &\gt 0 \\ (b^m - 2) \cdot (b^m + 1) &\gt 0\text{.} \end{aligned} The last statement holds when $$b^m \gt 2$$, which is always since $$b \ge 4$$ and $$m \ge 1$$.</p> <p>Applying the upper bound, \begin{aligned} N &\lt n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1} \\ &\lt n^{b^{2m}} \\ &\le 2^{b^{2m + 1}}\text{.} \end{aligned} </p> <div class="p">We can then use the following theorem, which we'll state without proof: <div class="theorem"> <span class="theorem-name">(<a href="http://en.wikipedia.org/wiki/Primorial">Primorial</a> lower bound.)</span> For $$x \ge 31$$, the product of primes $$\le x$$ exceeds $$2^x$$.<sup><a href="#fn3" id="r3"></a></sup> That is, $x\# = \prod_{p \le x\text{, }p\text{ is prime}} p \gt 2^x\text{.}$ </div> <p>Since $$b \ge 4$$ and $$m \ge 1$$, $$b^{2m + 1} \ge 31$$, and so $$2^{b^{2m + 1}} \lt (b^{2m + 1})\#$$. Therefore, $N \lt 2^{b^{2m + 1}} \lt (b^{2m + 1})\#\text{.}$ But that implies that there is some prime number $$p_0 \le b^{2m + 1}$$ that does not divide $$N$$; if they all did, then $$N$$ would be at least their product $$(b^{2m + 1})\#$$, contradicting the inequality above. Therefore, $$o_{p_0}(n) \gt b^m$$. &#x220e;</p> </div> </div> </div> <p>We can then prove our theorem: for $$n \ge 9$$, apply Lemma 3 with $$m = 2$$. Here are explicit values for the rest: for $$n = 2$$, $$r = 3$$; $$n = 3$$, $$r = 7$$; $$n \in \{ 4, 6, 7, 8\}$$, $$r = 11$$; and for $$n = 5$$, $$r = 17$$. &#x220e;</p> </div> </div> <div class="p">Also, it turns out that about half the time, we can do better. We'll state this theorem without proof: <div class="theorem"><span class="theorem-name">(Tight upper bound for some $$r$$.)</span> Let $$n \equiv \pm 3 \pmod{8}$$. Then there exists some $$r \lt 8 \lceil \lg n \rceil^2$$ such that $$o_r(n) \gt \lceil \lg n \rceil^2$$.<sup><a href="#fn4" id="r4"></a></sup></div> </div> </section> <section> <header> <h2>7. The AKS algorithm (simple version)</h2> </header> <div class="p">Without further ado, here is a simple version of the AKS algorithm, given $$n \ge 2$$: <ol> <li>Starting from $$\lceil \lg n \rceil^2 + 2$$, find an $$r$$ such that $$\gcd(r, n) = 1$$ and $$o_r(n) \gt \lceil \lg n \rceil^2$$.</li> <li>Compute $$M = \lfloor \sqrt{r - 1} \rfloor \lceil \lg n \rceil + 1$$.</li> <li>Search for a prime factor of $$n$$ less than $$M$$. If one is found, return &ldquo;composite&rdquo;. If none are found and $$M \ge \lfloor \sqrt{n} \rfloor$$, return &ldquo;prime&rdquo;.</li> <li>For each $$1 \le a \lt M$$, compute $$(X + a)^n$$, reducing coefficients mod $$n$$ and powers mod $$r$$. If the result is not equal to $$X^{n\text{ mod }r} + a$$, return &ldquo;composite&rdquo;.</li> <li>Otherwise, return &ldquo;prime&rdquo;.</li> </ol> </div> <p>As we've showed in the previous section, there always exists an $$r$$ such that $$o_r(n) \gt \lceil \lg n \rceil^2$$, so step 1 will terminate. All other steps are bounded, so the entire algorithm will always terminate.</p> <p>In step 2, since $$φ(r) \le r - 1$$, the value of $$M$$ that we compute is always greater than $$\sqrt{φ(r)} \lceil \lg n \rceil$$. Step 4 checks if $$(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}$$ holds. Therefore, By the strong AKS theorem, if the algorithm returns &ldquo;prime&rdquo;, then $$n$$ is prime. Furthermore, by the weak version of Fermat's little theorem for polynomials, if the algorithm returns &ldquo;composite&rdquo;, then $$n$$ is composite.</p> <p>Since the algorithm always terminates and it returns the correct answer when it terminates, it is <a href="http://en.wikipedia.org/wiki/Total_correctness">totally correct</a>.</p> <p>As shown in the previous section, we have to test $$O(\lg^5 n)$$ values to find a suitable $$r$$. Assuming a straightforward algorithm to compute the multiplicative order that bails out once $$\lfloor \lg n \rfloor^2$$ is reached, and assuming we use the division-based <a href="http://en.wikipedia.org/wiki/Euclidean_algorithm">Euclidean algorithm</a> for computing the greatest common divisor, testing each value takes $$O(\lg^2 n)$$ multiplies and $$O(\lg r) = O(\lg \lg n)$$ divisions of $$O(\lg r)$$-bit numbers. Let $$M(b)$$ be the cost to multiply two $$b$$-bit numbers. The complexity of division is asymptotically the same as multiplication, so the total cost of step 1 is $$O(\lg^5 n \cdot (\lg^2 n + \lg \lg n) \cdot M(\lg \lg n)) = O(\lg^7 n \cdot M(\lg \lg n))$$, assuming $$M(O(b)) = O(M(b))$$.</p> <p>Step 2 involves one square root, one multiplication, and one increment, all involving $$O(\lg \lg n)$$-bit numbers. The complexity of taking the square root is asymptotically the same as multiplication, so the total cost of step 2 is $$O(M(\lg \lg n))$$.</p> <p>Step 3 takes a square root and tests $$M = O(\lg^{7/2} n)$$ numbers, and each test involves dividing two $$O(\lg M)$$-bit numbers, so the total cost of step 3 is $$O(\lg^{7/2} n \cdot M(\lg \lg n))$$.</p> <p>Steps 4 and 5 test $$O(\lg^{7/2} n)$$ polynomials. Testing each polynomial involves exponentiating it by $$n$$, reducing power mod $$r$$ and coefficients mod $$n$$ at each step, which requires $$O(\lg n)$$ multiplications of polynomials with $$O(r)$$ coefficients each of size $$O(\lg n)$$. The cost of multiplying two polynomials with $$s$$ coefficients of size $$b$$ is $$M(s) \cdot M(b)$$, so the total cost of steps 4 and 5 is $$O(\lg^{9/2} n \cdot M(\lg^5 n \cdot \lg \lg n))$$, assuming $$M(a) \cdot M(b) = M(a \cdot b)$$.</p> <p>If <a href="http://en.wikipedia.org/wiki/Multiplication_algorithm#Long_multiplication">long multiplication</a> is used, then it costs $$M(b) = b^2$$, which gives a total cost of $$O(\lg^{29/2} n \cdot \lg^2 \lg n) = O(\lg^{15} n)$$ for the whole algorithm. <a href="http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm">More complicated multiplication methods</a> cost only $$M(b) = b \lg b$$, which gives a total cost of $$O(\lg^{10} n)$$ for the whole algorithm. Either way, the AKS primality test is shown to be implementable in polynomial time.</p> <div class="p">Below is step 1 implemented in Javascript; however, here we bound $$r$$ explicitly to be able to detect bugs quickly.<sup><a href="#fn5" id="r5"></a></sup> <pre class="code-container"><code class="language-javascript">// Returns an upper bound for r such that o_r(n) > ceil(lg(n))^2 that // is polylog in n. function calculateAKSModulusUpperBound(n) { n = SNat.cast(n); var ceilLgN = new SNat(n.ceilLg()); var rUpperBound = ceilLgN.pow(5).max(3); var nMod8 = n.mod(8); if (nMod8.eq(3) || nMod8.eq(5)) { rUpperBound = rUpperBound.min(ceilLgN.pow(2).times(8)); } return rUpperBound; } // Returns the least r such that o_r(n) > ceil(lg(n))^2 >= ceil(lg(n)^2). function calculateAKSModulus(n, multiplicativeOrderCalculator) { n = SNat.cast(n); multiplicativeOrderCalculator = multiplicativeOrderCalculator || calculateMultiplicativeOrderCRT; var ceilLgN = new SNat(n.ceilLg()); var ceilLgNSq = ceilLgN.pow(2); var rLowerBound = ceilLgNSq.plus(2); var rUpperBound = calculateAKSModulusUpperBound(n); for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) { if (n.gcd(r).ne(1)) { continue; } var o = multiplicativeOrderCalculator(n, r); if (o.gt(ceilLgNSq)) { return r; } } throw new Error('Could not find AKS modulus'); }</code></pre> </div> <div class="p">Here is step 2 implemented in Javascript: <pre class="code-container"><code class="language-javascript">// Returns floor(sqrt(r-1)) * ceil(lg(n)) + 1 > floor(sqrt(Phi(r))) * lg(n). function calculateAKSUpperBoundSimple(n, r) { n = SNat.cast(n); r = SNat.cast(r); // Use r - 1 instead of calculating Phi(r). return r.minus(1).floorRoot(2).times(n.ceilLg()).plus(1); }</code></pre> </div> <div class="p">Here is part of step 3 implemented in Javascript, along with the comments for the functions used in trial division: <pre class="code-container"><code class="language-javascript">// Given a number n, a generator function getNextDivisor, and a // processing function processPrimeFactor, factors n using the // divisors returned by genNextDivisor and passes each prime factor // with its multiplicity to processPrimeFactor. // // getNextDivisor is passed the current unfactorized part of n and it // should return the next divisor to try, or null if there are no more // divisors to generate (although processPrimeFactor may still be // called). processPrimeFactor is called with each non-trivial prime // factor and its multiplicity. If it returns a false value, it won't // be called anymore. function trialDivide(n, getNextDivisor, processPrimeFactor) { ... } // Returns a generator that generates primes up to 7, then odd numbers // up to floor(sqrt(n)), using a mod-30 wheel to eliminate odd numbers // that are known composite (roughly half). function makeMod30WheelDivisorGenerator() { ... } // Returns the first factor of n &lt; m from generator, or null if there // is no such factor. function getFirstFactorBelow(n, M, generator) { n = SNat.cast(n); M = SNat.cast(M); generator = generator || makeMod30WheelDivisorGenerator(); var boundedGenerator = function(n) { var d = generator(n); return (d && d.lt(M)) ? d : null; }; var factor = null; trialDivide(n, boundedGenerator, function(p, k) { if (p.lt(M.min(n))) { factor = p; } return false; }); return factor; }</code></pre> </div> <div class="p">Below is a function that ties steps 1 to 3 together; it is useful for testing purposes to separate it from the other steps. (Actually, we use a different function to compute $$M$$ which computes $$φ(r)$$ instead of using $$r - 1$$ so that we always have the tightest bound possible for $$M$$.) <pre class="code-container"><code class="language-javascript">// The getAKSParameters* functions below return a parameters object // with the following fields: // // n: the number the parameters are for. // // factor: A prime factor of n. If present, the fields below may // not be present. // // isPrime: if set, n is prime. If present, the fields below may // not be present. // // r: the AKS modulus for n. // // M: the AKS upper bound for n. function getAKSParametersSimple(n) { n = SNat.cast(n); var r = calculateAKSModulus(n); var M = calculateAKSUpperBound(n, r); var parameters = { n: n, r: r, M: M }; var factor = getFirstFactorBelow(n, M); if (factor) { parameters.factor = factor; } else if (M.gt(n.floorRoot(2))) { parameters.isPrime = true; } return parameters; }</code></pre> </div> <div class="p">Finally, here is step 4 implemented in Javascript: <pre class="code-container"><code class="language-javascript">// Returns whether (X + a)^n = X^n + a mod (X^r - 1, n). function isAKSWitness(n, r, a) { n = SNat.cast(n); r = SNat.cast(r); a = SNat.cast(a); function reduceAKS(p) { return p.modPow(r).mod(n); } function prodAKS(x, y) { return reduceAKS(x.times(y)); }; var one = new SPoly(new SNat(1)); var xn = one.shiftLeft(n.mod(r)); var ap = new SPoly(a); var lhs = one.shiftLeft(1).plus(ap).pow(n, prodAKS); var rhs = reduceAKS(one.shiftLeft(n).plus(ap)); return lhs.ne(rhs); } // Returns the first a &lt; M that is an AKS witness for n, or null if // there isn't one. function getFirstAKSWitness(n, r, M) { for (var a = new SNat(1); a.lt(M); a = a.plus(1)) { if (isAKSWitness(n, r, a)) { return a; } } return null; }</code></pre> </div> <div class="p">Here's the code that ties it all together: <pre class="code-container"><code class="language-javascript">// Returns whether n is prime or not using the AKS primality test. function isPrimeByAKS(n) { n = SNat.cast(n); var parameters = getAKSParameters(n); if (parameters.factor) { return false; } if (parameters.isPrime) { return true; } return (getFirstAKSWitness(n, parameters.r, parameters.M) == null); }</code></pre> </div> <p class="interactive-example" id="aksExample"> Let <span class="fake-katex"><var>n</var> = <input class="parameter" size="6" pattern="[0-9]*" required type="text" value="175507" data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span>. <!-- ko template: outputTemplate --><!-- /ko --> <script type="text/html" id="aks.error.invalidN"> <span class="fake-katex"><var>n</var></span> is not a valid number. </script> <script type="text/html" id="aks.error.outOfBoundsN"> <span class="fake-katex"><var>n</var></span> must be greater than or equal to 2. </script> <script type="text/html" id="aks.success"> <span class="fake-katex">&lceil;lg <var>n</var>&rceil;</span></span> is <span class="fake-katex intermediate" data-bind="text: ceilLgN"></span>, <span class="fake-katex"><var>r</var> = <span class="intermediate" data-bind="text: r"></span></span> is the least value such that <span class="fake-katex">o<sub><var>r</var></sub>(<var>n</var>) = <span class="intermediate" data-bind="text: nOrder"></span> &gt; &lceil;lg <var>n</var>&rceil;<sup>2</sup> = <span class="intermediate" data-bind="text: ceilLgNSq"></span></span>, <span class="fake-katex"><var>&phi;</var>(<var>r</var>) = <span class="intermediate" data-bind="text: eulerPhiR"></span></span>, and <span class="fake-katex"><var>M</var> = &lfloor;&radic;<var>&phi;</var>(<var>r</var>)&rfloor; &sdot; &lceil;lg <var>n</var>&rceil; + 1 = <span class="intermediate" data-bind="text: M"></span> &gt; &lfloor;&radic;<var>&phi;</var>(<var>r</var>)&rfloor; &sdot; lg <var>n</var></span>. <span data-bind="if: factor()"> <span class="fake-katex"><var>n</var></span> has a factor <span class="fake-katex"><span class="intermediate" data-bind="text: factor"></span> &lt; <var>M</var></span>, so therefore <span class="fake-katex"><var>n</var></span> is <span class="result">composite</span>. </span> <span data-bind="if: isPrime()"> <span class="fake-katex"><var>n</var></span> has no factor <span class="fake-katex">&lt; <var>M</var></span> and <span class="fake-katex"><var>M</var> &le; &lfloor;&radic;<var>n</var>&rfloor; = <span class="intermediate" data-bind="text: floorSqrtN"></span></span>, so therefore <span class="fake-katex"><var>n</var></span> is <span class="result">prime</span>. </span> <span data-bind="if: !factor() && !isPrime()"> <span class="fake-katex"><var>n</var></span> has no factor <span class="fake-katex">&lt; <var>M</var></span> and <span class="fake-katex"><var>M</var> &gt; &lfloor;&radic;<var>n</var>&rfloor; = <span class="intermediate" data-bind="text: floorSqrtN"></span></span>, so <span class="fake-katex"><var>n</var></span> is prime iff <span class="fake-katex">(<var>X</var> + <var>a</var>)<sup><var>n</var></sup> &equiv; <var>X</var><sup><var>n</var></sup> + <var>a</var> (mod <var>X</var><sup><var>r</var></sup> &minus; 1, <var>n</var>)</span> for <span class="fake-katex">0 &le; <var>a</var> &le; <var>M</var></span>. </span> </script> </p> <script type="text/javascript" src="/primality-testing-polynomial-time-part-2-files/aks-example.js"></script> <p><em>(To-do: Have an interactive box to demonstrate how the per-$$a$$ AKS test works.)</em></p> </section> <section> <header> <h2>8. The AKS algorithm (improved version)</h2> </header> <div class="p">Here is a slightly more complicated version of the AKS algorithm. Again given $$n \ge 2$$: <ol> <li>Search for a prime factor of $$n$$ less than $$\lceil \lg n \rceil^2 + 2$$. If one is found, return &ldquo;composite&rdquo;.</li> <li>For each $$r$$ from $$\lceil \lg n \rceil^2 + 2$$: <ol> <li>If $$r \gt \lfloor \sqrt{n} \rfloor$$, return &ldquo;prime&rdquo;.</li> <li>If $$r$$ divides $$n$$, return &ldquo;composite&rdquo;.</li> <li>Otherwise, factorize $$r$$.</li> <li>Compute $$o_r(n)$$ using $$r$$'s prime factors. If it is less than or equal to $$\lceil \lg n \rceil^2$$, jump back to the top of the loop with the next $$r$$.</li> <li>Otherwise, compute $$φ(r)$$ using $$r$$'s prime factors.</li> <li>Compute $$M = \lfloor \sqrt{φ(r)} \rfloor \lceil \lg n \rceil + 1$$, and break out of the loop.</li> </ol> </li> <li>For each $$1 \le a \lt M$$, compute $$(X + a)^n$$, reducing coefficients mod $$n$$ and powers mod $$r$$. If the result is not equal to $$X^{n\text{ mod }r} + a$$, return &ldquo;composite&rdquo;.</li> <li>Otherwise, return &ldquo;prime&rdquo;.</li> </ol> </div> <p>The logic of steps 1 to 3 of the simple version is essentially merged together to form steps 1 and 2 of this version; since each $$r$$ has to be checked for co-primality with $$n$$, that effectively also checks if $$r$$ is a prime factor of $$n$$, so we only have to check for prime factors of $$n$$ up to the lower bound of $$r$$. Furthermore, both the multiplicative order as well as the totient function can be computed more quickly given a complete prime factorization, so we can compute that for each $$r$$. Third, we use $$φ(r)$$ instead of $$r - 1$$ to give a tighter bound for $$M$$. Finally, the last two steps are the same as in the simple version.</p> <div class="p">Here are steps 1 and 2 of the above algorithm, implemented in Javascript: <pre class="code-container"><code class="langauge-javascript">function getAKSParameters(n, factorizer) { n = SNat.cast(n); factorizer = factorizer || defaultFactorizer; var ceilLgN = new SNat(n.ceilLg()); var ceilLgNSq = ceilLgN.pow(2); var floorSqrtN = n.floorRoot(2); var rLowerBound = ceilLgNSq.plus(2); var rUpperBound = calculateAKSModulusUpperBound(n).min(floorSqrtN); var parameters = { n: n }; var factor = getFirstFactorBelow(n, rLowerBound); if (factor) { parameters.factor = factor; return parameters; } for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) { if (n.mod(r).isZero()) { parameters.factor = d; return parameters; } var rFactors = getFactors(r, factorizer); var o = calculateMultiplicativeOrderCRTFactors(n, rFactors, factorizer); if (o.gt(ceilLgNSq)) { parameters.r = r; parameters.M = calculateAKSUpperBoundFactors(n, rFactors); return parameters; } } if (rUpperBound.eq(floorSqrtN)) { parameters.isPrime = true; return parameters; } throw new Error('Could not find AKS modulus'); }</code></pre> </div> </section> <p><em>(To-do: Wrap up and lead into what will be shown in part 3.)</em></p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> This is a version of Theorem 2 from Lenstra's paper <a href="http://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1979e/art.pdf">Miller's Primality Test</a>. <a href="#r1">↩</a></p> <p id="fn2"> We work with $$\lceil \lg n \rceil^2$$ instead of $$\lceil \lg^2 n \rceil$$ or $$\lg^2 n$$ as it's easier to work with in an actual implementation. <a href="#r2">↩</a></p> <p id="fn3"> This is exercise 1.27 from <a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime Numbers: A Computational Perspective</a>. <a href="#r3">↩</a></p> <p id="fn4"> This is an adapted from section 8.4 of Granville's <a href="http://www.dms.umontreal.ca/~andrew/PDF/Bulletin04.pdf">It is Easy to Determine Whether a Given Number is Prime</a>. <a href="#r4">↩</a></p> <p id="fn5"> The <a href="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"><code>SNat</code></a> class used is the same as in my previous article, <a href="intro-primality-testing">An Introduction to Primality Testing</a>. <a href="#r5">↩</a></p> </section> https://www.akalin.com/primality-testing-polynomial-time-part-1 Primality Testing in Polynomial Time (Ⅰ) 2012-08-06T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <section> <header> <h2>1. Introduction</h2> </header> <p>Exactly ten years ago, <a href="http://www.cse.iitk.ac.in/users/manindra/">Agrawal</a>, <a href="http://research.microsoft.com/en-us/people/neeraka/">Kayal</a>, and <a href="http://www.math.uni-bonn.de/people/saxena/">Saxena</a> published <a href="http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf">&ldquo;PRIMES is in P&rdquo;</a>, which described an algorithm that could provably determine whether a given number was prime or composite in polynomial time.</p> <p>The AKS algorithm is quite short, but understanding how it works via the proofs in the paper requires some mathematical sophistication. Also, some results in the last decade have simplified both the algorithm and its accompanying proofs. In this article I will explain in detail the main result of the AKS paper, and in a follow-up article I will strengthen the main result, use it to get a polynomial-time primality testing algorithm, and implement that algorithm in Javascript. If you've understood <a href="/intro-primality-testing">my introduction to primality testing</a>, you should be able to follow along.</p> <div class="p">Let's get started! The basis for the AKS primality test is the following generalization of <a href="http://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat's little theorem</a> to polynomials: <div class="theorem"> (<span class="theorem-name">Fermat's little theorem for polynomials, strong version</span>.) If $$n \ge 2$$ and $$a$$ is relatively prime to $$n$$, then $$n$$ is prime if and only if $(X + a)^n \equiv X^n + a \pmod{n}\text{.}$ </div> </div> <p>The form of the equation above may be unfamiliar. The polynomials in question are <a href="http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_K.5BX.5D"><em>formal polynomials</em></a>. That is, we care only about the coefficients of the polynomial and not how it behaves as a function. In this case, we restrict ourselves to polynomials with integer coefficients. Then we can meaningfully compare two polynomials modulo $$n$$: we consider two polynomials congruent modulo $$n$$ if their respective coefficients are all congruent modulo $$n$$. (Equivalently, two polynomials $$f(X)$$ and $$g(X)$$ are congruent modulo $$n$$ if $$f(X) - g(X) = n \cdot h(X)$$ for some polynomial $$h(X)$$.) This definition is consistent with how they behave as functions; if two polynomials $$f(X)$$ and $$g(X)$$ are congruent modulo $$n$$, then treating them as functions, $$f(x)\ \equiv g(x) \pmod{n}$$ for any integer $$x$$.<sup><a href="#fn1" id="r1"></a></sup></p> <div class="p">Unfortunately, this test by itself cannot give a polynomial-time algorithm as testing even one value of $$a$$ may require looking at $$n$$ coefficients of the left-hand side. (Remember that we're interested in algorithms with time polynomial not in the input $$n$$, but in its bit length $$\lg n$$. Such an algorithm is described as having time <em>polylog in $$n$$</em>.) However, we can reduce the number of coefficients we have to look at by taking the powers of $$X$$ modulo some number $$r$$. This is equivalent to taking the modulo of the polynomials themselves by $$X^r - 1$$; you can see this for yourself by picking some polynomial and some value for $$r$$ and doing long division by $$X^r - 1$$ to find the remainder. (It may seem weird to talk about taking the modulo of one polynomial with another, but it's entirely analogous to integers.) This gives us a weaker version of the theorem above: <div class="theorem"> (<span class="theorem-name">Fermat's little theorem for polynomials, weak version</span>.) If $$n$$ is prime and $$a$$ is not a multiple of $$n$$, then for any $$r \ge 2$$ $(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}$</div> </div> <p>The &ldquo;double mod&rdquo; notation above may be unfamiliar, but in this case its meaning is simple. We consider two polynomials congruent modulo $$X^r - 1, n$$ when they are congruent modulo $$n$$ after you reduce the powers of $$X$$ modulo $$r$$ and combine like terms. More generally, two polynomials $$f(X)$$ and $$g(X)$$ are congruent modulo $$n(X), n$$ if $$f(X) - g(X) \equiv n(X) \cdot h(X) \pmod{n}$$ for some polynomial $$h(X)$$.</p> <!-- TODO(akalin): Put interactive applet for the condition here. --> <p>With this theorem, we only have to compare $$r$$ coefficients, but we introduce the possibility of the condition above being met even when $$n$$ is composite. But can we impose conditions on $$r$$ and $$a$$ such that if the condition holds for a polynomial number of pairs of $$r$$ and $$a$$, we can be sure that $$n$$ is prime? The answer is &ldquo;yes&rdquo;; in particular, we can find a single $$r$$ and an upper bound $$M$$ polylog in $$n$$ such that if the condition holds for $$r$$ and $$0 \le a \lt M$$, then $$n$$ is prime.</p> <p>In the remainder of this article, we'll work backwards. That is, we'll first assume we have some $$n \ge 2$$, $$r \ge 2$$, and $$M \ge 1$$ such that for all $$0 \le a \lt M$$ $(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}$ Then we'll assume that $$n$$ is not a power of one of its prime divisors $$p$$ and try to deduce the conditions that imposes on $$n$$, $$r$$, $$M$$, and $$p$$. Then we can take the contrapositive to find the inverse conditions on $$n$$, $$r$$, $$M$$, and $$p$$ that would then force $$n$$ to be a power of $$p$$. Since we can easily test whether $$n$$ is a <a href="http://en.wikipedia.org/wiki/Perfect_power">perfect power</a>, if it's not one, we can immediately conclude that $$n = p^1$$ and thus prime. (Of course, if it does turn out to be a perfect power, then it is trivially composite.)</p> <p>To understand the conditions that we will derive, we must first talk about <em>introspective numbers</em>. </section> <section> <header> <h2>2. Introspective numbers</h2> </header> <p>Given a base $$b$$, a polynomial $$g(X)$$ and a number $$q$$, we call $$q$$ <em>introspective</em><sup><a href="#fn2" id="r2"></a></sup> for $$g(X)$$ modulo $$b$$ if $g(X)^q = g(X^q) \pmod{b}\text{.}$</p> <p>We also say that $$g(X)$$ is <em>introspective</em> under $$q$$ modulo $$b$$.</p> <p>A basic property of introspective numbers and polynomials is that they are closed under multiplication. That is, if $$q_1$$ and $$q_2$$ are introspective for $$g(X)$$ modulo $$b$$, then $$q_1 \cdot q_2$$ is also introspective for $$g(X)$$ modulo $$b$$, and if $$g_1(X)$$ and $$g_2(X)$$ are introspective under $$q$$ modulo $$b$$, then $$g_1(X) \cdot g_2(X)$$ is also introspective under $$q$$ modulo $$b$$.</p> <p>In particular, given our assumptions above, we can easily see that $$1$$, $$p$$, and $$n$$ are introspective for $$X + a$$ modulo $$p$$ for any $$0 \le a \lt M$$. We can also show that $$n/p$$ is also introspective for $$X + a$$ modulo $$p$$. Using closure under multiplication, we can talk about the set of numbers generated by $$p$$ and $$n/p$$, which are all introspective for $$X + a$$ modulo $$p$$. Call this set $$I$$:</p> $I = \left\{ p^i \left( n/p \right)^j \mid i, j \ge 0 \right\}\text{.}$ <p>We can also take the closure of all $$X + a$$ to get a set of polynomials which are all introspective under $$p$$, $$n/p$$, or any number in $$I$$. Call this set $$P$$: $P = \left\{ 0 \right\} \cup \left\{ X^{e_0} \cdot (X + 1)^{e_1} \dotsm (X + M - 1)^{e_{M - 1}} \mid e_0, e_1, \dotsc, e_{M - 1} \ge 0 \right\}\text{.}$ To summarize, $$I$$ is a set of numbers and $$P$$ is a set of polynomials such that for any $$i \in I$$ and $$g(X) \in P$$, $$i$$ is introspective for $$g(X)$$ modulo $$p$$. Of course, it's still not clear what these two sets have to do with whether $$n$$ is prime or not. But we will examine certain finite sets related to $$I$$ and $$P$$ and their sizes, and we will see that we can deduce their properties depending on the relation of $$n$$ to $$p$$.</p> </section> <section> <header> <h2>3. Bounds on finite sets related to $$I$$ and $$P$$</h2> </header> <p>Now we're ready to work towards finding our restrictions on $$n$$, $$r$$, $$M$$, and $$p$$. We'll slowly build them up such that when the last one falls into place, we know that $$n$$ is a perfect power of $$p$$. Here's what we're starting with:</p> <div class="insert"> $$n \ge 2$$, <br/> $$r \ge 2$$, <br/> $$M \ge 1$$, <br/> $$p$$ is a prime divisor of $$n$$. </div> <p>Let us restrict $$I$$ to a finite set by bounding the exponents of $$p$$ and $$n/p$$: $I_k = \left\{ p^i (n/p)^j \mid 0 \le i, j \lt k \right\} \subset I\text{.}$</p> <p>Notice that if $$n$$ is not a power of $$p$$, then all members of $$I_k$$ are distinct, and therefore we can easily calculate its size:<sup><a href="#fn3" id="r3"></a></sup> $|I_k| = k^2\text{.}$</p> <p>Let's also restrict $$P$$ to a finite set by bounding the degrees of its polynomials: $P_d = \left\{ g \in P \mid \deg(g) \lt d \right\} \subset P\text{.}$</p> <p>We can calculate $$|P_d|$$ exactly,<sup><a href="#fn4" id="r4"></a></sup> but we only need a lower bound for when $$d \le M$$. Consider $$P_d^{\{0, 1\}}$$, the subset of $$P_d$$ where each $$X + a$$ is present at most once. Since each $$X + a$$ is either present or not present, but not all of them can be present at the same time, there are $$2^d - 1$$ distinct polynomials in $$P_d^{\{0, 1\}}$$. Adding back the zero polynomial yields $$|P_d^{\{0, 1\}}| = 2^d$$. Since $$P_d^{\{0, 1\}}$$ is a subset of $$P_d$$, $$|P_d| \ge |P_d^{\{0, 1\}}| = 2^d$$. Therefore, if $$d \le M$$, then<sup><a href="#fn5" id="r5"></a></sup> $|P_d| \ge 2^d\text{.}$ This will be useful later (for a particular value of $$d$$), so let's add the restriction to $$M$$: </p> <div class="insert"> $$n \ge 2$$, <br/> $$r \ge 2$$, <br/> <em>$$M \ge d$$</em>, <br/> $$p$$ is a prime divisor of $$n$$. </div> <p>Let us restrict $$I$$ in a different way, by reducing modulo $$r$$: $J = \left\{ x \bmod r \mid x \in I \right\}$ and let $$t = |J|$$. (This size will play an important role later.)</p> <p>Our final set that we're interested in needs some background to define. We want to find a subset of $$P$$ that lies in some field $$F$$ because fields have some convenient properties that we will use later.<sup><a href="#fn6" id="r6"></a></sup></p> <p>Consider $$\mathbb{Z}/p\mathbb{Z}$$, the ring of <a href="http://en.wikipedia.org/wiki/Integers_modulo_n#Integers_modulo_n">integers modulo $$p$$</a>. Since $$p$$ is prime, it is also a field. In particular, it is the <a href="http://en.wikipedia.org/wiki/Finite_field">finite field</a> $$\mathbb{F}_p$$ of order $$p$$. Then consider $$\mathbb{F}_p[X]$$, its <a href="http://en.wikipedia.org/wiki/Polynomial_ring">polynomial ring</a>, which is the set of polynomials with coefficients in $$\mathbb{F}_p$$. Given some polynomial $$q(X) \in \mathbb{F}_p[X]$$, we can further reduce modulo $$q(X)$$ to get $$\mathbb{F}_p[X] / q(X)$$. Finally, if $$q(X)$$ is <a href="http://en.wikipedia.org/wiki/Irreducible_polynomial">irreducible</a> over $$\mathbb{F}_p$$, then $$\mathbb{F}_p[X] / q(X)$$ is also a field.</p> <p>(We can show that both $$\mathbb{F}_p = \mathbb{Z}/p\mathbb{Z}$$ and $$\mathbb{F}_p[X] / q(X)$$ are fields from the same general theorem of rings: if $$R$$ is a <a href="http://en.wikipedia.org/wiki/Principal_ideal_domain">principal ideal domain</a> and $$(c)$$ is the <a href="http://en.wikipedia.org/wiki/Two-sided_ideal#Ideal_generated_by_a_set">two-sided ideal generated by $$c$$</a>, then the <a href="http://en.wikipedia.org/wiki/Quotient_ring">quotient ring</a> $$R / (c)$$ is a field if and only if $$c$$ is a <a href="http://en.wikipedia.org/wiki/Prime_element">prime element</a> of $$R$$.)<sup><a href="#fn7" id="r7"></a></sup></p> <p>So we just need to find a polynomial that's irreducible over $$\mathbb{F}_p$$. We know that $$X^r - 1$$ has $$Φ_r(X)$$, the $$r$$th <a href="http://en.wikipedia.org/wiki/Cyclotomic_polynomial">cyclotomic polynomial</a>, as a factor. $$Φ_r(X)$$ is irreducible over $$\mathbb{Z}$$, but not necessarily over $$\mathbb{F}_p$$. But if $$r$$ is relatively prime to $$p$$, then $$Φ_r(X)$$ factors into irreducible polynomials all of degree $$o_r(p)$$ (the <a href="http://en.wikipedia.org/wiki/Multiplicative_order">multiplicative order</a> of $$p$$ modulo $$r$$) over $$\mathbb{F}_p$$.<sup><a href="#fn8" id="r8"></a></sup> Then we can just require that $$r$$ be relatively prime to $$p$$. If we do so, then we can let $$h(X)$$ be one of the factors of $$Φ_r(X)$$ over $$\mathbb{F}_p$$ and we have our field $$F = \mathbb{F}_p[X] / h(X)$$.</p> <div class="insert"> $$n \ge 2$$, <br/> $$r \ge 2$$, <em>$$r$$ relatively prime to $$p$$</em>,<br/> $$M \ge d$$, <br/> $$p$$ is a prime divisor of $$n$$. </div> <p>Finally, we can define our last set. Let $Q = \left\{ f(X) \bmod (h(X), p) \mid f(X) \in P \right\} \subseteq F\text{.}$</p> <p>We can map elements of $$P$$ into $$Q$$ via reduction modulo $$(h(X), p)$$. But we're interested in only the elements of $$P$$ that map to distinct elements of $$Q$$, since that will let us find a lower bound for $$|Q|$$. A simple example would be the set of $$X + a$$ for $$0 \le a \lt M$$; if the degree of $$h(X)$$ is greater than $$1$$ and $$p \ge M$$, then each $$X + a$$ is distinct in $$Q$$.</p> <p>Another interesting set is $$X^k$$ for $$1 \le k \le r$$. Since $$h(X) \equiv 0 \pmod{h(X}, p)$$, we can say that $$X$$ is a root of the polynomial function $$h(y)$$ over the field $$F$$. But since $$h(y)$$ is a factor of $$Φ_r(y)$$, $$X$$ is then a primitive $$r$$th root of unity in $$Q$$.<sup><a href="#fn9" id="r9"></a></sup> But the powers of a primitive $$r$$th root of unity (from $$1$$ to $$r$$) are all distinct. Therefore all $$X^k$$ for $$1 \le k \le r$$ are distinct in $$Q$$.</p> <p>Most importantly, we can show that distinct elements in $$P_d$$ map to distinct elements in $$Q$$ if $$d \le t$$. Let $$f(X)$$ and $$g(X)$$ be two different elements of $$P_d$$. Assume that $$f(X) \equiv g(X) \pmod{h(x}, p)$$. Then, for $$m \in I$$: $f(X^m) \equiv f(X)^m \pmod{X^r - 1, p}$ and $g(X^m) \equiv g(X)^m \pmod{X^r - 1, p}$ by introspection modulo $$p$$, and therefore $f(X^m) \equiv g(X^m) \pmod{X^r - 1, p}$ which immediately leads to $f(X^m) \equiv g(X^m) \pmod{h(X}, p)\text{.}$ Therefore, all $$X^m$$ for $$m \in I$$ are roots of the polynomial function $$u(y) = f(y) - g(y)$$ over the field $$F$$, and in particular all $$X^m$$ for $$m \in J$$. But all such $$X^m$$ are distinct in $$Q$$ by the argument above. Therefore, $$u(y)$$ must have degree at least $$t$$ since a polynomial over a field cannot have more roots than its degree. But the degree of $$u(y)$$ is less than $$d$$ since both $$f(y)$$ and $$g(y)$$ have degree less than $$d$$. Since $$d \le t$$, this is a contradiction, so therefore $$f(X) \not\equiv g(X) \pmod{h(x}, p)$$. But since $$f(X)$$ and $$g(X)$$ were arbitrary, that implies that distinct elements of $$P_d$$ map to distinct elements of $$Q$$ for $$d \le t$$.</p> <p>Given the above, we can conclude that as long as we require that $$d \le t$$, $$p \ge M$$, and $$o_r(p) = \deg(h(X)) \gt 1$$, then $|Q| \ge |P_d| \ge 2^d\text{.}$</p> <div class="insert"> $$n \ge 2$$, <br/> <em>$$o_r(p) \gt 1$$</em>,<br/> $$M \ge d$$, <br/> <em>$$t \ge d$$</em>,<br/> <em>$$p \ge M$$</em>, $$p$$ is a prime divisor of $$n$$. </div> </section> <section> <header> <h2>4. The AKS theorem (weak version)</h2> </header> <p>We're finally ready to put it all together. Again assume $$n$$ is not a power of $$p$$, and recall that $$|J| = t$$. Let $$s \gt \sqrt{t}$$. Then $$|I_s| = s^2 \gt t$$. By the <a href="http://en.wikipedia.org/wiki/Pigeonhole_principle">pigeonhole principle</a>, there must be two elements $$m_1, m_2 \in I_s$$ that map to the same element in $$J$$; that is, there must be $$m_1, m_2 \in I_s$$ such that $$m_1 \equiv m_2 \pmod{r}$$. Now pick some $$g(X)$$ from $$P$$. Then $g(X)^{m_1} \equiv g(X^{m_1}) \pmod{X^r - 1, p}$ and $g(X)^{m_2} \equiv g(X^{m_2}) \pmod{X^r - 1, p}$ by introspection modulo $$p$$. But $$X^{m_1} \equiv X^{m_2} \pmod{X^r - 1}$$ since $$m_1 \equiv m_2 \pmod{r}$$, so $g(X^{m_1}) \equiv g(X^{m_2}) \pmod{X^r - 1, p}\text{.}$ Chaining all these congruences together lets us deduce that $g(X)^{m_1} \equiv g(X)^{m_2} \pmod{X^r - 1, p}\text{,}$ which immediately leads to $g(X)^{m_1} \equiv g(X)^{m_2} \pmod{h(X}, p)\text{.}$ </p> <p>That means that $$g(X) \bmod (h(X), p) \in Q$$ is a root of the polynomial function $$u(y) = y^{m_1} - y^{m_2}$$ over the field $$F$$. But $$g(X)$$ was picked arbitrarily from $$P$$, so $$u(y)$$ has at least $$|Q|$$ roots. $$\deg(u(y)) = \max(m_1, m_2) \le p^{s-1} \cdot (n/p)^{s-1} = n^{s-1}$$, and $$u(y)$$, being a polynomial over a field, cannot have more roots than its degree, so if $$n$$ is not a power of $$p$$, then $$|Q| \le n^{s-1}$$. Equivalently, if $$|Q| \gt n^{s-1}$$, then $$n$$ must be a power of $$p$$.<sup><a href="#fn10" id="r10"></a></sup> But we've shown above that $$|Q| \ge 2^d$$ for $$d \le t$$, so if we can pick $$d$$ and $$s$$ such that $$2^d \gt n^{s-1}$$, then we can force $$n$$ to be a power of $$p$$. Taking logs, we see that this is equivalent to picking $$d$$ and $$s$$ such that $$d \gt (s - 1) \lg n$$. Since $$d \le t$$, this imposes $$t \gt (s - 1) \lg n$$ in order for there to be room to pick $$d$$. Rearranging, we get $$s \lt \frac{t}{\lg n} + 1$$. But $$s \gt \sqrt{t}$$, so this imposes $$\sqrt{t} \lt \frac{t}{\lg n} + 1$$ in order for there to be room to pick $$s$$. Rearranging again, we get $$\frac{t}{\sqrt{t} - 1} \gt \lg n$$. Since $$\frac{t}{\sqrt{t} - 1} \gt \sqrt{t}$$, it suffices to require that $$t \gt \lg^2 n$$ in order for there to be room to pick $$d$$ and $$s$$. Furthermore, since $$s$$ has to be an integer, then $$s \ge \lfloor \sqrt{t} \rfloor + 1$$, and therefore $$d \gt \lfloor \sqrt{t} \rfloor \lg n$$. Let's update our assumptions:</p> <div class="insert"> $$n \ge 2$$, <br/> $$o_r(p) \gt 1$$<br/> <em>$$M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n$$</em>,<br/> <em>$$t \gt \lg^2 n$$</em>,<br/> $$p \ge M$$, $$p$$ is a prime divisor of $$n$$. </div> <p>So to summarize, if we make the above assumptions, we can pick $$d$$ and $$s$$ such that $$|Q| \ge 2^d \gt n^{s - 1}$$, which implies that $$n$$ must be a power of $$p$$, which was our goal. Now we just have to express all assumptions in terms of $$n$$, $$r$$, and $$M$$, strengthening them if necessary. $$J$$ is generated by $$p$$ and $$n/p$$, so its order (i.e., $$t$$) is at least $$o_r(p)$$, which is in turn at least $$o_r(n)$$, since $$p$$ is a prime factor of $$n$$ (this brings along the assumption that $$r$$ and $$n$$ are relatively prime). Therefore, we can replace the assumptions $$t \gt \lg^2 n$$ and $$o_r(p) \gt 1$$ with $$o_r(n) \gt \lg^2 n$$. We can remove the reference to $$d$$ by finding the maximum value of $$t$$. Since $$r$$ is relatively prime to $$n$$, $$J$$ is a subgroup of $$Z_r$$, and therefore its order divides (and therefore is at most) $$φ(r)$$. So we can replace $$M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n$$ with $$M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n$$. Finally, we can remove the reference to $$p$$ by mandating that $$n$$ has no prime factor less than $$M$$. Here are our final assumptions:</p> <div class="insert"> $$n \ge 2$$, <em>$$n$$ has no prime factors less than $$M$$</em>,<br/> <em>$$o_r(n) \gt \lg^2 n$$</em>,<br/> <em>$$M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n$$</em>.<br/> </div> <div class="p">We can summarize the above discussion in the following theorem: <div class="theorem"> (<span class="theorem-name">AKS theorem, weak version</span>.) Let $$n \ge 2$$, $$r$$ be relatively prime to $$n$$ with $$o_r(n) \gt \lg^2 n$$, and $$M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n$$. Furthermore, let $$n$$ have no prime factor less than $$M$$ and let $(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}$ for $$0 \le a \lt M$$. Then $$n$$ is the power of some prime $$p \ge M$$.</div> </div> <p>And that's it for now! In the follow-up article we will strengthen this theorem to further show that $$n$$ is equal to $$p$$, and therefore prime. Then we will use this result to get a primality-testing algorithm that we will prove to be polynomial time.</p> </section> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> We use uppercase letters for variables when we treat polynomials as formal polynomials and lowercase letters when we treat them as functions. <a href="#r1">↩</a></p> <p id="fn2"> The term &ldquo;introspection&rdquo;, which comes from the original AKS paper, was probably chosen to invoke the idea that the exponent $$q$$ can be pushed into and pulled out of $$g(X)$$. Here we generalize it a bit. <a href="#r2">↩</a></p> <p id="fn3"> This condition is too weak to be useful by itself, but we will parlay it into something we can use later. <a href="#r3">↩</a></p> <p id="fn4"> Using the ideas on <a href="http://www.johndcook.com/TwelvefoldWay.pdf">this page</a>, we can show that $$|P_d| = {M + d \choose d - 1} + 1$$ by considering each $$X + a$$ a labeled urn (plus a &ldquo;dummy&rdquo; urn) and each unit of power an unlabeled ball. (This was used in the AKS paper.) <a href="#r4">↩</a></p> <p id="fn5"> This lower bound, as well as other ideas that simplify the proof, was taken from <a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime Numbers: A Computational Perspective</a>. <a href="#r5">↩</a></p> <p id="fn6"> You may first want to brush up on the definitions of <a href="http://en.wikipedia.org/wiki/Group_(mathematics)">group</a>, <a href="http://en.wikipedia.org/wiki/Ring_(mathematics)">ring</a>, and <a href="http://en.wikipedia.org/wiki/Field_(mathematics)">field</a>, and the differences between them. <a href="#r6">↩</a></p> <p id="fn7"> This is Theorem 1.47(iv) from &ldquo;<a href="http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948">Introduction to finite fields and their applications</a>&rdquo;. <a href="#r7">↩</a></p> <p id="fn8"> The reducibility of $$Φ_r(X)$$ over $$\mathbb{F}_p$$ given $$r$$ relatively prime to $$p$$ is Theorem 2.47(ii) from &ldquo;<a href="http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948">Introduction to finite fields and their applications</a>&rdquo;. <a href="#r8">↩</a></p> <p id="fn9"> It's a bit weird to talk about a polynomial being the root of other polynomials, but recall that we can form a polynomial ring over any field, even a field of polynomials. We keep track of which polynomials belong to which domains by using different variables. <a href="#r9">↩</a></p> <p id="fn10"> Here's where we force $$n$$ to be a prime power. <a href="#r10">↩</a></p> </section> https://www.akalin.com/intro-primality-testing An Introduction to Primality Testing 2012-07-08T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"></script> <script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/primality-testing.js"></script> <p>I will explain two commonly-used primality tests: Fermat and Miller-Rabin. Along the way, I will cover the basic concepts of primality testing. I won't be assuming any background in number theory, but familiarity with <a href="http://en.wikipedia.org/wiki/Modular_arithmetic">modular arithmetic</a> will be helpful. I will also be providing implementations in Javascript, so <a href="https://developer.mozilla.org/en/JavaScript">familiarity with it</a> will also be helpful. Finally, since Javascript doesn't natively support arbitrary-precision arithmetic, I wrote a simple natural number class (<a href="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"><code>SNat</code></a>) that represents a number as an array of decimal digits. All algorithms used are the simplest possible, except when a more efficient one is needed by the algorithms we discuss.</p> <p>Primality testing is the problem of determining whether a given natural number is prime or composite. Compared to the problem of <a href="http://en.wikipedia.org/wiki/Integer_factorization">integer factorization</a>, which is to determine the prime factors of a given natural number, primality testing turns out to be easier; integer factorization is in <a href="http://en.wikipedia.org/wiki/NP_(complexity)">NP</a> and thought to be outside <a href="http://en.wikipedia.org/wiki/P_(complexity)">P</a> and <a href="http://en.wikipedia.org/wiki/NP-complete">NP-complete</a>, whereas primality testing is <a href="http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf">now known to be in P</a>.</p> <p>Most primality tests are actually compositeness tests; they involve finding <em>composite witnesses</em>, which are numbers that, along with a given number to be tested, can be fed to some easily-computable function to prove that the given number is composite. (The composite witness, along with the function, is a <em>certificate of compositeness</em> of the given number.) A primality test can either check each possible witness or, like the Fermat and Miller-Rabin tests, it can randomly sample some number of possible witnesses and call the number prime if none turn out to be witnesses. In the latter case, there is a chance that a composite number can erroneously be called prime; ideally, this chance goes to zero quickly as the sample size increases.</p> <p>The simplest possible witness type is, of course, a factor of the given number, which we'll call a <em>factor witness</em>. If the number to be tested is $$n$$ and the possible factor witness is $$a$$, then one can simply test whether $$a$$ divides $$n$$ (written as $$a \mid n$$) by evaluating $$n \bmod a = 0$$; that is, whether the remainder of $$n$$ divided by $$a$$ is zero. This doesn't yield a feasible deterministic primality test, though, since checking all possible witnesses is equivalent to factoring the given number. Nor does it yield a feasible probabilistic primality test, since in the worst case the given number has very few factors, which random sampling would miss.</p> <div class="p">The simplest useful witness type is a <em>Fermat witness</em>, which relies on the following theorem of Fermat: <div class="theorem"> (<span class="theorem-name">Fermat's little theorem</span>.) If $$n$$ is prime and $$a$$ is not a multiple of $$n$$, then $a^{n-1} \equiv 1 \pmod{n}\text{.}$ </div> </div> <p>Thus, a Fermat witness is a number $$1 \lt a \lt n$$ such that $$a^{n-1} \not\equiv 1 \pmod{n}$$. Conversely, if $$n$$ is composite and $$a^{n-1} \equiv 1 \pmod{n}$$, then $$a$$ is a <em>Fermat liar</em>.</p> <p class="interactive-example" id="fermatExample"> Let <span class="fake-katex"><var>n</var> = <input class="parameter" size="6" pattern="[0-9]*" required type="text" value="355207" data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span> and <span class="fake-katex"><var>a</var> = <input class="parameter" size="6" pattern="[0-9]*" required type="text" value="2" data-bind="value: aStr, valueUpdate: 'afterkeydown'" /></span>. <!-- ko template: outputTemplate --><!-- /ko --> <script type="text/html" id="fermat.error.invalidN"> <span class="fake-katex"><var>n</var></span> is not a valid number. </script> <script type="text/html" id="fermat.error.invalidA"> <span class="fake-katex"><var>a</var></span> is not a valid number. </script> <script type="text/html" id="fermat.error.outOfBoundsN"> <span class="fake-katex"><var>n</var></span> must be greater than <span class="fake-katex">2</span>. </script> <script type="text/html" id="fermat.error.outOfBoundsA"> <span class="fake-katex"><var>a</var></span> must be greater than <span class="fake-katex">1</span> and less than <span class="fake-katex"><var>n</var></span>. </script> <script type="text/html" id="fermat.success"> Then <span class="fake-katex"><var>a</var><sup><var>n</var>&minus;1</sup> &equiv; <span class="intermediate" data-bind="text: r"></span> <span data-bind="if: r() && r().ne(1)">&equiv;&#824; 1</span> (mod <var>n</var>)</span> so therefore <span class="fake-katex"><var>n</var></span> is <span data-bind="if: isCompositeByFermat()"> <span class="result">composite</span>. <span data-bind="if: r() && r().isZero()"> Furthermore, <span class="fake-katex">gcd(<var>a</var>, <var>n</var>) = <span class="intermediate" data-bind="text: k"></span></span> is a non-trivial factor of <span class="fake-katex"><var>n</var></span>. </span> </span> <span data-bind="ifnot: isCompositeByFermat()"> either <span class="result">prime</span> or a <span class="result">Fermat pseudoprime base <span class="fake-katex"><var>a</var></span></span>. </span> </script> </p> <script type="text/javascript" src="/intro-primality-testing-files/fermat-example.js"></script> <p>If $$n$$ has at least one Fermat witness that is relatively prime, then we can show that at least half of all possible witnesses are Fermat witnesses. (Roughly, if $$a$$ is the Fermat witness and $$a_1, a_2, \dotsc, a_s$$ are Fermat liars, then all $$a \cdot a_i$$ are also Fermat witnesses.) Therefore, for a sample of $$k$$ possible witnesses of $$n$$, the probability of all of them being Fermat liars is $$\le 2^{-k}$$, which goes to zero quickly enough to be practical.</p> <p>However, there is the possibility that $$n$$ is a composite number with no relatively prime Fermat witnesses. These are called <a href="http://en.wikipedia.org/wiki/Carmichael_numbers"><em>Carmichael numbers</em></a>. Even though Carmichael numbers are rare, their existence still makes the Fermat primality test unsuitable for some situations, as when the numbers to be tested are provided by some adversary.</p> <div class="p">Here is the Fermat compositeness test implemented in Javascript: <pre class="code-container"><code class="language-javascript">// Runs the Fermat compositeness test given n > 2 and 1 &lt; a &lt; n. // Calculates r = a^{n-1} mod n and whether a is a Fermat witness to n // (i.e., r != 1, which means n is composite). Returns a dictionary // with a, n, r, and isCompositeByFermat, which is true iff a is a // Fermat witness to n. function testCompositenessByFermat(n, a) { n = SNat.cast(n); a = SNat.cast(a); if (n.le(2)) { throw new RangeError('n must be > 2'); } if (a.le(1) || a.ge(n)) { throw new RangeError('a must satisfy 1 &lt; a &lt; n'); } var r = a.powMod(n.minus(1), n); var isCompositeByFermat = r.ne(1); return { a: a, n: n, r: r, isCompositeByFermat: isCompositeByFermat }; }</code></pre> Note that the algorithm depends on the efficiency of <a href="http://en.wikipedia.org/wiki/Modular_exponentiation"><em>modular exponentiation</em></a> when calculating $$a^{n-1} \pmod{n}$$. The naive method is unsuitable since it requires $$Θ(n)$$ $$b$$-bit multiplications, where $$b = \lceil \lg n \rceil$$. <code>SNat</code> uses <a href="http://en.wikipedia.org/wiki/Repeated_squaring">repeated squaring</a>, which requires only $$Θ(\lg n)$$ $$b$$-bit multiplications.</div> <p>Another useful witness type is a <em>non-trivial square root of unity $$\operatorname{mod} n$$</em>; that is, a number $$a ≠ \pm 1 \pmod{n}$$ such that $$a^2 \equiv 1 \pmod{n}$$. It is a theorem of number theory that if $$n$$ is prime, there are no non-trivial square roots of unity $$\operatorname{mod} n$$. Therefore, if we do find one, that means $$n$$ is composite. In fact, finding one leads directly to factors of $$n$$. By definition, a non-trivial square root of unity $$a$$ satisfies $$a \pm 1 ≠ 0 \pmod{n}$$ and $$a^2 - 1 \equiv 0 \pmod{n}$$. Factoring the latter leads to $$(a+1)(a-1) \equiv 0 \pmod{n}$$, which means that $$n$$ divides $$(a+1)(a-1)$$. But the first condition says that $$n$$ divides neither $$a+1$$ nor $$a-1$$, so it must be a product of two numbers $$p \mid a+1$$ and $$q \mid a-1$$. Then $$\gcd(a+1, n)$$<sup><a href="#fn1" id="r1"></a></sup> and $$\gcd(a-1, n)$$ are factors of $$n$$.</p> <p>Finding non-trivial square roots of unity by itself doesn't give a useful primality testing algorithm, but combining it with the Fermat primality test does. $$a^{n-1} \bmod n$$ either equals $$1$$ or not. If it doesn't, you're done since you have a Fermat witness. If it does equal $$1$$, and $$n-1$$ is even, then consider the square root of $$a^{n-1}$$, i.e. $$a^{(n-1)/2}$$. If it is not $$\pm 1$$, then it is a non-trivial square root of unity. If it is $$-1$$, then you can't do anything else. But if it is $$1$$, and $$(n-1)/2$$ is even, you can then take another square root and repeat the test, stopping when the exponent of $$a$$ becomes odd or when you get a result not equal to $$1$$.</p> <p>To turn this into an algorithm, you simply start from the bottom up: find the greatest odd factor of $$n-1$$, call it $$t$$, and keep squaring $$a^t$$ mod $$n$$ until you find a non-trivial square root of $$n$$ or until you can deduce the value of $$a^{n-1}$$. In fact, this is almost as fast as the original Fermat primality test, since the exponentiation by $$n-1$$ has to do the same sort of squaring, and we're just adding comparisons to $$±1$$ in between squarings.</p> <p>The original idea for the test above is from Artjuhov, although it is usually credited to Miller. Therefore, we call $$a$$ an <em>Artjuhov witness<sup><a href="#fn2" id="r2"></a></sup> of $$n$$</em> if it shows $$n$$ composite by the above test.</p> <p class="interactive-example" id="artjuhovExample"> Let <span class="fake-katex"><var>n</var> = <input class="parameter" size="6" pattern="[0-9]*" required type="text" value="561" data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span> and <span class="fake-katex"><var>a</var> = <input class="parameter" size="6" pattern="[0-9]*" required type="text" value="2" data-bind="value: aStr, valueUpdate: 'afterkeydown'" /></span>. <!-- ko template: outputTemplate --><!-- /ko --> <script type="text/html" id="artjuhov.error.invalidN"> <span class="fake-katex"><var>n</var></span> is not a valid number. </script> <script type="text/html" id="artjuhov.error.invalidA"> <span class="fake-katex"><var>a</var></span> is not a valid number. </script> <script type="text/html" id="artjuhov.error.outOfBoundsN"> <span class="fake-katex"><var>n</var></span> must be greater than <span class="fake-katex">2</span>. </script> <script type="text/html" id="artjuhov.error.outOfBoundsA"> <span class="fake-katex"><var>a</var></span> must be greater than <span class="fake-katex">1</span> and less than <span class="fake-katex"><var>n</var></span>. </script> <script type="text/html" id="artjuhov.success.fermatEquivResult"> Then <span class="fake-katex"><var>n</var></span> is even, so this reduces to the Fermat primality test. <span class="fake-katex"><var>a</var><sup><var>n</var>&minus;1</sup> &equiv; <span class="intermediate" data-bind="text: r"></span> <span data-bind="if: r() && r().ne(1)">&equiv;&#824; 1</span> (mod <var>n</var>)</span> so therefore <span class="fake-katex"><var>n</var></span> is <span data-bind="if: isCompositeByArtjuhov()"> <span class="result">composite</span>. <span data-bind="html: factorsHtml"></span> </span> <span data-bind="ifnot: isCompositeByArtjuhov()"> an <span class="result">Artjuhov pseudoprime base <span class="fake-katex"><var>a</var></span></span>. </span> </script> <script type="text/html" id="artjuhov.success.impliesFinalEquivResult"> Then <span class="fake-katex"><var>n</var> &minus; 1 = <span data-bind="html: nMinusOneHtml"></span></span>, and <span class="fake-katex"><var>r</var> &equiv; <span data-bind="html: rHtml"></span> &equiv; <span data-bind="html: rResultHtml"></span> (mod <var>n</var>)</span>, so <span class="fake-katex"><var>a</var><sup><var>n</var>&minus;1</sup> &equiv; <span data-bind="html: aNMinusOneHtml"></span> (mod <var>n</var>)</span>, and therefore <span class="fake-katex"><var>n</var></span> is <span data-bind="if: isCompositeByArtjuhov()"> <span class="result">composite</span>. <span data-bind="html: factorsHtml"></span> </span> <span data-bind="ifnot: isCompositeByArtjuhov()"> either <span class="result">prime</span> or an <span class="result">Artjuhov pseudoprime base <span class="fake-katex"><var>a</var></span></span>. </span> </script> <script type="text/html" id="artjuhov.success.nonTrivialSqrtResult"> Then <span class="fake-katex"><var>n</var> &minus; 1 = <span data-bind="html: nMinusOneHtml"></span></span>, <span class="fake-katex"><var>r</var> &equiv; <span data-bind="html: rHtml"></span> &equiv; <span class="intermediate">1</span> (mod <var>n</var>)</span>, and <span class="fake-katex">&radic;<var>r</var> &equiv; <span data-bind="html: rSqrtHtml"></span> &equiv; <span class="intermediate" data-bind="text: rSqrt"></span> (mod <var>n</var>)</span>, which is a non-trivial square root of unity <span class="fake-katex">mod <var>n</var></span> and therefore <span class="fake-katex"><var>n</var></span> is <span class="result">composite</span>. <span data-bind="html: factorsHtml"></span> </script> </p> <script type="text/javascript" src="/intro-primality-testing-files/artjuhov-example.js"></script> <p>If $$n$$ is an odd composite, then it can be shown (originally by Rabin) that at least three quarters of all possible witnesses are Artjuhov witnesses. Therefore, for a sample of $$k$$ possible witnesses of $$n$$, the probability of all of them being Artjuhov liars is $$\le 4^{-k}$$, which is stronger than the bound for the Fermat primality test. Furthermore, this bound is unconditional; there is nothing like Carmichael numbers for the Artjuhov test.</p> <div class="p">Here is the Artjuhov compositeness test, implemented in Javascript: <pre class="code-container"><code class="language-javascript">// Runs the Artjuhov compositeness test given n > 2 and 1 &lt; a &lt; n-1. // Finds the largest s such that n-1 = t*2^s, calculates r = a^t mod // n, then repeatedly squares r (mod n) up to s times until r is // congruent to -1, 0, or 1 (mod n). Then, based on the value of s // and the final value of r and i (the number of squarings), // determines whether a is an Artjuhov witness to n (i.e., n is // composite). // // Returns a dictionary with, a, n, s, t, i, r, rSqrt = sqrt(r) if i > // 0 and null otherwise, and isCompositeByArtjuhov, which is true iff // a is an Artjuhov witness to n. function testCompositenessByArtjuhov(n, a) { n = SNat.cast(n); a = SNat.cast(a); if (n.le(2)) { throw new RangeError('n must be > 2'); } if (a.le(1) || a.ge(n)) { throw new RangeError('a must satisfy 1 &lt; a &lt; n'); } var nMinusOne = n.minus(1); // Find the largest s and t such that n-1 = t*2^s. var t = nMinusOne; var s = new SNat(0); while (t.isEven()) { t = t.div(2); s = s.plus(1); } // Find the smallest 0 &lt;= i &lt; s such that a^{t*2^i} = 0/-1/+1 (mod // n). var i = new SNat(0); var rSqrt = null; var r = a.powMod(t, n); while (i.lt(s) && r.gt(1) && r.lt(nMinusOne)) { i = i.plus(1); rSqrt = r; r = r.times(r).mod(n); } var isCompositeByArtjuhov = false; if (s.isZero()) { // If 0 = i = s, then this reduces to the Fermat primality test. isCompositeByArtjuhov = r.ne(1); } else if (i.isZero()) { // If 0 = i &lt; s, then: // // * r = 0 (mod n) -> a^{n-1} = 0 (mod n), and // * r = +/-1 (mod n) -> a^{n-1} = 1 (mod n). isCompositeByArtjuhov = r.isZero(); } else if (i.lt(s)) { // If 0 &lt; i &lt; s, then: // // * r = 0 (mod n) -> a^{n-1} = 0 (mod n), // * r = +1 (mod n) -> a^{t*2^{i-1}} is a non-trivial square root of // unity mod n, and // * r = -1 (mod n) -> a^{n-1} = 1 (mod n). // // Note that the last case means r = n - 1 > 1. isCompositeByArtjuhov = r.le(1); } else { // If 0 &lt; i = s, then: // // * r = 0 (mod n) can't happen, // * r = +1 (mod n) -> a^{t*2^{i-1}} is a non-trivial square root of // unity mod n, and // * r > +1 (mod n) -> failure of the Fermat primality test. isCompositeByArtjuhov = true; } return { a: a, n: n, t: t, s: s, i: i, r: r, rSqrt: rSqrt, isCompositeByArtjuhov: isCompositeByArtjuhov }; }</code></pre> With the two compositeness tests above, we can now write a probabilistic primality test: <pre class="code-container"><code class="language-javascript">// Returns true iff a is a Fermat witness to n, and thus n is // composite. a and n must satisfy the same conditions as in // testCompositenessByFermat. function hasFermatWitness(n, a) { return testCompositenessByFermat(n, a).isCompositeByFermat; } // Returns true iff a is an Arjuhov witness to n, and thus n is // composite. a and n must satisfy the same conditions as in // testCompositenessByArtjuhov. function hasArtjuhovWitness(n, a) { return testCompositenessByArtjuhov(n, a).isCompositeByArtjuhov; } // Returns true if n is probably prime, based on sampling the given // number of possible witnesses and testing them against n. If false // is returned, then n is definitely composite. // // By default, uses the Artjuhov test for witnesses with 20 samples // and Math.random for the random number generator. This gives an // error bound of 4^-20 if true is returned. function isProbablePrime(n, hasWitness, numSamples, rng) { n = SNat.cast(n); hasWitness = hasWitness || hasArtjuhovWitness; rng = rng || Math.random; numSamples = numSamples || 20; if (n.le(1)) { return false; } if (n.le(3)) { return true; } if (n.isEven()) { return false; } for (var i = 0; i &lt; numSamples; ++i) { var a = SNat.random(2, n.minus(2), rng); if (hasWitness(n, a)) { return false; } } return true; }</code></pre> </div> <p><code>isProbablePrime</code> called with <code>hasFermatWitness</code> is the <em>Fermat primality test</em>, and <code>isProbablePrime</code> called with <code>hasArtjuhovWitness</code> is the <em>Miller-Rabin primality test</em>. The latter is the current general primality test of choice, replacing the <a href="http://en.wikipedia.org/wiki/Solovay-Strassen">Solovay-Strassen primality test</a>.</p> <p>We can also use <code>isProbablePrime</code> to randomly generate probable primes, which is useful for <a href="http://en.wikipedia.org/wiki/RSA_(algorithm)#Key_generation">cryptographic applications</a>:</p> <pre class="code-container"><code class="language-javascript">// Returns a probable b-bit prime that is at least 2^b. All // parameters but b are passed to isProbablePrime. function findProbablePrime(b, hasWitness, rng, numSamples) { b = SNat.cast(b); var lb = (new SNat(2)).pow(b.minus(1)); var ub = lb.times(2); while (true) { var n = SNat.random(lb, ub); if (isProbablePrime(n, hasWitness, rng, numSamples)) { return n; } } }</code></pre> <p>In this case, for sufficiently large $$b$$, the Fermat primality test is acceptable, since Carmichael numbers are so rare and we're the ones generating the possible primes to be tested.<sup><a href="#fn3" id="r3"></a></sup></p> <p>There are other primality tests, but they're less often used in practice because they're either <a href="http://en.wikipedia.org/wiki/Solovay%E2%80%93Strassen_primality_test">less efficient</a> or <a href="http://www.pseudoprime.com/pseudo2.pdf">more sophisticated</a> than the algorithms above, or they require $$n$$ to have <a href="http://en.wikipedia.org/wiki/Lucas_primality_test">special</a> <a href="http://en.wikipedia.org/wiki/Proth%27s_theorem">properties</a>. Perhaps the most interesting of these tests is the <a href="http://en.wikipedia.org/wiki/Aks_primality_test"><em>AKS primality test</em></a>, which proved once and for all that primality testing is in P.</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> $$\gcd$$ is the <a href="http://en.wikipedia.org/wiki/Greatest_common_divisor">greatest common divisor</a> function. <a href="#r1">↩</a></p> <p id="fn2"> &ldquo;Artjuhov witness&rdquo; is an idiosyncratic name on my part; a more common name is <em>strong witness</em>, which I don't like. <a href="#r2">↩</a></p> <p id="fn3"> <a href="http://en.wikipedia.org/wiki/Fermat_primality_test#Applications">According to Wikipedia</a>, PGP uses the Fermat primality test. <a href="#r3">↩</a></p> </section> https://www.akalin.com/pair-counterexamples-vector-calculus A Pair of Counterexamples in Vector Calculus 2011-11-27T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> KaTeXMacros = { "\\sgn": "\\operatorname{sgn}", }; </script> <p>While recently reviewing some topics in vector calculus, I became curious as to why violating seemingly innocuous conditions for some theorems leads to surprisingly wild results. In fact, I was struck by how these theorems resemble computer programs, not in some <a href="http://en.wikipedia.org/wiki/Curry-Howard_Correspondence">abstract way</a>, but in how the lack of &ldquo;input validation&rdquo; leads to <a href="http://en.wikipedia.org/wiki/Undefined_behavior">non-obvious behavior</a> in the face of erroneous input.</p> <p>I found that understanding why these counterexamples lead to wild results deepened my understanding of the theorems involved and their proofs.<sup><a href="#fn1" id="r1"></a></sup> Besides, pathological examples are more interesting than well-behaved ones!</p> <p>First, let's look at a &ldquo;counterexample&rdquo; to <a href="http://en.wikipedia.org/wiki/Green%27s_theorem">Green's theorem</a>:</p> <p class="example">1. Two functions $$L, M \colon \mathbb{R}^2 \to \mathbb{R}$$ and a positively-oriented, piecewise smooth, simple closed curve $$C$$ in $$\mathbb{R}^2$$ enclosing the region $$D$$ such that $∮_C L \,dx + M \,dy \ne ∬_D \left( \frac{∂{M}}{∂{x}} - \frac{∂{L}}{∂{y}} \right) \,dx \,dy \text{.}$</p> <p>Let $L = -\frac{y}{x^2+y^2} \text{,} \quad M = \frac{x}{x^2+y^2} \text{,}$ and $$C$$ be a curve going clockwise around the rectangle $$D = [-1, 1]^2$$.<sup><a href="#fn2" id="r2"></a></sup> Then the integral of $$L \,dx + M \, dy$$ around $$C$$ is $$2 π$$ since it encloses the origin. But $\frac{∂{M}}{∂{x}} = \frac{∂{L}}{∂{y}} = \frac{y^2-x^2}{x^2+y^2}$ so the difference of the two vanishes everywhere but the origin, where neither function is defined. Therefore, the (improper) integral over $$D$$ also vanishes, proving the inequality. &#8718;</p> <p>Of course, the easy explanation is that the discontinuity of $$L$$ and $$M$$ at the origin violates a condition of Green's theorem. But that doesn't really tell us anything, so let's break down the theorem and see where exactly it fails.</p> <p>Green's theorem is usually proved first for rectangles $$[a, b] \times [c, d]$$, which suffices for our purpose. If $$C$$ is a curve that goes counter-clockwise around such a rectangle $$D$$, then we can easily show that $∮_C L \,dx = - ∬_D \frac{∂{L}}{∂{y}} \,dx \,dy$ and $∮_C M \,dy = ∬_D \frac{∂{M}}{∂{x}} \,dx \,dy \text{,}$ with the sum of these two formulas proving the theorem.</p> <p>So the first sign of trouble is that the theorem freely interchanges addition and integration. Since the partial derivatives of our functions diverge at the origin, if $$D$$ contains the origin then the integrals of those partial derivatives over $$D$$ may not even be defined, even if the integral of their difference is.</p> <p>But the problem arises even before that. The statements above are proved by showing $∮_C L \,dx = - ∫_a^b \left( ∫_c^d \frac{∂{L}}{∂{y}} \,dy \right) \,dx$ and $∮_C M \,dy = ∫_c^d \left( ∫_a^b \frac{∂{M}}{∂{x}} \,dx \right) \,dy \text{.}$ both of which hold for our example. But notice that in one case we integrate with respect to $$y$$ first, and in the other case we integrate with respect to $$x$$ first. Therefore, we have to interchange the order of integration or convert to a double integral in order to get them to a form where we can add them. And there's the rub: if $$D$$ contains the origin, switching the order of integration for either integral above switches the sign of the result!</p> <p>This fully explains the discrepancy; since the result of both integrals above (with the iteration order preserved) is $$π$$, adding them together as-is gives the expected result of $$2 π$$. But if we switch the iteration order of one of the iterated integrals as done in the proof of Green's theorem, then we switch the result of that integral to $$-π$$, which cancels with the result of the other unchanged integral to produce $$0$$.</p> <p>So now let's examine this strange behavior of the sign of an integration's result depending on the iteration order. This leads us to our next &ldquo;counterexample,&rdquo; this time for <a href="http://en.wikipedia.org/wiki/Fubini%27s_theorem">Fubini's theorem</a>:</p> <p class="example">2. A function $$f \colon \mathbb{R}^2 \to \mathbb{R}$$ whose iterated integrals over a rectangle $$D = [a, b] \times [c, d] \subset \mathbb{R}^2$$ differ.</p> <p>Let $f(x, y) = \frac{x^2-y^2}{(x^2+y^2)^2} \quad \text{ and } \quad D = [-1, 1]^2\text{.}$ The two iterated integrals of $$f$$ over $$D$$ are usually written as $∫_{-1}^1 \left( ∫_{-1}^1 f(x, y) \,dy \right) \,dx \qquad \text{ and } \qquad ∫_{-1}^1 \left( ∫_{-1}^1 f(x, y) \,dx \right) \,dy$ but let's define them more carefully to make it easier to justify our calculations.</p> <p>Let \begin{aligned} u_k &= y \mapsto f(k, y) \\ v_l &= x \mapsto f(x, l) \text{.} \end{aligned} In other words, given the real constants $$k$$ and $$l$$, construct the (possibly partial) real functions $$u_k(y)$$ and $$v_l(x)$$ by partially-evaluating $$f$$ at $$x = k$$ and $$y = l$$, respectively.</p> <p>Then, if we also let<sup><a href="#fn3" id="r3"></a></sup> $U(x) = ∫_{-1}^1 u_x(y) \,dy \qquad \text{ and } \qquad V(y) = ∫_{-1}^1 v_y(x) \,dx \text{,}$ we can write the iterated integrals as $∫_{-1}^1 U(x) \,dx \qquad \text{ and } \qquad ∫_{-1}^1 V(y) \,dy \text{.}$ </p> <p>Computing $$U(x)$$ for $$x ≠ 0$$, we get<sup><a href="#fn4" id="r4"></a></sup> \begin{aligned} U(x) &= ∫_{-1}^1 \frac{∂{}}{∂{y}} \left( -\frac{y}{x^2+y^2} \right) \,dy \\ &= \left. -\frac{y}{x^2+y^2} \right|_{y=-1}^{y=1} \\ &= -\frac{2}{x^2+1} \text{.} \end{aligned} </p> <p>Attempting to evaluate $$U(0)$$, we see that \begin{aligned} U(0) &= ∫_{-1}^1 \frac{0^2-y^2}{(0^2+y^2)^2} \,dy \\ &= - ∫_{-1}^1 \frac{dy}{y^2} \end{aligned} which diverges. So $U(x) = -\frac{2}{x^2+1} \text{ for } x \ne 0 \text{.}$ </p> <p> By a similar computation, we find that<sup><a href="#fn5" id="r5"></a></sup> $V(y) = \frac{2}{y^2+1} \text{ for } y \ne 0 \text{.}$ </p> <p>Since $$U(x)$$ isn't defined at $$0$$, we have to treat it as an improper integral, although doing so poses no real difficulty: \begin{aligned} ∫_{-1}^1 U(x)\,dx &= \lim_{a \nearrow 0} \left( ∫_{-1}^a -\frac{2}{x^2+1} \,dx \right) + \lim_{a \searrow 0} \left( ∫_{a}^1 -\frac{2}{x^2+1} \,dx \right) \\ &= \lim_{a \nearrow 0} \Bigl( \left. -2 \arctan(x) \right|_{-1}^{a} \Bigr) + \lim_{a \searrow 0} \Bigl( \left. -2 \arctan(x) \right|_{a}^{1} \Bigr) \\ &= \left. -2 \arctan(x) \right|_{-1}^{0} + \left. -2 \arctan(x) \right|_{0}^{1} \\ &= \left. -2 \arctan(x) \right|_{-1}^{1} \\ &= -π \text{.} \end{aligned} </p> <p>Similarly, $∫_{-1}^1 V(y)\,dy = π \text{,}$ so the iterated integrals of $$f(x, y)$$ over $$[-1, 1]^2$$ differ; in fact, as we claimed above, switching the iteration order switches the sign of the result. &#8718;</p> <p>We can repeat the above calculations for an arbitrary rectangle to see that the iterated integrals of $$f(x, y)$$ differ if $$D$$ contains the origin either as an interior point or a corner. But there's an easier way to prove that statement and also gain some insight as to why $$f(x, y)$$ has this strange property.</p> <p>Note that the key facts in the above calculations were that $$U(x) \lt 0$$ for any $$x \ne 0$$ and $$V(y) \gt 0$$ for any $$y \ne 0$$. Therefore, integrating $$U(x)$$ over any interval on the $$x$$-axis would produce a negative result and integrating $$V(x)$$ over any interval on the $$y$$-axis would produce a positive result, leading to the difference in iterated integrals. This holds more generally; for any $$m, n \gt 0$$: $∫_{-n}^n f(x, y) \,dy \lt 0 \qquad \text{ and } \qquad ∫_{-m}^m f(x, y) \,dx \gt 0 \text{.}$ Therefore, $∫_{-m}^m \left( ∫_{-n}^n f(x, y) \,dy \right) \,dx \lt 0 \qquad \text{ and } \qquad ∫_{-n}^n \left( ∫_{-m}^m f(x, y) \,dx \right) \,dy \gt 0$ so the iterated integrals of $$f(x, y)$$ differ over the rectangles $$[-m, m] \times [-n, n]$$. Since any rectangle $$D$$ containing the origin as an interior point must contain some smaller rectangle $$E = [-m, m] \times [-n, n]$$, the iterated integrals of $$f(x, y)$$ over $$E$$ differ and therefore must also differ over $$D$$.</p> <p>Furthermore, since $$f(x, y)$$ is even in both $$x$$ and $$y$$, you can carry out a similar argument to the above with intervals of the form $$[0, m]$$ or $$[-m, 0]$$ to show that the iterated integrals of $$f(x, y)$$ must also differ over any rectangle with the origin as a corner. </p> <p>So the essential property of $$f(x, y)$$ is that slicing it along the $$x$$-axis gives a function which has positive area under the curve on any interval symmetric around $$0$$ or with $$0$$ as an endpoint, and that slicing it similarly along the $$y$$-axis gives a function with has negative area. Therefore, on a rectangle symmetric around the origin or with the origin as a corner, one can choose the sign of the iterated integral by choosing which axis to slice first.</p> <p>The next thing to investigate is how exactly the iterated integrals of $$f(x, y)$$ over the rectangle $$D$$ are expressed such that they differ only when $$D$$ contains the origin, especially considering that the $$f(x, y)$$ is expressed in quite a simple form. To do that, let's consider the simple case of a rectangle $$D = [δ, 1] \times [ϵ, 1]$$ where we can vary $$δ$$ and $$ϵ$$ at will.</p> <p>Let \begin{aligned} I_{yx}(δ, ϵ) &= ∫_{δ}^1 \left( ∫_{ϵ}^1 f(x, y) \,dy \right) \,dx \\ I_{xy}(δ, ϵ) &= ∫_{ϵ}^1 \left( ∫_{δ}^1 f(x, y) \,dx \right) \,dy \text{.} \end{aligned} Then, for $$ϵ ≠ 0$$: \begin{aligned} I_{yx}(δ, ϵ) &= ∫_{δ}^1 \left( ∫_{ϵ}^1 \frac{y^2-x^2}{(x^2+y^2)^2} \,dy \right) \,dx \\ &= ∫_{δ}^1 \left( \left. -\frac{y}{x^2+y^2} \right|_{y=ϵ}^{y=1} \right) \,dx \\ &= ∫_{δ}^1 \Biggl( -\frac{1}{1+x^2} - \left( -\frac{ϵ}{ϵ^2+x^2} \right) \Biggr) \,dx \\ &= ∫_{δ}^1 \frac{dx/ϵ}{1+(x/ϵ)^2} - ∫_{δ}^1 \frac{dx}{1+x^2} \\ &= \arctan\left(\frac{1}{ϵ}\right) - \arctan\left(\frac{δ}{ϵ}\right) - \frac{π}{4} + \arctan(δ) \text{,} \end{aligned} and for $$ϵ = 0$$: $I_{yx}(δ, 0) = -\frac{π}{4} + \arctan(δ) \text{.}$ Similarly, for $$δ ≠ 0$$: \begin{aligned} I_{xy}(δ, ϵ) &= ∫_{ϵ}^1 \left( ∫_{δ}^1 \frac{y^2-x^2}{(x^2+y^2)^2} \,dx \right) \,dy \\ &= ∫_{ϵ}^1 \left( \left. \frac{x}{x^2+y^2} \right|_{x=δ}^{x=1} \right) \,dy \\ &= ∫_{ϵ}^1 \left( \frac{1}{1+y^2} - \frac{δ}{δ^2+x^2} \right) \,dy \\ &= ∫_{ϵ}^1 \frac{dy}{1+y^2} - ∫_{ϵ}^1 \frac{dy/δ}{1+(y/δ)^2} \\ &= \frac{π}{4} - \arctan(ϵ) - \arctan\left(\frac{1}{δ}\right) + \arctan\left(\frac{ϵ}{δ}\right) \text{,} \end{aligned} and for $$δ = 0$$: $I_{xy}(0, ϵ) = \frac{π}{4} - \arctan(ϵ) \text{.}$ Then let $$Δ = I_{xy} - I_{yx}$$ be the difference between the two iterated integrals. We can use the identity $\arctan(x) + \arctan\left(\frac{1}{x}\right) = \frac{π}{2} \sgn(x)$ to simplify $$Δ(δ, ϵ)$$ if neither $$δ$$ nor $$ϵ$$ is zero: \begin{aligned} Δ(δ, ϵ) &= \bigl( π/4 - \arctan(ϵ) - \arctan(1/δ) + \arctan(ϵ/δ) \bigr) \\ & \quad \mathbin{-} \bigl( \arctan(1/ϵ) - \arctan(δ/ϵ) - π/4 + \arctan(δ) \bigr) \\ &= π/2 - \bigl( \arctan(ϵ) + \arctan(1/ϵ) \bigr) \\ & \quad \mathbin{-} \bigl( \arctan(δ) + \arctan(1/δ) \bigr) \\ & \quad \mathbin{+} \bigl( \arctan(δ/ϵ) + \arctan(ϵ/δ) \bigr) \\ &= \frac{π}{2} \bigl( 1 - \sgn(ϵ) - \sgn(δ) + \sgn(δ/ϵ) \bigr) \text{.} \end{aligned} </p> <p> Using the properties of $$\sgn(x)$$, we can simplify this to the final expression: $Δ(δ, ϵ) = \frac{π}{2} \bigl( 1 - \sgn(δ) \bigr) \bigl( 1 - \sgn(ϵ) \bigr)$ which we can prove still holds if either $$δ$$ or $$ϵ$$ is zero (or both).</p> <p>So with the simplified expression for $$Δ(δ, ϵ)$$, it becomes apparent how $$\sgn(x)$$ is used to control the value of $$Δ(δ, ϵ)$$; as long as either $$δ$$ or $$ϵ$$ is positive, $$1 - \sgn(x)$$ zeroes out the entire expression.</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> There are actually <a href="http://amzn.com/048668735X">whole</a> <a href="http://amzn.com/0486428753">books</a> dedicated to counterexamples. They make good bathroom reading material. <a href="#r1">↩</a></p> <p id="fn2"> The vector field $$(L, M)$$ also serves as the canonical &ldquo;counterexample&rdquo; to the <a href="http://en.wikipedia.org/wiki/Gradient_theorem">gradient theorem</a>. <a href="#r2">↩</a></p> <p id="fn3"> $$U(x)$$ and $$V(y)$$ are also (partial) real functions. <a href="#r3">↩</a></p> <p id="fn4"> We're justified in applying standard integration techniques here since $$u_k(y)$$ for $$k \gt 0$$ is defined and bounded for all $$y$$. <a href="#r4">↩</a></p> <p id="fn5"> Note that $$U(x)$$ and $$V(y)$$ differ only in variable name and sign. <a href="#r5">↩</a></p> </section> https://www.akalin.com/evlis-tail-recursion Understanding Evlis Tail Recursion 2011-10-28T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <p>While reading about <a href="http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%25_sec_3.5">proper tail recursion</a> in Scheme, I encountered a similar but obscure optimization called <em>evlis tail recursion</em>. In <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8567&rep=rep1&type=pdf">the paper where it was first described</a>, the author claims it "dramatically improve the space performance of many programs," which sounded promising.</p> <p>However, the few places where its mentioned don't do much more than state its definition and claim its usefulness. Hopefully I can provide a more detailed analysis here.</p> <div class="p">Consider the straightforward factorial implementation in Scheme:<sup><a href="#fn1" id="r1"></a></sup> <pre class="code-container"><code class="language-lisp">(define (fact n) (if (&lt;= n 1) 1 (* n (fact (- n 1)))))</code></pre> It is not tail-recursive, since the recursive call is nested in another procedure call. However, it's <em>almost</em> tail-recursive; the call to <code>*</code> is a tail call, and the recursive call is its last subexpression, so it will be the last subexpression to be evaluated.</div> <p>Recall what happens when a procedure call (represented as a list of subexpressions) is evaluated: each subexpression is evaluated, and the first result (the procedure) is passed the other results as arguments.<sup><a href="#fn2" id="r2"></a></sup></p> <p>Evlis tail recursion can be described as follows: when performing a procedure call and during the evaluation of the last subexpression, the calling environment is discarded as soon as it is not required.<sup><a href="#fn3" id="r3"></a></sup> The distinction between evlis tail recursion and proper tail recursion is subtle. Proper tail recursion requires only that the calling environment be discarded before the actual procedure call; evlis tail recursion discards the calling environment even sooner, if possible.</p> <div class="p">An example will help to clarify things. Given <code>fact</code> as defined above, say you evaluate <code>(fact 10)</code> and you're in the procedure call with <code>n = 5</code>. The call stack of a properly tail-recursive interpreter would look like this: <style> pre.stack { margin-top: 1em; margin-bottom: 1em; } </style> <pre class="stack"> evalExpr -------- env = { n: 10 } -&gt; &lt;top-level environment&gt; expr = '(* n (fact (- n 1)))' proc = &lt;native function: *&gt; args = [10, &lt;pending evalExpr('(fact (- n 1))', env)&gt;] evalExpr -------- env = { n: 9 } -&gt; &lt;top-level environment&gt; expr = '(* n (fact (- n 1)))' proc = &lt;native function: *&gt; args = [9, &lt;pending evalExpr('(fact (- n 1))', env)&gt;] ... evalExpr -------- env = { n: 6 } -&gt; &lt;top-level environment&gt; expr = '(* n (fact (- n 1)))' proc = &lt;native function: *&gt; args = [6, &lt;pending evalExpr('(fact (- n 1))', env)&gt;] evalExpr -------- env = { n: 5 } -&gt; &lt;top-level environment&gt; expr = '(if ...)' </pre> whereas the call stack of an evlis tail-recursive interpreter would look like this: <pre class="stack"> evalExpr -------- env = { n: 5 } -&gt; &lt;top-level environment&gt; pendingProcedureCalls = [ [&lt;native function: *&gt;, 10], [&lt;native function: *&gt;, 9], ... [&lt;native function: *&gt;, 6] ] expr = (if ...) </pre> In this implementation, the last subexpression of a procedure call is evaluated exactly like a tail expression, but the procedure call and non-last subexpressions are pushed onto a stack. Whenever an expression is reduced to a simple one and the stack is non-empty, a pending procedure call with its other args are popped off, and it is called with the reduced expression as the final argument.</div> <p>Note that this didn't change the asymptotic behavior of the procedure; it still takes $$Θ(n)$$ memory to evaluate. However, only the bare minimum of information is saved: the list of pending functions and their arguments. Other auxiliary variables, and crucially the nested calling environments, aren't preserved, leading to a significant constant-factor reduction in memory.</p> <div class="p">This raises the question: Are there cases where evlis tail recursion leads to better asymptotic behavior? In fact, yes; consider the following (contrived) implementation of factorial<sup><a href="#fn4" id="r4"></a></sup>: <pre class="code-container"><code class="language-lisp">(define (fact2 n) (define v (make-vector n)) (* (n (fact2 (- n 1)))))</code></pre> Before the main body of the function, a vector of size $$n$$ is defined. This means that the environments in the call stack of a properly tail-recursive interpreter would look like this:<sup><a href="#fn5" id="r5"></a></sup> <pre class="stack"> env = { n: 10, v: &lt;vector of size 10&gt; } -&gt; &lt;top-level environment&gt; env = { n: 9, v: &lt;vector of size 9&gt; } -&gt; &lt;top-level environment&gt; env = { n: 8, v: &lt;vector of size 8&gt; } -&gt; &lt;top-level environment&gt; env = { n: 7, v: &lt;vector of size 7&gt; } -&gt; &lt;top-level environment&gt; ... </pre> whereas the an evlis tail-recursive interpreter would keep around only the current environment. Therefore, the properly tail-recursive interpreter would require $$Θ(n^2)$$ memory to evaluate <code>(fact2 n)</code> while the evlis tail-recursive interpreter would require only $$Θ(n)$$</div> <p>Studying examples like the one above enabled me to finally understand how evlin tail recursion worked and what sort of savings it gives. However, I have yet to find a practical example where evlis tail recursion gives the same sort of asymptotic gains as described above, and I'd be interested to receive some. But perhaps the "large gains" mentioned in the various papers describing it are only constant-factor reductions in memory.</p> <p>In any case, another important difference in Scheme between proper tail recursion and evlis tail recursion is that the former is a <em>language feature</em> and the latter is an <em>optimization</em>. That means that it is acceptable and even encouraged to write Scheme programs that take advantage of proper tail recursion, but it would be unwise to rely on evlis tail recursion for the asymptotic performance of your function. Instead, one should treat it just as a nice constant-factor speed gain.</p> <p>Note that it is easy to make evlis tail recursion "smarter." Since Scheme doesn't specify the order of argument evaluation, an interpreter could evaluate arguments to maximize the gains from evlis tail recursion. As an easy example, if we had switched the arguments to <code>+</code> in <code>fact</code> above, making it non-evlis-tail-recursive, a smart compiler could still treat it as such. A possible rule of thumb would be to pick a non-trivial function call to evaluate last.</p> <p>To complete the picture, I will outline below the evaluation function for a simple evlis tail-recursive Scheme interpreter in Javascript. All of the sources I've found describe it in terms of compilers, so I think it'll be useful to have a reference implementation for an interpreter.</p> <div class="p">Let's say we already have a properly tail-recursive interpreter:<sup><a href="#fn6" id="r6"></a></sup> <pre class="code-container"><code class="lang-javascript">function evalExpr(expr, env) { // Fake tail calls with a while loop and continue. while (true) { // Symbols, constants, quoted expressions, and lambdas. if (isSimpleExpr(expr)) { // The only exit point. return evalSimpleExpr(expr, env); } // (if test conseq alt) if (isSpecialForm(expr, 'if')) { expr = evalExpr(expr, env) ? expr : expr; continue; } // (set! var expr) if (isSpecialForm(expr, 'set!')) { env.set(expr, evalExpr(expr, env)); expr = null; continue; } // (define var expr?) if (isSpecialForm(expr, 'define')) { env.define(expr, evalExpr(expr, env)); expr = null; continue; } // (begin expr*) if (isSpecialForm(expr, 'begin')) { if (expr.length == 1) { expr = null; continue; } // Evaluate all but the last subexpression. for (var i = 1; i &lt; expr.length - 1; ++i) { evalExpr(expr[i], env); } expr = expr[expr.length - 1]; continue; } // (proc expr*) var proc = evalExpr(expr.shift(), env); var args = expr.map(function(subExpr) { return evalExpr(subExpr, env); }); // proc.run() returns its body in result.expr and the environment // in which to evaluate it (with all its arguments bound) in // result.env. var result = proc.run(args); expr = result.expr; // The only time when env is changed. env = result.env; continue; } }</code></pre> Then implementing evlis tail recursion requires only a few changes: <pre class="code-container"><code class="lang-javascript">function evalExpr(expr, env) { // This is a stack of procedures and their non-final arguments that // are waiting for their final argument to be evaluated. var pendingProcedureCalls = []; while (true) { if (isSimpleExpr(expr)) { expr = evalSimpleExpr(expr, env); // Discard calling environment. env = null; if (pendingProcedureCalls.length == 0) { // No pending procedure calls, so we're done (the only exit // point). return expr; } var args = pendingProcedureCalls.pop(); var proc = args.shift(); args.push(expr); var result = proc.run(args); expr = result.expr; // Change to new environment (the only time when env is // changed). env = result.env; continue; } ... // Everything else remains the same. ... // (proc expr*) var nonFinalSubExprs = exprs.slice(0, -1).map( function(subExpr) { return evalExpr(subExpr, env); }); pendingProcecureCalls.push(nonFinalSubExprs); // Evaluate the last subexpression as a tail call. expr = expr[expr.length - 1]; continue; } }</code></pre> </div> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> Assume a left-to-right evaluation order for now. <a href="#r1">↩</a></p> <p id="fn2"> The function that takes a list of expressions, evaluates them, and returns the results as a list is traditionally called <code>evlis</code>, hence the name of the optimization. <a href="#r2">↩</a></p> <p id="fn3"> This assumes that the calling environment isn't stored somewhere else. <a href="#r3">↩</a></p> <p id="fn4"> This was adapted from an example in <a href="ftp://ftp.ccs.neu.edu/pub/people/will/tail.pdf">Proper Tail Recursion and Space Efficiency</a>. <a href="#r4">↩</a></p> <p id="fn5"> Assume that the interpreter isn't smart enough to deduce that $$v$$ can be optimized out since it's never used. <a href="#r5">↩</a></p> <p id="fn6"> Adapted from Peter Norvig's excellent <a href="http://norvig.com/lispy.html"><code>lis.py</code></a>. <a href="#r6">↩</a></p> </section> https://www.akalin.com/elementary-gaussian-proof An Elementary Way to Calculate the Gaussian Integral 2011-01-06T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <p> While reading <a href="http://gowers.wordpress.com">Timothy Gowers's blog</a> I stumbled on <a href="http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-239">Scott Carnahan's comment</a> describing an elegant calculation of the Gaussian integral $∫_{-∞}^{∞} e^{-x^2} \, dx = \sqrt{π}\text{.}$ I was so struck by its elementary character that I imagined what it would be like written up, say, as an extra credit exercise in a single-variable calculus class: </p> <div class="exercise"> <span class="exercise">Exercise 1.</span> (<span class="exercise-name">The Gaussian integral</span>.) Let $F(t) = ∫_0^t e^{-x^2} \, dx \text{, }\qquad G(t) = ∫_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx \text{,}$ and $$H(t) = F(t)^2 + G(t)$$. <ol class="exercise-list"> <li>Calculate $$H(0)$$.</li> <li>Calculate and simplify $$H'(t)$$. What does this imply about $$H(t)$$?</li> <li>Use part&nbsp;b to calculate $$F(∞) = \displaystyle\lim_{t \to ∞} F(t)$$.</li> <li>Use part&nbsp;c to calculate $∫_{-∞}^{∞} e^{-x^2} \, dx\text{.}$</li> </ol> </div> <p> Although this is simpler than <a href="http://en.wikipedia.org/wiki/Gaussian_integral#Careful_proof">the usual calculation of the Gaussian integral</a>, for which careful reasoning is needed to justify the use of polar coordinates, it seems more like a <a href="http://en.wikipedia.org/wiki/Certificate_(complexity)">certificate</a> than an actual proof; you can convince yourself that the calculation is valid, but you gain no insight into the reasoning that led up to it.<sup><a href="#fn1" id="r1"></a></sup> </p> <p> Fortunately, <a href="http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-243">David Speyer's comment</a> solves the mystery; $$G(t)$$ falls out of doing the integration in Cartesian coordinates over a triangular region. Just for kicks, here's how I imagine an exercise based on this method would look like (this time for a multi-variable calculus class): </p> <div class="exercise"> <span class="exercise">Exercise 2.</span> (<span class="exercise-name">The Gaussian integral in Cartesian coordinates.</span>) Let $A(t) = ∬\limits_{\triangle_t} e^{-(x^2+y^2)} \, dx \, dy$ where $$\triangle_t$$ is the triangle with vertices $$(0, 0)$$, $$(t, 0)$$, and $$(t, t)$$. <!-- TODO(akalin): Draw a diagram for \triangle_t. --> <ol class="exercise-list"> <li>Use the substitution $$y = sx$$ to reduce $$A(t)$$ to a one-dimensional integral.</li> <li>Use part&nbsp;a to calculate $$A(∞) = \lim_{t \to ∞} A(t)$$.</li> <li>Use part&nbsp;b to calculate $∫_{-∞}^{∞} e^{-x^2} \, dx\text{.}$</li> <li>Let $F(t) = ∫_0^t e^{-x^2} \, dx \qquad\text{ and }\qquad G(t) = ∫_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx \text{.}$ Use part&nbsp;a to relate $$F(t)$$ to $$G(t)$$.</li> <li>Use part&nbsp;d to derive a proof of part&nbsp;c using only single-variable calculus.</li> </ol> </div> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> Similar to proving $$\sum\limits_{i=0}^n m^3 = \frac{n^2(n+1)^2}{4}$$ by induction. <a href="#r1">↩</a></p> </section> https://www.akalin.com/parallelizing-flac-encoding Parallelizing FLAC Encoding 2008-05-05T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <style type="text/css" media="all"> /*<![CDATA[*/ table.benchmark-results, table.benchmark-results tr, table.benchmark-results th { border: 1px solid black; } table.benchmark-results { font-family: "Arial", "Helvetica", sans-serif; } table.benchmark-results th, table.benchmark-results td { padding: .2em .4em; } /*]]>*/ </style> <p>One thing I noticed ever since getting a multi-core system was that the reference FLAC encoder is not multi-threaded. This isn't a huge problem for most people as you can simply encode multiple files at the same time but I usually rip my audio CDs into a single audio file with a cue sheet instead of separate track files and so I am usually encoding a single large audio file instead of multiple smaller ones. Even so, encoding a CD-length audio file takes under a minute but I thought it would be a fun and useful weekend project to see if I could parallelize the simpler <a href="http://flac.cvs.sourceforge.net/flac/flac/examples/c/encode/file/main.c?revision=1.2&amp;view=markup">example encoder</a>. The <a href="http://flac.sourceforge.net/format.html">format specification</a> indicates that input blocks are encoded independently which makes the problem <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel">embarassingly parallel</a> and trawling through the <a href="http://www.mail-archive.com/flac-dev@xiph.org/msg00724.html">FLAC mailing lists</a> reveals that no one has had the time nor the inclination to look into it.</p> <p>However, I was able to write a multithreaded FLAC encoder that achieves near-linear speedup with only minor hacks to the libFLAC API. Here are some encode times on an 8-core 2.8 GHz Xeon 5400 for a 636 MB wave file (some caveats are discussed below):</p> <table class="benchmark-results"> <tr> <th>baseline</th><td>34.906s</td> </tr> <tr> <th>1 threads</th><td>31.424s</td> </tr> <tr> <th>2 threads</th><td>16.936s</td> </tr> <tr> <th>4 threads</th><td>10.173s</td> </tr> <tr> <th>8 threads</th><td>6.808s</td> </tr> </table> <p>I took the simple approach of sharding the input file into <var>n</var> roughly equal pieces and passing them to <var>n</var> encoder threads, assembling the output file from the <var>n</var> output buffers. In general this is not a good way of partitioning the workload as time is wasted if one shard takes significantly more time to process but for my use case this isn't a significant problem.</p> <div class="p">The best way to share the input file among the encoding threads is to map it into memory. In fact, memory-mapped file I/O has so many advantages in general that I'm surprised at how little I see it used, although it does have the disadvantage of requiring a bit more bookkeeping. Here is how I use it in my multithreaded encoder (slightly paraphrased): <pre class="code-container"><code class="language-cpp">#include &lt;fcntl.h&gt; /* open() */ #include &lt;sys/mman.h&gt; /* mmap()/munmap() */ #include &lt;sys/stat.h&gt; /* stat() */ #include &lt;unistd.h&gt; /* close() */ int main(int argc, char *argv[]) { int fdin; struct stat buf; char *bufin; fdin = open(argv, O_RDONLY); fstat(fdin, &buf); bufin = mmap(NULL, buf.st_size, PROT_READ, MAP_SHARED, fdin, 0); /* The input file (passed in via argv) is now mapped read-only to the memory region in bufin up to bufin + buf.st_size. */ /* Note that you can work directly with the mapped input file instead of fread()ing the header into a buffer. */ if((buf.st_size &lt; WAV_HEADER_SIZE) || memcmp(bufin, "RIFF", 4) || memcmp(bufin+8, "WAVEfmt \020\000\000\000\001\000\002\000", 16) || memcmp(bufin+32, "\004\000\020\000data", 8)) { /* Invalid input file: print error and exit. */ } for (i = 0; i &lt; num_threads; ++i) { shard_infos[i].bufin = bufin + WAV_HEADER_SIZE + i * bytes_per_thread; /* bufsize for the last thread may be slightly larger. */ shard_infos[i].bufsize = bytes_per_thread; } /* Spawn encode threads (which calls encode_shard() below) passing an element of shard_infos to each. */ ... munmap(bufin, buf.st_size); close(fdin); } FLAC__bool encode_shard(struct shard_info *shard_info) { FLAC__StreamEncoder *encoder = FLAC__stream_encoder_new(); ... /* The input file is paged in lazily as this function accesses bufin from shard_info->bufin. */ FLAC__stream_encoder_process_interleaved(encoder, shard_info->bufin, shard_info->bufsize); ... FLAC__stream_encoder_delete(encoder); }</code></pre> However, handling the output file is a bit trickier. Since the encoded FLAC data output by the threads vary in size we have to wait until all encoding threads are done before we know the right offsets to write the output data. A convenient and fast way to handle this is to use asynchronous I/O; we know where to write the output data for a shard as soon as the encoding thread for all previous shards finish so we simply wait for the encoding threads in shard order and queue up a write request after each thread finishes. Here I use the POSIX asynchronous I/O API in my multithreaded encoder (again, slightly paraphrased): <pre class="code-container"><code class="language-cpp">#include &lt;aio.h&gt; /* aio_*() */ #include &lt;pthread.h&gt; /* pthread_*() */ #include &lt;string.h&gt; /* memset() */ int main(int argc, char *argv[]) { int fdout; pthread_t threads[MAX_THREADS]; struct aiocb aiocbs[MAX_THREADS]; unsigned long byte_offset = 0; /* mmap input file in. */ ... fdout = open(argv, O_WRONLY | O_CREAT | O_TRUNC); /* Spawn encode threads passing an element of shard_infos to each. */ ... /* Wait for each thread in sequence and queue up output writes. */ /* We need to zero out any aiocb struct that we use before we fill in any members. */ memset(aiocbs, 0, num_threads * sizeof(*aiocbs)); for (i = 0; i &lt; num_threads; ++i) { pthread_join(threads[i], NULL); aiocbs[i].aio_buf = shard_infos[i].bufout; aiocbs[i].aio_nbytes = shards_infos[i].bytes_written; aiocbs[i].aio_offset = byte_offset; aiocbs[i].aio_fildes = fdout; aio_write(&aiocbs[i]); byte_offset += shard_infos[i].bytes_written; } /* Wait for all output writes to finish. */ for (i = 0; i &lt; num_threads; ++i) { const struct aiocb *aiocbp = &aiocbs[i]; aio_suspend(&aiocbp, 1, NULL); aio_return(&aiocbs[i]); } close(fdout); }</code></pre> </div> <p>The POSIX API is a bit unwieldy for this use case; ideally, there would be a version of <code>aio_suspend()</code> that would suspend the calling process until <em>all</em> of the specified requests have completed. As it is now the simplest way is to loop through the requests as above, especially since the maximum number of simultaneous asynchronous I/O requests is usually quite small (16 on my system).</p> <p>Also, I found that the OS X implementation of <code>aio_write()</code> did not obey this part of the specified behavior:</p> <blockquote> <pre> If O_APPEND is set for aiocbp->aio_fildes, aio_write() operations append to the file in the same order as the calls were made. If O_APPEND is not set for the file descriptor, the write operation will occur at the abso- lute position from the beginning of the file plus aiocbp->aio_offset.</pre> </blockquote> <p>but it was just as easy (and clearer) to explicitly set the correct offset.</p> <p>I had to hack up libFLAC a bit to implement my multithreaded encoder. I exposed the <code>update_metadata_()</code> to make it easy to write the correct number of total samples in the metadata field and also to zero out the min/max framesize fields. I also exposed the <code>FLAC__stream_encoder_set_do_md5()</code> function (which it should have been in the first place) so that I can turn off the writing of md5 field in the metadata. Finally, I added the function <code>FLAC__stream_encoder_set_current_frame_number()</code> so that the correct frame numbers are written at encode time.</p> <p>For comparison purposes I turn off md5 calculation in my multithreaded encoder as well as the baseline one. Since calling <code>FLAC__stream_encoder_set_current_frame_number()</code> causes crashes with vericiation turned on I also turn that off. The numbers above reflect that so they're underestimates of how a production multithreaded encoder would perform. However, the essential behavior of the program shouldn't change much.</p> <p><a href="/parallelizing-flac-encoding-files/patch-libFLAC.in">Here</a> is a patch file for the <a href="http://downloads.sourceforge.net/flac/flac-1.2.1.tar.gz?modtime=1189961849&amp;big_mirror=0">flac 1.2.1 source</a> that implements the hacks I described above. <a href="/parallelizing-flac-encoding-files/mt_encode.c">Here</a> is the source for my multithreaded FLAC encoder. I've tested it with <code>i686-apple-darwin9-gcc-4.0.1</code> and <code>i686-apple-darwin9-gcc-4.2.1</code> on Mac OS X. I got the above numbers compiling <code>mt_encode.c</code> with gcc 4.2.1 and the switches <code>-Wall -Werror -g -O2 -ansi</code>.</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> https://www.akalin.com/bfpp bfpp 2008-04-23T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <div class="p">Okay, I lied; you can't <em>really</em> embed <a href="http://www.muppetlabs.com/~breadbox/bf/">brainfuck</a> in C++ but you can get pretty close. Here is an example: <pre class="code-container"><code class="language-cpp">#include "bfpp.h" int main() { // Prints out factorial numbers in sequence. Adapted from // http://www.hevanet.com/cristofd/brainfuck/factorial.b . bfppend_bfpp }</code></pre> I call this variant &ldquo;bfpp&rdquo; as it has some pretty significant differences from brainfuck. First of all, some commands had to be adapted; although <code>+</code> and <code>-</code> remain the same, <ul> <li><code>&lt;</code> and <code>&gt;</code> were changed to <code>&amp;</code> and <code>*</code>,</li> <li><code>.</code> and <code>,</code> were changed to <code>!</code> and <code>~</code> (mnemonic: <code>!</code> contains <code>.</code> within it and <code>~</code> is kind of like a sideways <code>,</code>),</li> <li>and <code>[</code> and <code>]</code> were changed to <code>--</code> and <code>++</code> (mnemonic: <code>[</code> and <code>]</code> are the most complex brainfuck commands [to implement, at least] and so deserve to be mapped to the wider and more prominent operators).</li> </ul> This magic is made possible by the fact that brainfuck has exactly eight commands and C++ has exactly eight overloadable symbolic unary operators. Add some macros to hide the C++ scaffolding behind some delimiters and you have a convincing illusion of an embedded language.</div> <p><a href="/bfpp-files/bfpp.h">bfpp.h</a> implements a simple (&lt;100 lines) bfpp interpreter and the magic described above, and <a href="/bfpp-files/bf2bfpp.c">bf2bfpp.c</a> is a straightforward translator from brainfuck to bfpp. Gotta love C++!</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> https://www.akalin.com/longest-palindrome-linear-time Finding the Longest Palindromic Substring in Linear Time 2007-11-28T00:00:00-08:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <style type="text/css" media="all"> /*<![CDATA[*/ span.palind { color: red; } /*]]>*/ </style> <script> function trackOutboundLink(url) { ga('send', 'event', 'outbound', 'click', url, { 'hitCallback': function() { document.location = url; } }); } </script> <p>Another <a href="http://www.reddit.com/r/programming/comments/2dykz/finding_palindromes_repairing_endos_dna_and_the/" onclick="trackOutboundLink('http://programming.reddit.com/info/2dykz/comments/c2e7r0'); return false;">interesting problem</a> I stumbled across on reddit is finding the longest substring of a given string that is a palindrome. I found <a href="http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html" onclick="trackOutboundLink('http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html'); return false;">the explanation on Johan Jeuring's blog</a> somewhat confusing and I had to spend some time poring over the Haskell code (eventually rewriting it in Python) and walking through examples before it "clicked." I haven't found any other explanations of the same approach so hopefully my explanation below will help the next person who is curious about this problem.</p> <p>Of course, the most naive solution would be to exhaustively examine all $$n \choose 2$$ substrings of the given $$n$$-length string, test each one if it's a palindrome, and keep track of the longest one seen so far. This has complexity $$O(n^3)$$, but we can easily do better by realizing that a palindrome is centered on either a letter (for odd-length palindromes) or a space between letters (for even-length palindromes). Therefore we can examine all $$2n + 1$$ possible centers and find the longest palindrome for that center, keeping track of the overall longest palindrome. This has complexity $$O(n^2)$$.</p> <div class="p">It is not immediately clear that we can do better but if we're told that an $$Θ(n)$$ algorithm exists we can infer that the algorithm is most likely structured as an iteration through all possible centers. As an off-the-cuff first attempt, we can adapt the above algorithm by keeping track of the current center and expanding until we find the longest palindrome around that center, in which case we then consider the last letter (or space) of that palindrome as the new center. The algorithm (which isn't correct) looks like this informally: <ol type="1"> <li>Set the current center to the first letter.</li> <li>Loop while the current center is valid: <ol type="a"> <li>Expand to the left and right simultaneously until we find the largest palindrome around this center.</li> <li>If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.</li> <li>Set the space following the current palindrome as the current center unless the two letters immediately surrounding it are different, in which case set the last letter of the current palindrome as the current center.</li> </ol> </li> <li>Return the stored maximum palindrome.</li> </ol> </div> <p>This seems to work but it doesn't handle all cases: consider the string "abababa". The first non-trivial palindrome we see is "<span class="palind">a</span>|bababa", followed by "<span class="palind">aba</span>|baba". Considering the current space as the center doesn't get us anywhere but considering the preceding letter (the second 'a') as the center, we can expand to get "<span class="palind">ababa</span>|ba". From this state, considering the current space again doesn't get us anywhere but considering the preceding letter as the center, we can expand to get "ab<span class="palind">ababa</span>|". However, this is incorrect as the longest palindrome is actually the entire string! We can remedy this case by changing the algorithm to try and set the new center to be one before the end of the last palindrome, but it is clear that having a fixed "lookbehind" doesn't solve the general case and anything more than that will probably bump us back up to quadratic time.</p> <div class="p">The key question is this: given the state from the example above, "<span class="palind">ababa</span>|ba", what makes the second 'b' so special that it should be the new center? To use another example, in "<span class="palind">abcbabcba</span>|bcba", what makes the second 'c' so special that it should be the new center? Hopefully, the answer to this question will lead to the answer to the more important question: once we stop expanding the palindrome around the current center, how do we pick the next center? To answer the first question, first notice that the current palindromes in the above examples themselves contain smaller non-trivial palindromes: "ababa" contains "aba" and "abcbabcba" contains "abcba" which also contains "bcb". Then, notice that if we expand around the "special" letters, we get a palindrome which shares a right edge with the current palindrome; that is, <em>the longest palindrome around the special letters are proper suffixes of the current palindrome</em>. With a little thought, we can then answer the second question: <em>to pick the next center, take the center of the longest palindromic proper suffix of the current palindrome</em>. Our algorithm then looks like this: <ol type="1"> <li>Set the current center to the first letter.</li> <li>Loop while the current center is valid: <ol type="a"> <li>Expand to the left and right simultaneously until we find the largest palindrome around this center.</li> <li>If the current palindrome is bigger than the stored maximum one, store the current one as the maximum one.</li> <li>Find the maximal palindromic proper suffix of the current palindrome.</li> <li>Set the center of the suffix from c as the current center and start expanding from the suffix as it is palindromic.</li> </ol> </li> <li>Return the stored maximum palindrome.</li> </ol> </div> <p>However, unless step 2c can be done efficiently, it will cause the algorithm to be superlinear. Doing step 2c efficiently seems impossible since we have to examine the entire current palindrome to find the longest palindromic suffix unless we somehow keep track of extra state as we progress through the input string. Notice that the longest palindromic suffix would by definition also be a palindrome of the input string so it might suffice to keep track of every palindrome that we see as we move through the string and hopefully, by the time we finish expanding around a given center, we would know where all the palindromes with centers lying to the left of the current one are. However, if the longest palindromic suffix has a center to the right of the current center, we would not know about it. But we also have at our disposal the very useful fact that <em>a palindromic proper suffix of a palindrome has a corresponding dual palindromic proper prefix</em>. For example, in one of our examples above, "abcbabcba", notice that "abcba" appears twice: once as a prefix and once as a suffix. Therefore, while we wouldn't know about all the palindromic suffixes of our current palindrome, we would know about either it or its dual.</p> <p>Another crucial realization is the fact that we don't have to keep track of all the palindromes we've seen. To use the example "abcbabcba" again, we don't really care about "bcb" that much, since it's already contained in the palindrome "abcba". In fact, we only really care about keeping track of the longest palindromes for a given center or equivalently, the length of the longest palindrome for a given center. But this is simply a more general version of our original problem, which is to find the longest palindrome around <em>any</em> center! Thus, if we can keep track of this state efficiently, maybe by taking advantage of the properties of palindromes, we don't have to keep track of the maximal palindrome and can instead figure it out at the very end.</p> <p>Unfortunately, we seem to be back where we started; the second naive algorithm that we have is simply to loop through all possible centers and for each one find the longest palindrome around that center. But our discussion has led us to a different incremental formulation: given a current center, the longest palindrome around that center, and a list of the lengths of the longest palindromes around the centers to the left of the current center, can we figure out the new center to consider and extend the list of longest palindrome lengths up to that center efficiently? For example, if we have the state:</p> <p>&lt;"ab<span class="palind">a</span>ba|??", [0, 1, 0, 3, 0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]&gt;</p> <p>where the highlighted letter is the current center, the vertical line is our current position, the question marks represent unread characters or unknown quantities, and the array represents the list of longest palindrome lengths by center, can we get to the state:</p> <p>&lt;"aba<span class="palind">b</span>a|??", [0, 1, 0, 3, 0, 5, 0, ?, ?, ?, ?, ?, ?, ?, ?]&gt;</p> <p>and then to:</p> <p>&lt;"aba<span class="palind">b</span>aba|", [0, 1, 0, 3, 0, 5, 0, 7, 0, 5, 0, 3, 0, 1, 0]&gt;</p> <p>efficiently? The crucial thing to notice is that the longest palindrome lengths array (we'll call it simply the lengths array) in the final state is palindromic since the original string is palindromic. In fact, the lengths array obeys a more general property: <em>the longest palindrome <var>d</var> places to the right of the current center (the <var>d</var>-right palindrome) is at least as long as the longest palindrome d places to the left of the current center (the <var>d</var>-left palindrome) if the <var>d</var>-left palindrome is completely contained in the longest palindrome around the current center (the center palindrome), and it is of equal length if the <var>d</var>-left palindrome is not a prefix of the center palindrome or if the center palindrome is a suffix of the entire string</em>. This then implies that we can more or less fill in the values to the right of the current center from the values to the left of the current center. For example, from [0, 1, 0, 3, 0, 5, ?, ?, ?, ?, ?, ?, ?, ?, ?] we can get to [0, 1, 0, 3, 0, 5, 0, &ge;3?, 0, &ge;1?, 0, ?, ?, ?, ?]. This also implies that the first unknown entry (in this case, &ge;3?) should be the new center because it means that the center palindrome is not a suffix of the input string (i.e., we're not done) and that the <var>d</var>-left palindrome is a prefix of the center palindrome.</p> <div class="p">From these observations we can construct our final algorithm which returns the lengths array, and from which it is easy to find the longest palindromic substring: <ol type="1"> <li>Initialize the lengths array to the number of possible centers.</li> <li>Set the current center to the first center.</li> <li>Loop while the current center is valid: <ol type="a"> <li>Expand to the left and right simultaneously until we find the largest palindrome around this center.</li> <li>Fill in the appropriate entry in the longest palindrome lengths array.</li> <li>Iterate through the longest palindrome lengths array backwards and fill in the corresponding values to the right of the entry for the current center until an unknown value (as described above) is encountered.</li> <li>set the new center to the index of this unknown value.</li> </ol> </li> <li>Return the lengths array.</li> </ol> </div> <p>Note that at each step of the algorithm we're either incrementing our current position in the input string or filling in an entry in the lengths array. Since the lengths array has size linear in the size of the input array, the algorithm has worst-case linear running time. Since given the lengths array we can find and return the longest palindromic substring in linear time, a linear-time algorithm to find the longest palindromic substring is the composition of these two operations.</p> <div class="p">Here is Python code that implements the above algorithm (although it is closer to Johan Jeuring's Haskell implementation than to the above description): <pre class="code-container"><code class="language-python">def fastLongestPalindromes(seq): """ Behaves identically to naiveLongestPalindrome (see below), but runs in linear time. """ seqLen = len(seq) l = [] i = 0 palLen = 0 # Loop invariant: seq[(i - palLen):i] is a palindrome. # Loop invariant: len(l) &gt;= 2 * i - palLen. The code path that # increments palLen skips the l-filling inner-loop. # Loop invariant: len(l) &lt; 2 * i + 1. Any code path that # increments i past seqLen - 1 exits the loop early and so skips # the l-filling inner loop. while i &lt; seqLen: # First, see if we can extend the current palindrome. Note # that the center of the palindrome remains fixed. if i &gt; palLen and seq[i - palLen - 1] == seq[i]: palLen += 2 i += 1 continue # The current palindrome is as large as it gets, so we append # it. l.append(palLen) # Now to make further progress, we look for a smaller # palindrome sharing the right edge with the current # palindrome. If we find one, we can try to expand it and see # where that takes us. At the same time, we can fill the # values for l that we neglected during the loop above. We # make use of our knowledge of the length of the previous # palindrome (palLen) and the fact that the values of l for # positions on the right half of the palindrome are closely # related to the values of the corresponding positions on the # left half of the palindrome. # Traverse backwards starting from the second-to-last index up # to the edge of the last palindrome. s = len(l) - 2 e = s - palLen for j in xrange(s, e, -1): # d is the value l[j] must have in order for the # palindrome centered there to share the left edge with # the last palindrome. (Drawing it out is helpful to # understanding why the - 1 is there.) d = j - e - 1 # We check to see if the palindrome at l[j] shares a left # edge with the last palindrome. If so, the corresponding # palindrome on the right half must share the right edge # with the last palindrome, and so we have a new value for # palLen. # # An exercise for the reader: in this place in the code you # might think that you can replace the == with &gt;= to improve # performance. This does not change the correctness of the # algorithm but it does hurt performance, contrary to # expectations. Why? if l[j] == d: palLen = d # We actually want to go to the beginning of the outer # loop, but Python doesn't have loop labels. Instead, # we use an else block corresponding to the inner # loop, which gets executed only when the for loop # exits normally (i.e., not via break). break # Otherwise, we just copy the value over to the right # side. We have to bound l[i] because palindromes on the # left side could extend past the left edge of the last # palindrome, whereas their counterparts won't extend past # the right edge. l.append(min(d, l[j])) else: # This code is executed in two cases: when the for loop # isn't taken at all (palLen == 0) or the inner loop was # unable to find a palindrome sharing the left edge with # the last palindrome. In either case, we're free to # consider the palindrome centered at seq[i]. palLen = 1 i += 1 # We know from the loop invariant that len(l) &lt; 2 * seqLen + 1, so # we must fill in the remaining values of l. # Obviously, the last palindrome we're looking at can't grow any # more. l.append(palLen) # Traverse backwards starting from the second-to-last index up # until we get l to size 2 * seqLen + 1. We can deduce from the # loop invariants we have enough elements. lLen = len(l) s = lLen - 2 e = s - (2 * seqLen + 1 - lLen) for i in xrange(s, e, -1): # The d here uses the same formula as the d in the inner loop # above. (Computes distance to left edge of the last # palindrome.) d = i - e - 1 # We bound l[i] with min for the same reason as in the inner # loop above. l.append(min(d, l[i])) return l</code></pre> And here is a naive quadratic version for comparison: <pre class="code-container"><code class="language-python">def naiveLongestPalindromes(seq): """ Given a sequence seq, returns a list l such that l[2 * i + 1] holds the length of the longest palindrome centered at seq[i] (which must be odd), l[2 * i] holds the length of the longest palindrome centered between seq[i - 1] and seq[i] (which must be even), and l[2 * len(seq)] holds the length of the longest palindrome centered past the last element of seq (which must be 0, as is l). The actual palindrome for l[i] is seq[s:(s + l[i])] where s is i // 2 - l[i] // 2. (// is integer division.) Example: naiveLongestPalindrome('ababa') -> [0, 1, 0, 3, 0, 5, 0, 3, 0, 1] Runs in quadratic time. """ seqLen = len(seq) lLen = 2 * seqLen + 1 l = [] for i in xrange(lLen): # If i is even (i.e., we're on a space), this will produce e # == s. Otherwise, we're on an element and e == s + 1, as a # single letter is trivially a palindrome. s = i / 2 e = s + i % 2 # Loop invariant: seq[s:e] is a palindrome. while s &gt; 0 and e &lt; seqLen and seq[s - 1] == seq[e]: s -= 1 e += 1 l.append(e - s) return l</code></pre> Note that this is not the only efficient solution to this problem; building a suffix tree is linear in the length of the input string and you can use one to solve this problem but as Johan also mentions, that is a much less direct and efficient solution compared to this one.</div> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> https://www.akalin.com/number-theory-haskell-foray A Foray into Number Theory with Haskell 2007-07-06T00:00:00-07:00 Fred Akalin https://www.akalin.com/ © Fred Akalin 2005–2021. All rights reserved. <script> // See https://github.com/Khan/KaTeX/issues/85 . KaTeXMacros = { "\\cfrac": "\\dfrac{#1}{#2}\\kern-1.2pt", }; </script> <div class="p">I encountered <a href="http://programming.reddit.com/info/216p9/comments">an interesting problem</a> on reddit a few days ago which can be paraphrased as follows: <blockquote><p>Find a perfect square $$s$$ such that $$1597s + 1$$ is also perfect square.</p></blockquote> </div> <p>After reading the discussion about implementing a brute-force algorithm to solve the problem and spending a futile half-hour or so trying my hand at find a better way, someone noticed that the problem was an instance of <a href="http://en.wikipedia.org/wiki/Pell%27s_equation">Pell's equation</a> which is known to have an elegant and fast solution; indeed, he posted a <a href="http://programming.reddit.com/info/216p9/comments/c21dpn">one-liner in Mathematica</a> solving the given problem. However, I wanted to try coding up the solution myself as the Mathematica solution, while succinct, isn't very enlightening since the heavy lifting is already done by a built-in function and an arbitrary constant was used for this particular instance of Pell's equation.</p> <p>Pell's equation is simply the <a href="http://en.wikipedia.org/wiki/Diophantine_equation">Diophantine equation</a> $$x^2 - dy^2 = 1$$ for a given $$d$$<sup><a href="#fn1" id="r1"></a></sup>; being Diophantine means that all variables involved take on only integer values. (In our original problem, $$d$$ is 1597 and we are asked for $$y^2$$.) The solution involves finding the <em>continued fraction expansion</em> of $$\sqrt{d}$$, finding the first <em>convergent</em> of the expansion that satisfies Pell's equation, and then generating all other solutions from that <em>fundamental solution</em>. We rule out the trivial solution $$x = 1$$, $$y = 0$$ which also implies that if $$d$$ is a perfect square then there is no solution.</p> <p>A continued fraction is an expression of the form: $x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cfrac{1}{\ddots\,}}}}$ where all $$a_i$$ are integers and all but the first one are positive. The standard math notation for continued fractions is quite unwieldy so from now on we'll use $$\left \langle a_0; a_1, a_2, \dotsc \right \rangle$$ instead of the above.</p> <div class="p">The theory of continued fractions is a rich and beautiful one but for now we'll just state a few facts: <ul> <li>The continued fraction expansion of a number is (mostly) unique.</li> <li>The continued fraction expansion of a rational number is finite.</li> <li>The continued fraction expansion of a irrational number is infinite.</li> <li>A <a href="http://en.wikipedia.org/wiki/Quadratic_surd">quadratic surd</a> is a number of the form $$\frac{a + \sqrt{b}}{c}$$ where $$a$$, $$b$$, and $$c$$ are integers. Except maybe for the first term, the continued fraction expansion of a quadratic surd is periodic; that is, it repeats forever after a certain number of terms. This applies in particular to the square root of an integer.</li> <li>Truncating an infinite continued fraction to get a finite continued fraction gives (in some sense) an optimal rational approximation to the irrational number represented by the infinite continued fraction.</li> </ul> </div> <div class="p">Given a quadratic surd it is fairly easy to manipulate it into the form $$a + \frac{1}{q}$$ where $$q$$ is another quadratic surd. This fact can be used to come up with an algorithm to find the continued fraction expansion of a square root. Wikipedia <a href="http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Continued_fraction_expansion">explains it pretty well</a> so I won't go over it, but here is my Haskell implementation: <pre class="code-container"><code class="language-haskell">sqrt_continued_fraction n = [ a_i | (_, _, a_i) &lt;- mdas ] where mdas = iterate get_next_triplet (m_0, d_0, a_0) m_0 = 0 d_0 = 1 a_0 = truncate $sqrt$ fromIntegral n get_next_triplet (m_i, d_i, a_i) = (m_j, d_j, a_j) where m_j = d_i * a_i - m_i d_j = (n - m_j * m_j) div d_i a_j = (a_0 + m_j) div d_j</code></pre> and here are some examples: <pre class="code-container"><code class="language-shell">Prelude Main> take 20 $sqrt_continued_fraction 2 [1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2] Prelude Main> take 20$ sqrt_continued_fraction 103 [10,6,1,2,1,1,9,1,1,2,1,6,20,6,1,2,1,1,9,1] Prelude Main> take 20 $sqrt_continued_fraction 36 [6,*** Exception: divide by zero</code></pre> </div> <p>(Note that we're assuming that we won't be called with a perfect square. Also, do you notice anything interesting about the periodic portion of the continued fractions, particularly of $$\sqrt{103}$$?)</p> <div class="p">For those who are unfamiliar with Haskell, here's a quick list of key facts: <ul> <li>The first line takes a list of triplets and forms a list of all third elements, which is what we're interested in. (The other two elements of the triplet are auxiliary variables used by the algorithm.)</li> <li><code>iterate</code> is a function which takes in another function <code>f</code>, an initial variable <code>x</code>, and returns the infinite list <code>[ x, f(x), f(f(x)), f(f(f(x))), ... ]</code>.</li> <li>Note that Haskell uses <a href="http://en.wikipedia.org/wiki/Lazy_evaluation">lazy evaluation</a> and so this function does not take an infinite amount of time to run; all its elements are evaluated (and memoized) only when needed.</li> <li>The rest of the function is a straightforward representation of the meat of the algorithm described in the above Wikipedia entry.</li> </ul> </div> <p>It may not be clear what $$\sqrt{d}$$ and its continued fraction expansion has to do with solving Pell's equation. However, notice that if $$x$$ and $$y$$ solve Pell's equation then manipulating Pell's equation to get $$\sqrt{d}$$ on one side reveals that $$\frac{x}{y}$$ is a good approximation of $$\sqrt{n}$$. In fact, it is so good that you can prove that $$\frac{x}{y}$$ <em>must</em> come from truncating the continued fraction expansion of $$\sqrt{d}$$.</p> <p>This leads us to the following: if you have an infinite continued fraction $$\left \langle a_0; a_1, a_2, \dotsc \right \rangle$$ you can truncate it into a finite continued fraction $$\left \langle a_0; a_1, a_2, \dotsc, a_i \right \rangle$$ and simplify it into the rational number $$\frac{p_i}{q_i}$$. The sequence $$\frac{p_0}{q_0}, \frac{p_1}{q_1}, \frac{p_2}{q_2}, \dotsc$$ forms the <a href="http://en.wikipedia.org/wiki/Convergent_%28continued_fraction%29"><em>convergents</em></a> of $$\left \langle a_0; a_1, a_2, \dotsc \right \rangle$$ and converges to its represented irrational number.</p> <div class="p">It turns out you can calculate $$p_{i+1}$$ and $$q_{i+1}$$ efficiently from $$p_i$$, $$q_i$$, $$p_{i-1}$$, $$q_{i-1}$$, and $$a_{i+1}$$ using the <a href="http://en.wikipedia.org/wiki/Fundamental_recurrence_formulas"><em>fundamental recurrence formulas</em></a> (which can be proved by induction). Here is my Haskell implementation: <pre class="code-container"><code class="language-haskell">get_convergents (a_0 : a_1 : as) = pqs where pqs = (p_0, q_0) : (p_1, q_1) : zipWith3 get_next_convergent pqs (tail pqs) as p_0 = a_0 q_0 = 1 p_1 = a_1 * a_0 + 1 q_1 = a_1 get_next_convergent (p_i, q_i) (p_j, q_j) a_k = (p_k, q_k) where p_k = a_k * p_j + p_i q_k = a_k * q_j + q_i</code></pre> and some more examples: <pre class="code-container"><code class="language-shell">Prelude Main> take 8$ get_convergents $sqrt_continued_fraction 2 [(1,1),(3,2),(7,5),(17,12),(41,29),(99,70),(239,169),(577,408)] Prelude Main> take 8$ get_convergents $sqrt_continued_fraction 103 [(10,1),(61,6),(71,7),(203,20),(274,27),(477,47),(4567,450),(5044,497)] Prelude Main> take 8$ get_convergents $sqrt_continued_fraction 1597 [(39,1),(40,1),(1039,26),(1079,27),(2118,53),(3197,80),(27694,693),(113973,2852)] Prelude Main> let divFrac (x, y) = (fromInteger x) / (fromInteger y) Prelude Main> take 8$ map divFrac $get_convergents$ sqrt_continued_fraction 2 [1.0,1.5,1.4,1.4166666666666667,1.4137931034482758,1.4142857142857144,1.4142011834319526,1.4142156862745099] Prelude Main> take 8 $map divFrac$ get_convergents $sqrt_continued_fraction 103 [10.0,10.166666666666666,10.142857142857142,10.15,10.148148148148149,10.148936170212766,10.148888888888889,10.148893360160965] Prelude Main> take 8$ map divFrac $get_convergents$ sqrt_continued_fraction 1597 [39.0,40.0,39.96153846153846,39.96296296296296,39.9622641509434,39.9625,39.96248196248196,39.9624824684432]</code></pre> </div> <div class="p">Here are a few more quick facts to help those unfamiliar with Haskell: <ul> <li>The expression <code>a : as</code> forms a new list from the element <code>a</code> and the existing list <code>as</code> (equivalent to <code>cons</code> in Lisp).</li> <li><code>zipWith3</code> is a function that takes in a function <code>f</code>, three lists <code>a</code>, <code>b</code>, and <code>c</code> of the same (possibly infinite) length <code>n</code>, and forms the new list <code>[ f(a, b, c), f(a, b, c), ..., f(a[n], b[n], c[n]) ]</code>.</li> <li>Note that the result of <code>zipWith3</code> is part of the variable <code>pqs</code> which itself appears (twice!) in the arguments to <code>zipWith3</code>. This is a Haskell idiom and reflects the fact that the recurrence formulas define a convergent in terms of its two previous convergents. A simpler example (using the Fibonacci sequence) can be found in the <a href="http://en.wikipedia.org/wiki/Lazy_evaluation">Wikipedia entry for lazy evaluation</a>.</li> <li>Haskell has built-in data types for integers of arbitrary size which is necessary as the numerators and denominators of the convergents get large quickly. In fact, Haskell has built-in data types for rational numbers (represented as fractions) but it doesn't help us much here.</li> </ul> </div> <div class="p">Since we are guaranteed that some convergent eventually satisfies Pell's equation, we can write a simple function to generate all convergents, test each one to see if it satisfies Pell's equation, and return the first one we see. Here is the Haskell implementation: <pre class="code-container"><code class="language-haskell">get_pell_fundamental_solution n = head $solutions where solutions = [ (p, q) | (p, q) &lt;- convergents, p * p - n * q * q == 1 ] convergents = get_convergents$ sqrt_continued_fraction n</code></pre> Note the use of the Haskell's <a href="http://en.wikipedia.org/wiki/List_comprehension">list comprehension</a> syntax, similar to Python, which expresses what I just described in a matter reminiscent of set notation.</div> <div class="p">Here is the full Haskell program designed so its output may be conveniently piped to <a href="http://en.wikipedia.org/wiki/Bc_programming_language">bc</a> for verification: <pre class="code-container"><code class="language-haskell">module Main where import System (getArgs) sqrt_continued_fraction :: (Integral a) => a -> [a] {- ... the sqrt_continued_fraction function explained above ... -} get_convergents :: (Integral a) => [a] -> [(a, a)] {- ... the get_convergents function explained above ... -} get_pell_fundamental_solution :: (Integral a) => a -> (a, a) {- ... the get_pell_fundamental_solution function explained above ... -} main :: IO () main = do args &lt;- System.getArgs let d = (read $head$ args :: Integer) (p, q) = get_pell_fundamental_solution d in putStr $"d = " ++ (show d) ++ "\n" ++ "p = " ++ (show p) ++ "\n" ++ "q = " ++ (show q) ++ "\n" ++ "p^2 - d * q^2 == 1\n"</code></pre> and here is it in action: <pre class="code-container"><code class="language-shell">$ ./solve_pell 1597 d = 1597 p = 519711527755463096224266385375638449943026746249 q = 13004986088790772250309504643908671520836229100 p^2 - d * q^2 == 1</code></pre> </div> <p>The solution to the original problem is therefore:<br/> <strong>5054112910466227478111803017176109047976100000000.</strong></p> <p>Now that we've found a method to get <em>a</em> solution, the question remains as to whether it's the only one. In fact it is not, but it is the minimal one, and all other solutions (of which there are an infinite number) can be generated from this fundamental one with a simple recurrence relation as described on the <a href="http://en.wikipedia.org/wiki/Pell%27s_equation#Solution_technique">Wikipedia article</a>. My program above can be easily extended to generate all solutions instead of just the fundamental one (I'll leave it to the reader as an exercise).</p> <p>One remaining question is the efficiency of this algorithm. For simplicity, let's neglect the cost of the arbitrary-precision arithmetic involved and assume that the incremental cost of generating each term of the continued fraction expansion and the convergents is constant. Then the main cost is just how many convergents we have to generate before we find one that satisfies Pell's equation. In fact, it turns out that this depends on the length of the period of the continued fraction expansion of $$\sqrt{d}$$, which has a rough upper bound of $$O(\ln(d \sqrt{d}))$$. Therefore, the cost of solving Pell's equation (in terms of how many convergents to generate) for a given $$n$$-digit number is $$O(n 2^{n/2})$$. This is pretty expensive already, although it's still much better than brute-force search (which is on the order of exponentiating the above expression). Can we do better? Well, sort of; it turns out the length of the answer is of the same order as the expression above, so any algorithm that explicitly outputs a solution necessarily takes that long. However, if you can somehow factor $$d$$ into $$s d'$$, where $$s$$ is a perfect square and $$d'$$ is <a href="http://en.wikipedia.org/wiki/Squarefree">squarefree</a> (i.e., not divisible by any perfect square), then you can solve Pell's equation for the smaller number $$d'$$ and output the solution for $$d'$$ as the smaller fundamental solution and an expression raised to a certain power involving it. Note that in general this involves factoring $$d$$, another hard problem, but for which there exists tons of prior work. An interested reader can peruse the papers by <a href="http://www.ams.org/notices/200202/fea-lenstra.pdf">Lenstra</a> and <a href="http://www.math.nyu.edu/~crorres/Archimedes/Cattle/cattle_vardi.pdf">Vardi</a> for more details.</p> <p>As a final note, one of the things I really like about number theory is that investigating such a simple program can lead you down surprising avenues of mathematics and computational theory. In fact, I've had to omit a lot of things I had planned to say to avoid growing this entry to be longer than it already is. Hopefully, this entry helps someone else learn more about this interesting corner of number theory.</p> <hr /> <p>Like this post? Subscribe to <!-- The image is 256x256, the center of the dot is 189 pixels from the top, and the radius of the dot is 24. Therefore, the dot is 43/256 = 0.16796875 of the image height above the bottom.--> <a href="feed/atom">my feed <img src="feed-icon.svg" alt="RSS icon" style="width: 1em; height: 1em; vertical-align: -0.16796875em;" /></a> or follow me on <a href="https://twitter.com/fakalin">Twitter <img src="twitter-icon.svg" alt="Twitter icon" style="width: 1em; height 1em;" /></a>.</p> <section class="footnotes"> <header> <h2>Footnotes</h2> </header> <p id="fn1"> As a rule we'll avoid considering trivial cases and re-stating obvious assumptions (like $$d$$ having to be a positive integer). <a href="#r1">↩</a></p> </section>