<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
  <title>Fred Akalin</title>
  <subtitle>Notes on math, tech, and everything in between</subtitle>
  <link type="text/html" rel="alternate" href="https://www.akalin.com/"/>
  <link type="application/atom+xml" rel="self" href="https://www.akalin.com/feed/atom"/>
  <updated>2024-01-29T04:19:26-08:00</updated>
  <id>https://www.akalin.com/</id>
  <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>


  
  <entry>
    <id>https://www.akalin.com/fta-connectedness</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/fta-connectedness"/>
    <title>The Fundamental Theorem of Algebra via Connectedness</title>
    <updated>2021-01-03T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;p&gt;It is intuitive that removing even a single point from a line disconnects it, but removing a finite set of points from a plane leaves it connected.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/disconnected-line.png&quot; alt=&quot;A line disconnected by a single point.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;A line disconnected by a single point.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/connected-plane.png&quot; style=&quot;width:50.0%&quot; alt=&quot;A plane remaining connected even with a few points removed.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;A plane remaining connected even with a few points removed.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;However, this basic fact leads to a non-trivial property of real and complex polynomials: not all non-constant real polynomials have real roots, but all non-constant complex polynomials have complex roots. The latter, is in fact the &lt;em&gt;fundamental theorem of algebra&lt;/em&gt;:&lt;/p&gt;
&lt;div class=&quot;theorem&quot;&gt;
&lt;p&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Fundamental theorem of algebra&lt;/span&gt;.) Every non-constant complex polynomial has a root.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We’ll prove this theorem using nothing stronger than the complex inverse function theorem. Here’s a synopsis:&lt;/p&gt;
&lt;ol type=&quot;1&quot;&gt;
&lt;li&gt;Let &lt;span class=&quot;math inline&quot;&gt;p \colon \mathbb{C}→ \mathbb{C}&lt;/span&gt; be a non-constant complex polynomial, and &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; its set of regular values. Let &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}= p^{-1}(V_{\text{regular}})&lt;/span&gt; be its set of pure regular points, so that &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; can be thought of a &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}→ V_{\text{regular}}&lt;/span&gt; map.&lt;/li&gt;
&lt;li&gt;Any complex polynomial, and &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; in particular, is a closed &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}→ \mathbb{C}&lt;/span&gt; map, and thus also a closed &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}→ V_{\text{regular}}&lt;/span&gt; map.&lt;/li&gt;
&lt;li&gt;Furthermore, by the inverse function theorem, &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; is an open &lt;span class=&quot;math inline&quot;&gt;P_{\text{regular}}→ V_{\text{regular}}&lt;/span&gt; map, and thus also an open &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}→ V_{\text{regular}}&lt;/span&gt; map.&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt;, being non-constant, has only finitely many critical points. (&lt;em&gt;This is the step that fails for real polynomials.&lt;/em&gt;) Therefore, &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; is the complex plane with a finite set of points removed, and thus is connected. Similarly, &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; is also connected.&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt;, being a continuous, open, and closed &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}→ V_{\text{regular}}&lt;/span&gt; map, must take connected components to connected components. Since &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; are both connected, that means that &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; maps &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; onto &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;.&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; also maps &lt;span class=&quot;math inline&quot;&gt;P_{\text{critical}}&lt;/span&gt; onto &lt;span class=&quot;math inline&quot;&gt;V_{\text{critical}}&lt;/span&gt;, so &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; is surjective on &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;, and thus must have a root.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is a wonderfully succinct proof, but it’s full of subtleties and would benefit from elaboration (as well as some diagrams). We’ll do that in the rest of this article. First, we need some definitions.&lt;/p&gt;
&lt;h3 id=&quot;points-and-values&quot;&gt;Points and values&lt;/h3&gt;
&lt;p&gt;If a function &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; maps from &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;, we’ll call elements of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; &lt;em&gt;points&lt;/em&gt; and elements of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; &lt;em&gt;values&lt;/em&gt;; in our case, &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; will both be subsets of either &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt; or &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;, but it’s helpful to distinguish when we’re talking about a real or complex number as a domain element versus a codomain element.&lt;/p&gt;
&lt;p&gt;If &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; is differentiable, we’ll call &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; a &lt;em&gt;critical point&lt;/em&gt; if &lt;span class=&quot;math inline&quot;&gt;f&amp;#39;(x) = 0&lt;/span&gt; and a &lt;em&gt;regular point&lt;/em&gt; otherwise. We’ll call &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; a &lt;em&gt;critical value&lt;/em&gt; if &lt;span class=&quot;math inline&quot;&gt;y = f(x)&lt;/span&gt; for some critical point &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; and a &lt;em&gt;regular value&lt;/em&gt; otherwise. In particular, if &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is not in the image of &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt;, then &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is a regular value.&lt;/p&gt;
&lt;p&gt;A regular point may map to a critical value. In that case, we call it an &lt;em&gt;impure regular point&lt;/em&gt; and a &lt;em&gt;pure regular point&lt;/em&gt; otherwise. (This is nonstandard terminology, but it helps with visualizing what’s going on.)&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/real-function-points-values.png&quot; style=&quot;width:50.0%&quot; alt=&quot;The points and values of a real function. Red points are critical points, blue values are critical values, and green points are impure regular points. All other points are pure regular, and all other values are regular.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The points and values of a real function. &lt;span style=&quot;color: red;&quot;&gt;Red points&lt;/span&gt; are &lt;span style=&quot;color: red;&quot;&gt;critical points&lt;/span&gt;, &lt;span style=&quot;color: blue;&quot;&gt;blue values&lt;/span&gt; are &lt;span style=&quot;color: blue;&quot;&gt;critical values&lt;/span&gt;, and &lt;span style=&quot;color: green;&quot;&gt;green points&lt;/span&gt; are &lt;span style=&quot;color: green;&quot;&gt;impure regular points&lt;/span&gt;. All other points are pure regular, and all other values are regular.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/complex-function-points-values.png&quot; alt=&quot;The points and values of a complex function, with the same colors as in the previous figure.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The points and values of a complex function, with the same colors as in the previous figure.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The strategy of the proof is to show that a non-constant complex polynomial &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; is surjective. By construction, &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; maps impure regular points and critical points onto the critical values. Then it suffices to show that &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; maps the pure regular points &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; onto the regular values &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;. In doing so, we’ll show that there are only a finite number of critical points, critical values, and impure regular points; therefore, &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; is the complex plane minus a finite number of points, and that is where connectedness comes into play.&lt;/p&gt;
&lt;h3 id=&quot;connected-sets&quot;&gt;Connected sets&lt;/h3&gt;
&lt;p&gt;A subset &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; of a topological space is &lt;em&gt;disconnected&lt;/em&gt; if it is the union of two disjoint, non-empty open sets, and &lt;em&gt;connected&lt;/em&gt; otherwise.&lt;/p&gt;
&lt;p&gt;For example, the set &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; in the first figure is the real line &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt; with a single point &lt;span class=&quot;math inline&quot;&gt;a&lt;/span&gt; removed. Then &lt;span class=&quot;math inline&quot;&gt;X = (-∞, a) ∪ (a, ∞)&lt;/span&gt;, so it is disconnected.&lt;/p&gt;
&lt;p&gt;It is harder to show that a set is connected. However, we can use a stronger property that’s easier to show. A subset &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; of a topological space is &lt;em&gt;path-connected&lt;/em&gt; if for every two points &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; in &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt;, there exists a &lt;em&gt;path&lt;/em&gt; from &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt;—that is, a continuous function &lt;span class=&quot;math inline&quot;&gt;f \colon [0, 1] → X&lt;/span&gt; such that &lt;span class=&quot;math inline&quot;&gt;f(0) = x&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;f(1) = y&lt;/span&gt;. A path-connected set is automatically a connected set—being able to draw paths between any two points makes it impossible to split the set into two disjoint non-empty open subsets.&lt;/p&gt;
&lt;p&gt;In particular, let &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; be the plane &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}^2&lt;/span&gt; or &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; with a finite number of points &lt;span class=&quot;math inline&quot;&gt;p_i&lt;/span&gt; removed. Then we’ll show that &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; is path-connected. Let &lt;span class=&quot;math inline&quot;&gt;d&lt;/span&gt; be the minimum distance between any of the removed points, and let &lt;span class=&quot;math inline&quot;&gt;r = d/3&lt;/span&gt;. Then given &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; in &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt;, let &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt; be the straight-line path from &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt;. For any &lt;span class=&quot;math inline&quot;&gt;p_i&lt;/span&gt; that is on &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt;, replace the segment through &lt;span class=&quot;math inline&quot;&gt;p_i&lt;/span&gt; with a semi-circular arc of radius &lt;span class=&quot;math inline&quot;&gt;r&lt;/span&gt; around &lt;span class=&quot;math inline&quot;&gt;p_i&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;r &amp;lt; d/2&lt;/span&gt;, the arc will not have any other removed point on it, and no two arcs will overlap. Therefore, this modified path lies entirely in &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; were arbitrary, &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; is path-connected, and thus connected.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/path-connected.png&quot; style=&quot;width:50.0%&quot; alt=&quot;The path between x and y on a plane with a finite number of points removed.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The path between &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; on a plane with a finite number of points removed.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We’re most interested in connected sets that are maximal in the sense that they’re not contained in a larger connected set. These are called &lt;em&gt;connected components&lt;/em&gt;, and any topological space can be decomposed into its connected components. For example, the set &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; in the first figure has two connected components &lt;span class=&quot;math inline&quot;&gt;(-∞, a)&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;(a, ∞)&lt;/span&gt;, and the plane with a finite number of points removed remains connected, and thus only has a single connected component. However, removing a line from a plane splits it into two connected components, one on each side of the line.&lt;/p&gt;
&lt;p&gt;A continuous function preserves connectedness: it maps connected sets to connected sets. However, it may map a connected component to a connected set that’s not a connected component. We want to show that real and complex polynomials map connected components to connected components—this leads us to the concepts of open and closed maps.&lt;/p&gt;
&lt;h3 id=&quot;open-and-closed-functions&quot;&gt;Open and closed functions&lt;/h3&gt;
&lt;p&gt;If a function &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; between topological spaces &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; sends open sets of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to open sets of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;, we call it &lt;em&gt;open&lt;/em&gt;. Similarly, if it sends closed sets of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to closed sets of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;, we call it &lt;em&gt;closed&lt;/em&gt;. Be careful! Like with sets, whether a function is open is unrelated to whether it is closed; a function may be neither open nor closed, just open, just closed, or both.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/not-open-example.png&quot; style=&quot;width:75.0%&quot; alt=&quot;The real polynomial p(x) = x^2 + 1 is not open, since it maps the open interval (-1, +1) to the closed interval \lbrack 1,  2).&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The real polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) = x^2 + 1&lt;/span&gt; is not open, since it maps the open interval &lt;span class=&quot;math inline&quot;&gt;(-1, +1)&lt;/span&gt; to the closed interval &lt;span class=&quot;math inline&quot;&gt;\lbrack 1,  2)&lt;/span&gt;.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We’re more interested in sets and functions that are both open and closed, which we’ll call &lt;em&gt;clopen&lt;/em&gt;. A topological space &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; always has two clopen subsets: &lt;span class=&quot;math inline&quot;&gt;\emptyset&lt;/span&gt; and itself. However, if its disconnected, it may have more: in general, a clopen subset &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; is a union of connected components of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt;. Conversely, if &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; has finitely many connected components, each connected component is clopen.&lt;/p&gt;
&lt;p&gt;Then since a clopen function &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; between &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; sends clopen sets of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to clopen sets of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;, it then sends connected components of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to unions of connected components of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;. If &lt;span class=&quot;math inline&quot;&gt;f(x)&lt;/span&gt; is also continuous, then it must send a connected component of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to another connected set, which then must be a connected component of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Therefore, since real and complex polynomials are continuous, in order to show that they map connected components to connected components, we need to show that they are also clopen.&lt;/p&gt;
&lt;h3 id=&quot;real-and-complex-polynomials-are-closed&quot;&gt;Real and complex polynomials are closed&lt;/h3&gt;
&lt;p&gt;First, we want to show that a real polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) \colon \mathbb{R}→ \mathbb{R}&lt;/span&gt; or a complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) \colon \mathbb{C}→ \mathbb{C}&lt;/span&gt; is closed.&lt;/p&gt;
&lt;p&gt;If &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; is constant, then this follows immediately. Otherwise, the essential property of polynomials that we use is that if &lt;span class=&quot;math inline&quot;&gt;x → ∞&lt;/span&gt;, then &lt;span class=&quot;math inline&quot;&gt;p(x) → ∞&lt;/span&gt;. In other words, if &lt;span class=&quot;math inline&quot;&gt;x_n&lt;/span&gt; is a sequence such that &lt;span class=&quot;math inline&quot;&gt;p(x_n)&lt;/span&gt; is bounded, then &lt;span class=&quot;math inline&quot;&gt;x_n&lt;/span&gt; must also be bounded.&lt;/p&gt;
&lt;p&gt;Then let &lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; be a closed set of points, and let &lt;span class=&quot;math inline&quot;&gt;y ∈ \overline{p(U)}&lt;/span&gt;; in other words, &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is a limit point of &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt;. To show that &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt; is closed, we want to show that &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is in fact in &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is a limit point of &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt;, there is some sequence &lt;span class=&quot;math inline&quot;&gt;x_n&lt;/span&gt; in &lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; such that &lt;span class=&quot;math inline&quot;&gt;p(x_n)&lt;/span&gt; converges to &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt;. Then &lt;span class=&quot;math inline&quot;&gt;p(x_n)&lt;/span&gt; is bounded, so by the above, &lt;span class=&quot;math inline&quot;&gt;x_n&lt;/span&gt; is also bounded. Then some subsequence &lt;span class=&quot;math inline&quot;&gt;x_m&lt;/span&gt; of &lt;span class=&quot;math inline&quot;&gt;x_n&lt;/span&gt; converges to some &lt;span class=&quot;math inline&quot;&gt;\tilde{x} ∈ U&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; is continuous, &lt;span class=&quot;math inline&quot;&gt;p(x_m)&lt;/span&gt; then converges to &lt;span class=&quot;math inline&quot;&gt;p(\tilde{x})&lt;/span&gt;, which must then equal &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt;. Therefore, &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; is indeed in &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt;, which shows that &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; is a closed map.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/polynomials-are-closed.png&quot; alt=&quot;Diagram for the proof that a non-constant polynomial p(x) is closed.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;Diagram for the proof that a non-constant polynomial &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; is closed.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/not-closed.png&quot; style=&quot;width:50.0%&quot; alt=&quot;The function f(x) = 1/x is not closed, since the closed interval \lbrack 1, ∞) gets mapped to the half-open interval \lbrack 0,  1)&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The function &lt;span class=&quot;math inline&quot;&gt;f(x) = 1/x&lt;/span&gt; is not closed, since the closed interval &lt;span class=&quot;math inline&quot;&gt;\lbrack 1, ∞)&lt;/span&gt; gets mapped to the half-open interval &lt;span class=&quot;math inline&quot;&gt;\lbrack 0,  1)&lt;/span&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;So polynomials &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}→ \mathbb{R}&lt;/span&gt; or &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}→ \mathbb{C}&lt;/span&gt; are closed, but what we really want to show is that they’re also closed as maps from its pure regular points &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to its regular values &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;. In general, restricting the domain or codomain of a function doesn’t preserve the property of being closed, but if &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt; is a closed map from &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;D ⊆ B&lt;/span&gt;, then &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt; is a closed map from &lt;span class=&quot;math inline&quot;&gt;C = f^{-1}(D)&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;D&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;A proof: if &lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; is a closed subset of &lt;span class=&quot;math inline&quot;&gt;C&lt;/span&gt;, then it is &lt;span class=&quot;math inline&quot;&gt;U&amp;#39; ∩ C&lt;/span&gt; for &lt;span class=&quot;math inline&quot;&gt;U&amp;#39;&lt;/span&gt; a closed subset of &lt;span class=&quot;math inline&quot;&gt;A&lt;/span&gt;. In general we have the identity &lt;span class=&quot;math inline&quot;&gt;f(X ∩ Y) ⊆ f(X) ∩ f(Y)&lt;/span&gt;, so &lt;span class=&quot;math display&quot;&gt;
f(U&amp;#39; ∩ C) ⊆ f(U&amp;#39;) ∩ f(C) ⊆ f(U&amp;#39;) ∩ D\text{.}
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Conversely, if &lt;span class=&quot;math inline&quot;&gt;y ∈ f(U&amp;#39;) ∩ D&lt;/span&gt;, then &lt;span class=&quot;math inline&quot;&gt;f(x) = y&lt;/span&gt; for some &lt;span class=&quot;math inline&quot;&gt;x ∈ U&amp;#39;&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;f(x) ∈ D&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;x ∈ C = f^{-1}(D)&lt;/span&gt;, so &lt;span class=&quot;math inline&quot;&gt;x ∈ U&amp;#39; ∩ C&lt;/span&gt;. Therefore, &lt;span class=&quot;math inline&quot;&gt;y ∈ f(U&amp;#39; ∩ C)&lt;/span&gt;, thus &lt;span class=&quot;math inline&quot;&gt;f(U&amp;#39;) ∩ D ⊆ f(U&amp;#39; ∩ C)&lt;/span&gt;, and&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math display&quot;&gt; f(U) = f(U&amp;#39; ∩ C) = f(U&amp;#39;) ∩ D\text{.}&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;f(U&amp;#39;)&lt;/span&gt; is a closed subset of &lt;span class=&quot;math inline&quot;&gt;B&lt;/span&gt; by &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt; being closed, and so &lt;span class=&quot;math inline&quot;&gt;f(U&amp;#39;) ∩ D&lt;/span&gt; is a closed subset of &lt;span class=&quot;math inline&quot;&gt;D&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;In particular, &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; is the inverse image of &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; by construction, so a real or complex polynomial is thus a closed map from &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;.&lt;/p&gt;
&lt;h3 id=&quot;real-and-complex-polynomials-have-finitely-many-critical-points&quot;&gt;Real and complex polynomials have finitely many critical points&lt;/h3&gt;
&lt;p&gt;One subtle but important fact that we need is that non-constant real and complex polynomials have finitely many critical points. A critical point of the real or complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; is a root of &lt;span class=&quot;math inline&quot;&gt;p&amp;#39;(x)&lt;/span&gt;, which is another polynomial, so the statement that a non-constant real or complex polynomial has finitely many critical points is equivalent to the statement that a non-zero real or complex polynomial has finitely many roots.&lt;/p&gt;
&lt;p&gt;But isn’t that equivalent to the fundamental theorem of algebra? No! For one, it’s also true for real polynomials. More generally, it’s an upper bound on the number of roots, whereas the fundamental theorem of algebra is a lower bound.&lt;/p&gt;
&lt;p&gt;If a real or complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; of positive degree &lt;span class=&quot;math inline&quot;&gt;n&lt;/span&gt; has a root &lt;span class=&quot;math inline&quot;&gt;r&lt;/span&gt;, then &lt;span class=&quot;math inline&quot;&gt;p(x) = (x - r) q(x)&lt;/span&gt; for some polynomial &lt;span class=&quot;math inline&quot;&gt;q(x)&lt;/span&gt; of degree &lt;span class=&quot;math inline&quot;&gt;n - 1&lt;/span&gt;. Then since non-zero degree-&lt;span class=&quot;math inline&quot;&gt;0&lt;/span&gt; polynomials have no roots, by induction &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; has at most &lt;span class=&quot;math inline&quot;&gt;n&lt;/span&gt; roots.&lt;/p&gt;
&lt;p&gt;Therefore, a non-constant real or complex polynomial of degree &lt;span class=&quot;math inline&quot;&gt;n&lt;/span&gt; has at most &lt;span class=&quot;math inline&quot;&gt;n - 1&lt;/span&gt; critical points.&lt;/p&gt;
&lt;h3 id=&quot;real-and-complex-polynomials-are-open-on-regular-points&quot;&gt;Real and complex polynomials are open on regular points&lt;/h3&gt;
&lt;p&gt;A real polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) \colon \mathbb{R}→ \mathbb{R}&lt;/span&gt; is &lt;em&gt;not&lt;/em&gt; open in general; a figure above shows that &lt;span class=&quot;math inline&quot;&gt;p(x) = x^2 + 1&lt;/span&gt; is a counterexample. Fortunately, it’s only the critical points that are the problem: as functions from &lt;span class=&quot;math inline&quot;&gt;P_{\text{regular}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt;, real polynomials are open.&lt;/p&gt;
&lt;p&gt;The complex case is actually easier—the &lt;a href=&quot;https://en.wikipedia.org/wiki/Open_mapping_theorem_(complex_analysis)&quot;&gt;open mapping theorem&lt;/a&gt; implies that a complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) \colon \mathbb{C}→ \mathbb{C}&lt;/span&gt; is open in general. However, that theorem uses a bit more complex analysis machinery than we’d like—it turns out that we can use the same proof as in the real case (which is simpler) to show that complex polynomials are open as functions from &lt;span class=&quot;math inline&quot;&gt;P_{\text{regular}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So let’s start the proof. Let &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; be a real (or complex) polynomial, considered as a function from &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt; (or &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;). Let &lt;span class=&quot;math inline&quot;&gt;U ⊆ V_{\text{regular}}&lt;/span&gt; be open, and we want to show that &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt; is also open.&lt;/p&gt;
&lt;p&gt;Let &lt;span class=&quot;math inline&quot;&gt;y ∈ p(U)&lt;/span&gt;. Then &lt;span class=&quot;math inline&quot;&gt;y = p(x)&lt;/span&gt; for some regular point &lt;span class=&quot;math inline&quot;&gt;x ∈ U&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;p&amp;#39;(x) ≠ 0&lt;/span&gt;, by the real inverse function theorem (or the complex inverse function theorem) there is some open set &lt;span class=&quot;math inline&quot;&gt;X&lt;/span&gt; containing &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; that is diffeomorphic to &lt;span class=&quot;math inline&quot;&gt;p(X)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; is open in &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;, which is &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; minus a finite number of points. Therefore, &lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; is an open set in &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; minus a finite number of points, and is thus also open in &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;. (This is where we use the fact that &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; has a finite number of critical points.)&lt;/p&gt;
&lt;p&gt;Since &lt;span class=&quot;math inline&quot;&gt;U&lt;/span&gt; is open in &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;, so is &lt;span class=&quot;math inline&quot;&gt;X ∩ U&lt;/span&gt;, which is diffeomorphic to &lt;span class=&quot;math inline&quot;&gt;p(X ∩ U)&lt;/span&gt;, which is thus an open set contained in &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt; containing &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt;. Since &lt;span class=&quot;math inline&quot;&gt;y&lt;/span&gt; was arbitrary, &lt;span class=&quot;math inline&quot;&gt;p(U)&lt;/span&gt; is open.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/real-open.png&quot; alt=&quot;With the real polynomial p(x) = x^3, X = (-a, 1+a) is an open set containing 1 that is diffeomorphic to p(X). Then X ∩ U =  (-a, 0) ∪ (0, 1 + a) is also open, and thus so is p(X ∩  U).&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;With the real polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) = x^3&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;X = (-a, 1+a)&lt;/span&gt; is an open set containing &lt;span class=&quot;math inline&quot;&gt;1&lt;/span&gt; that is diffeomorphic to &lt;span class=&quot;math inline&quot;&gt;p(X)&lt;/span&gt;. Then &lt;span class=&quot;math inline&quot;&gt;X ∩ U =  (-a, 0) ∪ (0, 1 + a)&lt;/span&gt; is also open, and thus so is &lt;span class=&quot;math inline&quot;&gt;p(X ∩  U)&lt;/span&gt;.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/complex-open.png&quot; alt=&quot;A similar diagram for a complex polynomial p(x).&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;A similar diagram for a complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt;.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Since a real or complex polynomial &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; is open from &lt;span class=&quot;math inline&quot;&gt;P_{\text{regular}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt; or &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt;, the same reasoning as in the closed case shows that since &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}⊆ \mathbb{C}&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}= p^{-1}(V_{\text{regular}})&lt;/span&gt;, then a real or complex polynomial is an open map from &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;.&lt;/p&gt;
&lt;h3 id=&quot;non-constant-complex-polynomials-are-surjective-but-not-real-ones&quot;&gt;Non-constant complex polynomials are surjective (but not real ones)&lt;/h3&gt;
&lt;p&gt;Now we’re ready to put it all together. Let &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; be a non-constant complex polynomial. By the above, it is clopen as a map from &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;. Therefore, since it’s also continuous, it maps each connected components of &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to a connected component of &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;. But both &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; are &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; minus a finite set of points, and thus they both have a single connected component. Therefore, &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; maps &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; onto &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;. Since it also maps &lt;span class=&quot;math inline&quot;&gt;P_{\text{critical}}&lt;/span&gt; onto &lt;span class=&quot;math inline&quot;&gt;V_{\text{critical}}&lt;/span&gt;, it maps &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; onto &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}= V_{\text{critical}}∪ V_{\text{regular}}&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;In particular, this implies that &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; has a root, which is the fundamental theorem of algebra.&lt;/p&gt;
&lt;p&gt;What about the real case? Consider the real polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) = x^2 + 1&lt;/span&gt;. It has a single critical value &lt;span class=&quot;math inline&quot;&gt;1&lt;/span&gt; mapped to by a single critical point &lt;span class=&quot;math inline&quot;&gt;0&lt;/span&gt;, so &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; has two connected components: &lt;span class=&quot;math inline&quot;&gt;(-∞, 0)&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;(0, ∞)&lt;/span&gt;. &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt; has two connected components &lt;span class=&quot;math inline&quot;&gt;(-∞, 1)&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;(1, ∞)&lt;/span&gt;, but &lt;span class=&quot;math inline&quot;&gt;p(x)&lt;/span&gt; maps both connected components of &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;(1, ∞)&lt;/span&gt;, and so isn’t surjective on &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}&lt;/span&gt;, and in particular doesn’t have a root.&lt;/p&gt;
&lt;figure&gt;
&lt;img src=&quot;fta-connectedness-files/real-poly-connected-components.png&quot; style=&quot;width:50.0%&quot; alt=&quot;The polynomial p(x) = x^2 + 1 maps the two connected components (-∞, 0) and (0, ∞) of P_{\text{pure}} to only one connected component (1, ∞) of V_{\text{regular}}.&quot; /&gt;&lt;figcaption aria-hidden=&quot;true&quot;&gt;The polynomial &lt;span class=&quot;math inline&quot;&gt;p(x) = x^2 + 1&lt;/span&gt; maps the two connected components &lt;span class=&quot;math inline&quot;&gt;(-∞, 0)&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;(0, ∞)&lt;/span&gt; of &lt;span class=&quot;math inline&quot;&gt;P_{\text{pure}}&lt;/span&gt; to only one connected component &lt;span class=&quot;math inline&quot;&gt;(1, ∞)&lt;/span&gt; of &lt;span class=&quot;math inline&quot;&gt;V_{\text{regular}}&lt;/span&gt;.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id=&quot;further-reading&quot;&gt;Further reading&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://mathoverflow.net/a/10684&quot;&gt;This MathOverflow answer&lt;/a&gt; is where I first found this proof, although it’s slightly less elementary (it relies on polynomials being proper) and even more terse.&lt;/p&gt;
&lt;p&gt;Milnor’s wonderful book “Topology from the Differentiable Viewpoint” has a &lt;a href=&quot;https://www.google.com/books/edition/Topology_from_the_Differentiable_Viewpoi/BaQYYJp84cYC?gbpv=1&amp;amp;pg=PA8&quot;&gt;similarly elegant proof&lt;/a&gt; using the fact that a sphere minus a finite number of points remains connected, whereas a circle minus at least two points becomes disconnected. However, it requires somewhat more machinery.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://faculty.bard.edu/~belk/math461s11/InverseFunctionTheorem.pdf&quot;&gt;This set of notes&lt;/a&gt; is a self-contained proof of the inverse function theorem for &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}^n&lt;/span&gt; (note that the inverse function theorem for &lt;span class=&quot;math inline&quot;&gt;\mathbb{C}&lt;/span&gt; reduces to the inverse function theorem for &lt;span class=&quot;math inline&quot;&gt;\mathbb{R}^2&lt;/span&gt; by the &lt;a href=&quot;https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations&quot;&gt;Cauchy-Riemann equations&lt;/a&gt;.) It turns out that a property called “local surjectivity” is all that’s needed to prove openness, but that’s less well-known and only slightly less complicated than the full inverse function theorem.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/curvature-moving-frames</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/curvature-moving-frames"/>
    <title>Curvature computations with moving frames</title>
    <updated>2018-03-22T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
KaTeXMacros = {
  &quot;\\pd&quot;: &quot;\\frac{∂{#1}}{∂{#2}}&quot;,
  &quot;\\CSF&quot;: &quot;Γ_{#1}&quot;,
  &quot;\\CS&quot;: &quot;{Γ^{#1}}_{#2}&quot;,
  &quot;\\cnf&quot;: &quot;{ω^{#1}}_{#2}&quot;,
  &quot;\\crf&quot;: &quot;{Ω^{#1}}_{#2}&quot;,
  &quot;\\Riem&quot;: &quot;{\\operatorname{Riem}^{#1}}_{#2}&quot;,
  &quot;\\Ric&quot;: &quot;\\operatorname{Ric}_{#1}&quot;,
  &quot;\\sgn&quot;: &quot;\\operatorname{sgn}&quot;,
};
&lt;/script&gt;

&lt;style&gt;
div.cheatsheet, div.important-equation {
  border: 1px solid #002b36; /* solarized base03 */
  background-color: #fdf6e3; /* solarized base3 */
  color: #111;
  margin: 0.5em 0em;
  text-align: left;
  padding-left: 0.5em;
  padding-right: 0.5em;
}

div.cheatsheet &gt; h2 {
  font-weight: bold;
}

li &gt; h3 {
  font-weight: bold;
  font-style: italic;
}
&lt;/style&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;Overview&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Given a metric on a manifold, it is often necessary to compute its
  curvature. However, the usual method of first computing the
  Christoffel symbols and then using those to compute the Riemann
  curvature tensor is tedious and error-prone.&lt;/p&gt;

&lt;p&gt;Fortunately, there&amp;rsquo;s another way to compute the curvature
  that&amp;rsquo;s often quicker and easier: Cartan&amp;rsquo;s method of
  moving frames, or the &lt;em&gt;repère mobile&lt;/em&gt;. Unfortunately,
  explanations of this method aren&amp;rsquo;t very clear, so here
  I&amp;rsquo;m going to provide my own, based on working through a few
  examples.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m going to assume that you know enough Riemannian geometry
  to be able to compute curvature the usual way, and also that
  you&amp;rsquo;re familiar with the basics of differential forms and
  exterior differentiation. Some familiarity with &lt;a href=&quot;https://en.wikipedia.org/wiki/Pseudo-Riemannian_manifold&quot;&gt;semi-Riemannian metrics&lt;/a&gt;
  will also be helpful, since a lot of motivating examples come from
  general relativity, which uses
  &lt;a href=&quot;https://en.wikipedia.org/wiki/Pseudo-Riemannian_manifold#Lorentzian_manifold&quot;&gt;Lorentzian metrics&lt;/a&gt;.&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;The coordinate frame method&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;First, a quick overview of the usual method using coordinate
  frames. Let \(g = g_{ij} \, dx^i ⊗ dx^j\) be a given semi-Riemannian
  metric expressed in terms of the coordinates \((x^1, \dotsc, x^n)\).
  We first compute the &lt;em&gt;Christoffel symbols&lt;/em&gt; using the formula
  \[
    \CS{k}{ij} = \frac{1}{2} (g^*)^{kl} \left(∂_j g_{il} + ∂_i g_{lj} - ∂_l g_{ij}\right)\text{,}
  \]
  where \((g^*)^{ij}\) are the components of the dual metric \(g^*\),
  which can be computed by taking components of the inverse of the
  matrix \(G[i, j] = g_{ij}\) formed from the metric components, i.e. \((g^*)^{ij} = G^{-1}[i, j]\). Recall
  that the Christoffel symbols are symmetric in the lower indices, so
  if our manifold is \(n\)-dimensional, then in general we have \(n^2(n+1)/2\) independent
  Christoffel symbols.&lt;/p&gt;

&lt;p&gt;Note that we use the &lt;a href=&quot;https://en.wikipedia.org/wiki/Einstein_notation&quot;&gt;Einstein summation convention&lt;/a&gt;;
  in the absence of a summation sign, index variables that appear once
  as a superscript and once as a subscript are implicitly summed over.&lt;/p&gt;

&lt;p&gt;A useful special case is when the metric \(g\) is diagonal,&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; i.e. \(g = g_{ii} \, dx^i ⊗ dx^i\). Then \((g^*)^{ii} = 1/g_{ii}\) and
  \[
  \begin{alignedat}{2}
    \CS{k}{ij} &amp;= 0                            \qquad &amp; \CS{k}{ik} &amp;= \frac{∂_i g_{kk}}{2 g_{kk}} \\
    \CS{k}{ii} &amp;= -\frac{∂_k g_{ii}}{2 g_{kk}} \qquad &amp; \CS{i}{ii} &amp;= \frac{∂_i g_{ii}}{2 g_{ii}}\text{,}
  \end{alignedat}
  \]
  where \(i\), \(j\), and \(k\) are distinct. Therefore in this case we have \(n^2\) non-zero independent Christoffel symbols.&lt;/p&gt;

&lt;p&gt;The Christoffel symbols are important in their own right, but we
  need them only to compute curvature. We can compute the components
  of the &lt;em&gt;Riemann curvature tensor&lt;/em&gt; using the formula
  \[
    \Riem{k}{lij} = ∂_i \CS{k}{jl} - ∂_j \CS{k}{il} + \CS{k}{im} \CS{m}{jl} - \CS{k}{jm} \CS{m}{il}\text{.}
  \]
  We can then compute the &lt;em&gt;Ricci curvature tensor&lt;/em&gt; and the &lt;em&gt;scalar curvature&lt;/em&gt;:
  \[
    \Ric{ij} = \Riem{k}{ikj} \qquad S = (g^*)^{ij} \Ric{ij}\text{.}
  \]&lt;/p&gt;

&lt;p&gt;For applications, we&amp;rsquo;re most interested in the Ricci curvature tensor,
  so we usually just want to calculate that directly:
  \[
    \Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}
  \]&lt;/p&gt;

&lt;div class=&quot;cheatsheet&quot;&gt;
  &lt;h2&gt;Cheatsheet: coordinate frame method&lt;/h2&gt;

  &lt;div class=&quot;p&quot;&gt;Given the components \(g_{ij}\) of a semi-Riemannian metric:
    &lt;ol&gt;
      &lt;li&gt;Compute the Christoffel symbols. If the metric \(g\) is
        diagonal, use
        \[
        \begin{alignedat}{2}
          \CS{k}{ij} &amp;= 0                            \qquad &amp; \CS{k}{ik} &amp;= \frac{∂_i g_{kk}}{2 g_{kk}} \\
          \CS{k}{ii} &amp;= -\frac{∂_k g_{ii}}{2 g_{kk}} \qquad &amp; \CS{i}{ii} &amp;= \frac{∂_i g_{ii}}{2 g_{ii}}\text{.}
        \end{alignedat}
        \]
        Otherwise, compute the dual metric components \((g^*)^{ij} = G^{-1}[i, j]\) where \(G[i, j] = g_{ij}\) and use
        \[
          \CS{k}{ij} = \frac{1}{2} (g^*)^{kl} \left(∂_j g_{il} + ∂_i g_{lj} - ∂_l g_{ij}\right)\text{.}
        \]&lt;/li&gt;
      &lt;li&gt;Compute the Ricci curvature tensor:
        \[
          \Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}
        \]&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;The Lagrangian method&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;An alternate method for computing the Christoffel symbols is to write
  down the Lagrangian corresponding to the metric:
  \[
    L(x^1, \dotsc, x^n, v^1, \dotsc, v^n) = g_{ij}(x^1, \dotsc, x^n) \, v^i v^j
  \]
  and then to compute the Euler-Lagrange equations for a path
  \(γ(t) = \big(x^1(t), \dotsc, x^n(t)\big)\):
  \[
    \frac{d}{dt} \left( \frac{∂ L}{∂ v^k}(γ(t), \dot{γ}(t)) \right) - \frac{∂ L}{∂ x^k}(γ(t), \dot{γ}(t)) = 0
  \]
  to get the geodesic equations. Then we can compare these equations
  to the geodesic equations expressed in terms of the Christoffel symbols
  \[
    \ddot{γ}^k + \CS{k}{ij} \dot{γ}^i \dot{γ}^j = 0\text{,}
  \]
  and then we can read off the Christoffel symbols from the coefficients of the
  \(\dot{γ}^i \dot{γ}^j\) terms.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m not convinced that this method saves that much work,
especially when the metric is diagonal, but it&amp;rsquo;s at least a
clearer way to organize the computations for the Christoffel symbols.&lt;/p&gt;

&lt;div class=&quot;cheatsheet&quot;&gt;
  &lt;h2&gt;Cheatsheet: Lagrangian method&lt;/h2&gt;

  &lt;div class=&quot;p&quot;&gt;Given the components \(g_{ij}\) of a semi-Riemannian metric:
    &lt;ol&gt;
      &lt;li&gt;With the Lagrangian
        \[
          L = g_{ij} \, v^i v^j\text{,}
        \]
        compute the Euler-Lagrange equations
        \[
        \frac{d}{dt} \left( \frac{∂ L}{∂ v^k}(γ(t), \dot{γ}(t)) \right) - \frac{∂ L}{∂ x^k}(γ(t), \dot{γ}(t)) = 0\text{.}
        \]&lt;/li&gt;
      &lt;li&gt;Compare the Euler-Lagrange equations to the geodesic equation
        \[
        \ddot{γ}^k + \CS{k}{ij} \dot{γ}^i \dot{γ}^j = 0
        \]
        and read off the Christoffel symbols \(\CS{k}{ij}\).
      &lt;/li&gt;
      &lt;li&gt;Compute the Ricci curvature tensor:
        \[
        \Ric{ij} = ∂_k \CS{k}{ji} - ∂_j \CS{k}{ki} + \CS{k}{km} \CS{m}{ji} - \CS{k}{jm} \CS{m}{ki}\text{.}
        \]&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;The moving frame method&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now, finally, I can explain the method of moving
  frames. Don&amp;rsquo;t worry too much about understanding this the first
  time through; I suggest skimming this section and then following along
  with the examples below, referring back as necessary.&lt;/p&gt;

&lt;p&gt;For now, let&amp;rsquo;s assume that we have not a semi-Riemannian, but
  a Riemannian metric \(g = g_{ij} \, dx^i ⊗ dx^j\) expressed in terms
  of the coordinates \((x^1, \dotsc, x^n)\). We want to find
  &lt;em&gt;basis one-forms&lt;/em&gt;
  \((θ^1, \dotsc, θ^n)\) such that
  \[
    g = ∑_i θ^i ⊗ θ^i\text{.}
  \]
  If the metric is diagonal, this is easy (suspending the summation
  convention):
  \[
    θ^i = \sqrt{g_{ii}} \, dx^i\text{.}
  \]
  If instead the metric is not diagonal, we may still be able to
  factor it into a &amp;ldquo;sum of squares&amp;rdquo; form by
  inspection. Otherwise, an equivalent definition of the \(θ^i\) is that
  \[
    g^*(θ^i, θ^j) = δ^i_j\text{,}
  \]
  i.e. the basis one-forms \(θ^i\) comprise an &lt;em&gt;orthonormal dual frame&lt;/em&gt;.
  We can then use a &lt;a href=&quot;https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process&quot;&gt;Gram-Schmidt-like&lt;/a&gt; process on the \(dx^i\) or
  some ad hoc method to compute the basis one-forms.&lt;/p&gt;

&lt;p&gt;It is also convenient to express the coordinate forms in terms of the
  basis one-forms, which is again simple if the metric is diagonal:
  \[
    dx^i = \frac{1}{\sqrt{g_{ii}}} \, θ^i\text{.}
  \]
  Otherwise, one would need to invert the matrix expressing the \(θ^i\)
  in terms of the \(dx^i\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The next step is compute the &lt;em&gt;connection one-forms&lt;/em&gt; \(\cnf{i}{j}\).
  To do so, we compute the exterior derivatives of the basis one-forms
  \(dθ^i\) and express them in terms of the basis two-forms, i.e.
  \[
    dθ^i = a^i_{jk} \, θ^j ∧ θ^k
  \]
  for functions \(a^i_{jk}\).

  Then we can use &lt;em&gt;Cartan&amp;rsquo;s first structure equation&lt;/em&gt;

  &lt;div class=&quot;important-equation&quot;&gt;
    \[
    dθ^i = -\cnf{i}{j} ∧ θ^j
    \]
  &lt;/div&gt;
  and the fact that &lt;em&gt;the connection forms are skew symmetric&lt;/em&gt;
  &lt;div class=&quot;important-equation&quot;&gt;
    \[
      \cnf{i}{j} = -\cnf{j}{i}
    \]
  &lt;/div&gt;
  to deduce the \(\cnf{i}{j}\).&lt;/div&gt;

&lt;p&gt;There&amp;rsquo;s an explicit general formula for \(\cnf{i}{j}\) in
  terms of the basis one-forms,&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
  but it&amp;rsquo;s often easier to compare the expressions for \(dθ^i\)
  to the form of the first structure equation, guess what the
  connection forms are, taking advantage of their skew symmetry, and
  check that the first structure equation holds.  In fact, if the
  metric is diagonal, the expressions for \(dθ^i\) are
  nice enough that you can immediately read off the connection
  forms. This &amp;ldquo;guess and check&amp;rdquo; method works because the
  connection forms are guaranteeed to exist, and furthermore are
  guaranteed to be unique, so any guessed list of \(\cnf{i}{j}\) that
  satisfies the first structure equation &lt;em&gt;must&lt;/em&gt; be the
  connection forms.&lt;/p&gt;

&lt;p&gt;Note that skew symmetry immediately implies that (suspending the
  Einstein summation convention)
  \[
    \cnf{i}{i} = 0\text{.}
  \]
  Therefore, we have \(n(n-1)/2\) independent connection forms.&lt;/p&gt;

&lt;p&gt;There &lt;em&gt;is&lt;/em&gt; a formula for the connection forms when \(g\) is
  diagonal, which is more useful for deducing properties of diagonal
  metrics than it is for doing calculations. Suspending the summation
  convention,
  \[
    \begin{aligned}
      \cnf{i}{j}
        &amp;= \frac{∂_j g_{ii}}{2 g_{ii} \sqrt{g_{jj}}} \, θ^i - \frac{∂_i g_{jj}}{2 g_{jj} \sqrt{g_{ii}}} \, θ^j \\
        &amp;= \frac{∂_j g_{ii}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^i - \frac{∂_i g_{jj}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^j\text{.}
    \end{aligned}
  \]
  This formula implies that a diagonal metric has connection forms
  with at most two components each, as opposed to \(n\) components in
  general. Furthermore, if a diagonal metric depends only on a single
  coordinate \(x^r\), the only possible non-zero connection forms up to skew symmetry are \(\cnf{i}{r}\),
  which are proportional to \(θ^i\). If instead a diagonal metric depends on two coordinates \(x^r\) and \(x^s\),
  then the only possible non-zero connection forms up to skew symmetry
  are \(\cnf{i}{r}\), \(\cnf{i}{s}\), or \(\cnf{r}{s}\). The first two
  cases are proportional to \(θ^i\), and the
  last case has at most two components: one proportional to \(θ^r\) and another proportional to \(θ^s\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The connection forms play an important role similar to the
  Christoffel symbols, but we need them only to compute
  curvature. First, observer that we can express each connection form
  in two ways: in terms of the \(dx^i\), and in terms of the \(θ^i\). We
  need to compute the derivatives \(d\cnf{i}{j}\), which is easiest to
  do if \(\cnf{i}{j}\) is expressed in terms of the \(dx^i\), since
  \(d(dx^i) = 0\). Then we can compute the &lt;em&gt;curvature forms&lt;/em&gt;
  \(\crf{i}{j}\) using &lt;em&gt;Cartan&amp;rsquo;s second structure equation&lt;/em&gt;
  &lt;div class=&quot;important-equation&quot;&gt;
    \[
      \crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}\text{.}
    \]
  &lt;/div&gt;
  Like the connection forms, &lt;em&gt;the curvature forms are skew symmetric&lt;/em&gt;:
  &lt;div class=&quot;important-equation&quot;&gt;
    \[
      \crf{i}{j} = \crf{j}{i}\text{,}
    \]
  &lt;/div&gt;
  so we need only calculate \(n(n-1)/2\) independent curvature forms,
  i.e. the ones where \(i ≠ j\). Also note that in the \(\cnf{i}{k} ∧ \cnf{k}{j}\) term, one need only take the sum over the \(n - 2\) terms \(k ∉ \{ i, j \}\), by
  (suspending the summation convention) \(\cnf{i}{i} = \cnf{j}{j} = 0\).&lt;/div&gt;

  &lt;p&gt;From the properties discussed above, if a diagonal metric depends
  only on a single coordinate, then each curvature form \(\crf{i}{j}\)
    is proportional to \(θ^i ∧ θ^j\). If instead a diagonal metric depends on two coordinates \(x^r\) and \(x^s\),
  then each curvature form \(\crf{i}{r}\) or \(\crf{i}{s}\), up to skew symmetry, has at most two components: one proportional to \(θ^i ∧ θ^r\) and another proportional to \(θ^i ∧ θ^s\), and all other curvature forms \(\crf{i}{j}\) are
  proportional to \(θ^i ∧ θ^j\).&lt;/p&gt;

  &lt;p&gt;At this point we&amp;rsquo;re done, since the Riemann curvature tensor
  with respect to the orthonormal frame \((E_1, \dotsc, E_n)\) dual to
  \((θ^1, \dotsc, θ^n)\) is
  \[
    \Riem{l}{kij} = \crf{l}{k}(E_i, E_j)
  \]
  and the Ricci curvature tensor is
  \[
    \Ric{ij} = \crf{k}{i}(E_k, E_j)\text{.}
  \]
  Note that it&amp;rsquo;s not necessary to explicitly calculate \(E_i\);
  it&amp;rsquo;s enough to use the definition
  \[
    θ^i(E_j) = δ^i_j\text{,}
  \]
  and the definition of the wedge product to derive the relations
  \[
    (θ^i ∧ θ^j)(E_k, E_l) = \begin{cases}
      +1 &amp; k = i ≠ j = l \\
      -1 &amp; l = i ≠ j = k \\
      0 &amp; \text{otherwise,}
    \end{cases}
  \]
    which can then be used to compute the curvature tensor components.&lt;/p&gt;

  &lt;p&gt;From the properties discussed above, if a diagonal metric depends
    only on a single coordinate, then \(\crf{i}{j}\) is proportional to \(θ^i ∧ θ^j\), which implies that \(\Ric{}\) is
    also diagonal. Furthermore, if the metric is diagonal and depends
    on two coordinates \(x^k\) and \(x^l\), then the only possible off-diagonal component is \(\Ric{kl}\).&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;cheatsheet&quot;&gt;
  &lt;h2&gt;Cheatsheet: The moving frame method for Riemannian metrics&lt;/h2&gt;

  &lt;div class=&quot;p&quot;&gt;Given the components \(g_{ij}\) of a Riemannian metric:
    &lt;ol&gt;
      &lt;li&gt;Find an orthonormal dual frame, i.e. basis one-forms \((θ^1, \dotsc, θ^n)\) such that
        \[
          g = ∑_i θ^i ⊗ θ^i\text{.}
        \]
        If the metric is diagonal, then (suspending the summation
        convention)
        \[
          θ^i = \sqrt{g_{ii}} \, dx^i\text{.}
        \]&lt;/li&gt;
      &lt;li&gt;Use the first structure equation
        \[
          dθ^i = -\cnf{i}{j} ∧ θ^j
        \]
        and the skew symmetry relations
        \[
          \cnf{i}{j} = -\cnf{j}{i}
        \]
      to deduce the connection forms \(\cnf{i}{j}\).&lt;/li&gt;
      &lt;li&gt;Compute the curvature forms using the second structure equation
        \[
          \crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}
        \]
        and the skew symmetry relations
        \[
          \crf{i}{j} = -\crf{j}{i}\text{.}
        \]
        Note that it&amp;rsquo;s easiest to compute \(d\cnf{i}{j}\) when
        \(\cnf{i}{j}\) is expressed in terms of the \(dx^i\), since
        \(d(dx^i) = 0\)&lt;/li&gt;
      &lt;li&gt;Compute the components of the Ricci curvature tensor via
        \[
          \Ric{ij} = \crf{k}{i}(E_k, E_j)
        \]
        and the relations
        \[
          (θ^i ∧ θ^j)(E_k, E_l) = \begin{cases}
            +1 &amp; k = i ≠ j = l \\
            -1 &amp; l = i ≠ j = k \\
            0 &amp; \text{otherwise.}
          \end{cases}
        \]&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/div&gt;
&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;Comparing the methods&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;As we saw above, one advantage of the moving frame method is
    that, in the worst case, one need only compute \(n(n-1)/2\)
    independent connection forms, each with at most \(n\) components,
    rather than \(n^2(n+1)/2\) independent Christoffel symbols&amp;mdash;a
    saving of \(n^2\) &amp;ldquo;component calculations&amp;rdquo;. Even in
    the simplest case, when the metric is diagonal, you still need to
    compute \(n^2\) possibly
    non-zero independent Christoffel symbols, as opposed to \(n(n -
    1)/2\) independent connection forms, each with at most two components&amp;mdash;still a saving of \(n\) &amp;ldquo;component calculations&amp;rdquo;.&lt;/p&gt;

  &lt;p&gt;Also, when computing a curvature form, one need only compute a
    single exterior derivative of a connection form and \(n - 2\) wedge
    products of connection forms. This turns out to be less tedious
    than the corresponding calculation using coordinate methods of \(\Riem{k}{lij}\) for
    fixed \(k\) and \(l\) such that \(k ≠ l\).&lt;/p&gt;

  &lt;p&gt;Furthermore, the orthonormality of the dual frame tends to cause
    symmetries to appear earlier in the calculation, leading to less
    wasted work. This is advantageous when you know the answer
    you&amp;rsquo;re looking for, and it&amp;rsquo;s particularly simple,
    e.g. if you expect the Ricci curvature to be zero, because
    calculations becoming unduly complicated becomes a sign of an
    undetected mistake. With coordinate methods, even if calculations
    become complicated, you can&amp;rsquo;t rule out terms cancelling if
    you continue, so errors become apparent only later.&lt;/p&gt;

  &lt;p&gt;On the other hand, the moving frame method requires a certain
    amount of cleverness, first in coming up with the one-forms \(θ^i\) if
    the metric isn&amp;rsquo;t diagonal, and second in deducing the
    connection forms \(\cnf{i}{j}\). The coordinate methods require
    less thought, and are more &amp;ldquo;plug and chug&amp;rdquo;. In fact,
    once we examine the semi-Riemannian case later, we&amp;rsquo;ll see
    that the coordinate methods remain unchanged, yet the moving frame
    method becomes more complicated.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;Example 1: Orthogonal coordinates on 2D surfaces&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Let \(g\) be a Riemannian metric on a 2D manifold. The method of
  moving frames makes calculating curvature particularly easy, since
  there is exactly one connection form and one curvature form. For
  example, consider the special case when the metric is diagonal,
  i.e. with line element
  \[
    ds^2 = E \, du^2 + G \, dv^2\text{.}
  \]
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;h3&gt;Orthonormal dual frame&lt;/h3&gt;
    &lt;p&gt;We can then read off an orthonormal dual frame:
      \[
        ds^2 = {\underbrace{(\sqrt{E} \, du)}_{θ^1}}^2 + {\underbrace{(\sqrt{G} \, dv)}_{θ^2}}^2\text{,}
      \]
      i.e.
      \[
        θ^1 = \sqrt{E} \, du \qquad θ^2 = \sqrt{G} \, dv\text{,}
      \]
      and express the coordinate forms in terms of it:
      \[
        du = \frac{1}{\sqrt{E}} \, θ^1 \qquad dv = \frac{1}{\sqrt{G}} \, θ^2\text{.}
      \]&lt;/p&gt;
  &lt;/li&gt;

  &lt;li&gt;
    &lt;h3&gt;Connection forms&lt;/h3&gt;
    &lt;p&gt;The derivatives of the basis one-forms are
      \[
      \begin{aligned}
        dθ^1 &amp;= \frac{∂_v E}{2 \sqrt{E}} \, dv ∧ du = \frac{∂_v E}{2 E \sqrt{G}} \, θ^2 ∧ θ^1 \\
        dθ^2 &amp;= \frac{∂_u G}{2 \sqrt{G}} \, du ∧ dv = \frac{∂_u G}{2 G \sqrt{E}} \, θ^1 ∧ θ^2
      \end{aligned}
      \]
      and the first structure equations are
      \[
      \begin{aligned}
        dθ^1 &amp;= -\cnf{1}{2} ∧ θ^2 \\
        dθ^2 &amp;= -\cnf{2}{1} ∧ θ^1 = \cnf{1}{2} ∧ θ^1\text{.}
      \end{aligned}
      \]
      Rewriting the derivative equations to match the first structure
      equations,
      &lt;!-- TODO: File a bug for \(\) in \text{}, and clean up the below once \(\) is supported inside \text{}. --&gt;
      \[
      \begin{aligned}
        dθ^1 &amp;= -\overbrace{\left(\frac{∂_v E}{2 E \sqrt{G}} \, θ^1\right)}^{\text{one term of $\cnf{1}{2}$}} ∧ θ^2 \\
        dθ^2 &amp;= \underbrace{\left(-\frac{∂_u G}{2 G \sqrt{E}} \, θ^2\right)}_{\text{another term of $\cnf{1}{2}$}} ∧ θ^1\text{,}
      \end{aligned}
      \]
      we can guess that
      \[
        \cnf{1}{2} = \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2\text{.}
      \]
      This guess works, since
      \[
      \begin{aligned}
        -\cnf{1}{2} ∧ θ^2
        &amp;= -\left( \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 \right) ∧ θ^2 \\
        &amp;= -\frac{∂_v E}{2 E \sqrt{G}} \, θ^1 ∧ θ^2 + \underbrace{\cancel{\frac{∂_u G}{2 G \sqrt{E}} \, θ^2 ∧ θ^2}}_{θ^2 ∧ θ^2 = 0} \\
        &amp;= dθ^1
      \end{aligned}
      \]
      and
      \[
      \begin{aligned}
        \cnf{1}{2} ∧ θ^1
        &amp;= \left( \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 \right) ∧ θ^1 \\
        &amp;= \underbrace{\cancel{\frac{∂_v E}{2 E \sqrt{G}} \, θ^1 ∧ θ^1}}_{θ^1 ∧ θ^1 = 0} - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 ∧ θ^1 \\
        &amp;= dθ^2\text{,}
      \end{aligned}
      \]
      using the fact that \(θ^1 ∧ θ^1 = θ^2 ∧ θ^2 = 0\). Therefore, by uniqueness of connection forms, this is &lt;em&gt;the&lt;/em&gt; connection form. Then, expressing \(\cnf{1}{2}\) in
      terms of both the basis one-forms and the coordinate forms,
      \[
        \cnf{1}{2} = \frac{∂_v E}{2 E \sqrt{G}} \, θ^1 - \frac{∂_u G}{2 G \sqrt{E}} \, θ^2 = \frac{∂_v E}{2 \sqrt{EG}} \, du - \frac{∂_u G}{2 \sqrt{EG}} \, dv\text{.}
      \]
    (By a very similar method, one can derive the formula stated
    previously for the \(\cnf{i}{j}\) of a diagonal metric.)&lt;/p&gt;
  &lt;/li&gt;

  &lt;li&gt;
    &lt;h3&gt;Curvature forms&lt;/h3&gt;

    &lt;p&gt;Since we only have the single connection form \(\cnf{1}{2}\), there are
      no non-zero \(\cnf{i}{k} ∧ \cnf{k}{j}\) terms, since \(i\), \(j\), and \(k\) would all have to be distinct. Using the expression for
      \(\cnf{1}{2}\) in terms of the coordinate forms \(du\) and \(dv\),
      and that \(d(du) = d(dv) = 0\), the single curvature form is:
      \[
      \begin{aligned}
        \crf{1}{2} = d\cnf{1}{2} &amp;= \pd{}{v} \left( \frac{∂_v E}{2 \sqrt{EG}} \right) dv ∧ du - \pd{}{u} \left( \frac{∂_u G}{2 \sqrt{EG}} \right) du ∧ dv \\
                                 &amp;\begin{alignedat}{2}
                                   &amp;= \, &amp;           -\frac{1}{2} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) &amp; \, du ∧ dv \\
                                   &amp;= \, &amp; -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) &amp; \, θ^1 ∧ θ^2\text{.}
                                  \end{alignedat}
      \end{aligned}
      \]&lt;/p&gt;
  &lt;/li&gt;

  &lt;li&gt;
    &lt;h3&gt;Gaussian curvature&lt;/h3&gt;
    &lt;p&gt;Therefore, we get the classical result that
      the Gaussian curvature \(K\), which is equal to the single independent
      component of the Riemann curvature tensor (up to sign), is
      \[
      \begin{aligned}
        K &amp;= \Riem{1}{212} = \crf{1}{2}(E_1, E_2) \\
          &amp;= -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right) \, (θ^1 ∧ θ^2)(E_1, E_2) \\
          &amp;= -\frac{1}{2 \sqrt{EG}} \left( \pd{}{u} \left( \frac{∂_u G}{\sqrt{EG}} \right) + \pd{}{v} \left( \frac{∂_v E}{\sqrt{EG}} \right) \right)\text{.}
      \end{aligned}
      \]
    &lt;/p&gt;
  &lt;/li&gt;
&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;The semi-Riemannian case&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;As we alluded to above, in the semi-Riemannian case, the
    coordinate methods remain unchanged, but the moving frame method
    gets more complicated. The equation that the one-forms must satisfy becomes
    \[
      g = ∑_i ε_i \, θ^i ⊗ θ^i\text{,}
    \]
    where each \(ε_i\) is \(±1\) throughout the whole chart domain.&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
    For example, in the Riemannian case, we let all \(ε_i = 1\), and
    in the Lorentzian case we let \(ε_0 = -1\) and all other \(ε_i = +1\). (The entire list \((ε_i)\) is called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Metric_signature&quot;&lt;em&gt;signature&lt;/em&gt;&lt;/a&gt; of the metric.)&lt;/p&gt;

  &lt;p&gt;If the metric is diagonal, then each \(g_{ii}\)
    must be non-zero throughout the whole chart domain, so
    \(ε_i = \sgn(g_{ii})\) and (suspending the summation convention)
    \[
      θ^i = ε_i \sqrt{\lvert g_{ii} \rvert} \, dx^i\text{.}
    \]&lt;/p&gt;

  &lt;p&gt;The equivalent definition of the \(θ^i\) becomes
    \[
      g^*(θ^i, θ^j) = ε_i δ^i_j\text{,}
    \]
    where each \(ε_i\) is \(±1\) throughout the whole chart
    domain. Furthermore, the Gram-Schmidt process becomes harder to
    apply; you&amp;rsquo;ll need to find a &lt;em&gt;non-degenerate basis&lt;/em&gt; first; see &lt;a href=&quot;https://math.stackexchange.com/q/2622562/343314&quot;&gt;this Math StackExchange question&lt;/a&gt; for details.&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;Both Cartan structure equations still hold, but the connection
    and curvature forms are not skew symmetric anymore; instead,
    they&amp;rsquo;re &lt;em&gt;semi-skew symmetric&lt;/em&gt;. Suspending the summation convention,
    &lt;div class=&quot;important-equation&quot;&gt;
      \[
      \begin{aligned}
        \cnf{i}{j} &amp;= -ε_i ε_j \cnf{j}{i} \\
        \crf{i}{j} &amp;= -ε_i ε_j \crf{j}{i}\text{.}
      \end{aligned}
      \]
    &lt;/div&gt;
    Fortunately, this still implies that (suspending the Einstein summation convention)
    \[
      \cnf{i}{i} = \crf{i}{i} = 0\text{.}
    \]
  &lt;/div&gt;

  &lt;p&gt;The formula for the connection forms of a diagonal metric becomes
  (suspending the summation convention)
  \[
    \begin{aligned}
      \cnf{i}{j}
        &amp;= \frac{∂_j g_{ii}}{2 g_{ii} \sqrt{g_{jj}}} \, θ^i - ε_i ε_j \frac{∂_i g_{jj}}{2 g_{jj} \sqrt{g_{ii}}} \, θ^j \\
        &amp;= \frac{∂_j g_{ii}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^i - ε_i ε_j \frac{∂_i g_{jj}}{2 \sqrt{g_{ii} g_{jj}}} \, dx^j\text{.}
    \end{aligned}
  \]
  However, none of the deduced properties of diagonal metrics
  depending on one or two coordinates change.&lt;/p&gt;

  &lt;p&gt;Finally, note that the relations
    \[
      (θ^i ∧ θ^j)(E_k, E_l) = \begin{cases}
        +1 &amp; k = i ≠ j = l \\
        -1 &amp; l = i ≠ j = k \\
        0 &amp; \text{otherwise.}
      \end{cases}
    \]
    still hold.&lt;/p&gt;

  &lt;p&gt;As you can tell, the moving frame method forces you to keep
    careful track of signs, which you may count as a disadvantage.&lt;/p&gt;

  &lt;div class=&quot;cheatsheet&quot;&gt;
    &lt;h2&gt;Cheatsheet: The moving frame method for semi-Riemannian metrics&lt;/h2&gt;

    &lt;div class=&quot;p&quot;&gt;Given the components \(g_{ij}\) of a semi-Riemannian metric:
      &lt;ol&gt;
        &lt;li&gt;Find an orthonormal dual frame, i.e. basis one-forms \((θ^1, \dotsc, θ^n)\) such that
          \[
          g = ∑_i ε_i \, θ^i ⊗ θ^i\text{,}
          \]
          where each \(ε_i\) is \(±1\) throughout the whole chart
          domain.  If the metric is diagonal, then (suspending the
          summation convention) \(ε_i = \sgn(g_{ii})\), and
          \[
            θ^i = ε_i \sqrt{\lvert g_{ii} \rvert} \, dx^i\text{.}
          \]&lt;/li&gt;
        &lt;li&gt;Use the first structure equation
          \[
            dθ^i = -\cnf{i}{j} ∧ θ^j
          \]
          and the semi-skew symmetry relations (suspending the summation convention)
          \[
            \cnf{i}{j} = -ε_i ε_j \cnf{j}{i}
          \]
          to deduce the connection forms \(\cnf{i}{j}\).&lt;/li&gt;
        &lt;li&gt;Compute the curvature forms using the second structure equation
          \[
            \crf{i}{j} = d\cnf{i}{j} + \cnf{i}{k} ∧ \cnf{k}{j}
          \]
          and the semi-skew symmetry relations (suspending the summation convention)
          \[
            \crf{i}{j} = -ε_i ε_j \crf{j}{i}\text{.}
          \]
          Note that it&amp;rsquo;s easiest to compute \(d\cnf{i}{j}\) when
          \(\cnf{i}{j}\) is expressed in terms of the \(dx^i\), since
          \(d(dx^i) = 0\)&lt;/li&gt;
        &lt;li&gt;Compute the components of the Ricci curvature tensor via
          \[
            \Ric{ij} = \crf{k}{i}(E_k, E_j)
          \]
          and the relations
          \[
            (θ^i ∧ θ^j)(E_k, E_l) = \begin{cases}
              +1 &amp; k = i ≠ j = l \\
              -1 &amp; l = i ≠ j = k \\
              0 &amp; \text{otherwise.}
            \end{cases}
          \]&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;Example 2: The Schwarzschild metric&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now we&amp;rsquo;re ready to tackle a more complicated metric. For
  our first semi-Riemannian example, let \(g\) be the &lt;a href=&quot;https://en.wikipedia.org/wiki/Schwarzschild_metric&quot;&gt;&lt;em&gt;Schwarzschild metric&lt;/em&gt;&lt;/a&gt;, with line element
  \[
    ds^2 = -f(r) \, dt^2 + f(r)^{-1} \, dr^2 + r^2 \, dΩ^2\text{,}
  \]
  where
  \[
    f(r) = 1 - \frac{r_S}{r}\text{,}
  \]
  \(r_S\) is the Schwarzschild radius, which is constant, and
  \[
    dΩ^2 = dθ^2 + \sin^2 θ \, dφ^2
  \]
  is the line element of the round metric \(\mathring{g}\) on the
  two-sphere. We want to show that this metric is &lt;em&gt;Ricci-flat&lt;/em&gt;,
  i.e. has vanishing Ricci curvature.&lt;/p&gt;

  &lt;p&gt;We can skip some steps by taking advantage of the metric being
    diagonal and depending only on the two coordinates \(r\) and \(θ\),
    but in the interest of showing the general method, we&amp;rsquo;ll do
    everything the &amp;ldquo;hard way&amp;rdquo;, but we&amp;rsquo;ll
    double-check that our results using the properties of diagonal
    metrics we deduced earlier.&lt;/p&gt;

  &lt;ol&gt;
    &lt;li&gt;
      &lt;h3&gt;Orthonormal dual frame&lt;/h3&gt;
      &lt;p&gt;Since the metric is diagonal, we can read off an orthonormal dual
        frame with its corresponding signature:
        \[
          ds^2 =
          \; \underbrace{-}_{ε_0} \;
          {\underbrace{\left(f(r)^{1/2} \, dt\right)}_{ϑ^0}}^2
          \; \underbrace{+}_{ε_1} \;
          {\underbrace{\left(f(r)^{-1/2} \, dr\right)}_{ϑ^1}}^2
          \; \underbrace{+}_{ε_2} \;
          {\underbrace{(r \, dθ)}_{ϑ^2}}^2
          \; \underbrace{+}_{ε_3} \;
          {\underbrace{(r \sin θ \, dφ)}_{ϑ^3}}^2\text{.}
        \]
        i.e.
        \[
        \begin{alignedat}{2}
          ϑ^0 &amp;= \, &amp; f(r)^{1/2}  &amp; \, dt \\
          ϑ^1 &amp;= \, &amp; f(r)^{-1/2} &amp; \, dr \\
          ϑ^2 &amp;= \, &amp; r           &amp; \, dθ \\
          ϑ^3 &amp;= \, &amp; r \sin θ    &amp; \, dφ
        \end{alignedat}
        \]
        with Lorentzian signature \(({-} \; {+} \; {+} \; {+})\). We can then
        express the coordinate forms in terms of it:
        \[
        \begin{alignedat}{2}
          dt &amp;= \, &amp; f(r)^{-1/2}   &amp; \, ϑ^0 \\
          dr &amp;= \, &amp; f(r)^{1/2}    &amp; \, ϑ^1 \\
          dθ &amp;= \, &amp; r^{-1}        &amp; \, ϑ^2 \\
          dφ &amp;= \, &amp; r^{-1} \csc θ &amp; \, ϑ^3\text{.}
        \end{alignedat}
        \]
        Note that since we&amp;rsquo;re using \(θ\) as a coordinate, we use \(ϑ^λ\) to
        denote the basis one-forms. Furthermore, since this metric is Lorentzian, we
        adopt the convention that the index of the first coordinate is \(0\),
        Greek indices start from \(0\), and Latin indices start from \(1\).&lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Connection forms&lt;/h3&gt;
      &lt;p&gt;The derivatives of the basis one-forms are
        \[
        \begin{alignedat}{2}
          dϑ^0 &amp;= \frac{1}{2}f(r)^{-1/2} f&apos;(r) \, dr ∧ dt &amp; &amp;= \frac{1}{2}f(r)^{-1/2} f&apos;(r) \, ϑ^1 ∧ ϑ^0 \\
          dϑ^1 &amp;= 0                                       &amp; &amp; \\
          dϑ^2 &amp;= dr ∧ dθ                                 &amp; &amp;= \frac{f(r)^{1/2}}{r} \, ϑ^1 ∧ ϑ^2 \\
          dϑ^3 &amp;= \sin θ \, dr ∧ dφ + r \cos θ \, dθ ∧ dφ &amp; &amp;= \frac{f(r)^{1/2}}{r} \, ϑ^1 ∧ ϑ^3 + \frac{\cot θ}{r} \, ϑ^2 ∧ ϑ^3\text{.}
        \end{alignedat}
        \]
        By semi-skew symmetry, since \(ε_0 = -1\) and \(ε_i = 1\), \(\cnf{0}{i} = \cnf{i}{0}\) and
        \(\cnf{i}{j} = -\cnf{j}{i}\). Therefore, we can explicitly write out the first structure equations:
        \[
        \begin{alignedat}{4}
          dϑ^0 &amp;=                  &amp; &amp;- \cnf{0}{1} ∧ ϑ^1 &amp; &amp;- \cnf{0}{2} ∧ ϑ^2 &amp; &amp;- \cnf{0}{3} ∧ ϑ^3 \\
          dϑ^1 &amp;= -\cnf{0}{1} ∧ ϑ^0 &amp; &amp;                   &amp; &amp;- \cnf{1}{2} ∧ ϑ^2 &amp; &amp;- \cnf{1}{3} ∧ ϑ^3 \\
          dϑ^2 &amp;= -\cnf{0}{2} ∧ ϑ^0 &amp; &amp;+ \cnf{1}{2} ∧ ϑ^1 &amp; &amp;                   &amp; &amp;- \cnf{2}{3} ∧ ϑ^3 \\
          dϑ^3 &amp;= -\cnf{0}{3} ∧ ϑ^0 &amp; &amp;+ \cnf{1}{3} ∧ ϑ^1 &amp; &amp;+ \cnf{2}{3} ∧ ϑ^2\text{,} &amp; &amp;
        \end{alignedat}
        \]
        and rewriting the derivative equations to match:
        \[
        \begin{alignedat}{3}
          dϑ^0 &amp;= &amp;
            \; -\overbrace{\left(\frac{1}{2}f(r)^{-1/2} f&apos;(r) \, ϑ^0\right)}^{\text{one term of $\cnf{0}{1}$}} &amp;∧ ϑ^1 &amp;
            &amp; \\
          dϑ^1 &amp;= 0 &amp;
            &amp; &amp;
            &amp; \\
          dϑ^2 &amp;= &amp;
            \overbrace{\left(-\frac{f(r)^{1/2}}{r} \, ϑ^2\right)}^{\text{one term of $\cnf{1}{2}$}} &amp;∧ ϑ^1 &amp;
            &amp; \\
          dϑ^3 &amp;= &amp;
            \underbrace{\left(-\frac{f(r)^{1/2}}{r} \, ϑ^3\right)}_{\text{one term of $\cnf{1}{3}$}} &amp;∧ ϑ^1 &amp;
            \; + \; \underbrace{\left( -\frac{\cot θ}{r} \, ϑ^3 \right)}_{\text{one term of $\cnf{2}{3}$}} &amp;∧ ϑ^2\text{,}
        \end{alignedat}
        \]
        we can guess that
        \[
        \begin{alignedat}{2}
          \cnf{0}{1} &amp;= \, &amp; \frac{1}{2} f(r)^{-1/2} f&apos;(r) &amp; \, ϑ^0 \\
          \cnf{1}{2} &amp;= \, &amp;         -\frac{f(r)^{1/2}}{r} &amp; \, ϑ^2 \\
          \cnf{1}{3} &amp;= \, &amp;         -\frac{f(r)^{1/2}}{r} &amp; \, ϑ^3 \\
          \cnf{2}{3} &amp;= \, &amp;             -\frac{\cot θ}{r} &amp; \, ϑ^3\text{.}
        \end{alignedat}
        \]
        Happily, plugging these expressions back into the first
        structure equations, we find that they hold. Therefore, by
        uniqueness of the connection forms, they are &lt;em&gt;the&lt;/em&gt; connection forms.&lt;/p&gt;

      &lt;p&gt;Rather than plugging our guess into the first structure equations, a
        slicker way to see that it works would be to split up the first
        structure equation thus:
        \[
          dϑ^λ = -∑_{λ \lt μ} \cnf{λ}{μ} ∧ ϑ^μ - ∑_{λ &amp;gt; μ} \cnf{λ}{μ} ∧ ϑ^μ\text{,}
        \]
        and notice that our derivative equations have the particularly simple form
        \[
          dϑ^λ =  ∑_{λ \lt μ} (f_μ \, ϑ^λ) ∧ ϑ^μ\text{,}
        \]
        so setting
        \[
          \cnf{λ}{μ} = -f_μ \, ϑ^λ \quad \text{for $λ \lt μ$}
        \]
        takes care of the left sum above. Then by semi-skew symmetry, if \(λ \gt μ\),
        \[
          \lvert \cnf{λ}{μ} ∧ ϑ^μ \rvert = \lvert \cnf{μ}{λ} ∧ ϑ^μ \rvert = \lvert (f_λ \, ϑ^μ) ∧ ϑ^μ \rvert = 0\text{.}
        \]
        Thus all terms in the right sum above vanish as required.&lt;/p&gt;

      &lt;p&gt;Then, expressing the connection forms in terms of both the basis one-forms and
        the coordinate forms,
        \[
        \begin{alignedat}{6}
          \cnf{0}{1} &amp;= &amp;     &amp;\cnf{1}{0} &amp; &amp;= \quad &amp; \frac{1}{2} f(r)^{-1/2} f&apos;(r) \, &amp;ϑ^0 &amp; \quad &amp;= \quad &amp; \frac{1}{2} f&apos;(r) \, &amp;dt \\
          \cnf{2}{1} &amp;= &amp; \; -&amp;\cnf{1}{2} &amp; &amp;= \quad &amp;          \frac{f(r)^{1/2}}{r} \, &amp;ϑ^2 &amp; \quad &amp;= \quad &amp;        f(r)^{1/2} \, &amp;dθ \\
          \cnf{3}{1} &amp;= &amp; \; -&amp;\cnf{1}{3} &amp; &amp;= \quad &amp;          \frac{f(r)^{1/2}}{r} \, &amp;ϑ^3 &amp; \quad &amp;= \quad &amp; f(r)^{1/2} \sin θ \, &amp;dφ \\
          \cnf{3}{2} &amp;= &amp; \; -&amp;\cnf{2}{3} &amp; &amp;= \quad &amp;              \frac{\cot θ}{r} \, &amp;ϑ^3 &amp; \quad &amp;= \quad &amp;            \cos θ \, &amp;dφ \text{.}
        \end{alignedat}
        \]&lt;/p&gt;

      &lt;p&gt;Note that \(\cnf{2}{1}\) has only one component instead of
      two; this is because \(g_{11}\) doesn&amp;rsquo;t depend on \(θ\). The other connection forms are either zero or have only one component, as expected for a diagonal metric depending on two coordinates.&lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Curvature forms&lt;/h3&gt;
      &lt;p&gt;Using the expressions for \(\cnf{μ}{ν}\) in terms of the coordinate
        one-forms, since \(d(dt) = d(dr) = d(dθ) = d(dφ) = 0\), the derivatives of the connection forms are:
        \[
        \begin{aligned}
          d \cnf{0}{1}
            &amp;= \frac{1}{2} f&apos;&apos;(r) \, dr ∧ dt \\
            &amp;= \frac{1}{2} f&apos;&apos;(r) \, ϑ^1 ∧ ϑ^0 \\
          d \cnf{2}{1}
            &amp;= \frac{1}{2} f(r)^{-1/2} f&apos;(r) \, dr ∧ dθ \\
            &amp;= \frac{f&apos;(r)}{2r} \, ϑ^1 ∧ ϑ^2 \\
          d \cnf{3}{1}
            &amp;= \frac{1}{2} f(r)^{-1/2} f&apos;(r) \sin ϑ \, dr ∧ dφ + f(r)^{1/2} \cos θ \, dθ ∧ dφ \\
            &amp;= \frac{f&apos;(r)}{2r} \, ϑ^1 ∧ ϑ^3 + \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 \\
          d \cnf{3}{2}
            &amp;= -\sin θ \, dθ ∧ dφ \\
            &amp;= -\frac{1}{r^2} \, ϑ^2 ∧ ϑ^3\text{.}
        \end{aligned}
        \]
        For \(\cnf{μ}{λ} ∧ \cnf{λ}{ν}\), recalling that one need only sum over \(λ ∉ \{ μ, ν \}\),
        the non-zero terms are
        \[
        \begin{alignedat}{3}
          \cnf{0}{λ} ∧ \cnf{λ}{2} &amp;= \cnf{0}{1} ∧ \cnf{1}{2} &amp; &amp;= \; &amp; -\frac{f&apos;(r)}{2r} \, &amp;ϑ^0 ∧ ϑ^2 \\
          \cnf{0}{λ} ∧ \cnf{λ}{3} &amp;= \cnf{0}{1} ∧ \cnf{1}{3} &amp; &amp;= \; &amp; -\frac{f&apos;(r)}{2r} \, &amp;ϑ^0 ∧ ϑ^3 \\
          \cnf{1}{λ} ∧ \cnf{λ}{3} &amp;= \cnf{1}{2} ∧ \cnf{2}{3} &amp; &amp;= \; &amp; \frac{f(r)^{1/2} \cot θ}{r^2} \, &amp;ϑ^2 ∧ ϑ^3 \\
          \cnf{2}{λ} ∧ \cnf{λ}{3} &amp;= \cnf{2}{1} ∧ \cnf{1}{3} &amp; &amp;= \; &amp; -\frac{f(r)}{r^2} \, &amp;ϑ^2 ∧ ϑ^3\text{.}
        \end{alignedat}
        \]
        Then we can compute the curvature forms:
        \[
        \begin{aligned}
          \crf{0}{1} &amp;= d\cnf{0}{1} = \frac{1}{2} f&apos;&apos;(r) \, ϑ^1 ∧ ϑ^0 \\
          \crf{0}{2} &amp;= \cnf{0}{λ} ∧ \cnf{λ}{2} = -\frac{f&apos;(r)}{2r} \, ϑ^0 ∧ ϑ^2 \\
          \crf{0}{3} &amp;= \cnf{0}{λ} ∧ \cnf{λ}{3} = -\frac{f&apos;(r)}{2r} \, ϑ^0 ∧ ϑ^3 \\
          \crf{1}{2} &amp;= d\cnf{1}{2} = -\frac{f&apos;(r)}{2r} \, ϑ^1 ∧ ϑ^2 \\
          \crf{1}{3} &amp;= d\cnf{1}{3} + \cnf{1}{λ} ∧ \cnf{λ}{3} \\
                     &amp;= -\frac{f&apos;(r)}{2r} \, ϑ^1 ∧ ϑ^3 - \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 + \frac{f(r)^{1/2} \cot θ}{r^2} \, ϑ^2 ∧ ϑ^3 \\
                     &amp;= -\frac{f&apos;(r)}{2r} \, ϑ^1 ∧ ϑ^3 \\
          \crf{2}{3} &amp;= d\cnf{2}{3} + \cnf{2}{λ} ∧ \cnf{λ}{3} \\
                     &amp;= \frac{1}{r^2} \, ϑ^2 ∧ ϑ^3 - \frac{f(r)}{r^2} \, ϑ^2 ∧ ϑ^3 \\
                     &amp;= \frac{1 - f(r)}{r^2} \, ϑ^2 ∧ ϑ^3\text{.}
        \end{aligned}
        \]
      Again by semi-skew symmetry, since \(ε_0 = -1\) and \(ε_i = 1\), \(\crf{0}{i} = \crf{i}{0}\) and
        \(\crf{i}{j} = -\crf{j}{i}\). Therefore,
        \[
        \begin{alignedat}{3}
          \crf{0}{1} &amp;= \; &amp;  \crf{1}{0} &amp;= \; &amp;   \frac{1}{2} f&apos;&apos;(r) \, &amp;ϑ^1 ∧ ϑ^0 \\
          \crf{0}{2} &amp;= \; &amp;  \crf{2}{0} &amp;= \; &amp;    -\frac{f&apos;(r)}{2r} \, &amp;ϑ^0 ∧ ϑ^2 \\
          \crf{0}{3} &amp;= \; &amp;  \crf{3}{0} &amp;= \; &amp;    -\frac{f&apos;(r)}{2r} \, &amp;ϑ^0 ∧ ϑ^3 \\
          \crf{1}{2} &amp;= \; &amp; -\crf{2}{1} &amp;= \; &amp;    -\frac{f&apos;(r)}{2r} \, &amp;ϑ^1 ∧ ϑ^2 \\
          \crf{1}{3} &amp;= \; &amp; -\crf{3}{1} &amp;= \; &amp;    -\frac{f&apos;(r)}{2r} \, &amp;ϑ^1 ∧ ϑ^3 \\
          \crf{2}{3} &amp;= \; &amp; -\crf{3}{2} &amp;= \; &amp; \frac{1 - f(r)}{r^2} \, &amp;ϑ^2 ∧ ϑ^3\text{.}
        \end{alignedat}
        \]
      &lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Ricci curvature&lt;/h3&gt;
      &lt;p&gt;We can compute the Ricci tensor \(\Ric{μν}\) as
        \[
          \Ric{μν} = \Riem{λ}{μλν} = \crf{λ}{μ}(E_λ, E_ν)\text{,}
        \]
        where the \(E_λ\) comprise the dual frame to \(ϑ^λ\). From the relations
        \[
          (θ^μ ∧ θ^ν)(E_ρ, E_σ) = \begin{cases}
           +1 &amp; σ = μ ≠ ν = ρ \\
           -1 &amp; ρ = μ ≠ ν = σ \\
            0 &amp; \text{otherwise,}
          \end{cases}
        \]
        we can examine the expressions above and conclude that \(\crf{ρ}{σ}(E_μ, E_ν)\) is possibly non-zero only when
        \(\{ μ, ν \} = \{ ρ, σ \}\). Furthermore, examining the expression for \(\Ric{μν}\),
        we can further conclude that \(\Ric{μν}\) is zero
        when \(μ ≠ ν\). Therefore, it suffices to check \(\Ric{λλ}\). (One
        of the properties we deduced for a diagonal metric depending
        on two coordinates was that \(\Ric{}\) would be diagonal
        except for possibly \(\Ric{12}\), but since \(\cnf{1}{2}\) turned
        out to not have a \(ϑ^1\) term, that immediately leads to \(\Ric{12} = 0\).)&lt;/p&gt;

      &lt;p&gt;From the expressions above,
        \[
        \begin{aligned}
          \crf{0}{1}(E_0, E_1) &amp;= -\frac{1}{2} f&apos;&apos;(r) \\
          \crf{0}{2}(E_0, E_2) &amp;= \crf{0}{3}(E_0, E_3) = \crf{1}{2}(E_1, E_2) = \crf{1}{3}(E_1, E_3) = -\frac{f&apos;(r)}{2r} \\
          \crf{2}{3}(E_2, E_3) &amp;= \frac{1 - f(r)}{r^2}\text{,}
        \end{aligned}
        \]
        so using the skew symmetry of two-forms
        \[
          \crf{μ}{ν}(E_ρ, E_σ) = -\crf{μ}{ν}(E_σ, E_ρ)
        \]
        and the semi-skew symmetry of \(\crf{μ}{ν}\)
        \[
        \crf{0}{i} = \crf{i}{0} \quad \text{and} \quad \crf{i}{j} = -\crf{j}{i} \text{,}
        \]
        we can compute \(\Ric{λλ}\):
        \[
        \begin{aligned}
          \Ric{00} &amp;= \crf{1}{0}(E_1, E_0) + \crf{2}{0}(E_2, E_0) + \crf{3}{0}(E_3, E_0) \\
                   &amp;= -\crf{0}{1}(E_0, E_1) - \crf{0}{2}(E_0, E_2) - \crf{0}{3}(E_0, E_3) \\
                   &amp;= \frac{1}{2} f&apos;&apos;(r) + \frac{f&apos;(r)}{r} \\
          \Ric{11} &amp;= \crf{0}{1}(E_0, E_1) + \crf{2}{1}(E_2, E_1) + \crf{3}{1}(E_3, E_1) \\
                   &amp;= \crf{0}{1}(E_0, E_1) + \crf{1}{2}(E_1, E_2) + \crf{1}{3}(E_1, E_3) \\
                   &amp;= -\Ric{00} \\
          \Ric{22} &amp;= \crf{0}{2}(E_0, E_2) + \crf{1}{2}(E_1, E_2) + \crf{3}{2}(E_3, E_2) \\
                   &amp;= \crf{0}{2}(E_0, E_2) + \crf{1}{2}(E_1, E_2) + \crf{2}{3}(E_2, E_3) \\
                   &amp;= -\frac{f&apos;(r)}{r} + \frac{1 - f(r)}{r^2} \\
          \Ric{33} &amp;= \crf{0}{3}(E_0, E_3) + \crf{1}{3}(E_1, E_3) + \crf{2}{3}(E_2, E_3) \\
                   &amp;= \Ric{22}\text{.}
        \end{aligned}
        \]&lt;/p&gt;

      &lt;p&gt;Finally, a computation shows that for \(f(r) = 1 - \frac{r_S}{r}\),
        \[
          \frac{1 - f(r)}{r^2} = -\frac{1}{2} f&apos;&apos;(r) = \frac{f&apos;(r)}{r} \text{,}
        \]
        so all the Ricci tensor components above vanish.&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;Example 3: The pp-wave metric&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;For our last example, to keep things interesting, let&amp;rsquo;s consider a non-diagonal metric. Let
  \[
  g = H(u, x, y) \, du ⊗ du + du ⊗ dv + dv ⊗ du + dx ⊗ dx + dy ⊗ dy
  \]
  be the &lt;a href=&quot;https://en.wikipedia.org/wiki/Pp-wave_spacetime&quot;&gt;&lt;em&gt;pp-wave metric&lt;/em&gt;&lt;/a&gt;,
  where \(H(u, x, y)\) is some smooth
  function. We want to derive a necessary and sufficient condition for \(g\) to be Ricci-flat.&lt;/p&gt;

  &lt;ol&gt;
    &lt;li&gt;
      &lt;h3&gt;Orthonormal dual frame&lt;/h3&gt;

      &lt;p&gt;This metric has the matrix
        \[
        G = \begin{pmatrix}
          H &amp; 1 &amp; 0 &amp; 0 \\
          1 &amp; 0 &amp; 0 &amp; 0 \\
          0 &amp; 0 &amp; 1 &amp; 0 \\
          0 &amp; 0 &amp; 0 &amp; 1
        \end{pmatrix}\text{,}
        \]
        which has inverse
        \[
        G^{-1} = \begin{pmatrix}
          0 &amp; 1 &amp; 0 &amp; 0 \\
          1 &amp; -H &amp; 0 &amp; 0 \\
          0 &amp; 0 &amp; 1 &amp; 0 \\
          0 &amp; 0 &amp; 0 &amp; 1
        \end{pmatrix}\text{,}
        \]
        so the dual metric is
        \[
          g^* = ∂_u ⊗ ∂_v + ∂_v ⊗ ∂_u - H(u, x, y) \, ∂_v ⊗ ∂_v + ∂_x ⊗ ∂_x + ∂_y ⊗ ∂_y\text{.}
        \]
        We can see that \(dx\) and \(dy\) form part of an orthonormal dual frame, but
        we have to find the other two, which involve \(du\) and \(dv\). First we
        have to figure out the signature of the metric.

        So set
        \[
        \begin{aligned}
          θ^0 &amp;= A \, du + B \, dv \\
          θ^1 &amp;= C \, du + D \, dv \\
          θ^2 &amp;= dx \\
          θ^3 &amp;= dy\text{,}
        \end{aligned}
        \]
        and solve for \(A\), \(B\), \(C\), and \(D\) using the orthonormality
        conditions
        \[
        \begin{aligned}
          g^*(θ^0, θ^0) &amp;= 2AB - B^2 H = ε_0 \\
          g^*(θ^0, θ^1) &amp;= AD + BC - BDH = 0 \\
          g^*(θ^1, θ^1) &amp;= 2CD - D^2 H = ε_1\text{.}
        \end{aligned}
        \]
        The tricky thing is to pick the \(θ^μ\) without assuming that \(H\) is
        non-zero. The simplest way to do that is to assume that none of the
        coefficients of \(H\) vanish, and, since we have four unknowns (not
        counting \(ε_0\) and \(ε_1\)) and three equations, to set \(B = 1\).
        Then the first equation gives \(A = (ε_0 + H)/2\), the second equation
        gives \(C = D(H - A)\), and plugging everything into the third equation
        gives \(D^2 = -ε_1 / ε_0\), which implies that \(ε_1 = -ε_0\) and
        \(D = ±1\). Set \(ε_0 = -1\) to make the frame have a Lorentzian signature
        \(({-} \; {+} \; {+} \; {+})\), and let \(D = ε\). Then
        \[
        \begin{aligned}
          A &amp;= \frac{H - 1}{2} \\
          B &amp;= 1 \\
          C &amp;= ε\frac{H + 1}{2} \\
          D &amp;= ε\text{.}
        \end{aligned}
        \]
        Setting \(ε = 1\) for symmetry, we finally have
        \[
        \begin{aligned}
          θ^0 &amp;= \frac{H-1}{2} \, du + dv \\
          θ^1 &amp;= \frac{H+1}{2} \, du + dv = θ^0 + du \\
          θ^2 &amp;= dx \\
          θ^3 &amp;= dy
        \end{aligned}
        \]
        and
        \[
        \begin{aligned}
          du &amp;= θ^1 - θ^0 \\
          dx &amp;= θ^2 \\
          dy &amp;= θ^3\text{;}
        \end{aligned}
        \]
        it&amp;rsquo;ll turn out that we don&amp;rsquo;t need to express \(dv\) in terms of
        the \(θ^μ\).&lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Connection forms&lt;/h3&gt;

      &lt;p&gt;Since
        \[
        \begin{aligned}
          θ^1 &amp;= θ^0 + du \\
          θ^2 &amp;= dx \\
          θ^3 &amp;= dy\text{,}
        \end{aligned}
        \]
        the derivatives of the basis one-forms are
        \[
        \begin{aligned}
          dθ^0 &amp;= dθ^1 = \frac{1}{2} (H_x \, dx + H_y \, dy) ∧ du \\
               &amp;= \frac{H_x}{2} \, θ^2 ∧ θ^1 - \frac{H_x}{2} \, θ^2 ∧ θ^0 + \frac{H_y}{2} \, θ^3 ∧ θ^1 - \frac{H_y}{2} \, θ^3 ∧ θ^0 \\
          dθ^2 &amp;= 0 \\
          dθ^3 &amp;= 0\text{.}
        \end{aligned}
        \]
      &lt;/p&gt;

      &lt;p&gt;Similarly to the Schwarzschild example, by semi-skew symmetry,
      since \(ε_0 = -1\) and \(ε_i = 1\), \(\cnf{0}{i} = \cnf{i}{0}\) and
        \(\cnf{i}{j} = -\cnf{j}{i}\). Therefore, we can explicitly
        write out the first structure equations:
        \[
        \begin{alignedat}{4}
          dθ^0 &amp;=                  &amp; &amp;- \cnf{0}{1} ∧ θ^1 &amp; &amp;- \cnf{0}{2} ∧ θ^2 &amp; &amp;- \cnf{0}{3} ∧ θ^3 \\
          dθ^1 &amp;= -\cnf{0}{1} ∧ θ^0 &amp; &amp;                   &amp; &amp;- \cnf{1}{2} ∧ θ^2 &amp; &amp;- \cnf{1}{3} ∧ θ^3 \\
          dθ^2 &amp;= -\cnf{0}{2} ∧ θ^0 &amp; &amp;+ \cnf{1}{2} ∧ θ^1 &amp; &amp;                   &amp; &amp;- \cnf{2}{3} ∧ θ^3 \\
          dθ^3 &amp;= -\cnf{0}{3} ∧ θ^0 &amp; &amp;+ \cnf{1}{3} ∧ θ^1 &amp; &amp;+ \cnf{2}{3} ∧ θ^2\text{.} &amp; &amp;
        \end{alignedat}
        \]
        However, unlike the Schwarzschild example, we can&amp;rsquo;t
        simply read off the non-zero connection forms; for example, it&amp;rsquo;s not immediately clear whether the \(\frac{H_x}{2} \, θ^2 ∧ θ^1\) term in \(dθ^0\) belongs
        to the \(\cnf{0}{1} ∧ θ^1\) term or the \(\cnf{0}{2} ∧ θ^2\) term. However, since \(dθ^0 = dθ^1\), we can guess that \(\cnf{0}{2} = \cnf{1}{2}\)
        and \(\cnf{0}{3} = \cnf{1}{3}\). Subtracting the first structure equations for \(dθ^1\) and \(dθ^0\), we get
        \[
          \cnf{0}{1} ∧ (θ^1 - θ^0) = 0\text{,}
        \]
        i.e. that \(\cnf{0}{1} ∼ θ^1 - θ^0\). However, plugging this
        into the first structure equation for \(dθ^0\) or \(dθ^1\), we get a \(θ^0
        ∧ θ^1\) term, which isn&amp;rsquo;t present in the derivative
        equation for \(dθ^0 = dθ^1\), which then implies that \(\cnf{0}{1} = 0\). Thus,
        there&amp;rsquo;s only one way to assign each term of the
        derivative equation for \(dθ^0 = dθ^1\) to \(\cnf{0}{2} ∧ θ^2\) or \(\cnf{0}{3} ∧ θ^3\): 
        \[
        \begin{aligned}
          \cnf{0}{2} &amp;= \cnf{1}{2} = -\frac{H_x}{2} \, (θ^1 - θ^0) = -\frac{H_x}{2} \, du \\
          \cnf{0}{3} &amp;= \cnf{1}{3} = -\frac{H_y}{2} \, (θ^1 - θ^0) = -\frac{H_y}{2} \, du\text{.}
        \end{aligned}
        \]
        Plugging this into the structure equations for \(dθ^2\) and \(dθ^3\), we get
        \[
        \begin{aligned}
          dθ^2 &amp;= -\cnf{0}{2} ∧ θ^0 + \cnf{1}{2} ∧ θ^1 - \cnf{2}{3} ∧ θ^3 \\
               &amp;= \cnf{0}{2} ∧ du - \cnf{2}{3} ∧ θ^3 \\
               &amp;= -\frac{H_x}{2} \, du ∧ du - \cnf{2}{3} ∧ θ^3 \\
               &amp;= -\cnf{2}{3} ∧ θ^3 \\
          dθ^3 &amp;= -\cnf{0}{3} ∧ θ^0 + \cnf{1}{3} ∧ θ^1 + \cnf{2}{3} ∧ θ^2 \\
               &amp;= \cnf{0}{3} ∧ du + \cnf{2}{3} ∧ θ^2 \\
               &amp;= -\frac{H_y}{2} \, du ∧ du + \cnf{2}{3} ∧ θ^2 \\
               &amp;= \cnf{2}{3} ∧ θ^2\text{.}
        \end{aligned}
        \]
        Since \(dθ^2 = dθ^3 = 0\) from the derivative equations, \(\cnf{2}{3}\) is proportional to both \(θ^2\) and \(θ^3\), i.e. \(\cnf{2}{3} = 0\). We&amp;rsquo;ve found expressions for \(\cnf{μ}{ν}\) that
        satisfy the first structure equations. Therefore, by
        uniqueness of the connection forms, these expressions are &lt;em&gt;the&lt;/em&gt; connection forms. Then, expressing the connection forms in terms of both the basis one-forms and
        the coordinate forms,
        \[
        \begin{aligned}
          \cnf{0}{2} &amp;= \cnf{2}{0} = \cnf{1}{2} = -\cnf{2}{1} = -\frac{H_x}{2} \, (θ^1 - θ^0) = -\frac{H_x}{2} \, du \\
          \cnf{0}{3} &amp;= \cnf{3}{0} = \cnf{1}{3} = -\cnf{3}{1} = -\frac{H_y}{2} \, (θ^1 - θ^0) = -\frac{H_y}{2} \, du\text{.}
        \end{aligned}
        \]&lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Curvature forms&lt;/h3&gt;

      &lt;p&gt;Using the expressions for \(\cnf{μ}{ν}\) in terms of the coordinate
        one-forms, since \(d(du) = 0\), the derivative of \(\cnf{0}{2} = \cnf{1}{2}\) is
        \[
        \begin{aligned}
          d\cnf{0}{2} &amp;= d\cnf{1}{2} = -\frac{1}{2} \, dH_x ∧ du \\
                      &amp;= -\frac{1}{2} (H_{xx} \, dx + H_{xy} \, dy) ∧ du \\
                      &amp;= -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0)
        \end{aligned}
        \]
        and similarly the derivative of \(\cnf{0}{3} = \cnf{1}{3}\) is
        \[
          d\cnf{0}{3} = d\cnf{1}{3} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.}
        \]
        Since
        all the connection forms are proportional to \(du\), all possible
        sums \(\cnf{μ}{λ} ∧ \cnf{λ}{ν}\) equal \(0\). Then we can compute the
        curvature forms:
        \[
        \begin{aligned}
          \crf{0}{2} &amp;= \crf{1}{2} = -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0) \\
          \crf{0}{3} &amp;= \crf{1}{3} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.}
        \end{aligned}
        \]
      Again by semi-skew symmetry, since \(ε_0 = -1\) and \(ε_i = 1\), \(\crf{0}{i} = \crf{i}{0}\) and
        \(\crf{i}{j} = -\crf{j}{i}\). Therefore,
        \[
        \begin{aligned}
          \crf{0}{2} &amp;= \crf{2}{0} = \crf{1}{2} = -\crf{2}{1} = -\frac{1}{2} (H_{xx} \, θ^2 + H_{xy} \, θ^3) ∧ (θ^1 - θ^0) \\
          \crf{0}{3} &amp;= \crf{3}{0} = \crf{1}{3} = -\crf{3}{1} = -\frac{1}{2} (H_{xy} \, θ^2 + H_{yy} \, θ^3) ∧ (θ^1 - θ^0)\text{.}
        \end{aligned}
        \]
      &lt;/p&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;h3&gt;Ricci curvature&lt;/h3&gt;

      &lt;p&gt;We can compute the Ricci tensor \(\Ric{μν}\) as
        \[
          \Ric{μν} = \Riem{λ}{μλν} = \crf{λ}{μ}(E_λ, E_ν)\text{,}
        \]
        where the \(E_λ\) comprise the dual frame to \(ϑ^λ\). First, using the relations
        \[
          (θ^μ ∧ θ^ν)(E_ρ, E_σ) = \begin{cases}
           +1 &amp; σ = μ ≠ ν = ρ \\
           -1 &amp; ρ = μ ≠ ν = σ \\
            0 &amp; \text{otherwise,}
          \end{cases}
        \]
        we compute
        \[
          \Ric{0ν} = \crf{λ}{0}(E_λ, E_ν) = \crf{0}{λ}(E_λ, E_ν) = \crf{0}{2}(E_2, E_ν) + \crf{0}{3}(E_3, E_ν)
        \]
        and see that it&amp;rsquo;s only non-zero for \(ν ∈ \{ 0, 1 \}\); furthermore, \(\Ric{01} = -\Ric{00}\). Similarly,
        \[
          \Ric{1ν} = \crf{λ}{1}(E_λ, E_ν) = -\crf{1}{λ}(E_λ, E_ν) = -\crf{0}{λ}(E_λ, E_ν) = -\Ric{0ν}\text{.}
        \]
        For the last two, we can save some effort by calculating \((θ^1 - θ^0)(E_0 + E_1) = 0\), which implies
        \[
          (θ^μ ∧ (θ^1 - θ^0))(E_ν, E_0 + E_1) = 0\text{.}
        \]
        Then, using skew symmetry of two-forms
        \[
          \crf{μ}{ν}(E_ρ, E_σ) = -\crf{μ}{ν}(E_σ, E_ρ)\text{,}
        \]
        we compute
        \[
          \Ric{2ν} = \crf{λ}{2}(E_λ, E_ν) = -\crf{λ}{2}(E_ν, E_λ) = -\crf{0}{2}(E_ν, E_0) - \crf{1}{2}(E_ν, E_1) = -\crf{0}{2}(E_ν, E_0 + E_1) = 0
        \]
        and
        \[
          \Ric{3ν} = \crf{λ}{3}(E_λ, E_ν) = -\crf{λ}{3}(E_ν, E_λ) = -\crf{0}{3}(E_ν, E_0) - \crf{1}{3}(E_ν, E_1) = -\crf{0}{3}(E_ν, E_0 + E_1) = 0\text{,}
        \]
        so it suffices to compute \(\Ric{00}\):
        \[
        \begin{aligned}
          \Ric{00} &amp;= \crf{0}{2}(E_2, E_0) + \crf{0}{3}(E_3, E_0) \\
                   &amp;= \frac{1}{2} (H_{xx} + H_{yy})\text{.}
        \end{aligned}
        \]
        Finally, we can conclude that the pp-wave metric is Ricci flat
        exactly when
        \[
          H_{xx} + H_{yy} = 0\text{.}
        \]&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;Further reading&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;The classic reference for the method of moving frames is
  Volume&amp;nbsp;2, Chapter&amp;nbsp;7 of Spivak&amp;rsquo;s &amp;ldquo;A
  Comprehensive Introduction to Differential Geometry&amp;rdquo;. However,
  this only covers the Riemannian case. For the semi-Riemannian case,
  look to §&amp;nbsp;1.8 of O&amp;rsquo;Neill&amp;rsquo;s &amp;ldquo;The Geometry of
  Kerr Black Holes&amp;rdquo;, or §&amp;nbsp;14.6 of &lt;a href=&quot;https://en.wikipedia.org/wiki/Gravitation_(book)&quot;&gt;Gravitation&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;

&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] A metric can only be diagonal &lt;em&gt;with respect to a
  particular coordinate system&lt;/em&gt;, but for brevity I&amp;rsquo;ll only mention it here. &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] See p.&amp;nbsp;52 of &lt;em&gt;The Geometry of Kerr Black Holes&lt;/em&gt; by Barret O&amp;lsquo;Neill. &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] The paper &lt;a href=&quot;https://arxiv.org/abs/gr-qc/9602015&quot;&gt;&amp;ldquo;Ricci Tensor of Diagonal Metric&amp;rdquo;&lt;/a&gt; has
  a similar discussion using coordinate methods; note that the
  calculations are much more laborious! &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] One subtle technical point is that there might not be such an expression for \(g\) throughout the whole chart domain; see &lt;a href=&quot;https://math.stackexchange.com/q/2625887/343314&quot;&gt;this Math StackExchange question&lt;/a&gt; for
    details. In practice, though, this doesn&amp;rsquo;t turn out to be a
    problem. &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] The Schwarzschild metric describes the field outside
    a spherically symmetric and non-rotating massive body.
    If we let \(f(r)\) have an \(r^{-2}\) term, e.g.
    \[
      f(r) = 1 - \frac{r_S}{r} + \frac{r_Q^2}{r^2}
    \]
    for some constant \(r_Q\), then we have non-vanishing Ricci
    components. However, this metric, called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Reissner%E2%80%93Nordstr%C3%B6m_metric&quot;&gt;Reissner–Nordström metric&lt;/a&gt;,
    is still useful, as it describes a &lt;em&gt;charged&lt;/em&gt;, spherically
    symmetric, non-rotating massive body. &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/intro-erasure-codes</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/intro-erasure-codes"/>
    <title>A Gentle Introduction to Erasure Codes</title>
    <updated>2017-11-30T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script src=&quot;https://unpkg.com/preact@8.2.7&quot;&gt;&lt;/script&gt;

&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn2.js&quot;&gt;&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/arithmetic.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/math.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/field_257.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/field_256.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/rational.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/matrix.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/cauchy_erasure_code.js&quot;&gt;&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/demo_common.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless_demo_common.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/matrix_demo_common.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/erasure_code_demo_common.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless_div_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/row_reduce.js&quot;&gt;&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless_add_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless_mul_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/carryless_div_demo_util.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/field_257_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/field_256_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/cauchy_matrix_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/matrix_inverse_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/compute_parity_demo.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/intro-erasure-codes@8d5e10f/reconstruct_data_demo.js&quot;&gt;&lt;/script&gt;

&lt;script&gt;
KaTeXMacros = {
  &quot;\\clplus&quot;: &quot;\\oplus&quot;,
  &quot;\\clminus&quot;: &quot;\\ominus&quot;,
  &quot;\\clmul&quot;: &quot;\\otimes&quot;,
  &quot;\\cldiv&quot;: &quot;\\oslash&quot;,
  &quot;\\bclmod&quot;: &quot;\\mathbin{\\mathrm{clmod}}&quot;,
};
&lt;/script&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;1. Overview&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;This article explains Reed-Solomon erasure codes and the problems
  they solve in gory detail, with the aim of providing enough
  background to understand how the &lt;a href=&quot;https://en.wikipedia.org/wiki/Parchive&quot;&gt;PAR1
  and PAR2&lt;/a&gt; file formats work, the details of which will be covered in
  future articles.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m assuming that the reader is familiar with programming,
  but has not had much exposure to coding theory or linear
  algebra. Thus, I&amp;rsquo;ll review the basics and treat the results we
  need as a &amp;ldquo;black box&amp;rdquo;, stating them and moving
  on. However, I&amp;rsquo;ll give self-contained proofs of those results
  in a companion article.&lt;/p&gt;

&lt;p&gt;So let&amp;rsquo;s start with the problem we&amp;rsquo;re trying to
  solve! Let&amp;rsquo;s say you have \(n\) files of roughly the
  same size, and you want to guard against \(m\) of them being
  lost or corrupted. To do so, you generate \(m\)
  &lt;em&gt;parity files&lt;/em&gt;
  ahead of time, and if in the future you lose up to \(m\) of the data
  files, you can use an equal number of parity files to recover the
  lost data files.&lt;/p&gt;

&lt;style&gt;
.fig {
  display: flex;
  flex-flow: row;
  width: 100%;
}

.fig img {
  border: 1px solid black;
  height: auto;
}

.fig div.column {
  display: flex;
  align-items: center;
  flex-flow: column;
  flex-grow: 1;
  justify-content: center;
}

#fig1 div.column &gt; div {
  margin: 0.5em;
}

#fig1 img {
  width: 9.375em;
}

#fig2 img {
  margin: 0.5em 0em;
  width: 6.25em;
}
&lt;/style&gt;

&lt;figure&gt;
  &lt;div class=&quot;fig&quot; id=&quot;fig1&quot;&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;div&gt;
        &lt;div&gt;&lt;code&gt;cashcat0.jpg&lt;/code&gt;&lt;/div&gt;
        &lt;img src=&quot;intro-erasure-codes-files/cashcat0.jpg&quot; /&gt;
      &lt;/div&gt;
      &lt;div&gt;
        &lt;div&gt;&lt;code&gt;cashcat1.jpg&lt;/code&gt;&lt;/div&gt;
        &lt;img src=&quot;intro-erasure-codes-files/cashcat1.jpg&quot; /&gt;
      &lt;/div&gt;
      &lt;div&gt;
        &lt;div&gt;&lt;code&gt;cashcat2.jpg&lt;/code&gt;&lt;/div&gt;
        &lt;img src=&quot;intro-erasure-codes-files/cashcat2.jpg&quot; /&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;div&gt;\(\xmapsto{\mathtt{GenerateParityFiles}}\)&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;div&gt;
        &lt;div&gt;&lt;code&gt;cashcats.p00&lt;/code&gt;&lt;/div&gt;
        &lt;img src=&quot;intro-erasure-codes-files/cashcats.p00.png&quot; /&gt;
      &lt;/div&gt;
      &lt;div&gt;
        &lt;div&gt;&lt;code&gt;cashcats.p01&lt;/code&gt;&lt;/div&gt;
        &lt;img src=&quot;intro-erasure-codes-files/cashcats.p01.jpg&quot; /&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;

  &lt;figcaption&gt;
    &lt;span class=&quot;figure-text&quot;&gt;Figure 1&lt;/span&gt;&amp;ensp; Using
    parity codes to protect against the loss or corruption of
    up to two images (out of three) of &lt;a href=&quot;https://twitter.com/CatsAndMoney&quot;&gt;cashcats&lt;/a&gt;.
  &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;figure&gt;
  &lt;div class=&quot;fig&quot; id=&quot;fig2&quot;&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcat0-glitched.png&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcat1.jpg&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/broken-image.png&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcats.p00.png&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcats.p01.jpg&quot; /&gt;
    &lt;/div&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;div&gt;\(\xmapsto{\mathtt{ReconstructDataFiles}}\)&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;column&quot;&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcat0.jpg&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcat1.jpg&quot; /&gt;
      &lt;img src=&quot;intro-erasure-codes-files/cashcat2.jpg&quot; /&gt;
    &lt;/div&gt;
  &lt;/div&gt;

  &lt;figcaption&gt;
    &lt;span class=&quot;figure-text&quot;&gt;Figure 2&lt;/span&gt;&amp;ensp; With a
    corrupted and a missing file, recovering the original
    cashcat images using the parity files from Figure&amp;nbsp;1.
  &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Note that this works even if you lose some of the parity files
  also; as long as you have \(n\) files, whether they be data or
  parity files, you&amp;rsquo;ll be able to recover the original \(n\)
  data files. Compare making \(n\) parity files with simply making a
  copy of the \(n\) data files (for \(n &gt; 1\)). In the latter case, if
  you lose both a data file and its copy, that data file becomes
  unrecoverable! So parity files take the same amount of space but
  provide superior recovery capabilities.&lt;/p&gt;

&lt;p&gt;Now we can reduce the problem above to a byte-level problem
  as follows. Have &lt;code&gt;ComputeParityFiles&lt;/code&gt; pad all the
  data files so they&amp;rsquo;re the same size, and then for each
  byte position &lt;code&gt;i&lt;/code&gt; call a function &lt;code&gt;ComputeParityBytes&lt;/code&gt; on the &lt;code&gt;i&lt;/code&gt;th
  byte of each data file, and store the results into the &lt;code&gt;i&lt;/code&gt;th
  byte of each parity file. Also take a checksum or hash of
  each data file and store those (along with the original data
  file sizes) with the parity files. Then, &lt;code&gt;ReconstructDataFiles&lt;/code&gt;
  can detect corrupted files using the checksums/hashes and
  treat them as missing, and then for each byte position &lt;code&gt;i&lt;/code&gt; it
  can call a function &lt;code&gt;ReconstructDataBytes&lt;/code&gt; on the &lt;code&gt;i&lt;/code&gt;th
  byte of each good data and parity file to recover the &lt;code&gt;i&lt;/code&gt;th byte of the corrupted/missing data files.&lt;/p&gt;

&lt;p&gt;A byte error where we &lt;em&gt;know&lt;/em&gt; the position of the
  dropped/corrupted byte is called an &lt;em&gt;erasure&lt;/em&gt;. Then, the pair
  of functions &lt;code&gt;ComputeParityBytes&lt;/code&gt; and &lt;code&gt;ReconstructDataBytes&lt;/code&gt; which
  behave as described above implements what is called an &lt;a href=&quot;https://en.wikipedia.org/wiki/Erasure_code#Optimal_erasure_codes&quot;&gt;&lt;em&gt;optimal erasure code&lt;/em&gt;&lt;/a&gt;;
  it&amp;rsquo;s an erasure code because it guards only against byte
  erasures, and not more general errors where we don&amp;rsquo;t know
  which data bytes have been corrupted, and it&amp;rsquo;s optimal because
  in general you need at least \(n\) known bytes to recover the \(n\)
  data bytes, and that bound is achieved.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;In detail, an optimal erasure code is composed of some
  set of possible \((n, m)\) pairs, and for each possible pair, a
  function

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ComputeParityBytes&amp;lt;n, m&amp;gt;(data: byte[n]) -&gt; (parity: byte[m])&lt;/code&gt;&lt;/pre&gt;

  that computes \(m\) parity bytes given \(n\) data bytes, and a
  function

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ReconstructDataBytes&amp;lt;n, m&amp;gt;(partialData: (byte?)[n], partialParity: (byte?)[m]) -&gt; ((data: byte[n]) | Error)&lt;/code&gt;&lt;/pre&gt;

  that takes in a partial list of data and parity bytes from an
  earlier call to &lt;code&gt;ComputeParity&lt;/code&gt;, and returns the full
  list of data bytes if there are at least \(n\) known data or parity
  bytes (i.e., there are no more than \(m\) omitted data or parity
  bytes), and an error otherwise.&lt;/div&gt;

&lt;p&gt;(In the above pseudocode, I&amp;rsquo;m using &lt;code&gt;T[n]&lt;/code&gt; to mean an array of &lt;code&gt;n&lt;/code&gt; objects of type &lt;code&gt;T&lt;/code&gt;,
  and &lt;code&gt;byte?&lt;/code&gt; to mean &lt;code&gt;byte | None&lt;/code&gt;. Also, I&amp;rsquo;ll omit the &lt;code&gt;-Bytes&amp;lt;n, m&amp;gt;&lt;/code&gt; suffix
  from now on.)&lt;/p&gt;

&lt;p&gt;By the end of this article, we&amp;rsquo;ll find out exactly how the
  following example works:&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Example 1: &lt;code&gt;ComputeParity&lt;/code&gt; and &lt;code&gt;ReconstructData&lt;/code&gt;&lt;/h3&gt;

  &lt;div class=&quot;interactive-example&quot; id=&quot;computeParityDemo&quot;&gt;
    &lt;h3&gt;&lt;code&gt;ComputeParity&lt;/code&gt;&lt;/h3&gt;
    Let

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;d&lt;/var&gt; = [ da, db, 0d ]
    &lt;/span&gt;

    be the input data bytes and let

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;m&lt;/var&gt; = 2
    &lt;/span&gt;

    be the desired parity byte count. Then the output parity bytes
    are

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;p&lt;/var&gt; = [ &lt;span class=&quot;result&quot;&gt;52&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;0c&lt;/span&gt; ].
    &lt;/span&gt;
  &lt;/div&gt;
  &lt;script&gt;
  &apos;use strict&apos;;
  (function() {
    const { h, render } = window.preact;
    const root = document.getElementById(&apos;computeParityDemo&apos;);
    render(h(ComputeParityDemo, {
      initialD: &apos;da, db, 0d&apos;, initialM: &apos;2&apos;,
      name: &apos;computeParityDemo&apos;,
      detailed: false,
      header: h(&apos;h3&apos;, {}, h(&apos;code&apos;, {}, &apos;ComputeParity&apos;)),
      containerClass: &apos;interactive-example&apos;,
      inputClass: &apos;parameter&apos;,
      resultColor: &apos;#268bd2&apos;, // solarized blue
    }), root.parent, root);
  })();
  &lt;/script&gt;

  &lt;br /&gt;

  &lt;div class=&quot;interactive-example&quot; id=&quot;reconstructDataDemo&quot;&gt;
    Let

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;d&lt;/var&gt;&lt;sub&gt;partial&lt;/sub&gt; = [ ??, db, ?? ]
    &lt;/span&gt;

    be the input partial data bytes and

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;p&lt;/var&gt;&lt;sub&gt;partial&lt;/sub&gt; = [ 52, 0c ]
    &lt;/span&gt;

    be the input partial parity bytes. Then the output data bytes are

    &lt;span style=&quot;white-space: nowrap;&quot;&gt;
      &lt;var&gt;d&lt;/var&gt; = [ &lt;span class=&quot;result&quot;&gt;da&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;db&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;0d&lt;/span&gt; ].
    &lt;/span&gt;
  &lt;/div&gt;
  &lt;script&gt;
  &apos;use strict&apos;;
  (function() {
    const { h, render } = window.preact;
    const root = document.getElementById(&apos;reconstructDataDemo&apos;);
    render(h(ReconstructDataDemo, {
      initialPartialD: &apos;??, db, ??&apos;, initialPartialP: &apos;52, 0c&apos;,
      name: &apos;reconstructDataDemo&apos;,
      detailed: false,
      header: h(&apos;h3&apos;, {}, h(&apos;code&apos;, {}, &apos;ReconstructData&apos;)),
      containerClass: &apos;interactive-example&apos;,
      inputClass: &apos;parameter&apos;,
      resultColor: &apos;#268bd2&apos;, // solarized blue
    }), root.parent, root);
  })();
  &lt;/script&gt;
&lt;/div&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;2. Erasure codes for \(m = 1\)&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;The simplest erasure codes are when \(m = 1\). For example, define

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ComputeParitySum(data: byte[n]) {
  return [data[0] + &amp;hellip; + data[n-1]]
}&lt;/code&gt;&lt;/pre&gt;

  where we consider &lt;code&gt;byte&lt;/code&gt; to be an unsigned type such that
  addition and subtraction wrap around, i.e. byte arithmetic is done
  modulo \(256\). Then also define

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ReconstructDataSum(partialData: (byte?)[n], partialParity: (byte?)[1]) {
  if &lt;em&gt;there is more than one entry of partialData or partialParity set to None&lt;/em&gt; {
    return Error
  } else if &lt;em&gt;partialData has no entry set to None&lt;/em&gt; {
    return partialData
  }

  i := partialData.firstIndexOf(None);
  partialSum = partialData[0] + &amp;hellip; + partialData[i-1] + partialData[i+1] + &amp;hellip; + partialData[n-1]
  return partialData[0:i] ++ [partialParity[0] - partialSum] ++ partialData[i+1:n]
}&lt;/code&gt;&lt;/pre&gt;

  where &lt;code&gt;a[i:j]&lt;/code&gt; means the subarray of &lt;code&gt;a&lt;/code&gt; starting at &lt;code&gt;i&lt;/code&gt; and
  ending (without inclusion) at &lt;code&gt;j&lt;/code&gt;, and &lt;code&gt;++&lt;/code&gt; is array concatenation.&lt;/div&gt;

&lt;p&gt;This simple erasure code uses the fact that if you have the sum of
  a list of numbers, then you can recover a missing number by
  subtracting the sum of the other numbers from the total sum, and
  also that this works even if you do the arithmetic modulo \(256\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Another erasure code for \(m = 1\) uses &lt;a href=&quot;https://en.wikipedia.org/wiki/Exclusive_or#Bitwise_operation&quot;&gt;bitwise exclusive
  or&lt;/a&gt; (denoted as xor, &lt;code&gt;^&lt;/code&gt;, or \(\oplus\)) instead
  of arithmetic modulo \(256\). Define

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ComputeParityXor(data: byte[n]) {
  return [data[0] &amp;oplus; &amp;hellip; &amp;oplus; data[n-1]]
}&lt;/code&gt;&lt;/pre&gt;

  and

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ReconstructDataXor(partialData: (byte?)[n], partialParity: (byte?)[1]) {
  if &lt;em&gt;there is more than one entry of partialData or partialParity set to None&lt;/em&gt; {
    return Error
  } else if &lt;em&gt;partialData has no entry set to None&lt;/em&gt; {
    return partialData
  }

  i := partialData.firstIndexOf(None);
  partialXor = partialData[0] &amp;oplus; &amp;hellip; &amp;oplus; partialData[i-1] &amp;oplus; partialData[i&amp;oplus;1] &amp;oplus; &amp;hellip; &amp;oplus; partialData[n-1]
  return partialData[0:i] ++ [partialParity[0] &amp;oplus; partialXor] ++ partialData[i+1:n]
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This relies on the fact that \(a \oplus a = 0\), so given the xor
  of a list of bytes, you can recover a missing byte by xoring with
  all the known bytes.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;3. Erasure codes for \(m = 2\) (almost)&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now coming up with an erasure code for \(m = 2\) is more involved,
  but we can get an inkling of how it could work by letting \(n = 3\)
  for simplicity, and also letting the output of &lt;code&gt;ComputeParity&lt;/code&gt; be
  non-negative integers, instead of just bytes (i.e., less than
  \(256\)). In that case, we can consider parity numbers that are
  weighted sums of the data bytes. For example, like in the \(m = 1\)
  case, we can have the first parity number be

  \[
    p_0 = d_0 + d_1 + d_2\text{,}
  \]

  (using \(d_i\) for data bytes and \(p_i\) for parity numbers)
  but for the second parity number, we can pick different weights, say

  \[
    p_1 = 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2\text{.}
  \]

  We want to make sure that the weights for the second parity number
  are &amp;ldquo;sufficiently different&amp;rdquo; from that of the first
  parity number, which we&amp;rsquo;ll clarify later, but for example note
  that setting

  \[
    p_1 = 2 \cdot d_0 + 2 \cdot d_1 + 2 \cdot d_2
  \]

  can&amp;rsquo;t add any new information, since then \(p_1\) will
  always be equal to \(2 \cdot p_0\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;So then our &lt;code&gt;ComputeParity&lt;/code&gt; function looks like

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ComputeParityWeighted(data: byte[3]) {
  return [
    int(data[0]) +     int(data[1]) +     int(data[2]),
    int(data[0]) + 2 * int(data[1]) + 3 * int(data[2]),
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;As for &lt;code&gt;ReconstructData&lt;/code&gt;, if we have two missing data bytes,
  say \(d_i\) and \(d_j\) for \(i &amp;lt; j\), and \(p_0\) and \(p_1\),
  we can rearrange the equations

  \[
    \begin{aligned}
      p_0 &amp;= d_0 + d_1 + d_2 \\
      p_1 &amp;= 1 \cdot d_0 + 2 \cdot d_1 + 3 \cdot d_2
    \end{aligned}
  \]

  to get all the unknowns on the left side, letting \(d_k\) be the known data byte:

  \[
    \begin{aligned}
      d_i + d_j &amp;= X = p_0 - d_k \\
      (i+1) \cdot d_i + (j+1) \cdot d_j &amp;= Y = p_1 - (k + 1) \cdot d_k\text{.}
    \end{aligned}
  \]

  We can then multiply the first equation by \(i + 1\) and
  subtract it from the second to cancel the \(d_i\) term and get

  \[
    d_j = (Y - (i + 1) \cdot X) / (j - i)\text{,}
  \]

  and then we can use the first equation to solve for \(d_i\):

  \[
    d_i = X - d_j = ((j + 1) \cdot X - Y) / (j - i)\text{.}
  \]

  Thus with these equations, we can implement &lt;code&gt;ReconstructData&lt;/code&gt;:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ReconstructDataWeighted(partialData: (byte?)[3], partialParity: (int?)[2]) {
  &lt;em&gt;Handle all cases except when there are exactly two entries set to none in partialData.&lt;/em&gt;

  [i, j] := &lt;em&gt;indices of the unknown data bytes&lt;/em&gt;
  k := &lt;em&gt;index of the known data byte&lt;/em&gt;

  X := partialParity[0] - partialData[k]
  Y := partialParity[1] - (k + 1) * partialData[k];

  d_i := ((j + 1) * X - Y) / (j - i)
  d_j := (Y - (i + 1) * X) / (j - i)

  return &lt;em&gt;an array with d_i, d_j, and d[k] in the right order&lt;/em&gt;
}&lt;/code&gt;&lt;/pre&gt;

  (Generalizing this to larger values of \(n\) is straightforward;
  \(p_0\) will still have a weight of \(1\) for each data byte, and
  \(p_1\) will have a weight of \(i + 1\) for \(d_i\). \(X\) and \(Y\)
  will then have terms for all known bytes, and everything else
  proceeds the same after that.)&lt;/div&gt;

&lt;p&gt;Now what goes wrong if we just try to do everything modulo \(256\)?
  The most obvious difference from the \(m = 1\) case is that solving
  for \(d_i\) or \(d_j\) involves division, which works fine for
  non-negative integers as long as there&amp;rsquo;s no remainder, but it
  is not immediately clear how division can make sense modulo \(256\).&lt;/p&gt;

&lt;p&gt;One possible way to define division modulo \(256\)
  would be to first define the &lt;em&gt;multiplicative inverse&lt;/em&gt; modulo
  \(256\) of an integer \(0 \le x \lt 256\) as the integer \(0 \le y
  \lt 256\) such that \((x \cdot y) \bmod 256 = 1\), if it exists, and
  then define division by \(x\) modulo \(256\) to be multiplication by
  \(y\) modulo \(256\). But this immediately runs into problems; \(2\)
  has no multiplicative inverse modulo \(256\), and the same holds for
  any even number, so reconstruction will fail if, for example, we
  have the first and third data bytes missing, since then we&amp;rsquo;d
  be trying to divide by \(j - i = 2\).&lt;/p&gt;

&lt;p&gt;But for now, let&amp;rsquo;s leave aside the problem of generating
  parity bytes instead of parity numbers, and instead focus on how we
  can generalize the above for larger values of \(m\). To do so, we
  need to first review some linear algebra.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;4. Just enough linear algebra to get by&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;In our \(n = 3, m = 2\) example in the previous section, the
  equations for the parity numbers have the form

  \[
    p = a_0 \cdot d_0 + a_1 \cdot d_1 + a_2 \cdot d_2
  \]

  for constants \(a_0\), \(a_1\), and \(a_2\). We call such a
  weighted sum of the \(d_i\)s a &lt;em&gt;linear combination&lt;/em&gt; of
  the \(d_i\)s, and we write this in a tabular form

  \[
    p =
    \begin{pmatrix}
      a_0 &amp; a_1 &amp; a_2
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{,}
  \]

  where we define the multiplication of a
  &lt;em&gt;row vector&lt;/em&gt; and a &lt;em&gt;column vector&lt;/em&gt; by the
  equation above, generalized in the straightforward manner
  for any \(n\).&lt;/p&gt;

&lt;p&gt;Then since we have two parity numbers \(p_0\) and \(p_1\),
  each a linear combination of the \(d_i\)s, i.e.

  \[
    \begin{aligned}
      p_0 &amp;= a_{00} \cdot d_0 + a_{01} \cdot d_1 + a_{02} \cdot d_2 \\
      p_1 &amp;= a_{10} \cdot d_0 + a_{11} \cdot d_1 + a_{12} \cdot d_2\text{,}
    \end{aligned}
  \]

  we can write this in a single tabular form as

  \[
    \begin{bmatrix}
      p_0 \\ p_1
    \end{bmatrix}
    =
    \begin{pmatrix}
      a_{00} &amp; a_{01} &amp; a_{02} \\
      a_{10} &amp; a_{11} &amp; a_{12}
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{,}
  \]

  where we define the multiplication of a &lt;em&gt;matrix&lt;/em&gt; and
  a column vector by the equations above.&lt;/p&gt;

&lt;p&gt;Now if we restrict parity numbers to be linear combinations of the
  data bytes, then we can identify a function
  &lt;code&gt;ComputeParity&lt;/code&gt; using some set of weights with the matrix
  formed from that set of weights as above. This holds in general: if
  a function is defined as a list of linear combinations of its
  inputs, then it can be represented using a matrix as above, and we
  call it a
  &lt;em&gt;linear function&lt;/em&gt;. Then we have a correspondence between
  linear functions that take \(n\) numbers to \(m\) numbers and
  matrices with \(m\) rows and \(n\) columns, which are denoted as \(m
  \times n\) matrices.&lt;/p&gt;

&lt;p&gt;As the first example of this correspondence, note that we denote
  the elements of the matrix above as \(a_{ij}\), where the first
  index is the row index and the second index is the column
  index. Looking back to the parity equations, we also see that the
  first index corresponds to the output arguments of &lt;code&gt;ComputeParity&lt;/code&gt;, and the second index corresponds to
  the input arguments.&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The usefulness of the correspondence between linear functions and
  matrices is that we can store and manipulate a linear function by
  storing and manipulating its corresponding matrix of weights, which
  you wouldn&amp;rsquo;t be able to easily do for functions in
  general. For example, as we&amp;rsquo;ll see below, we&amp;rsquo;ll be able
  to compute the inverse of a linear function by matrix operations,
  which will be useful for &lt;code&gt;ReconstructData&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;First, let&amp;rsquo;s examine some simple matrix operations and their
  effects on the corresponding linear function:

  &lt;ul&gt;
    &lt;li&gt;&lt;em&gt;Deleting a row&lt;/em&gt; of a matrix corresponds to &lt;em&gt;deleting an output&lt;/em&gt; of a linear function.&lt;/li&gt;
    &lt;li&gt;&lt;em&gt;Swapping two rows&lt;/em&gt; of a matrix corresponds to &lt;em&gt;swapping two outputs&lt;/em&gt; of a linear function.&lt;/li&gt;
    &lt;li&gt;&lt;em&gt;Appending a row&lt;/em&gt; to a matrix corresponds to &lt;em&gt;adding an output&lt;/em&gt; to a linear function.&lt;/li&gt;
  &lt;/ul&gt;

  In general, matrix row operations correspond to manipulations of a
  linear function&amp;rsquo;s outputs.&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;An important operation on functions is composition: if
  \(f\) takes \(k\) inputs to \(m\) outputs, and \(g\) takes
  \(m\) inputs to \(n\) outputs, then we can define \((g \circ
  f)(x_0, \dotsc, x_k) = g(f(x_0, \dotsc, x_k))\) which takes
  \(k\) inputs to \(n\) outputs. It turns out that the
  composition of two linear functions is again a linear
  function, and so there must be an operation which takes the
  corresponding \(m \times k\) matrix \(F\) and the \(n \times
  m\) matrix \(G\) and yields a \(n \times k\) matrix. This
  important operation, the bane of high-schoolers everywhere,
  is called &lt;a href=&quot;https://en.wikipedia.org/wiki/Matrix_multiplication&quot;&gt;&lt;em&gt;matrix multiplication&lt;/em&gt;&lt;/a&gt;,
  denoted by \(F \cdot G\). If \(H = F \cdot G\), then the
  explicit formula for its elements is

  \[
    h_{ij} = \sum_{k=0}^{m-1} f_{ik} \cdot g_{kj}\text{,}
  \]

  which corresponds to the following code:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;matrixMultiply(f: Matrix, g: Matrix) {
  if (f.columns != g.rows) {
    return Error
  }

  h := new Matrix(f.rows, g.columns)
  for i := 0 to f.rows - 1 {
    for j := 0 to g.columns - 1 {
      t := 0
      for k := 0 to f.columns - 1 {
        t += f[i, k] * g[k, j]
      }
      h[i, j] = t
    }
  }
  return h
}&lt;/code&gt;&lt;/pre&gt;

  You can convince yourself that the above formula and code is correct
  by trying to compose some small linear functions by hand.
&lt;/div&gt;

&lt;p&gt;A useful property of matrix multiplication is that it&amp;rsquo;s a
  generalization of the product of a row vector and a column vector,
  and the product of a matrix and a column vector as we defined above.&lt;/p&gt;

&lt;p&gt;I would be remiss if I didn&amp;rsquo;t talk about the consequences of
  defining matrix multiplication as the matrix of the composition of
  the corresponding linear functions. First, this immediately implies
  that you can only multiply matrices if the left matrix has the same
  number of rows as the number of columns of the right matrix, which
  corresponds to the fact that you can only compose functions if the
  left function takes the same number of inputs as the number of
  outputs of the right function. Furthermore, even if you have two \(n
  \times n\) matrices \(F\) and \(G\), unlike numbers, it is not true
  that \(F \cdot G = G \cdot F\), which corresponds to the fact that
  in general, for two functions that take \(n\) inputs to \(n\)
  outputs, it is not true that \(f \circ g = g \circ f\). If you
  learned matrix multiplication just from the formula above, then
  these facts are much less obvious!&lt;/p&gt;

&lt;p&gt;Finally, an important function is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Identity_function&quot;&gt;&lt;em&gt;identity function&lt;/em&gt;&lt;/a&gt;
  \(\mathrm{Id}_n\), which return its \(n\) inputs as its outputs. It
  corresponds to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Identity_matrix&quot;&gt;&lt;em&gt;identity matrix&lt;/em&gt;&lt;/a&gt;
  \[
  I_n =
  \begin{pmatrix}
    1 &amp; 0 &amp; \cdots &amp; 0 &amp; 0 \\
    0 &amp; 1 &amp; \cdots &amp; 0 &amp; 0 \\
    \vdots &amp; \vdots &amp; \ddots &amp; \vdots &amp; \vdots \\
    0 &amp; 0 &amp; \cdots &amp; 1 &amp; 0 \\
    0 &amp; 0 &amp; \cdots &amp; 0 &amp; 1
  \end{pmatrix}\text{.}
  \]&lt;/p&gt;

&lt;p&gt;For a linear function \(f\) that takes \(n\) inputs to \(n\)
  outputs, if there is a function \(g\) such that \(f \circ g =
  \mathrm{Id}_n\), then we call \(g\) the inverse of \(f\), and denote
  it as \(f^{-1}\). (It is also true that \(f^{-1} \circ f =
  \mathrm{Id}_n\), i.e. \((f^{-1})^{-1} = f\).) Not all linear
  functions taking \(n\) inputs to \(n\) outputs have inverses, but if
  the inverse exists, it is also linear (and unique, which is why we
  call it &lt;em&gt;the&lt;/em&gt; inverse). Therefore, we can define
  the &lt;em&gt;inverse&lt;/em&gt; of an \(n \times n\) (or &lt;em&gt;square&lt;/em&gt;)
  matrix \(M\) as the unique matrix \(M^{-1}\) such that \(M \cdot
  M^{-1} = M^{-1} \cdot M = I_n\), if it exists; also, if \(M\) has an
  inverse, we say that \(M\) is &lt;em&gt;invertible&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Example 2: The matrix/linear function correspondence&lt;/h3&gt;

  &lt;div class=&quot;p&quot;&gt;Let

  \[
    M = \begin{pmatrix} 1 &amp; 2 \\ 3 &amp; 4\end{pmatrix}\text{.}
  \]

    This corresponds to the linear function

    &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;f(x: rational[2]) {
  return [
    1 * x[0] + 2 * x[1],
    3 * x[0] + 4 * x[1],
  ]
}&lt;/code&gt;&lt;/pre&gt;

    where &lt;code&gt;rational&lt;/code&gt; is an arbitrary-precision rational
    number type.&lt;/div&gt;

  &lt;div class=&quot;p&quot;&gt;\(M\) is invertible with inverse

    \[
    M^{-1} = \begin{pmatrix} -2 &amp; 1 \\ 3/2 &amp; -1/2\end{pmatrix}\text{.}
    \]

    This corresponds to the linear function

    &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;g(y: rational[2]) {
  return [
    -2 * x[0] + 1 * x[1],
    (3/2) * x[0] + (-1/2) * x[1],
  ]
}&lt;/code&gt;&lt;/pre&gt;

    so &lt;code&gt;g&lt;/code&gt; is the inverse function of &lt;code&gt;f&lt;/code&gt;. Indeed, &lt;code&gt;f([5, 6])&lt;/code&gt; is &lt;code&gt;[17, 39]&lt;/code&gt; and &lt;code&gt;g([17, 39])&lt;/code&gt; is &lt;code&gt;[5, 6]&lt;/code&gt;.&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;So now we&amp;rsquo;ve reduced the problem of finding the inverse of a
  linear function taking \(n\) inputs to \(n\) outputs to finding the
  inverse of an \(n \times n\) matrix. Before we tackle the question
  of computing those inverses, let&amp;rsquo;s first recast our problem in
  the language of linear algebra and see why we need to find the
  inverse of a linear function.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;5. Erasure codes in general&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;So, revisiting our \(n = 3, m = 2\) erasure code from
  above, we have the linear function

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;ComputeParityWeighted(data: byte[3]) {
  return [
    int(data[0]) +     int(data[1]) +     int(data[2]),
    int(data[0]) + 2 * int(data[1]) + 3 * int(data[2]),
  ]
}&lt;/code&gt;&lt;/pre&gt;

  which therefore corresponds to the &lt;em&gt;parity matrix&lt;/em&gt;

  \[
    P =
    \begin{pmatrix}
      1 &amp; 1 &amp; 1 \\
      1 &amp; 2 &amp; 3
    \end{pmatrix}\text{.}
  \]

  So in mathematical notation, &lt;code&gt;ComputeParityWeighted&lt;/code&gt; looks like:

  \[
    \begin{bmatrix}
      p_0 \\ p_1
    \end{bmatrix}
    =
    \mathtt{ComputeParityWeighted}(d_0, d_1, d_2) =
    \begin{pmatrix}
      1 &amp; 1 &amp; 1 \\
      1 &amp; 2 &amp; 3
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{.}
  \]
&lt;/div&gt;

&lt;p&gt;So let&amp;rsquo;s now reimplement &lt;code&gt;ReconstructDataWeighted&lt;/code&gt; using linear algebra. First, append the rows of \(P\) to the identity matrix \(I_3\) to get the matrix equation

  \[
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2 \\ p_0 \\ p_1
    \end{bmatrix}
    =
    \begin{pmatrix}
      1 &amp; 0 &amp; 0 \\
      0 &amp; 1 &amp; 0 \\
      0 &amp; 0 &amp; 1 \\
      1 &amp; 1 &amp; 1 \\
      1 &amp; 2 &amp; 3
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{,}
  \]

  which corresponds to a linear function that returns the input data bytes in addition to computing the parity numbers. Now let&amp;rsquo;s say we lose the data bytes \(d_0\) and \(d_2\). Then let&amp;rsquo;s remove the rows corresponding to those bytes:

  \[
    \begin{bmatrix}
      \xcancel{d_0} \\ d_1 \\ \xcancel{d_2} \\ p_0 \\ p_1
    \end{bmatrix}
    =
    \begin{pmatrix}
      \xcancel{1} &amp; \xcancel{0} &amp; \xcancel{0} \\
      0 &amp; 1 &amp; 0 \\
      \xcancel{0} &amp; \xcancel{0} &amp; \xcancel{1} \\
      1 &amp; 1 &amp; 1 \\
      1 &amp; 2 &amp; 3
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{,}
  \]

  which turns into

  \[
    \begin{bmatrix}
      d_1 \\ p_0 \\ p_1
    \end{bmatrix} =
    \begin{pmatrix}
      0 &amp; 1 &amp; 0 \\
      1 &amp; 1 &amp; 1 \\
      1 &amp; 2 &amp; 3
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}\text{,}
  \]

  which corresponds to a linear function that maps the input data
  bytes to the non-lost data bytes and the parity bytes. This
  is the &lt;em&gt;inverse&lt;/em&gt; of the function we want, so we want
  to invert the \(3 \times 3\) matrix above, which we&amp;rsquo;ll
  call \(M\). That inverse is

  \[
    M^{-1} =
    \begin{pmatrix}
      -1/2 &amp; 3/2 &amp; -1/2 \\
      1 &amp; 0 &amp; 0 \\
      -1/2 &amp; -1/2 &amp; 1/2
    \end{pmatrix}\text{.}
  \]

  Multiplying both sides above by \(M^{-1}\), we get

  \[
    \begin{bmatrix}
      d_0 \\ d_1 \\ d_2
    \end{bmatrix}
    =
    \begin{pmatrix}
      -1/2 &amp; 3/2 &amp; -1/2 \\
      1 &amp; 0 &amp; 0 \\
      -1/2 &amp; -1/2 &amp; 1/2
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_1 \\ p_0 \\ p_1
    \end{bmatrix}\text{,}              
  \]

  which is exactly what we want: the original data bytes in
  terms of the known data bytes and the parity numbers!&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Comparing this equation to the one we manually computed previously,
  they don&amp;rsquo;t look immediately similar, but some rearrangement
  will reveal that they indeed compute the same thing. As a sanity
  check, notice that the second row of \(M^{-1}\) means that the first
  input argument is mapped unchanged to the second output argument,
  which is exactly what we want for the known byte \(d_1\).&lt;/p&gt;

&lt;p&gt;Now what does this look like in general, i.e. for
  arbitrary \(n\) and \(m\)? First, we have to generate an
  \(m \times n\) parity matrix \(P\) whose rows have to be
  &amp;ldquo;sufficiently different&amp;rdquo; from each other,
  which we still have to clarify. Then &lt;code&gt;ComputeParity&lt;/code&gt; just multiplies \(P\) by \([d]\), the column matrix of input bytes, like so:

  \[
    \begin{bmatrix}
      p_0 \\ \vdots \\ p_{m-1}
    \end{bmatrix}
    =
    \mathtt{ComputeParity}(d_0, \dotsc, d_{n-1}) =
    \begin{pmatrix}
      p_0 \\
      \vdots \\
      p_{m-1}
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      d_0 \\ \vdots \\ d_{n-1}
    \end{bmatrix}\text{,}
  \]

  where the \(p_i\) are the rows of \(P\).&lt;/p&gt;

&lt;p&gt;As for &lt;code&gt;ReconstructData&lt;/code&gt;, we first append
  the rows of \(P\) to \(I_n\), whose rows we&amp;rsquo;ll denote as \(e_i\):

\[
  \begin{bmatrix}
    d_0 \\ \vdots \\ d_{n-1} \\
    p_0 \\ \vdots \\ p_{m-1}
  \end{bmatrix}
  =
  \begin{pmatrix}
    e_0 \\
    \vdots \\
    e_{n-1} \\
    p_0 \\
    \vdots \\
    p_{m-1}
  \end{pmatrix}
  \cdot
  \begin{bmatrix}
    d_0 \\ \vdots \\ d_{n-1}
  \end{bmatrix}\text{.}
\]

Now assume that the indices of the missing \(k \le m\) data
bytes are \(i_0, \dotsc, i_{k-1}\).
Then we remove the rows
corresponding to the missing data bytes, and keep some \(k\)
parity rows, e.g. \(p_0\) to \(p_{k-1}\). This yields the equation

\[
  \begin{bmatrix}
    d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\
    p_0 \\ \vdots \\ p_{k-1}
  \end{bmatrix}
  =
  \begin{pmatrix}
    e_{j_0} \\
    \vdots \\
    e_{j_{n-k-1}} \\
    p_0 \\
    \vdots \\
    p_{k-1}
  \end{pmatrix}
  \cdot
  \begin{bmatrix}
    d_0 \\ \vdots \\ d_{n-1}
  \end{bmatrix}\text{,}
\]

where \(j_0, \dotsc, j_{m-k-1}\) are the indices of the
&lt;em&gt;present&lt;/em&gt; \(n - k\) data bytes. Call that \(n \times n\)
matrix \(M\), and compute its inverse \(M^{-1}\). If \(P\) was chosen correctly, \(M^{-1}\) should always exist, so if the inverse computation fails, raise an error. Therefore, &lt;code&gt;ReconstructData&lt;/code&gt; just multiplies \(M^{-1}\) by the column matrix of present data bytes and chosen parity numbers:

\[
  \begin{bmatrix}
    d_0 \\ \vdots \\ d_{n-1}
  \end{bmatrix}
  =
  \mathtt{ReconstructData}(d_{j_0}, \dotsc, d_{j_{n-k-1}}, p_0, \dotsc, p_{k-1})
  = M^{-1} \cdot
  \begin{bmatrix}
    d_{j_0} \\ \vdots \\ d_{j_{n-k-1}} \\
    p_0 \\ \vdots \\ p_{k-1}
  \end{bmatrix}\text{.}
\]
&lt;/p&gt;

&lt;p&gt;As an optimization, some rows of \(M^{-1}\) correspond to just
  shuffling around the known data bytes \(d_{j_*}\), so we can just
  remove those rows, compute the missing data bytes, and do the
  shuffling ourselves.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;So we now have outlines of implementations of both &lt;code&gt;ComputeParity&lt;/code&gt; and &lt;code&gt;ReconstructData&lt;/code&gt;,
  but we still have missing pieces. In particular,
  &lt;ol&gt;
    &lt;li&gt;How do we compute matrix inverses?&lt;/li&gt;
    &lt;li&gt;How do we generate &amp;ldquo;optimal&amp;rdquo; parity matrices so that \(M^{-1}\) always exists?&lt;/li&gt;
    &lt;li&gt;How do we compute parity bytes instead of parity numbers?&lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;So first, let&amp;rsquo;s see how to compute matrix inverses using row
  reduction.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;6. Finding matrix inverses using row reduction&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;We developed the theory of matrices by identifying them with linear
  functions of numbers. To show how to find matrix inverses, we have
  to look at them in a slightly different way by identifying matrix
  equations with systems of linear equations of numbers.&lt;/p&gt;

&lt;p&gt;For example, consider the matrix equation

  \[
    M \cdot x = y\text{,}
  \]

  where

  \[
    M =
    \begin{pmatrix}
      1 &amp; 2 \\
      3 &amp; 4
    \end{pmatrix}\text{,}
    \quad
    x =
    \begin{bmatrix}
      x_1 \\ x_2
    \end{bmatrix}
    \text{,} \quad \text{and }
    y =
    \begin{bmatrix}
      y_1 \\ y_2
    \end{bmatrix}\text{.}
  \]

  This expands to

  \[
    \begin{pmatrix}
      1 &amp; 2 \\
      3 &amp; 4
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      x_1 \\ x_2
    \end{bmatrix} =
    \begin{bmatrix}
      y_1 \\ y_2
    \end{bmatrix}\text{,}
  \]

  or

  \[
    \begin{aligned}
      y_1 &amp;= 1 \cdot x_1 + 2 \cdot x_2 \\
      y_2 &amp;= 3 \cdot x_1 + 4 \cdot x_2\text{,}
    \end{aligned}
  \]

  which is a linear system of equations of numbers. Letting \(M\) be
  any matrix, and \(x\) and \(y\) be appropriately-sized column
  matrices of variables, we can see that the matrix equation
  \(M \cdot x = y\) is shorthand for a system of linear equations of
  numbers.&lt;/p&gt;

&lt;p&gt;If we could find \(M^{-1}\), we could solve the matrix
  equation easily by multiplying both sides by it:

  \[
    \begin{aligned}
      M^{-1} \cdot (M \cdot x) &amp;= M^{-1} \cdot y \\
      x &amp;= M^{-1} \cdot y\text{,}
    \end{aligned}
  \]

  and therefore solve the linear system for \(x\) in terms of \(y\).
  Conversely, if we were able to solve the linear system for \(x\),
  we&amp;rsquo;d then be able to read off \(M^{-1}\) from the new linear
  system.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;But how do we solve a linear system? From the theory of linear systems of equations, we have a few tools at our disposal:
  &lt;ul&gt;
    &lt;li&gt;swapping two equations,&lt;/li&gt;
    &lt;li&gt;multiplying an equation by a number,&lt;/li&gt;
    &lt;li&gt;adding one equation to another, possibly multiplying
      the equation by a number before adding.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;p&gt;All of these are valid transformations because they
  don&amp;rsquo;t change the solution set of the linear system.&lt;/p&gt;

&lt;p&gt;For example, in the equation above, the first step would be
  to subtract \(3\) times the first equation from the second
  equation to yield

  \[
  \begin{aligned}
    y_1 &amp;= x_1 + 2 \cdot x_2 \\
    y_2 - 3 \cdot y_1 &amp;= -2 \cdot x_2\text{.}
  \end{aligned}
  \]

  Then, add the second equation back to the first equation:

  \[
    \begin{aligned}
      y_2 - 2 \cdot y_1 &amp;= x_1 \\
      y_2 - 3 \cdot y_1 &amp;= -2 \cdot x_2\text{.}
    \end{aligned}
  \]

  Finally, divide the second equation by \(-2\):

  \[
    \begin{aligned}
      y_2 - 2 \cdot y_1 &amp;= x_1 \\
      (3/2) \cdot y_1 - (1/2) \cdot y_2 &amp;= x_2\text{.}
    \end{aligned}
  \]

  This is equivalent to the matrix equation

  \[
    \begin{pmatrix}
      -2 &amp; 1 \\ 3/2 &amp; -1/2
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      y_1 \\ y_2
    \end{bmatrix} =
    \begin{bmatrix}
      x_1 \\ x_2
    \end{bmatrix}\text{,}
  \]

  so

  \[
    M^{-1} = \begin{pmatrix}
      -2 &amp; 1 \\ 3/2 &amp; -1/2
    \end{pmatrix}\text{.}
  \]
&lt;/p&gt;

&lt;p&gt;So how do we translate the above process to an algorithm operating on matrices? First, express our matrix equation in a slightly
different form:

  \[
    M \cdot x = I \cdot y\text{.}
  \]

  Using the example above, this is

  \[
    \begin{pmatrix}
      1 &amp; 2 \\
      3 &amp; 4
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      x_1 \\ x_2
    \end{bmatrix}
    =
    \begin{pmatrix}
      1 &amp; 0 \\
      0 &amp; 1
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      y_1 \\ y_2
    \end{bmatrix}\text{.}
  \]

  Then, you can see that the first step above corresponds to subtracting \(-3\) times the first row from the second row to yield:

  \[
    \begin{pmatrix}
      1 &amp; 2 \\
      0 &amp; -2
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      x_1 \\ x_2
    \end{bmatrix}
    =
    \begin{pmatrix}
      1 &amp; 0 \\
      -3 &amp; 1
    \end{pmatrix}
    \cdot
    \begin{bmatrix}
      y_1 \\ y_2
    \end{bmatrix}\text{.}
  \]

  We don&amp;rsquo;t even need to keep writing the \(x\) and \(y\)
  column matrices; we can just write the &amp;ldquo;augmented&amp;rdquo; matrix.

  \[
    A =
    \left( \hskip -5pt
    \begin{array}{cc|cc}
      1 &amp; 2 &amp; 1 &amp; 0 \\
      0 &amp; -2 &amp; -3 &amp; 1
    \end{array}
    \hskip -5pt \right)
  \]
  and operate on it.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Thus, the operations listed above on linear systems have corresponding operations on augmented matrices:
  &lt;ul&gt;
    &lt;li&gt;&lt;em&gt;swapping two equations&lt;/em&gt; corresponds to &lt;em&gt;swapping two rows&lt;/em&gt;;&lt;/li&gt;
    &lt;li&gt;&lt;em&gt;multiplying an equation by a number&lt;/em&gt; corresponds to &lt;em&gt;multiplying a row by a number&lt;/em&gt;; and&lt;/li&gt;
    &lt;li&gt;&lt;em&gt;adding an equation to another&lt;/em&gt;, possibly multiplying the
      equation by a number before adding, corresponds to &lt;em&gt;adding a row to another row&lt;/em&gt;,
      possibly multiplying the row by a number before adding.&lt;/li&gt;
  &lt;/ul&gt;

Then, the goal is to use these &lt;em&gt;row operations&lt;/em&gt; to transform
  the initial augmented matrix, where the right side looks like the
  identity matrix, into one where the left side looks like the
  identity matrix. Then, translating the augmented matrix back into a
  matrix equation, that would give \(M^{-1}\) on the right side.&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;When doing this by hand, one usually works with the linear
  system itself, trying to see which variables can be easily
  eliminated so as to minimize arithmetic. However, to
  translate this to an algorithm, we&amp;rsquo;re more interested
  in a systematic way of doing this. Fortunately,
  there&amp;rsquo;s an easy two-step process:
  &lt;ol&gt;
    &lt;li&gt;Turn the left side of \(A\) into a &lt;em&gt;unit upper triangular matrix&lt;/em&gt;,
      which means that all the elements on the main diagonal are
      \(1\), and all elements below the main diagonal are \(0\),
      i.e. that \(a_{ii} = 1\) for all \(i\), and \(a_{ij} = 0\) for
      all \(j &gt; i\).&lt;/li&gt;
    &lt;li&gt;Then turn the left side of \(A\) into the identity matrix.&lt;/li&gt;
  &lt;/ol&gt;
  This algorithm is called &lt;a href=&quot;https://en.wikipedia.org/wiki/Row_reduction&quot;&gt;row reduction&lt;/a&gt;. The
  first step can be further broken down:
  &lt;ol type=&quot;a&quot;&gt;
    &lt;li&gt;For each column \(i\) of the left side in ascending order:
      &lt;ol type=&quot;i&quot;&gt;
        &lt;li&gt;If \(a_{ii}\) is zero, look at the rows below the
          \(i\)th row for a row \(j &gt; i\) such that \(a_{ji} \ne
          0\), and swap rows \(i\) and \(j\). If no such row
          exists, return an error, as that means that \(A\) is
          non-invertible.&lt;/li&gt;
        &lt;li&gt;Divide the \(i\)th row by \(a_{ii}\), so that \(a_{ii}
          = 1\).&lt;/li&gt;
        &lt;li&gt;For each row \(j &gt; i\), subtract \(a_{ji}\) times the
          \(i\)th row from it, which will set \(a_{ji}\) to \(0\).&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
  The second step can be similarly broken down:
  &lt;ol type=&quot;a&quot;&gt;
    &lt;li&gt;For each column \(i\) of the left side, in order:
      &lt;ol type=&quot;i&quot;&gt;
        &lt;li&gt;For each row \(j &amp;lt; i\), subtract \(a_{ji}\) times the
          \(i\)th row from it, which will set \(a_{ji}\) to \(0\).&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;Note that we&amp;rsquo;re assuming that all arithmetic is
  exact, i.e. we use a arbitrary-precision rational number
  type. If we use floating point numbers, we&amp;rsquo;d have to
  worry a lot more about the order in which we do operations
  and numerical stability.&lt;/p&gt;

&lt;style&gt;
.swap-row-a { color: #dc322f; /* solarized red */ }
.swap-row-b { color: #268bd2; /* solarized blue */ }

.divide-row { color: #dc322f; /* solarized red */ }

.subtract-scaled-row-src { color: #268bd2; /* solarized blue */ }
.subtract-scaled-row-dest { color: #dc322f; /* solarized red */ }
&lt;/style&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;matrixInverseDemo&quot;&gt;
  &lt;h3&gt;Example 3: Matrix inversion via row reduction&lt;/h3&gt;
  Let

  &lt;pre&gt;    / 0 2 2 \
M = | 3 4 5 |
    \ 6 6 7 /.&lt;/pre&gt;

  The initial augmented matrix &lt;var&gt;A&lt;/var&gt; is

  &lt;pre&gt;/ 0 2 2 | 1 0 0 \
| 3 4 5 | 0 1 0 |
\ 6 6 7 | 0 0 1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;00&lt;/sub&gt; to be non-zero, so swap rows &lt;span class=&quot;swap-row-a&quot;&gt;0&lt;/span&gt; and &lt;span class=&quot;swap-row-b&quot;&gt;1&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;swap-row-a&quot;&gt;0 2 2&lt;/span&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;1 0 0&lt;/span&gt; \     / &lt;span class=&quot;swap-row-b&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;swap-row-b&quot;&gt;0 1 0&lt;/span&gt; \
| &lt;span class=&quot;swap-row-b&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;swap-row-b&quot;&gt;0 1 0&lt;/span&gt; | --&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;0 2 2&lt;/span&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;1 0 0&lt;/span&gt; |
\ 6 6 7 | 0 0 1 /     \ 6 6 7 | 0 0 1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;00&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;0&lt;/span&gt; by 3:

  &lt;pre&gt;/ &lt;span class=&quot;divide-row&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;0 1 0&lt;/span&gt; \     / &lt;span class=&quot;divide-row&quot;&gt;1 4/3 5/3&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;0 1/3 0&lt;/span&gt; \
| 0 2 2 | 1 0 0 | --&gt; | 0  2   2  | 1  0  0 |
\ 6 6 7 | 0 0 1 /     \ 6  6   7  | 0  0  1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;20&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0&lt;/span&gt; scaled by 6 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;2&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1 4/3 5/3&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0 1/3 0&lt;/span&gt; \     / 1 4/3 5/3 | 0 1/3 0 \
| 0  2   2  | 1  0  0 | --&gt; | 0  2   2  | 1  0  0 |
\ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;6  6   7&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  0  1&lt;/span&gt; /     \ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0 -2  -3&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0 -2  1&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;11&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;1&lt;/span&gt; by 2:

  &lt;pre&gt;/ 1 4/3 5/3 |  0  1/3 0 \     / 1 4/3 5/3 |  0  1/3 0 \
| &lt;span class=&quot;divide-row&quot;&gt;0  2   2 &lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt; 1   0  0&lt;/span&gt; | --&gt; | &lt;span class=&quot;divide-row&quot;&gt;0  1   1 &lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;1/2  0  0&lt;/span&gt; |
\ 0 -2  -3  |  0  -2  1 /     \ 0 -2  -3  |  0  -2  1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;21&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1&lt;/span&gt; scaled by &amp;minus;2 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;2&lt;/span&gt;:

  &lt;pre&gt;/ 1 4/3 5/3 |  0  1/3 0 \     / 1 4/3 5/3 |  0  1/3 0 \
| &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  1   1&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1/2  0  0&lt;/span&gt; | --&gt; | 0  1   1  | 1/2  0  0 |
\ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0 -2  -3&lt;/span&gt;  |  &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  -2  1&lt;/span&gt; /     \ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  0  -1&lt;/span&gt;  |  &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1  -2  1&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;22&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;2&lt;/span&gt; by &amp;minus;1, which makes the left side of &lt;var&gt;A&lt;/var&gt; a
  unit upper triangular matrix:

  &lt;pre&gt;/ 1 4/3 5/3 |  0  1/3 0 \     / 1 4/3 5/3 |  0  1/3 0 \
| 0  1   1  | 1/2  0  0 | --&gt; | 0  1   1  | 1/2  0  0 |
\ &lt;span class=&quot;divide-row&quot;&gt;0  0  -1 &lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt; 1  -2  1&lt;/span&gt; /     \ &lt;span class=&quot;divide-row&quot;&gt;0  0   1 &lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;-1   2 -1&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;12&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;2&lt;/span&gt; from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1&lt;/span&gt;:

  &lt;pre&gt;/ 1 4/3 5/3 |  0  1/3 0 \     / 1 4/3 5/3 |  0  1/3 0 \
| &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  1   1&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1/2  0  0&lt;/span&gt; | --&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  1   0&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;3/2 -2  1&lt;/span&gt; |
\ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  0   1&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;-1   2 -1&lt;/span&gt; /     \ 0  0   1  | -1   2 -1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;02&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;2&lt;/span&gt; scaled by 5/3 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 4/3 5/3&lt;/span&gt; |  &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  1/3 0&lt;/span&gt; \     / &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 4/3  0&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;5/3 -3 5/3&lt;/span&gt; \
| 0  1   0  | 3/2 -2  1 | --&gt; | 0  1   0  | 3/2 -2  1  |
\ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  0   1&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;-1   2 -1&lt;/span&gt; /     \ 0  0   1  | -1   2 -1  /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;01&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1&lt;/span&gt; scaled by 4/3 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0&lt;/span&gt;, which makes the left side of &lt;var&gt;A&lt;/var&gt; the identity matrix:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 4/3  0&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;5/3  -3 5/3&lt;/span&gt; \     / &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1  0   0&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;-1/3 -1/3 1/3&lt;/span&gt; \
| &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  1   0&lt;/span&gt;  | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;3/2 -2   1&lt;/span&gt;  | --&gt; | 0  1   0  |  3/2  -2   1  |
\ 0  0   1  | -1   2  -1  /     \ 0  0   1  |  -1    2  -1  /.&lt;/pre&gt;

  Since the left side of &lt;var&gt;A&lt;/var&gt; is the identity matrix, the right side of &lt;var&gt;A&lt;/var&gt; is &lt;var&gt;M&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;. Therefore,

  &lt;pre&gt;         / -1/3 -1/3 1/3 \
M^{-1} = |  3/2  -2   1  |
         \  -1    2  -1  /.&lt;/pre&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;matrixInverseDemo&apos;);
  render(h(MatrixInverseDemo, {
    initialElements: &apos;0, 2, 2, 3, 4, 5, 6, 6, 7&apos;, initialFieldType: &apos;rational&apos;,
    name: &apos;matrixInverseDemo&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 3: Matrix inversion via row reduction&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    buttonClass: &apos;interactive-example-button&apos;,
    allowFieldTypeChanges: false,
    swapRowAColor: &apos;#dc322f&apos;, // solarized red
    swapRowBColor: &apos;#268bd2&apos;, // solarized blue
    divideRowColor: &apos;#dc322f&apos;, // solarized red
    subtractScaledRowSrcColor: &apos;#268bd2&apos;, // solarized blue
    subtractScaledRowDestColor: &apos;#dc322f&apos;, // solarized red
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Now notice one thing: if \(M\) has a row that is proportional to
  another row, then row reduction would eventually zero out one of the
  rows, causing the algorithm to fail, and signaling that \(M\) is
  non-invertible. In fact, a stronger statement is true: \(M\) has a
  row that can be expressed as a linear combination of other rows of
  \(M\) exactly when \(M\) is non-invertible. Informally, this means
  that the linear function corresponding to \(M\) has one of its
  outputs redundant with the other outputs, so it is essentially a a
  linear function taking \(n\) inputs to fewer than \(n\) outputs, and
  such functions aren&amp;rsquo;t invertible.&lt;/p&gt;

&lt;p&gt;This gets us a partial explanation for what &amp;ldquo;sufficiently
  different&amp;rdquo; means for our parity functions. If one parity
  function is a linear combination of other parity functions, then it
  is redundant, and therefore not &amp;ldquo;sufficiently
  different&amp;rdquo;. Therefore, we want our parity matrix \(P\) to be
  such that no row can be expressed as a linear combination of other
  rows.&lt;/p&gt;

&lt;p&gt;However, this criterion for \(P\) isn&amp;rsquo;t quite enough
  to guarantee that all possible matrices \(M\) computed as
  part of &lt;code&gt;ReconstructData&lt;/code&gt; are invertible. For example,
  this criterion holds for the identity matrix \(I_n\), but if \(n &gt;
  1\) and you pick \(I_n\) as the parity matrix for \(n = m\), you can
  certainly end up with a constructed matrix \(M\) with repeated rows,
  since you&amp;rsquo;re starting by appending another copy of \(I_n\) on
  top of \(P = I_n\)! This explains in a different way why simply
  making a copy of the original data files makes for a poor erasure
  code, unless of course you only have one data file. We&amp;rsquo;re led
  to our next topic: what makes a parity matrix &amp;ldquo;optimal&amp;rdquo;?&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;7. Optimal parity matrices&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Recall from above that we form the square matrix

  \[
    M =
    \begin{pmatrix}
      e_{j_0} \\
      \vdots \\
      e_{j_{n-k-1}} \\
      p_0 \\
      \vdots \\
      p_{k-1}
    \end{pmatrix}
  \]

  by prepending some rows of the identity matrix to the first
  \(k\) rows of the parity matrix. We can generalize this a
  bit more, since we don&amp;rsquo;t have to take the first \(k\)
  rows, but instead can take any \(k\) rows of the parity
  matrix, whose indices we denote here as \(l_0, \dotsc, l_{k-1}\):

  \[
  M =
  \begin{pmatrix}
    e_{j_0} \\
    \vdots \\
    e_{j_{n-k-1}} \\
    p_{l_0} \\
    \vdots \\
    p_{l_{k-1}}
  \end{pmatrix}\text{.}
  \]

  So we want to construct \(P\) so that any such square matrix
  \(M\) formed from the rows of \(P\) is invertible. Therefore,
  we call a parity matrix \(P\) &lt;em&gt;optimal&lt;/em&gt; if it satisfies this
  criterion.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Fortunately, there is a simpler criterion for optimal parity
  matrices. First, define a &lt;a href=&quot;https://en.wikipedia.org/wiki/Matrix_(mathematics)#Submatrix&quot;&gt;&lt;em&gt;submatrix&lt;/em&gt;&lt;/a&gt;
  of a matrix \(P\) to be a matrix that you get by deleting
  any number of rows or columns, and call a matrix &lt;em&gt;non-empty&lt;/em&gt; if
  it has at least one row and one column. Then:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem&amp;nbsp;1&lt;/span&gt;.)
    A parity matrix \(P\) is optimal exactly when any non-empty square
    submatrix of \(P\) is invertible.&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;

Note that this criterion is stronger than the one in the previous
  section, where we want a parity matrix \(P\) to have no row that can
  be expressed as a linear combination of other rows. That is, if any
  non-empty square submatrix of \(P\) is invertible, that means that
  no row can be expressed as a linear combination of other rows.&lt;sup&gt;&lt;a href=&quot;#fn6&quot; id=&quot;r6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; However, it is possible to have a matrix
  \(P\) where no row can be expressed as a linear combination of
  other rows, but which is not optimal. We&amp;rsquo;ve already seen an
  example above: \(I_n\) for \(n \gt 1\), and indeed,

  \[
    I_2 =
    \begin{pmatrix}
      1 &amp; 0 \\
      0 &amp; 1
    \end{pmatrix}\text{,}
  \]

  has the \(1 \times 1\) non-invertible submatrix
  \(\begin{pmatrix} 0 \end{pmatrix}\).&lt;/div&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Example 4: A optimal parity matrix for \(m = 2\)&lt;/h3&gt;
  &lt;p&gt;Recall the parity matrix

    \[
      P =
      \begin{pmatrix}
        1 &amp; 1 &amp; 1 \\
        1 &amp; 2 &amp; 3
      \end{pmatrix}
    \]

    that we were using for our \(n = 3, m = 2\) example. For any \(n\),
    this matrix looks like

    \[
      P =
      \begin{pmatrix}
        1 &amp; 1 &amp; \cdots &amp; 1 \\
        1 &amp; 2 &amp; \cdots &amp; n-1
      \end{pmatrix}\text{.}
    \]

    A \(1 \times 1\) matrix is invertible exactly when its single
    element is non-zero, so any \(1 \times 1\) submatrix of \(P\) is
    invertible. Any \(2 \times 2\) submatrix of \(P\) looks like

    \[
      A =
      \begin{pmatrix}
        1 &amp; 1 \\
        a &amp; b
      \end{pmatrix}
    \]

    for \(a \ne b\), which, using the &lt;a href=&quot;https://en.wikipedia.org/wiki/Invertible_matrix#Inversion_of_2_.C3.97_2_matrices&quot;&gt;formula for inverses of \(2 \times 2\) matrices&lt;/a&gt;, has inverse

    \[
      A^{-1} = \begin{pmatrix} b/(b-a) &amp; -1/(b-a) \\ -a/(b-a) &amp; 1/(b-a) \end{pmatrix}\text{.}

    \]
    These are all the possible square submatrices of \(P\), so
    therefore this \(P\) is a optimal parity matrix for \(m = 2\).&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Then, finally, we now have a complete definition of what makes a
list of parity functions &amp;ldquo;sufficiently different&amp;rdquo;; it is
exactly when the corresponding parity matrix is optimal as we&amp;rsquo;ve
defined it above.&lt;/p&gt;

&lt;p&gt;Now this leads us to the question: how do we find such optimal
matrices? Fortunately, there&amp;rsquo;s a whole class of matrices that
are optimal: the &lt;em&gt;Cauchy matrices&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Let \(a_0, \dotsc, a_{m+n-1}\) be a sequence of distinct
  integers, meaning that no two \(a_i\) are equal, and let
  \(x_0, \dotsc, x_{m-1}\) be the first \(m\) integers of \(a_i\) with \(y_0, \dotsc, y_{n-1}\)
  the remaining integers. Then form the \(m \times n\)
  matrix \(A\) by setting its elements according to:

  \[
    a_{ij} = \frac{1}{x_i - y_j}\text{,}
  \]

  which is always defined since the denominator is never zero, by the distinctness of the \(a_i\). Then \(A\) is a &lt;em&gt;Cauchy matrix&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;What makes Cauchy matrices useful is the following theorem:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem&amp;nbsp;2&lt;/span&gt;.)
  Any non-empty square Cauchy matrix is invertible.&lt;/div&gt;

Combining this with the simple fact that any submatrix of a
  Cauchy matrix is also a Cauchy matrix, we get:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Corollary&amp;nbsp;1&lt;/span&gt;.)
  Any non-empty square submatrix of a Cauchy matrix is
  invertible, and thus any Cauchy parity matrix is optimal.&lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;cauchyMatrixDemo&quot;&gt;
  &lt;h3&gt;Example 5: Cauchy matrices&lt;/h3&gt;
  Let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;x&lt;/var&gt; = [ 1, 2, 3 ]
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;y&lt;/var&gt; = [ -1, 4, 0 ].
  &lt;/span&gt;
  Then, the Cauchy matrix constructed from
  &lt;var&gt;x&lt;/var&gt; and &lt;var&gt;y&lt;/var&gt; is

  &lt;pre&gt;/ 1/2 -1/3  1  \
| 1/3 -1/2 1/2 |
\ 1/4  -1  1/3 /,&lt;/pre&gt;

  which has inverse

  &lt;pre&gt;/ -36/5 96/5 -36/5 \
| -3/10  9/5  -9/5 |
\  9/2  -9     3   /.&lt;/pre&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;cauchyMatrixDemo&apos;);
  render(h(CauchyMatrixDemo, {
    initialX: &apos;1, 2, 3&apos;, initialY: &apos;-1, 4, 0&apos;, initialFieldType: &apos;rational&apos;,
    name: &apos;cauchyMatrixDemo&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 5: Cauchy matrices&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    allowFieldTypeChanges: false,
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Therefore, to generate a optimal parity matrix for any \((n,
  m)\), all we need to do is to generate an \(m \times n\)
  Cauchy matrix. We can pick any sequence of distinct \(m +
  n\) integers, so for simplicity let&amp;rsquo;s just use

  \[
    x_i = n + i \quad \text{and} \quad y_i = i\text{.}
  \]&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Example 6: Cauchy parity matrices for \(m = 2\)&lt;/h3&gt;
  &lt;p&gt;For \(n = 3, m = 2\), we have the sequences

    \[
      x_0 = 3, x_1 = 4 \quad \text{and} \quad y_0 = 0, y_1 = 1, y_2 = 2\text{,}
    \]

    so the corresponding Cauchy parity matrix is

    \[
      P = 
      \begin{pmatrix}
        1/3 &amp; 1/2 &amp; 1 \\
        1/4 &amp; 1/3 &amp; 1/2
      \end{pmatrix}\text{.}
    \]

    Similarly, for any \(n\),

    \[
      P =
      \begin{pmatrix}
        1/n &amp; \cdots &amp; 1/2 &amp; 1 \\
        1/{n + 1} &amp; \cdots &amp; 1/3 &amp; 1/2
      \end{pmatrix}\text{.}
    \]

    All entries of \(P\) are non-zero, so any \(1 \times 1\)
    submatrix of \(P\) is invertible. Any \(2 \times 2\) submatrix
    of \(P\) looks like

    \[
      A =
      \begin{pmatrix}
        1/a &amp; 1/b \\
        1/(a+1) &amp; 1/(b+1)
      \end{pmatrix}
    \]

    for \(a \ne b\), which, using the &lt;a href=&quot;https://en.wikipedia.org/wiki/Invertible_matrix#Inversion_of_2_.C3.97_2_matrices&quot;&gt;formula for inverses of \(2 \times 2\) matrices&lt;/a&gt;, has inverse

    \[
      A^{-1} =
      \begin{pmatrix}
        \frac{ab(a+1)}{b-a} &amp; -\frac{a(a+1)(b+1)}{b-a} \\
        -\frac{ab(b+1)}{b-a} &amp; \frac{b(a+1)(b+1)}{b-a}
      \end{pmatrix}\text{.}
    \]

    These are all the possible square submatrices of \(P\), so
    therefore this \(P\) is a optimal parity matrix for \(m = 2\).&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Note that our first parity matrix for \(n = 3, m = 2\)
  isn&amp;rsquo;t a Cauchy matrix, since no Cauchy matrix can have
  repeating elements in a single row. That means that there
  are other possible optimal parity matrices that aren&amp;rsquo;t
  Cauchy matrices.&lt;sup&gt;&lt;a href=&quot;#fn7&quot; id=&quot;r7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Also, our previous parity matrices had integers, and
  Cauchy matrices have rational numbers (i.e.,
  fractions). This means that our parity numbers are now
  fractions. This isn&amp;rsquo;t a serious difference, though,
  since we&amp;rsquo;d have to deal with fractions when
  computing matrix inverses anyway. You could also change a
  parity matrix with fractions into one without by simply
  multiplying the entire matrix by some non-zero number that gets
  rid of all the fractions, which doesn&amp;rsquo;t change the
  optimality of the matrix. For example, we can multiply

  \[
    \begin{pmatrix}
      1/3 &amp; 1/2 &amp; 1 \\
      1/4 &amp; 1/3 &amp; 1/2
    \end{pmatrix}
  \]

  by \(12\) to get the equivalent parity matrix

  \[
    \begin{pmatrix}
      4 &amp; 6 &amp; 12 \\
      3 &amp; 4 &amp; 6
    \end{pmatrix}\text{.}
  \]
&lt;/p&gt;

&lt;p&gt;Now our only remaining missing piece is this: how do we
  compute parity bytes instead of parity numbers? Answering
  this would render the above discussion moot. However, to do
  so, we first have to take another look at how we&amp;rsquo;re
  doing linear algebra.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;8. Linear algebra over fields&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;We ultimately want our parity numbers to be parity bytes, which
  means that we want to work with matrices of bytes instead of
  matrices of rational numbers. In order to do that, we need to define
  an interface for matrix elements that preserves the operations and
  properties we care about, and then we have to figure out how to
  implement that interface using bytes.&lt;/p&gt;

&lt;p&gt;Looking at the rule for matrix multiplication, we need to be able
  to add and multiply matrix elements. Looking at how we do matrix
  inversion, we also need to be able to subtract and divide matrix
  elements. Finally, there are certain properties that hold for
  rational numbers that we implicitly assume when doing matrix
  operations, but that we have to make explicit for matrix elements.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;This leads us to the concept of a &lt;em&gt;field&lt;/em&gt;, which
  essentially defines the interface that matrix elements
  should implement. Here it is:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;interface Field&amp;lt;T&amp;gt; {
  static Zero: T, One: T

  plus(b: T): T
  negate(): T

  times(b: T): T
  reciprocate(): T

  equals(b: T): bool

  minus(b: T) = this.plus(b.negate())
  dividedBy(b: T) = this.times(b.reciprocate())
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We need to be able to add and multiply field elements,
  which we&amp;rsquo;ll denote generically by \(\oplus\) and \(\otimes\). We
  also need to be able to take the negation (additive inverse) of an element \(x\),
  which we&amp;rsquo;ll denote by \(-x\), and the reciprocal (multiplicative inverse) of a
  non-zero element \(x\), which we&amp;rsquo;ll denote by
  \(x^{-1}\). Then we can define subtraction of field elements to be

  \[
    a \ominus b = a \oplus -b
  \]

  and division of field elements to be

  \[
    a \cldiv b = a \otimes b^{-1}\text{,}
  \]

  when \(b \ne 0\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Also, an implementation of &lt;code&gt;Field&lt;/code&gt; must satisfy further
properties, which are copied from the number laws you learn in school:

&lt;ul&gt;
  &lt;li&gt;Identities: \(a \oplus 0 = a \otimes 1 = a\).&lt;/li&gt;
  &lt;li&gt;Inverses: \(a \oplus -a = 0\), and for \(a \ne 0\), \(a
    \otimes a^{-1} = 1\).&lt;/li&gt;
  &lt;li&gt;Associativity: \((a \oplus b) \oplus c = a \oplus (b
    \oplus c)\), and \((a \otimes b) \otimes c = a \otimes (b
    \otimes c)\).&lt;/li&gt;
  &lt;li&gt;Commutativity: \(a \oplus b = b \oplus a\), and \(a \otimes
    b = b \otimes a\).&lt;/li&gt;
  &lt;li&gt;Distributivity: \(a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)\).&lt;/li&gt;
&lt;/ul&gt;

Of the above, guaranteeing the existence of reciprocals of
  non-zero elements is usually the non-trivial part. Now the
  rational numbers satisfy all of the above, since

  \[
    (p/q)^{-1} = q/p\text{,}
  \]

  so we say that they &lt;em&gt;form a field&lt;/em&gt;. However, the integers
  &lt;em&gt;do not&lt;/em&gt; form a field, since for example \(2\) has no
  integer reciprocal; only \(1\) and \(-1\) have integer
  reciprocals. Furthermore, as we saw above, the integers
  modulo \(256\), i.e. the numbers from \(0\) to \(255\) with
  standard arithmetic operations modulo \(256\), do not form a
  field, as we saw earlier, since \((2 \cdot b) \bmod 256 \ne
  1\) for any \(b\).&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;However, we can construct a field with \(257\) elements, using the
fact that \(257\) is a prime number, and the following theorem:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot; id=&quot;theorem-3&quot;&gt;Theorem&amp;nbsp;3&lt;/span&gt;.)
  Given a prime number \(p\), for every integer \(0 \lt a \lt p\),
  there is exactly one \(0 \lt b \lt p\) such that \((a \cdot b) \bmod
  p = 1\).&lt;/div&gt;

There are clever ways to find multiplcative inverses mod \(p\), but
  since \(257\) is so small, we can just brute-force it. So an
  implementation would look like:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;class Field257Element : implements Field&amp;lt;Field257Element&amp;gt; {
  plus(b) { return (this + b) % 257 }
  negate() { return (257 - this) }
  times(b) { return (this * b) % 257 }
  reciprocate() {
    if (this == 0) { return Error }
    for i := 0 to 256 {
      if (this.times(b) == 1) { return i; }
    }
    return Error
  }
  ...
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;field257Demo&quot;&gt;
  &lt;h3&gt;Example 7: Field with 257 elements&lt;/h3&gt;
  Denote operations on the field with 257
  elements by a &lt;sub&gt;257&lt;/sub&gt; subscript, and let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; = 23
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;b&lt;/var&gt; = 54.
  &lt;/span&gt;
  Then
  &lt;ul&gt;
    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; +&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = (23 + 54) mod 257 = &lt;span class=&quot;result&quot;&gt;77&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &amp;minus;&lt;sub&gt;257&lt;/sub&gt;&lt;var&gt;b&lt;/var&gt; = (257 &amp;minus; 54) mod 257 = &lt;span class=&quot;result&quot;&gt;203&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;minus;&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = &lt;var&gt;a&lt;/var&gt; +&lt;sub&gt;257&lt;/sub&gt; &amp;minus;&lt;sub&gt;257&lt;/sub&gt;&lt;var&gt;b&lt;/var&gt; = (23 + 203) mod 257 = &lt;span class=&quot;result&quot;&gt;226&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;times;&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = (23 &amp;times; 54) mod 257 = &lt;span class=&quot;result&quot;&gt;214&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        54 &amp;times;&lt;sub&gt;257&lt;/sub&gt; 119 = 1,
      &lt;/span&gt;
      so
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;b&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;&lt;sub&gt;257&lt;/sub&gt; = &lt;span class=&quot;result&quot;&gt;119&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;divide;&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = &lt;var&gt;a&lt;/var&gt; &amp;times;&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;&lt;sub&gt;257&lt;/sub&gt; = (23 &amp;times; 119) mod 257 = &lt;span class=&quot;result&quot;&gt;167&lt;/span&gt;,
      &lt;/span&gt;
      and indeed
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;b&lt;/var&gt; &amp;times;&lt;sub&gt;257&lt;/sub&gt; (&lt;var&gt;a&lt;/var&gt; &amp;divide;&lt;sub&gt;257&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt;) = (54 &amp;times; 167) mod 257 = 23 = &lt;var&gt;a&lt;/var&gt;.
      &lt;/span&gt;
    &lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;field257Demo&apos;);
  render(h(Field257Demo, {
    initialA: &apos;23&apos;, initialB: &apos;54&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 7: Field with 257 elements&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;div class=&quot;p&quot;&gt;So this gets us closer, since we can use &lt;code&gt;Field257Element&lt;/code&gt; instead
of a rational number type when implementing &lt;code&gt;ComputeParity&lt;/code&gt; and &lt;code&gt;ReconstructData&lt;/code&gt;,
and if we&amp;rsquo;ve abstracted our &lt;code&gt;Matrix&lt;/code&gt; type correctly, almost everything should just work. However, there &lt;em&gt;is&lt;/em&gt; one
thing we need to check: Are Cauchy parity matrices still
optimal if we use fields other than the rational numbers? Fortunately, the answer is yes:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem&amp;nbsp;1, general version&lt;/span&gt;.)
  A parity matrix \(P\) over any field is optimal exactly when any
  non-empty square submatrix of \(P\) is invertible.&lt;/div&gt;

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem&amp;nbsp;2, general version&lt;/span&gt;.)
  Any non-empty square Cauchy matrix over any field is invertible.&lt;/div&gt;

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Corollary&amp;nbsp;1, general version&lt;/span&gt;.)
  Any square submatrix of a Cauchy matrix over any field is
  invertible, and thus any Cauchy parity matrix over any field is
  optimal.&lt;/div&gt;

However, note that to construct an \(m \times n\) Cauchy matrix, we
  need \(m + n\) distinct elements. So if we&amp;rsquo;re working with a
  field with \(257\) elements, then this imposes the condition that
  \(m + n \le 257\), i.e. using a finite field limits the number of
  data bytes and parity numbers you can have.&lt;/div&gt;

&lt;p&gt;Now the question remains: can we construct a field with \(256\)
  elements? As we saw above, we can&amp;rsquo;t do so the same way as we
  constructed the field with \(257\) elements. In fact, we need to
  start with defining different arithmetic operations on the
  integers. This brings us to the topic of
  &lt;em&gt;binary carry-less arithmetic&lt;/em&gt;.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;9. Binary carry-less arithmetic&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;The basic idea with binary carry-less (which I&amp;rsquo;ll henceforth
  shorten to &amp;ldquo;carry-less&amp;rdquo;) arithmetic is to express all
  integers in binary, then perform all arithmetic operations using
  binary arithmetic, except ignoring all the carries.&lt;sup&gt;&lt;a href=&quot;#fn8&quot; id=&quot;r8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;How does this work with addition? Let&amp;rsquo;s denote binary
  carry-less add as \(\clplus\),&lt;sup&gt;&lt;a href=&quot;#fn9&quot; id=&quot;r9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt; and let&amp;rsquo;s see how it behaves on single binary digits:

  \[
    \begin{aligned}
      0 \clplus 0 &amp;= 0 \\
      0 \clplus 1 &amp;= 1 \\
      1 \clplus 0 &amp;= 1 \\
      1 \clplus 1 &amp;= 0\text{.}
    \end{aligned}
  \]

  This is just the exclusive or operation on bits, so if we do
  carry-less addition on any two integers, it turns out to be
  nothing but xor! Since xor can also be denoted by \(\clplus\),
  we can conveniently think of \(\clplus\) as meaning both carry-less
  addition and xor.&lt;/p&gt;
        
&lt;div class=&quot;interactive-example&quot; id=&quot;carrylessAddDemo&quot;&gt;
  &lt;h3&gt;Example 8: Carry-less addition&lt;/h3&gt;
  Let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; = 23
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;b&lt;/var&gt; = 54.
  &lt;/span&gt;
  Then, with carry-less arithmetic,
  &lt;pre&gt;  a = 23 =  10111b
^ b = 54 = 110110b
           -------
           100001b&lt;/pre&gt;
  so
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; &amp;oplus; &lt;var&gt;b&lt;/var&gt; = 100001&lt;sub&gt;b&lt;/sub&gt; =
    &lt;span class=&quot;result&quot;&gt;33&lt;/span&gt;.
  &lt;/span&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;carrylessAddDemo&apos;);
  render(h(AddDemo, {
    initialA: &apos;23&apos;, initialB: &apos;54&apos;,
    name: &apos;carrylessAddDemo&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 8: Carry-less addition&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;What about subtraction? Recall that \((a \clplus b) \clplus
  b = a\) for any \(a\) and \(b\). Therefore, every element
  \(b\) is its own (carry-less binary) additive inverse, which
  means that \(a \clminus b = a \clplus b\), i.e. carry-less
  subtraction is also just xor.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Carry-less_product&quot;&gt;Carry-less multiplication&lt;/a&gt;
  isn&amp;rsquo;t as simple, but recall that binary multiplication
  is just adding shifted copies of \(a\) based on which bits
  are set in \(b\) (or vice versa). To do carry-less
  multiplication, just ignore carries when adding the shifted
  copies again, i.e. xor shifted copies instead of adding
  them.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;carrylessMulDemo&quot;&gt;
  &lt;h3&gt;Example 9: Carry-less multiplication&lt;/h3&gt;
  Let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; = 23
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;b&lt;/var&gt; = 54.
  &lt;/span&gt;
  Then, with carry-less arithmetic,
  &lt;pre&gt;   a = 23 =       10111b
^* b = 54 =      110110b
            ------------
                 10111
          ^     10111
          ^   10111
          ^  10111
            ------------
             1111100010b&lt;/pre&gt;
  so
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; &amp;otimes; &lt;var&gt;b&lt;/var&gt; = 1111100010&lt;sub&gt;b&lt;/sub&gt; =
    &lt;span class=&quot;result&quot;&gt;994&lt;/span&gt;.
  &lt;/span&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;carrylessMulDemo&apos;);
  render(h(MulDemo, {
    initialA: &apos;23&apos;, initialB: &apos;54&apos;,
    name: &apos;carrylessMulDemo&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 9: Carry-less multiplication&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Finally, we can define carry-less division with remainder. Binary
  division with remainder is subtracting shifted copies of \(b\) from
  \(a\) until you get a remainder less than the divisor; then
  carry-less binary division with remainder is xor-ing shifted copies
  of \(b\) with \(a\) until you get a remainder. However,
  there&amp;rsquo;s a subtlety; with carry-less arithmetic, it&amp;rsquo;s not
  enough to stop when the remainder (for that step) is less than the
  divisor, because if the highest set bit of the remainder is the same
  as the highest set bit of the divisor, you can still xor with the
  divisor one more time to yield a smaller number (which then becomes
  the final remainder).&lt;/p&gt;

&lt;p&gt;Consider the example below, where we&amp;rsquo;re dividing \(55\) by
  \(19\). The first remainder is \(17\), which is less than \(19\),
  but still shares the same highest set bit, so we can xor one more
  time with \(19\) to get the remainder \(2\).&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;carrylessDivDemo&quot;&gt;
  &lt;h3&gt;Example 10: Carry-less division&lt;/h3&gt;
  Let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; = 55
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;b&lt;/var&gt; = 19.
  &lt;/span&gt;
  Then, with carry-less arithmetic,
  &lt;pre&gt;                     11b
                --------
b = 19 = 10011b )110111b = 55 = a
               ^ 10011
                 -----
                  10001
                ^ 10011
                  -----
                     10b&lt;/pre&gt;
  so
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; &amp;odiv; &lt;var&gt;b&lt;/var&gt; = 11&lt;sub&gt;b&lt;/sub&gt; =
    &lt;span class=&quot;result&quot;&gt;3&lt;/span&gt;
  &lt;/span&gt;
  with remainder
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    10&lt;sub&gt;b&lt;/sub&gt; = &lt;span class=&quot;result&quot;&gt;2&lt;/span&gt;.
  &lt;/span&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;carrylessDivDemo&apos;);
  render(h(DivDemo, {
    initialA: &apos;55&apos;, initialB: &apos;19&apos;,
    name: &apos;carrylessDivDemo&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 10: Carry-less division&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;This leads to an interesting difference between the carry-less
  modulo operation and the standard modulo operation. If you mod by a
  number \(n\), you get \(n\) possible remainders, from \(0\) to \(n -
  1\). However, if you clmod (carry-less mod) by a number \(2^k \le n
  \lt 2^{k+1}\), you get \(2^k\) possible remainders, from \(0\) to
  \(2^k-1\), since those are the numbers whose highest set bit is
  lower than the highest set bit of \(n\).&lt;/p&gt;

&lt;p&gt;In particular, if you clmod by a number \(256 \le n &amp;lt;
  512\), you always get \(256\) possible remainders. This is
  very close to what we want&amp;mdash;now the hope is to find &lt;em&gt;some&lt;/em&gt; \(256
  \le n &amp;lt; 512\) so that doing binary carry-less arithmetic clmod
  \(n\) yields a field, which will then be a field with \(256\)
  elements!&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;10. The finite field with \(256\) elements&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Since there are only a few numbers between \(256\) and \(512\), we
  can just try each one of them to see if clmod-ing by one of them
  yields a field. However, with a bit of math, we can gain more
  insight into which numbers will work.&lt;/p&gt;

&lt;p&gt;Recall the situation with the standard arithmetic
  operations: arithmetic mod \(p\) yields a field exactly when
  \(p\) is prime.&lt;sup&gt;&lt;a href=&quot;#fn10&quot; id=&quot;r10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt; But recall
  the definition of a prime number: it is an integer greater than
  \(1\) whose positive divisors are only itself and \(1\). Stated
  another way, a prime number is an integer \(p \gt 1\) that cannot be
  expressed as \(p = a \cdot b\), for \(a, b \gt 1\).&lt;/p&gt;

&lt;p&gt;Thus, the concept of a prime number is determined by the
  multiplication operation, and therefore we can define a
  &amp;ldquo;carry-less&amp;rdquo; prime number to be an integer \( p \gt 1\)
  that cannot be expressed as \(p = a \clmul b\), for \(a, b \gt 1\).&lt;sup&gt;&lt;a href=&quot;#fn11&quot; id=&quot;r11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The only question remaining is whether there is an equivalent of &lt;a href=&quot;#theorem-3&quot;&gt;Theorem&amp;nbsp;3&lt;/a&gt; for
  carry-less arithmetic. And indeed there is:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem&amp;nbsp;4&lt;/span&gt;.)
  Given a carry-less prime number \(2^k \lt p \le 2^{k+1}\), for every
  integer \(0 \lt a \lt 2^k\), there is a exactly one \(0 \lt b \lt
  2^k\) such that \((a \clmul b) \bclmod p = 1\).&lt;/div&gt;

Now we just need to find a carry-less prime number \(256
  \le p &amp;lt; 512\). However, the set of prime numbers and the
  set of carry-less prime numbers are not necessarily related,
  so for example, even though \(257\) is a prime number, it is &lt;em&gt;not&lt;/em&gt; a
  carry-less prime number.&lt;/div&gt;

&lt;p&gt;It is easy enough to test each number \(256 \le n &amp;lt; 512\) for
  carry-less primality though; doing so, we find the lowest one,
  \(283\).&lt;sup&gt;&lt;a href=&quot;#fn12&quot; id=&quot;r12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;So finally, we have a field with \(256\) elements: the
  integers with binary carry-less arithmetic clmod \(283\). An
  implementation would look like:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;class Field256Element : implements Field&amp;lt;Field256Element&amp;gt; {
  plus(b) { return this ^ b }
  negate() { return b }
  times(b) { return clmod(clmul(this, b), 283) }
  reciprocate() {
    if (this == 0) { return Error }
    for i := 0 to 255 {
      if (this.times(b) == 1) { return i; }
    }
    return Error
  }
  ...
}&lt;/code&gt;&lt;/pre&gt;

  Similarly to how we find reciprocals mod \(257\), we brute-force
  finding reciprocals clmod \(283\) also.&lt;/div&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;field256Demo&quot;&gt;
  &lt;h3&gt;Example 11: Field with 256 elements&lt;/h3&gt;
  Denote operations on the field with 256
  elements by a &lt;sub&gt;256&lt;/sub&gt; subscript, and let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;a&lt;/var&gt; = 23
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;b&lt;/var&gt; = 54.
  &lt;/span&gt;
  Then
  &lt;ul&gt;
    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;oplus;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = 23 &amp;oplus; 54 = &lt;span class=&quot;result&quot;&gt;33&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &amp;ominus;&lt;sub&gt;256&lt;/sub&gt;&lt;var&gt;b&lt;/var&gt; = &lt;var&gt;b&lt;/var&gt; = &lt;span class=&quot;result&quot;&gt;54&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;ominus;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = &lt;var&gt;a&lt;/var&gt; &amp;oplus;&lt;sub&gt;256&lt;/sub&gt; &amp;ominus;&lt;sub&gt;256&lt;/sub&gt;&lt;var&gt;b&lt;/var&gt; = &lt;var&gt;a&lt;/var&gt; &amp;oplus;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = &lt;span class=&quot;result&quot;&gt;33&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;otimes;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = (23 &amp;otimes; 54) clmod 283 = &lt;span class=&quot;result&quot;&gt;207&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        54 &amp;otimes;&lt;sub&gt;256&lt;/sub&gt; 102 = 1,
      &lt;/span&gt;
      so
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;b&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;&lt;sub&gt;256&lt;/sub&gt; = &lt;span class=&quot;result&quot;&gt;102&lt;/span&gt;;
      &lt;/span&gt;
    &lt;/li&gt;

    &lt;li&gt;
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;a&lt;/var&gt; &amp;oslash;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt; = &lt;var&gt;a&lt;/var&gt; &amp;otimes;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;&lt;sub&gt;256&lt;/sub&gt; = (23 &amp;otimes; 102) clmod 283 = &lt;span class=&quot;result&quot;&gt;19&lt;/span&gt;,
      &lt;/span&gt;
      and indeed
      &lt;span style=&quot;white-space: nowrap;&quot;&gt;
        &lt;var&gt;b&lt;/var&gt; &amp;otimes;&lt;sub&gt;256&lt;/sub&gt; (&lt;var&gt;a&lt;/var&gt; &amp;oslash;&lt;sub&gt;256&lt;/sub&gt; &lt;var&gt;b&lt;/var&gt;) = (54 &amp;times; 19) clmod 283 = 23 = &lt;var&gt;a&lt;/var&gt;.
      &lt;/span&gt;
    &lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;field256Demo&apos;);
  render(h(Field256Demo, {
    initialA: &apos;23&apos;, initialB: &apos;54&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 11: Field with 256 elements&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;11. The full algorithm&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now we have all the pieces we need to construct erasure codes for
  any \((n, m)\) such that \(m + n \le 256\). First, we can compute an
  \(m \times n\) Cauchy parity matrix over the field with \(256\)
  elements. (Recall that this needs \(m + n\) distinct field elements,
  which is what imposes the condition \(m + n \le 256\).)&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;cauchyMatrixDemoGeneral&quot;&gt;
  &lt;h3&gt;Example 12: Cauchy matrices in general&lt;/h3&gt;
  Working over the field with 256 elements, let
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;x&lt;/var&gt; = [ 1, 2, 3 ]
  &lt;/span&gt;
  and
  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;y&lt;/var&gt; = [ 4, 5, 6 ].
  &lt;/span&gt;
  Then, the Cauchy matrix constructed from
  &lt;var&gt;x&lt;/var&gt; and &lt;var&gt;y&lt;/var&gt; is
  &lt;pre&gt;/  82 203 209 \
| 123 209 203 |
\ 209 123  82 /,&lt;/pre&gt;
  which has inverse
  &lt;pre&gt;/ 130  31 176 \
| 252 219  31 |
\ 108 252 130 /.&lt;/pre&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;cauchyMatrixDemoGeneral&apos;);
  render(h(CauchyMatrixDemo, {
    initialX: &apos;1, 2, 3&apos;, initialY: &apos;4, 5, 6&apos;, initialFieldType: &apos;gf256&apos;,
    name: &apos;cauchyMatrixDemoGeneral&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 12: Cauchy matrices in general&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    allowFieldTypeChanges: true,
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Then we can implement matrix multiplication over arbitrary fields,
  and thus we can implement &lt;code&gt;ComputeParity&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;computeParityDetailDemo&quot;&gt;
  &lt;h3&gt;Example 13: &lt;code&gt;ComputeParity&lt;/code&gt; in detail&lt;/h3&gt;
  Let

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;d&lt;/var&gt; = [ da, db, 0d ]
  &lt;/span&gt;

  be the input data bytes and let

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;m&lt;/var&gt; = 2
  &lt;/span&gt;

  be the desired parity byte count. Then, with the input byte
  count

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;n&lt;/var&gt; = 3,
  &lt;/span&gt;

  the

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;m&lt;/var&gt; &amp;times; &lt;var&gt;n&lt;/var&gt;
  &lt;/span&gt;

  Cauchy parity matrix computed using

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;x&lt;/var&gt;&lt;sub&gt;i&lt;/sub&gt; = &lt;var&gt;n&lt;/var&gt; + &lt;var&gt;i&lt;/var&gt;
  &lt;/span&gt;

  and

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;y&lt;/var&gt;&lt;sub&gt;i&lt;/sub&gt; = &lt;var&gt;i&lt;/var&gt;
  &lt;/span&gt;

  is
  &lt;pre&gt;/ f6 8d 01 \
\ cb 52 7b /.&lt;/pre&gt;
  Therefore, the parity bytes are computed as
  &lt;pre&gt;                _    _     _    _
/ f6 8d 01 \   |  da  |   |  52  |
\ cb 52 7b / * |  db  | = |_ 0c _|,
               |_ 0d _|&lt;/pre&gt;
  and thus the output parity bytes are

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;p&lt;/var&gt; = [ &lt;span class=&quot;result&quot;&gt;52&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;0c&lt;/span&gt; ].
  &lt;/span&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;computeParityDetailDemo&apos;);
  render(h(ComputeParityDemo, {
    initialD: &apos;da, db, 0d&apos;, initialM: &apos;2&apos;,
    name: &apos;computeParityDetailDemo&apos;,
    detailed: true,
    header: h(&apos;h3&apos;, null, &apos;Example 13: &apos;, h(&apos;code&apos;, null, &apos;ComputeParity&apos;), &apos; in detail&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Then we can implement matrix inversion using row reduction over
  arbitrary fields.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;matrixInverseDemoGeneral&quot;&gt;
  &lt;h3&gt;Example 14: Matrix inversion via row reduction in general&lt;/h3&gt;
  Working over the field with 256 elements, let

  &lt;pre&gt;    / 0 2 2 \
M = | 3 4 5 |
    \ 6 6 7 /.&lt;/pre&gt;

  The initial augmented matrix &lt;var&gt;A&lt;/var&gt; is

  &lt;pre&gt;/ 0 2 2 | 1 0 0 \
| 3 4 5 | 0 1 0 |
\ 6 6 7 | 0 0 1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;00&lt;/sub&gt; to be non-zero, so swap rows &lt;span class=&quot;swap-row-a&quot;&gt;0&lt;/span&gt; and &lt;span class=&quot;swap-row-b&quot;&gt;1&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;swap-row-a&quot;&gt;0 2 2&lt;/span&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;1 0 0&lt;/span&gt; \     / &lt;span class=&quot;swap-row-b&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;swap-row-b&quot;&gt;0 1 0&lt;/span&gt; \
| &lt;span class=&quot;swap-row-b&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;swap-row-b&quot;&gt;0 1 0&lt;/span&gt; | --&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;0 2 2&lt;/span&gt; | &lt;span class=&quot;swap-row-a&quot;&gt;1 0 0&lt;/span&gt; |
\ 6 6 7 | 0 0 1 /     \ 6 6 7 | 0 0 1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;00&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;0&lt;/span&gt; by 3:

  &lt;pre&gt;/ &lt;span class=&quot;divide-row&quot;&gt;3 4 5&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;0 1 0&lt;/span&gt; \     / &lt;span class=&quot;divide-row&quot;&gt;1 245 3&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;0 246 0&lt;/span&gt; \
| 0 2 2 | 1 0 0 | --&gt; | 0  2  2 | 1  0  0 |
\ 6 6 7 | 0 0 1 /     \ 6  6  7 | 0  0  1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;20&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0&lt;/span&gt; scaled by 6 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;2&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1 245 3&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0 246 0&lt;/span&gt; \     / 1 245  3 | 0 246 0 \
| 0  2  2 | 1  0  0 | --&gt; | 0  2   2 | 1  0  0 |
\ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;6  6  7&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  0  1&lt;/span&gt; /     \ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  14 13&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  2  1&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;11&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;1&lt;/span&gt; by 2:

  &lt;pre&gt;/ 1 245  3 | 0 246 0 \     / 1 245  3 |  0  246 0 \
| &lt;span class=&quot;divide-row&quot;&gt;0  2   2&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;1  0  0&lt;/span&gt; | --&gt; | &lt;span class=&quot;divide-row&quot;&gt;0  1   1&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;141  0  0&lt;/span&gt; |
\ 0 14  13 | 0  2  1 /     \ 0 14  13 |  0   2  1 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;21&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1&lt;/span&gt; scaled by 14 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;2&lt;/span&gt;:

  &lt;pre&gt;/ 1 245  3 |  0  246 0 \     / 1 245  3 |  0  246 0 \
| &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  1   1&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;141  0  0&lt;/span&gt; | --&gt; | 0  1   1 | 141  0  0 |
\ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0 14  13&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 0   2  1&lt;/span&gt; /     \ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  0   3&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 7   2  1&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;22&lt;/sub&gt; to be 1, so divide row &lt;span class=&quot;divide-row&quot;&gt;2&lt;/span&gt; by 3, which makes the left side of &lt;var&gt;A&lt;/var&gt; a
  unit upper triangular matrix:

  &lt;pre&gt;/ 1 245  3 |  0  246 0 \     / 1 245  3 |  0  246  0  \
| 0  1   1 | 141  0  0 | --&gt; | 0  1   1 | 141  0   0  |
\ &lt;span class=&quot;divide-row&quot;&gt;0  0   3&lt;/span&gt; |  &lt;span class=&quot;divide-row&quot;&gt;7   2  1&lt;/span&gt; /     \ &lt;span class=&quot;divide-row&quot;&gt;0  0   1&lt;/span&gt; | &lt;span class=&quot;divide-row&quot;&gt;244 247 246&lt;/span&gt; /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;12&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;2&lt;/span&gt; from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1&lt;/span&gt;:

  &lt;pre&gt;/ 1 245  3 |  0  246  0  \     / 1 245  3 |  0  246  0  \
| &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  1   1&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;141  0   0 &lt;/span&gt; | --&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0  1   0&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;121 247 246&lt;/span&gt; |
\ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  0   1&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;244 247 246&lt;/span&gt; /     \ 0  0   1 | 244 247 246 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;02&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;2&lt;/span&gt; scaled by 3 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0&lt;/span&gt;:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 245  3&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 0  246  0 &lt;/span&gt; \     / &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 245  0&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 7  244  1 &lt;/span&gt; \
| 0  1   0 | 121 247 246 | --&gt; | 0  1   0 | 121 247 246 |
\ &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  0   1&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;244 247 246&lt;/span&gt; /     \ 0  0   1 | 244 247 246 /.&lt;/pre&gt;

  We need &lt;var&gt;A&lt;/var&gt;&lt;sub&gt;01&lt;/sub&gt; to be 0, so subtract row &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;1&lt;/span&gt; scaled by 245 from row &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;0&lt;/span&gt;, which makes the left side of &lt;var&gt;A&lt;/var&gt; the identity matrix:

  &lt;pre&gt;/ &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 245  0&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 7  244  1 &lt;/span&gt; \     / &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt;1 0 0&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-dest&quot;&gt; 82  82  82&lt;/span&gt; \
| &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;0  1   0&lt;/span&gt; | &lt;span class=&quot;subtract-scaled-row-src&quot;&gt;121 247 246&lt;/span&gt; | --&gt; | 0 1 0 | 121 247 246 |
\ 0  0   1 | 244 247 246 /     \ 0 0 1 | 244 247 246 /.&lt;/pre&gt;

  Since the left side of &lt;var&gt;A&lt;/var&gt; is the identity matrix, the right side of &lt;var&gt;A&lt;/var&gt; is &lt;var&gt;M&lt;/var&gt;&lt;sup&gt;-1&lt;/sup&gt;. Therefore,

  &lt;pre&gt;         /  82  82  82 \
M^{-1} = | 121 247 246 |
         \ 244 247 246 /.&lt;/pre&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;matrixInverseDemoGeneral&apos;);
  render(h(MatrixInverseDemo, {
    initialElements: &apos;0, 2, 2, 3, 4, 5, 6, 6, 7&apos;, initialFieldType: &apos;gf256&apos;,
    name: &apos;matrixInverseDemoGeneral&apos;,
    header: h(&apos;h3&apos;, null, &apos;Example 14: Matrix inversion via row reduction in general&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    buttonClass: &apos;interactive-example-button&apos;,
    allowFieldTypeChanges: true,
    swapRowAColor: &apos;#dc322f&apos;, // solarized red
    swapRowBColor: &apos;#268bd2&apos;, // solarized blue
    divideRowColor: &apos;#dc322f&apos;, // solarized red
    subtractScaledRowSrcColor: &apos;#268bd2&apos;, // solarized blue
    subtractScaledRowDestColor: &apos;#dc322f&apos;, // solarized red
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;Finally, we can use that to implement &lt;code&gt;ReconstructData&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot; id=&quot;reconstructDataDetailDemo&quot;&gt;
  &lt;h3&gt;Example 15: &lt;code&gt;ReconstructData&lt;/code&gt; in detail&lt;/h3&gt;
  Let

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;d&lt;/var&gt;&lt;sub&gt;partial&lt;/sub&gt; = [ ??, db, ?? ]
  &lt;/span&gt;

  be the input partial data bytes and

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;p&lt;/var&gt;&lt;sub&gt;partial&lt;/sub&gt; = [ 52, 0c ]
  &lt;/span&gt;

  be the input partial parity bytes. Then, with the data byte
  count

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;n&lt;/var&gt; = 3
  &lt;/span&gt;

  and the parity byte count

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;m&lt;/var&gt; = 2,
  &lt;/span&gt;

  and appending the rows of the

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;m&lt;/var&gt; &amp;times; &lt;var&gt;n&lt;/var&gt;
  &lt;/span&gt;

  Cauchy parity matrix to the

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;n&lt;/var&gt; &amp;times; &lt;var&gt;n&lt;/var&gt;
  &lt;/span&gt;

  identity matrix, we get

  &lt;pre&gt;/ X01X X00X X00X \
|  00   01   00  |
| X00X X00X X01X |
|  f6   8d   01  |
\  cb   52   7b  /,&lt;/pre&gt;

  where the rows corresponding to the unknown data and parity
  bytes are crossed out. Taking the first &lt;var&gt;n&lt;/var&gt; rows that
  aren&amp;rsquo;t crossed out, we get the square matrix

  &lt;pre&gt;/ 00 01 00 \
| f6 8d 01 |
\ cb 52 7b /&lt;/pre&gt;

  which has inverse

  &lt;pre&gt;/ 01 d0 d6 \
| 01 00 00 |
\ 7b b8 bb /.&lt;/pre&gt;

  Therefore, the data bytes are reconstructed from the first
  &lt;var&gt;n&lt;/var&gt; known data and parity bytes as

  &lt;pre&gt;                _    _     _    _
/ 01 d0 d6 \   |  db  |   |  da  |
| 01 00 00 | * |  52  | = |  db  |
\ 7b b8 bb /   |_ 0c _|   |_ 0d _|,&lt;/pre&gt;

  and thus the output data bytes are

  &lt;span style=&quot;white-space: nowrap;&quot;&gt;
    &lt;var&gt;d&lt;/var&gt; = [ &lt;span class=&quot;result&quot;&gt;da&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;db&lt;/span&gt;, &lt;span class=&quot;result&quot;&gt;0d&lt;/span&gt; ].
  &lt;/span&gt;
&lt;/div&gt;
&lt;script&gt;
&apos;use strict&apos;;
(function() {
  const { h, render } = window.preact;
  const root = document.getElementById(&apos;reconstructDataDetailDemo&apos;);
  render(h(ReconstructDataDemo, {
    initialPartialD: &apos;??, db, ??&apos;, initialPartialP: &apos;52, 0c&apos;,
    name: &apos;reconstructDataDetailDemo&apos;,
    detailed: true,
    header: h(&apos;h3&apos;, null, &apos;Example 15: &apos;, h(&apos;code&apos;, null, &apos;ReconstructData&apos;), &apos; in detail&apos;),
    containerClass: &apos;interactive-example&apos;,
    inputClass: &apos;parameter&apos;,
    resultColor: &apos;#268bd2&apos;, // solarized blue
  }), root.parent, root);
})();
&lt;/script&gt;

&lt;p&gt;And we&amp;rsquo;re done!&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
  &lt;h2&gt;12. Further reading&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Next time we&amp;rsquo;ll talk about the PAR1 file format, which is a
  practical implementation of an erasure code very similar to the one
  described above, and the various challenges to make it perform well
  on sets of large files.&lt;/p&gt;

&lt;p&gt;Also, for those of you interested in the mathematical details,
  I&amp;rsquo;ll also write a companion article. (This article is already
  quite long!)&lt;/p&gt;

&lt;p&gt;I gave &lt;a href=&quot;./magic-erasure-codes&quot;&gt;a 15-minute
  presentation&lt;/a&gt; for &lt;a href=&quot;https://wafflejs.com&quot;&gt;WaffleJS&lt;/a&gt; covering
  the same topics as this article but at a higher-level and more
  informally.&lt;/p&gt;

&lt;p&gt;I got the idea for explaining the finite field with \(256\)
  elements in terms of binary carry-less arithmetic from &lt;a href=&quot;http://www.zlib.net/crc_v3.txt&quot;&gt;A
  Painless Guide to CRC Error Detection Algorithms&lt;/a&gt;, which is an
  excellent document in its own right.&lt;/p&gt;

&lt;p&gt;Most sources below use Vandermonde matrices, which I plan to cover
  in the next article on PAR1, instead of Cauchy matrices. Cauchy
  matrices are more foolproof, which is why I started with
  them. templexxx, whose Go implementation I cite below, &lt;a href=&quot;http://www.templex.xyz/blog/101/cauchy.html&quot;&gt;feels the same way&lt;/a&gt;. (His
  blog post is in Chinese, but using &lt;a href=&quot;https://translate.google.com/&quot;&gt;Google Translate&lt;/a&gt; or
  a similar service translates it well enough to English.)&lt;/p&gt;

&lt;p&gt;I started learning about erasure codes from &lt;a href=&quot;https://web.eecs.utk.edu/~plank/&quot;&gt;James
  Plank&amp;rsquo;s&lt;/a&gt; papers. See &lt;a href=&quot;https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf&quot;&gt;A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like systems&lt;/a&gt;, but also make sure to read the very important &lt;a href=&quot;https://web.eecs.utk.edu/~plank/plank/papers/CS-03-504.pdf&quot;&gt;correction&lt;/a&gt; to it! &lt;a href=&quot;http://web.eecs.utk.edu/~plank/plank/papers/CS-05-569.pdf&quot;&gt;Optimizing
  Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications&lt;/a&gt; covers
  Cauchy matrices, although in a slightly different context. The first
  part of Plank&amp;rsquo;s &lt;a href=&quot;http://web.eecs.utk.edu/~plank/plank/classes/cs560/560/notes/Erasure/2004-ICL.pdf&quot;&gt;All About Erasure Codes&lt;/a&gt; slides
  also contains a good overview of the encoding/decoding process,
  including a nifty color-coded matrix diagram.&lt;/p&gt;

&lt;p&gt;As for implementations, &lt;a href=&quot;https://github.com/klauspost/reedsolomon&quot;&gt;klauspost&lt;/a&gt; and &lt;a href=&quot;https://github.com/templexxx/reedsolomon&quot;&gt;templexxx&lt;/a&gt; have
  good ones written in Go. They were in turn inspired by &lt;a href=&quot;https://github.com/Backblaze/JavaReedSolomon&quot;&gt;Backblaze&amp;rsquo;s Java implementation&lt;/a&gt;. &lt;a href=&quot;https://www.backblaze.com/blog/reed-solomon/&quot;&gt;Backblaze&amp;rsquo;s
  accompanying blog post&lt;/a&gt; is also a good overview of the topic. The
  toy JS implementation powering the demos on this page are also
  available on &lt;a href=&quot;https://github.com/akalin/intro-erasure-codes&quot;&gt;my GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://people.cs.clemson.edu/~westall/851/rs-code.pdf&quot;&gt;An Introduction to Galois Fields and Reed-Solomon Coding&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#fn13&quot; id=&quot;r13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt; covers
  much of the same material as I do, albeit assuming slightly more
  mathematical background.&lt;/p&gt;

&lt;p&gt;Going further afield, &lt;a href=&quot;https://research.swtch.com/field&quot;&gt;Russ Cox&lt;/a&gt;, &lt;a href=&quot;https://jeremykun.com/2015/03/23/the-codes-of-solomon-reed-and-muller/&quot;&gt;Jeremy Kun&lt;/a&gt;, and &lt;a href=&quot;https://www.nayuki.io/page/reed-solomon-error-correcting-code-decoder&quot;&gt;Nayuki&lt;/a&gt;
  also wrote about finite fields and Reed-Solomon codes.&lt;/p&gt;
&lt;/section&gt;

&lt;hr /&gt;

&lt;p class=&quot;thanks&quot;&gt;Thanks to Ying-zong Huang, Ryan Hitchman, Charles
  Ellis, and Josh Gao for comments/corrections/discussion.&lt;/p&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
&lt;header&gt;
  &lt;h2&gt;Footnotes&lt;/h2&gt;
&lt;/header&gt;

&lt;p id=&quot;fn1&quot;&gt;[1] This discussion of linear algebra is necessarily
  abbreviated for our purposes. For a more general but still basic
  treatment, see &lt;a href=&quot;https://www.khanacademy.org/math/linear-algebra/matrix-transformations&quot;&gt;Khan Academy&lt;/a&gt;. &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn2&quot;&gt;[2] Here and throughout this document, I index vectors and
  matrices starting with \(0\), to better match array indices in
  code. Most math texts index vectors and matrices starting at \(1\). &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn3&quot;&gt;[3] Now would be a good time to talk about the conventions
  I and other texts use. Following &lt;a href=&quot;https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf&quot;&gt;Plank&lt;/a&gt;,
  I use \(n\) for the data byte count and \(m\) for the parity
  byte count, and I represent arrays and vectors as &lt;em&gt;column vectors&lt;/em&gt;, where multiplication with a matrix is done with the column vector on the &lt;em&gt;right&lt;/em&gt;,
  which is the standard in most of math. However, in coding theory,
  \(k\) is used for the data byte count, which they call the &lt;em&gt;message length&lt;/em&gt;, and \(n\) is used for the sum of the data and parity byte counts, which they call the &lt;em&gt;codeword length&lt;/em&gt;. Furthermore,
  contrary to the rest of math, coding theory treats arrays and
  vectors as &lt;em&gt;row vectors&lt;/em&gt;, where multiplication with a matrix
  is done with the row vector on the &lt;em&gt;left&lt;/em&gt;, and the matrix used would be
  the transpose of the matrix that would be used with a column
  vector. &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn4&quot;&gt;[4] Khan Academy has a &lt;a href=&quot;https://www.khanacademy.org/math/algebra-home/alg-matrices/alg-determinants-and-inverses-of-large-matrices/v/inverting-matrices-part-3&quot;&gt;video stepping through an example&lt;/a&gt; for a \(3 \times 3\) matrix. &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn5&quot;&gt;[5] People with experience in coding theory might
  recognize that a parity matrix \(P\) being optimal is equivalent to
  the corresponding erasure code being &lt;a href=&quot;https://en.wikipedia.org/wiki/Singleton_bound#MDS_codes&quot;&gt;MDS&lt;/a&gt;. &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn6&quot;&gt;[6] An equivalent statement which is easier to see is that
  if a row could be expressed as a linear combination of other rows,
  then one would be able to construct a non-empty square submatrix of
  \(P\) with those rows, which would then be non-invertible. &lt;a href=&quot;#r6&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn7&quot;&gt;[7] It is instead a (transposed) &lt;a href=&quot;https://en.wikipedia.org/wiki/Vandermonde_matrix&quot;&gt;&lt;em&gt;Vandermonde matrix&lt;/em&gt;&lt;/a&gt;,
  which we&amp;rsquo;ll cover when we talk about the PAR1 file format in a
  follow-up article. &lt;a href=&quot;#r7&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn8&quot;&gt;[8] People with experience in abstract algebra might
  recognize this as &lt;a href=&quot;https://en.wikipedia.org/wiki/Finite_field_arithmetic#Effective_polynomial_representation&quot;&gt;arithmetic over \(\mathbb{F}_2[x]\)&lt;/a&gt;,
  the polynomials with coefficients in the finite field with
  \(2\) elements. &lt;a href=&quot;#r8&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn9&quot;&gt;[9] Our use of \(\clplus\), \(\clminus\), \(\clmul\), and
  \(\cldiv\) to denote carry-less arithmetic clashes with our use of
  the same symbols to denote generic field operations. However,
  we&amp;rsquo;ll never need to talk about both at the same time, so
  whichever one we mean should be obvious in context. &lt;a href=&quot;#r9&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn10&quot;&gt;[10] This is a slightly stronger statement than &lt;a href=&quot;#theorem-3&quot;&gt;Theorem&amp;nbsp;3&lt;/a&gt;. &lt;a href=&quot;#r10&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn11&quot;&gt;[11] People with experience in abstract algebra might recognize carry-less primes as &lt;a href=&quot;https://en.wikipedia.org/wiki/Irreducible_element&quot;&gt;irreducible
  elements&lt;/a&gt; of \(\mathbb{F}_2[x]\). &lt;a href=&quot;#r11&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn12&quot;&gt;[12] Coincidentally, \(283\) is also a regular prime
  number. Using another carry-less prime number \(256 \le p \lt 512\)
  would also yield a field with \(256\) elements, but is important to
  consistently use the same carry-less modulus; different carry-less
  moduli lead to fields with \(256\) elements that are &lt;em&gt;isomorphic&lt;/em&gt;, but not identical.&lt;/p&gt;

&lt;p&gt;Borrowing &lt;a href=&quot;https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks#Polynomial_representations&quot;&gt;notation from CRCs&lt;/a&gt;,
  the carry-less modulus is sometimes represented as a hexadecimal
  number with the leading digit (which is always \(1\)) omitted. For
  example, \(283\) would be represented as \(\mathtt{0x1b}\), and we can say
  that we&amp;rsquo;re using the field with \(256\) elements &lt;em&gt;defined
  by&lt;/em&gt;
  \(\mathtt{0x1b}\). &lt;a href=&quot;#r12&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn13&quot;&gt;[13] &lt;em&gt;Galois field&lt;/em&gt; is just another name for finite field. &lt;a href=&quot;#r13&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/quintic-unsolvability</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/quintic-unsolvability"/>
    <title>Why is the Quintic Unsolvable?</title>
    <updated>2016-09-26T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/jsxgraph.css&quot; /&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdnjs.cloudflare.com/ajax/libs/jsxgraph/0.99.5/jsxgraphcore.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/complex.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/complex_poly.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/animation.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/rotation_counter.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/display.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/complex_formula.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/quadratic.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/cubic.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/quartic.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/abel-ruffini-topological-proof@b8e50dd/quintic.js&quot;&gt;&lt;/script&gt;

&lt;!-- KaTeX messes up axes labels, for some reason, so remember to surround a
     jxgbox div with &lt;nokatex&gt;&lt;/nokatex&gt;. --&gt;
&lt;style&gt;
.graph {
  display: block;
  width: 300px;
  height: 300px;
  margin: 0.5em 0.2em;
}

.graph-container {
  display: inline-block;
  vertical-align: top;
  max-width: 300px;
}
&lt;/style&gt;

&lt;p&gt;&lt;em&gt;(This was discussed on &lt;a href=&quot;https://www.reddit.com/r/math/comments/57n07e/why_is_the_quintic_unsolvable/&quot;&gt;r/math&lt;/a&gt; and &lt;a href=&quot;https://news.ycombinator.com/item?id=14685466&quot;&gt;Hacker News&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;1. Overview&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;In this article, I hope to convince you that the quintic equation
  is unsolvable, in the sense that I can&amp;rsquo;t write down the solution
  to the equation
  \[
    ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0
  \]
  using only addition, subtraction, multiplication, division, raising
  to an integer power, and taking an integer root. In fact, I hope to
  go further and explain how this is true for the same reason
  that I can&amp;rsquo;t write down the solution to the equation
  \[
    ax^2 + bx + c = 0
  \]
  using only the first five operations above!&lt;/p&gt;

&lt;p&gt;The usual approach to the above claim involves a semester&amp;rsquo;s
  worth of abstract algebra and Galois theory. However, there&amp;rsquo;s
  a much easier and shorter proof which involves only a bit of group
  theory and complex analysis&amp;mdash;enough to fit in a blog
  post&amp;mdash;and some interactive
  visualizations.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;2. Quadratic Equations&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Let&amp;rsquo;s start with quadratic equations, which hopefully you all
remember from high school. Given two complex numbers \(r_1\) and
\(r_2\), you can determine the quadratic equation whose solutions are
\(r_1\) and \(r_2\), namely

\[
  (x - r_1)(x - r_2) = x^2 - (r_1 + r_2) x + r_1 r_2 = 0\text{.}
\]

If we take the standard form of a quadratic equation to be

\[
  a x^2 + bx + c = 0\text{,}
\]

then we can define a function from \(r_1\) and \(r_2\) to \(a\), \(b\),
and \(c\), which is shown by the first two panels in the visualization below;
drag either of the points \(r_1\) and \(r_2\) and notice how \(b\) and
\(c\) move (\(a\) will always remain fixed at \(1\)).&lt;/p&gt;

&lt;p&gt;Now pretend that we misremember the quadratic formula as

\[
  x_{1, 2} = \frac{-b ± b^2 - 4ac}{4a}\text{.}
\]

The results of this formula&amp;mdash;our candidate solution&amp;mdash;are
shown in the third panel. Note that since \(x_1\) and \(x_2\) depend
on \(a\), \(b\), and \(c\), which all depend on \(r_1\) and \(r_2\),
they also move when you drag either \(r_1\) and \(r_2\)&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 1: An incorrect quadratic formula&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardQuad1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
    &lt;button class=&quot;interactive-example-button quad1DisableWhileSwapping&quot;
            type=&quot;button&quot; onclick=&quot;quad1.swap();&quot;&gt;
      Swap \(r_1\) and \(r_2\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardQuad1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardQuad1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

function runOp(display, op, time, disableSelector, state, doneCallback) {
  if (state.running) {
    return;
  }
  state.running = true;

  var oldFixed = display.setRootsFixed(true);

  var elems = document.querySelectorAll(disableSelector);
  for (var i = 0; i &lt; elems.length; ++i) {
    elems[i].disabled = true;
  }

  op.run(time, function() {
    state.running = false;

    display.setRootsFixed(oldFixed);

    for (var i = 0; i &lt; elems.length; ++i) {
      elems[i].disabled = false;
    }
    if (doneCallback !== undefined) {
      doneCallback();
    }
  });
}

var incorrectQuadraticFormula = (function() {
  var a = ComplexFormula.select(-1);
  var b = ComplexFormula.select(-2);
  var x1 = b.neg().plus(quadraticDiscriminantFormula).div(a.times(4));
  var x2 = b.neg().minus(quadraticDiscriminantFormula).div(a.times(4));
  return x1.concat(x2);
})();

var quad1 = (function() {
  var initialRoots = [ new Complex(1, 0), new Complex(-1, 0) ];

  var display = new Display(
    &quot;rootBoardQuad1&quot;, &quot;coeffBoardQuad1&quot;, &quot;formulaBoardQuad1&quot;, initialRoots,
    incorrectQuadraticFormula, function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  var op = display.swapRootOp(0, 1, function() {});

  function swap() {
    runOp(display, op, 1000, &apos;.quad1DisableWhileSwapping&apos;, {});
  };

  return {
    display: display,
    swap: swap
  };
})();
&lt;/script&gt;

&lt;p&gt;Now this formula looks right, since \(x_1\) and \(x_2\) are at the
  same coordinates as \(r_1\) and \(r_2\). However, if you move
  \(r_1\) or \(r_2\) around, you can easily convince yourself that
  this formula can&amp;rsquo;t be right, since \(x_1\) and \(x_2\)
  don&amp;rsquo;t move in the same way.&lt;/p&gt;

&lt;p&gt;Now if you remember from high school, the real quadratic formula
  involves taking a square root, and since our candidate solution
  doesn&amp;rsquo;t do that, that means it&amp;rsquo;s probably incorrect. I
  say &amp;ldquo;probably&amp;rdquo; because there&amp;rsquo;s no immediate reason
  why there can&amp;rsquo;t be &lt;em&gt;multiple&lt;/em&gt; quadratic formulas, some
  simpler than others, of which one is simple enough to not need a
  square root. From manipulating \(r_1\) and \(r_2\), we know that our
  candidate formula is incorrect, but that doesn&amp;rsquo;t immediately
  follow from it not having a square root.&lt;/p&gt;

&lt;p&gt;Fortunately, there is a general way to rule out candidate solutions
  that are similar to the one above, namely those that use only
  addition, subtraction, multiplication, division, and raising to an
  integer power; we&amp;rsquo;ll call these &lt;em&gt;rational expressions&lt;/em&gt;. Here&amp;rsquo;s
  how it goes: if you press the button to swap \(r_1\) and \(r_2\),
  which moves \(r_1\) to \(r_2\)&amp;rsquo;s position and vice versa,
  \(a\), \(b\), and \(c\) move from their starting positions but
  return once \(r_1\) and \(r_2\) reach their destinations. This makes
  sense, because the coefficients of a polynomial don&amp;rsquo;t depend
  on how you order the roots. But since \(x_1\) and \(x_2\) depend
  only on \(a\), \(b\), and \(c\), they too must loop back to their
  starting positions.&lt;/p&gt;

&lt;p&gt;But that means that our candidate solution cannot be the quadratic
  formula! If it were, then \(x_1\) and \(x_2\) would have ended up
  swapped, too. Instead, they went back to their starting positions,
  which is a contradiction. This reasoning holds for any expression
  which is a &lt;em&gt;single-valued&lt;/em&gt; function of \(a\), \(b\), and \(c\),
  so in particular this holds for rational expressions.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Let&amp;rsquo;s summarize our reasoning in a theorem:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 1&lt;/span&gt;.) A
  rational expression&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; in the coefficients of the general quadratic
  equation
  \[
  ax^2 + bx + c = 0
  \]
  cannot be a solution to this equation.&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Sketch of proof.&lt;/span&gt; Assume to the
  contrary that the rational expression \(x = f(a, b, c)\) is a
  solution. Assume that we start with \(r_1 = z_1\) and \(r_2 = z_2
  \ne z_1\), and without loss of generality assume that we start with
  \(x = z_1\).&lt;/p&gt;

&lt;p&gt;Run \(r_1\) and \(r_2\) along continuous paths that swap their two
  positions, i.e. make \(r_1\) head from \(z_1\) to \(z_2\)
  continuously, and at the same time make \(r_2\) head from \(z_2\) to
  \(z_1\) continuously, and make sure to pick paths such that \(r_1\)
  and \(r_2\) never coincide.&lt;/p&gt;

&lt;p&gt;Since \(a\), \(b\), and \(c\) are continuous functions of \(r_1\)
and \(r_2\), and \(x\) is a rational function of \(a\), \(b\) and
\(c\), and thus continuous, \(x\) then depends continuously on \(r_1\)
and \(r_2\). Thus, since we start with \(x = r_1 = z_1\), and \(r_1\)
never coincides with \(r_2\), then as \(r_1\) moves, \(x = r_1\) must
continue to hold, since \(x\) is a solution, and therefore
\(x\)&amp;rsquo;s final position must be the same as \(r_1\)&amp;rsquo;s,
which is \(z_2\).&lt;/p&gt;

&lt;p&gt;However, since the coefficients \(a\), \(b\), and \(c\) don&amp;rsquo;t
  depend on the ordering of \(r_1\) and \(r_2\), then their final
  positions are the same as their initial positions. Since \(x\) is a
  function of only \(a\), \(b\), and \(c\), its final position also
  must be the same as its initial position, \(z_1\). This contradicts
  the above, and therefore \(x\) cannot be a solution. &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;

Now consider the candidate solution

\[
  x_{1,2} = \sqrt{b^2 - 4ac}\text{.}
\]

This isn&amp;rsquo;t a rational expression since it has a square root. In
particular, in the visualization below, it behaves quite differently
from our first candidate solution. First, even though we have just a
single expression, it yields two points \(x_1\) and \(x_2\). Second,
and more surprisingly, if you swap \(r_1\) and \(r_2\), \(x_1\) and
\(x_2\) also exchange places, seemingly contradicting Theorem&amp;nbsp;1!
What is going on?
&lt;/div&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 2: The quadratic equation&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardQuad2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
    &lt;button class=&quot;interactive-example-button quad2DisableWhileSwapping&quot;
            type=&quot;button&quot; onclick=&quot;quad2.swap();&quot;&gt;
      Swap \(r_1\) and \(r_2\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardQuad2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardQuad2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;label&gt;
      &lt;input class=&quot;quad2DisableWhileSwapping&quot; name=&quot;quad2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;quad2.switchFormula(incorrectQuadraticFormula);&quot; /&gt;
      \(x_{1, 2} = \frac{-b \pm b^2 - 4ac}{4a}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quad2DisableWhileSwapping&quot; name=&quot;quad2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;quad2.switchFormula(quadraticDiscriminantFormula);&quot; /&gt;
      \(x_1 = b^2 - 4ac\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;quad2DisableWhileSwapping&quot; name=&quot;quad2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;quad2.switchFormula(quadraticDiscriminantFormula.root(2));&quot; /&gt;
      \(x_{1, 2} = \sqrt{b^2 - 4ac}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quad2DisableWhileSwapping&quot; name=&quot;quad2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;quad2.switchFormula(newQuadraticFormula());&quot; /&gt;
      \(x_{1, 2} = \frac{-b + \sqrt{b^2 - 4ac}}{2a}\)
      &lt;br /&gt;
      (the quadratic formula)
    &lt;/label&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

function switchFormula(display, state, formula) {
  if (state.running) {
    return;
  }
  var numResults = display.setFormula(formula);
}

var quad2 = (function() {
  var initialRoots = [ new Complex(1, 0), new Complex(0, 1) ];

  var display = new Display(
    &quot;rootBoardQuad2&quot;, &quot;coeffBoardQuad2&quot;, &quot;formulaBoardQuad2&quot;, initialRoots,
    quadraticDiscriminantFormula.root(2), function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  var op = display.swapRootOp(0, 1, function() {});

  var state = {};

  function swap() {
    runOp(display, op, 1000, &apos;.quad2DisableWhileSwapping&apos;, state);
  }

  function switchQuadFormula(formula) {
    switchFormula(display, state, formula);
  }

  return {
    display: display,
    swap: swap,
    switchFormula: switchQuadFormula
  };
})();
&lt;/script&gt;

&lt;p&gt;To answer this, we first need to review some facts about complex
numbers. Recall that a complex number \(z\) can be expressed in polar
coordinates, where it has a length \(r\) and an angle \(θ\), and
that it can be converted to the usual Cartesian coordinates using &lt;a href=&quot;https://en.wikipedia.org/wiki/Euler%27s_formula&quot;&gt;Euler&amp;rsquo;s formula&lt;/a&gt;:
\[
  z = r e^{iθ} = r \cos θ + i \, r \sin θ\text{.}
\]
Then, if you have two complex numbers \(z_1 = r_1 e^{iθ_1}\) and
\(z_2 = r_2 e^{iθ_2}\) in polar form, you can multiply them by
  multiplying their lengths, and adding their angles:
\[
  z_1 z_2 = r_1 r_2 e^{i (θ_1 + θ_2)}\text{.}
\]
  So a square root of a complex number \(z = r e^{iθ}\) is just
  \(\sqrt{r} e^{iθ/2}\), as you can easily verify. However, if
  \(z\) is non-zero, there is one more square root of \(z\), namely
  \(\sqrt{r} e^{i (θ/2 + π)}\), as you can also verify. (Recall
  that angles that differ by \(2π = 360^\circ\) are considered the
  same.)&lt;/p&gt;

&lt;p&gt;So in general, the square root of a rational expression, like our
  candidate solution, yields two distinct points as long as the
  rational expression is non-zero. In our case, \(b^2 - 4ac\) remains
  non-zero as \(r_1\) and \(r_2\) don&amp;rsquo;t coincide. (We&amp;rsquo;ll
  have more to say about this expression, called the &lt;em&gt;discriminant&lt;/em&gt;,
  once we talk about cubic equations below.) Therefore, if we want to
  examine how \(x_1\) and \(x_2\) move as \(r_1\) and \(r_2\) move, we
  have to number the square roots of \(b^2 - 4ac\), and we have to
  keep this numbering consistent.&lt;/p&gt;

&lt;p&gt;To do so, we have to do two things: we have to vary \(r_1\) and
  \(r_2\) only continuously, and we have to vary \(r_1\) and \(r_2\)
  such that they never coincide. If we do this, then we can
  intuitively &amp;ldquo;lift&amp;rdquo; the expression \(b^2 - 4ac\) from the
  complex plane to a new surface \(S\) where we consider only angles
  that differ by \(4π = 720^\circ\), rather than \(2π\), to be
  the same. In this space, we can take the &amp;ldquo;first&amp;rdquo; square
  root of a non-zero complex number to be the one with half the angle,
  and the &amp;ldquo;second&amp;rdquo; square root to be the one with half the
  angle plus \(π\), and have these two square root functions behave
  continuously as their argument goes around the origin.&lt;/p&gt;

&lt;figure&gt;
  &lt;img src=&quot;quintic-unsolvability-files/Riemann_sqrt.svg&quot;/&gt;
  &lt;figcaption&gt;
    &lt;span class=&quot;figure-text&quot;&gt;Figure 1&lt;/span&gt;&amp;ensp;\(S\), which is the
    &lt;a href=&quot;https://en.wikipedia.org/wiki/Riemann_surface&quot;&gt;Riemann surface&lt;/a&gt;
    of \(\sqrt{z}\). (Image by &lt;a href=&quot;https://en.wikipedia.org/wiki/File:Riemann_sqrt.svg&quot;&gt;Leonid 2&lt;/a&gt; licensed under &lt;a href=&quot;https://creativecommons.org/licenses/by-sa/3.0/deed.en&quot;&gt;CC BY-SA 3.0&lt;/a&gt;.)
  &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Now this answers the question of why the proof of Theorem&amp;nbsp;1
  fails for \(\sqrt{b^2 - 4ac}\). \(a\), \(b\), and \(c\), go around a
  single loop as \(r_1\) is swapped with \(r_2\), and therefore \(b^2
  - 4ac\) goes around a single loop in the complex plane, but when
  \(b^2 - 4ac\) is lifted to \(S\), the final position of \(b^2 -
  4ac\) differs from the initial position only by an angle of
  \(2π\), so it is &lt;em&gt;distinct&lt;/em&gt; from the initial position, and
  thus we can&amp;rsquo;t conclude that the final position of \(\sqrt{b^2
  - 4ac}\) is the same as the initial position.&lt;/p&gt;

&lt;p&gt;Similar reasoning holds for any algebraic expression that
isn&amp;rsquo;t a rational expression, i.e. ones that involve taking any
integer root, so Theorem&amp;nbsp;1 cannot apply to algebraic expressions
in general. Of course, this is consistent with what we know about the
quadratic formula, since we know that it has a square root!&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;3. Cubic Equations&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now we can move on to cubic equations. Similarly, given three
complex numbers \(r_1\), \(r_2\), and \(r_3\), you can determine the
cubic equation with those solutions, namely

\[
  (x - r_1) (x - r_2) (x - r_3) = x^3 - (r_1 + r_2 + r_3) x^2 + (r_1 r_2 + r_1 r_3 + r_2 r_3) x - r_1 r_2 r_3\text{,}
\]

and so we can define a function from \(r_1\), \(r_2\), and \(r_3\) to
\(a\), \(b\), \(c\), and \(d\), where

\[
  a x^3 + b x^2 + c x + d
\]

is the standard form of a cubic polynomial, and this is shown in the
visualization below.&lt;/p&gt;

&lt;p&gt;In the previous section, we talked about the discriminant \(b^2 -
  4ac\) of the general quadratic polynomial. However, the discriminant
  is an expression that is defined for &lt;em&gt;any&lt;/em&gt; polynomial. If
  \(r_1, \dotsc, r_n\) are the roots of a polynomial (counting multiplicity)
  with leading coefficient \(a_n\), then the
  &lt;a href=&quot;https://en.wikipedia.org/wiki/Discriminant&quot;&gt;discriminant&lt;/a&gt; is
  \[
  Δ = a_n^{2n - 2} ∏_{i \lt j} (r_i - r_j)^2\text{.}
  \]
  In other words, the discriminant is, up to sign and a power of the
  leading coefficient, the product of the differences of all pairs of
  different roots. In particular, if the polynomial has repeated roots,
  the discriminant is zero.&lt;/p&gt;

&lt;p&gt;Using the formula above, you can express the discriminant in terms
  of the coefficients of the polynomial, as you can verify for
  yourself with the quadratic equation. Indeed this is true in
  general; for cubic polynomials, the discriminant can be expressed in
  terms of the coefficients as
  \[
  Δ = b^2 c^2 - 4 a c^3 - 4 b^3 d - 27 a^2 d^2 + 18 a b c d\text{.}
  \]
  But why do we care? Because, as you can see in the visualization below, if
  you swap any pair of roots, this causes the discriminant to make a single
  loop around the origin, so it serves as a useful test functions for
  taking roots.&lt;/p&gt;

&lt;p&gt;So now that we have three roots, we can swap them in multiple
  ways. If \(R\) is a list that starts off as \(\langle r_1, r_2, r_3
  \rangle\), let \(↺_{i, j}\) denote counter-clockwise
  paths that takes the root at the \(i\)th index of \(R\) to the one
  at the \(j\)th index of \(R\) and vice versa, and similarly for
  \(↻_{i, j}\). (Note that this is not the same as the
  paths that swap \(r_i\) and \(r_j\)!  Play around with the buttons
  in the visualization below to understand the difference.)&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 3: The cubic discriminant&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardCubic1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;rootListCubic1&quot;&gt;
      \(R = \langle r_1, r_2, r_3 \rangle\)
    &lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic1DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic1.runOp(cubic1.opA, 1000);&quot;&gt;
      \(↺_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button cubic1DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic1.runOp(cubic1.opB, 1000);&quot;&gt;
      \(↺_{2, 3}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic1DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic1.runOp(cubic1.opA.invert(), 1000);&quot;&gt;
      \(↻_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button cubic1DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic1.runOp(cubic1.opB.invert(), 1000);&quot;&gt;
      \(↻_{2, 3}\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardCubic1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardCubic1&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;label&gt;
      &lt;input class=&quot;cubic1DisableWhileRunningOp&quot; name=&quot;cubic1Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic1.switchFormula(cubicDiscFormula);&quot; /&gt;
      \(x_1 = Δ\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;cubic1DisableWhileRunningOp&quot; name=&quot;cubic1Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic1.switchFormula(cubicDiscFormula.root(5));&quot; /&gt;
      \(x_{1, 2, 3, 4, 5} = \sqrt[5]{Δ}\)
    &lt;/label&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

function updateRootList(display, rootListID) {
  var rootPermutation = display.getRootPermutation();
  var rootList = document.getElementById(rootListID);
  var TeXOutput = &apos;R = \\langle &apos; + rootPermutation.map(function(i) {
    return &apos;r_{&apos; + (i+1) + &apos;}&apos;;
  }).join(&apos;, &apos;) + &apos; \\rangle&apos;;
  katex.render(TeXOutput, rootList);
}

function updateResultList(display, resultListID) {
  var resultPermutation = display.getResultPermutation();
  var resultList = document.getElementById(resultListID);
  var TeXOutput = &apos;X = \\langle &apos; + resultPermutation.map(function(i) {
    return &apos;x_{&apos; + (i+1) + &apos;}&apos;;
  }).join(&apos;, &apos;) + &apos; \\rangle&apos;;
  katex.render(TeXOutput, resultList);
}

var cubicDiscFormula = cubicScaledDiscFormula.div(
  ComplexFormula.select(-1).pow(2).times(-27));

var cubic1 = (function() {
  var initialRoots = [
    new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1)
  ];

  var display = new Display(
    &quot;rootBoardCubic1&quot;, &quot;coeffBoardCubic1&quot;, &quot;formulaBoardCubic1&quot;, initialRoots,
    cubicDiscFormula.root(5), function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  function updateRootListCubic(display) {
    updateRootList(display, &quot;rootListCubic1&quot;);
  }

  var opA = display.swapRootOp(0, 1, updateRootListCubic);
  var opB = display.swapRootOp(1, 2, updateRootListCubic);

  var state = {}

  function runCubicOp(op, time) {
    runOp(display, op, time, &apos;.cubic1DisableWhileRunningOp&apos;, state);
  };

  function switchCubicFormula(formula) {
    switchFormula(display, state, formula);
    updateRootAndResultList(display);
  }

  return {
    display: display,
    opA: opA,
    opB: opB,
    runOp: runCubicOp,
    cubicDiscFormula: cubicDiscFormula,
    switchFormula: switchCubicFormula
  };
})();
&lt;/script&gt;

&lt;p&gt;Now, with the formula \(Δ\), the same reasoning as in the
  previous section shows that it cannot possibly be the cubic formula,
  nor can any other rational expression. However, unlike the quadratic
  case, we can also rule out \(\sqrt[5]{Δ}\), or any other
  algebraic formula with no nested radicals (i.e., that doesn&amp;rsquo;t
  have a radical within a radical like \(\sqrt{a - \sqrt{bc - 5}}\)).
  If you apply the operations \(↺_{2, 3}\),
  \(↺_{1, 2}\), \(↻_{2, 3}\), and
  \(↻_{1, 2}\) in sequence, \(r_1\), \(r_2\), and
  \(r_3\) rotate among themselves, but all the \(x_i\) go back to
  their original positions. Therefore, by similar reasoning as the
  previous section, \(\sqrt[5]{Δ}\) also cannot possibly be the
  cubic formula!&lt;/p&gt;

&lt;p&gt;To make this statement precise, we need to review some group
  theory.  Recall that a
  &lt;a href=&quot;https://en.wikipedia.org/wiki/Group_(mathematics)&quot;&gt;group&lt;/a&gt;
  is a set with an associative binary operation, an identity element,
  and inverse elements. Most basic examples of groups are related to
  numbers, like the integers under addition, or the non-zero rationals
  under multiplication. However, more interesting examples of groups
  are related to &lt;em&gt;functions&lt;/em&gt;, none the least because the group
  operation for functions is &lt;em&gt;composition&lt;/em&gt;, which is in general
  not commutative; in other words, if \(f\) and \(g\) are functions,
  \(f \circ g \ne g \circ f\), and it is this non-commutativity that
  will come in handy for our purposes.&lt;/p&gt;

&lt;p&gt;So let&amp;rsquo;s say we have a list of \(n\) objects, and we&amp;rsquo;re
  interested in the functions that rearrange this list&amp;rsquo;s
  elements. These are &lt;a href=&quot;https://en.wikipedia.org/wiki/Permutation&quot;&gt;permutations&lt;/a&gt;,
  and they naturally form a group under composition, as you can check
  for yourself, called \(S_n\), the &lt;a href=&quot;https://en.wikipedia.org/wiki/Symmetric_group&quot;&gt;symmetric group&lt;/a&gt; on
  \(n\) objects.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a convenient way to write permutations, called &lt;a href=&quot;https://en.wikipedia.org/wiki/Permutation#Cycle_notation&quot;&gt;cycle notation&lt;/a&gt;. If
you write \((i_1 \; i_2 \; \dotsc \; i_k)\), this denotes the
permutation that maps the \(i_1\)th position of the list to the
\(i_2\)th position the \(i_2\)th position to the \(i_3\)th, and so on,
  called a &lt;em&gt;cycle&lt;/em&gt;. Then you can write &lt;em&gt;any&lt;/em&gt; permutation
  as a composition of disjoint cycles, so this provides a convenient
  way to write down and compute with permutations.&lt;/p&gt;

  &lt;p&gt;In the visualization above, we have four operations
    \(↺_{1, 2}\), \(↺_{2, 3}\),
    \(↻_{1, 2}\), and \(↻_{2, 3}\),
    which &lt;em&gt;act on \(R\)&lt;/em&gt;, meaning that they define permutations
    on \(R\). In particular, \(↺_{1, 2}\) and
    \(↻_{1, 2}\) both swap the first and second
    elements of \(R\), so we say that \(↺_{1, 2}\) and
    \(↻_{1, 2}\) act on \(R\) as \((1 \; 2)\), and
    similarly, \(↺_{2, 3}\) and \(↻_{2,
    3}\) act on \(R\) as \((2 \; 3)\).&lt;/p&gt;

  &lt;p&gt;Now concatenating two operations&amp;mdash;doing one after the
  other&amp;mdash;corresponds to composing their mapped-to permutations on
  \(R\). Denoting \(o_2 * o_1\) as doing \(o_1\), then doing \(o_2\),
  the sequence of operations above is \(↻_{1, 2} *
  ↻_{2, 3} * ↺_{1, 2} *
  ↺_{2, 3}\) (note the order!), which acts on \(R\)
  like \((1 \; 2) (2 \; 3) (1 \; 2) (2 \; 3)\), which is equal to \((1
  \; 3 \; 2)\).&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; (The
  \(\circ\) is usually dropped when composing permutations.)&lt;/p&gt;

  &lt;p&gt;Now for the formula \(Δ\), all the operations make \(x_1\)
  loop around the origin either clockwise or counter-clockwise; in
  other words, they all induce a rotation of \(2π\) or \(-2π\) on
  \(x_1\), and the final distance of \(x_1\) from the origin is the
  same as the initial distance. Therefore, if we apply an equal number
  of clockwise and counter-clockwise rotations, the total angle of
  rotation will be \(0\) and the final distance will be the same as
  the initial distance, i.e. the final position of \(x_1\) is the same
  as it&amp;rsquo;s initial distance. But the same reasoning holds for the
  formula \(\sqrt[5]{Δ}\); all the operations induce a rotation
  of \(2π/5\) or \(-2π/5\) and leave the distance from the origin
  unchanged, so an equal number of clockwise and counter-clockwise
  rotations will still induce a total angle of \(0\) and leave the
  distance from the origin unchanged. Therefore, the operation
  \(↻_{1, 2} * ↻_{2, 3} *
  ↺_{1, 2} * ↺_{2, 3}\) acts like \((1
  \; 3\; 2)\) on \(R\), but leaves all \(x_i\) unchanged.&lt;/p&gt;

  &lt;p&gt;But how did we come up with \(↻_{1, 2} *
    ↻_{2, 3} * ↺_{1, 2} *
    ↺_{2, 3}\) in the first place? This involves a bit
    more group theory. \(S_3\) is &lt;em&gt;not&lt;/em&gt; a &lt;a href=&quot;https://en.wikipedia.org/wiki/Abelian_group&quot;&gt;commutative
  group&lt;/a&gt;; in particular, \((1 \; 2) (2 \; 3) \ne (2 \; 3) (1 \;
  2)\). For two group elements \(g\) and \(h\), we can define
  their
  &lt;a href=&quot;https://en.wikipedia.org/wiki/Commutator&quot;&gt;commutator&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;
  \([ g, h ]\), which is the group element that corrects for
  \(g\) and \(h\) not commutating. That is, we want the equation
  \[
  g h = h g [g, h]
  \]
  to hold, which means that
  \[
  [g, h] = g^{-1} h^{-1} g h\text{.}
  \]
  So the commutator provides a convenient way to generate a non-trivial
  permutation from two other non-commuting permutations. Furthermore, it
  involves two appearances of both elements, so we can pick a sequence of
  operations that induce the commutator and also have an equal number of
  clockwise and counter-clockwise operations. Then we&amp;rsquo;re guaranteed
  that this sequence of operations permutes \(R\) and leaves all \(x_i\)
  unchanged, even if each individual operation moves some \(x_i\). But of
  course, this is just \(↻_{1, 2} * ↻_{2, 3} *
  ↺_{1, 2} * ↺_{2, 3}\)!&lt;/p&gt;

  &lt;p&gt;Let&amp;rsquo;s define some terminology to make proofs and discussion
  easier. If \(o\) is an operation that acts on \(R\) non-trivially
  but has the final position of the expression \(x = f(a, b, c,
  \dotsc)\) the same as its initial position, we say that \(o\) &lt;em&gt;rules out&lt;/em&gt; the
  expression \(x = f(a, b, c, \dotsc)\). For example, Theorem&amp;nbsp;1
  says that swapping both roots of a quadratic rules out all rational
  expressions.&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;Now we&amp;rsquo;re ready to state and prove the theorem:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 2&lt;/span&gt;.) An
  algebraic expression with no nested radicals in the coefficients of
  the general cubic equation
  \[
  ax^3 + bx^2 + cx + d = 0
  \]
  cannot be a solution to this equation.&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Sketch of proof.&lt;/span&gt; First assume to
    the contrary that the expression \(x = \sqrt[k]{r(a, b, c, d)}\) is
    a solution, where \(r(a, b, c, d)\) is a rational
    expression. Assume we start with \(r_1 = z_1\), \(r_2 = z_2\), and
    \(r_3 = z_3\), where all \(z_i\) are distinct, and without loss of
    generality assume that we start with \(x = z_1\).&lt;/p&gt;

  &lt;p&gt;Any of the operations \(↺_{1, 2}\),
    \(↺_{2, 3}\), \(↻_{1, 2}\), and
    \(↻_{2, 3}\) applied to \(x = r(a, b, c, d)\)
    cause \(x\)&amp;rsquo;s final position to be the same as its initial
    position, by Theorem&amp;nbsp;1. Pick a point \(z_0\) that is never
    equal to any point \(x\) traverses under any operation. Then, by
    the same reasoning as above, the total angle induced by
    \(↻_{1, 2} * ↻_{2, 3} *
    ↺_{1, 2} * ↺_{2, 3}\) on \(x =
    \sqrt[k]{r(a, b, c, d)}\) around \(z_0\) is \(0\), and the
    distance from \(z_0\) remains unchanged.  Thus \(x\) remains
    fixed, and this operation rules out \(x = \sqrt[k]{r(a, b, c,
    d)}\).&lt;/p&gt;

&lt;p&gt;For the general case, it suffices to show that if \(o\) rules out
  the expressions \(f\) and \(g\), then \(o\) also rules out \(f\)
  raised to an integer power, \(f + g\text{,}\) \(f - g\text{,}\) \(f
  \cdot g\text{,}\) and \(f / g\) where \(g \ne 0\text{.}\) But this
  is straightforward, and such formulas are just the algebraic
  expressions with no nested radicals, so the statement holds in
  general. &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;

&lt;p&gt;Theorem&amp;nbsp;2 can be summarized thus: any \(↺_{i,
  j}\) or \(↻_{i, j}\) rules out any rational
  expression as the cubic formula, and if given an algebraic
  expression with no nested radicals, either some
  \(↺_{i, j}\) or \(↻_{i, j}\) rules it
  out, or \(↻_{1, 2} * ↻_{2, 3} *
  ↺_{1, 2} * ↺_{2, 3}\) rules it out.&lt;/p&gt;

&lt;p&gt;Now we can consider algebraic expressions with one level of
nesting. Can such formulas be ruled out as being the cubic formula?
We can&amp;rsquo;t do so via Theorem&amp;nbsp;2, at least; we would need a
non-trivial element of \(S_3\) that is the commutator of
commutators. But you can calculate that all non-trivial commutators of
\(S_3\) are either \((3 \; 2 \; 1)\) or \((1 \; 2\; 3)\), and these
two elements commute, so \(S_3\) cannot have a non-trivial commutator
of commutators.&lt;/p&gt;

&lt;p&gt;In fact, as we would expect, the actual &lt;a href=&quot;https://en.wikipedia.org/wiki/Cubic_function#General_formula&quot;&gt;cubic formula&lt;/a&gt;
has such an algebraic expression, which is \(C\) in the visualization
below, so that serves as a convenient example of an algebraic
expression with a single nested radical that can&amp;rsquo;t be ruled out
by Theorem&amp;nbsp;2.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 4: The cubic equation&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardCubic2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;rootListCubic2&quot;&gt;
      \(R = \langle r_1, r_2, r_3 \rangle\)
    &lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opA, 1000);&quot;&gt;
      \(↺_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opB, 1000);&quot;&gt;
      \(↺_{2, 3}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opA.invert(), 1000);&quot;&gt;
      \(↻_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opB.invert(), 1000);&quot;&gt;
      \(↻_{2, 3}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opComAB, 4000);&quot;&gt;
      \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button cubic2DisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;cubic2.runOp(cubic2.opComAB.invert(), 4000);&quot;&gt;
      \(↺_{1, 2} * ↺_{2, 3} * ↻_{1, 2} * ↻_{2, 3}\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardCubic2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardCubic2&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;resultListCubic2&quot;&gt;
      \(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
    &lt;/span&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;cubic2DisableWhileRunningOp&quot; name=&quot;cubic2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic2.switchFormula(cubicScaledDiscFormula);&quot; /&gt;
      \(x_1 = -27a^2 Δ = {Δ_1}^2 - 4 {Δ_0}^3\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;cubic2DisableWhileRunningOp&quot; name=&quot;cubic2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic2.switchFormula(newCubicCCubedFormula());&quot; /&gt;
      \(x_{1, 2} = C^3 = \frac{Δ_1 + \sqrt{-27a^2 Δ}}{2}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;cubic2DisableWhileRunningOp&quot; name=&quot;cubic2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic2.switchFormula(newCubicCCubedFormula().root(3));&quot; /&gt;
      \(x_{1,2,3,4,5,6} = C\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;cubic2DisableWhileRunningOp&quot; name=&quot;cubic2Formula&quot; type=&quot;radio&quot;
             onchange=&quot;cubic2.switchFormula(newCubicFormula());&quot; /&gt;
      \(x_{1, 2, 3} = -\frac{1}{3a} \left( b + C + \frac{Δ_0}{C} \right)\)
      &lt;br /&gt;
      (the cubic formula)
    &lt;/label&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

var cubic2 = (function() {
  var initialRoots = [
    new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1)
  ];

  var display = new Display(
    &quot;rootBoardCubic2&quot;, &quot;coeffBoardCubic2&quot;, &quot;formulaBoardCubic2&quot;, initialRoots,
    newCubicCCubedFormula().root(3), function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  function updateRootAndResultList(display) {
    updateRootList(display, &quot;rootListCubic2&quot;);
    updateResultList(display, &quot;resultListCubic2&quot;);
  }

  var opA = display.swapRootOp(0, 1, updateRootAndResultList);
  var opB = display.swapRootOp(1, 2, updateRootAndResultList);
  var opComAB = newCommutatorAnimation(opA, opB);

  var state = {}

  function runCubicOp(op, time) {
    runOp(display, op, time, &apos;.cubic2DisableWhileRunningOp&apos;, state);
  };

  function switchCubicFormula(formula) {
    switchFormula(display, state, formula);
    updateRootAndResultList(display);
  }

  return {
    display: display,
    opA: opA,
    opB: opB,
    opComAB: opComAB,
    runOp: runCubicOp,
    cubicDiscFormula: cubicDiscFormula,
    switchFormula: switchCubicFormula
  };
})();
&lt;/script&gt;

&lt;p&gt;Note that there is a new list \(X\), which lists the \(x_i\) in the
order which they occupy their initial positions, like how \(R\) does
the same for the \(r_i\). In general, we can&amp;rsquo;t do this, since a
general multi-valued function won&amp;rsquo;t necessarily permute that
\(x_i\) among themselves, but in the interactive visualizations
we&amp;rsquo;ll only consider expressions that do.&lt;/p&gt;

&lt;p&gt;We can then talk how an operation acts on \(X\). For example, if we
  pick \(\sqrt[5]{Δ}\) in Interactive&amp;nbsp;Example&amp;nbsp;3, we can
  say that \(↺_{i, j}\) acts like \((5 \; 1 \; 2 \; 3
  \; 4)\) on \(X\) and \(↻_{i, j}\) acts like \((1 \; 2 \; 3 \; 4 \;
  5)\) on \(X\). Therefore, \(↻_{1, 2} *
  ↻_{2, 3} * ↺_{1, 2} *
  ↺_{2, 3}\) acts non-trivially on \(R\) but acts
  trivially on \(X\), which is another more algebraic way of saying
  that if this operation rules out \(\sqrt[5]{Δ}\), since the
  action on \(X\) depends on the candidate formula. On the other hand,
  if you choose \(C\) in the visualization above, you can convince
  yourself that no operation acts non-trivially on \(R\) without also
  acting non-trivially on \(X\), and so \(C\) can&amp;rsquo;t be ruled out
  as the cubic formula.&lt;/p&gt;
&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;4. Quartic Equations&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now we can move on to quartic equations. As usual, given four
complex numbers \(r_1\), \(r_2\), \(r_3\), and \(r_4\), you can map
this to the coefficients \(a\), \(b\), \(c\), \(d\), and \(e\) of the
standard form of a quartic polynomial, as shown in the visualization
below, such that the \(r_i\) are the solutions to the quartic
equation

\[
  a x^4 + b x^3 + c x^2 + d x + e = 0\text{.}
\]

&lt;p&gt;Now that we have four roots, we have even more ways to permute them
using the \(↺_{i, j}\) and \(↻_{i,
j}\). Before we move on, we need more terminology and group theory to
handle this more complicated case.&lt;/p&gt;

&lt;p&gt;First, we want a convenient way to denote the combination of operations
that act like a commutator, so let&amp;rsquo;s define
\(↺_{i, j}^\prime\) to mean \(↻_{i,
j}\) and vice versa, \((o_1 \circ o_2 \circ \dotsb \circ o_n)^\prime\)
to mean \(o_n^\prime \circ o_{n-1}^\prime \circ \dotsb \circ
o_1^\prime\), and \([\![ o_1, o_2 ]\!]\) to mean \(o_1^\prime \circ
o_2^\prime \circ o_1 \circ o_2\), so that if \(o_i\) acts on \(R\)
like \(g_i\), then \(o_i^\prime\) acts on \(R\) like \(g_i^{-1}\) and
\([\![o_i, o_j]\!]\) acts on \(R\) like \([g_i, g_j]\). For example,
in the previous section, we were using \([\![ ↺_{1, 2},
↺_{2, 3} ]\!]\) to rule out algebraic expressions with
no nested radicals.&lt;/p&gt;

&lt;p&gt;Then not only do we want to talk about commutators of particular
  permutations, we want to talk about the set of commutators
  of a particular group. In fact, for a group \(G\), this set of
  commutators forms a subgroup \(K(G)\) called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Commutator_subgroup&quot;&gt;commutator subgroup&lt;/a&gt;. For
  the quadratic case, we just have \(S_2\), which has only a single
  non-trivial element, so its commutator subgroup \(K(S_2)\) is the
  trivial group. For the cubic case, we started with \(S_3\), and we
  computed the commutator subgroup \(K(S_3)\), which is just \(\{ e,
  (1 \; 2 \; 3), (3 \; 2 \; 1) \}\). We can also compute the
  commutator of &lt;em&gt;this&lt;/em&gt; group, which is just the trivial group
  again, since \(K(S_3)\) is commutative. So we can see that
  \(K(K(S_3))\) being the trivial group means that we can&amp;rsquo;t rule
  out algebraic expressions with nested radicals as solutions to the
  general cubic equation.&lt;/p&gt;

&lt;p&gt;Given all the elements of a group \(G\), it&amp;rsquo;s not
particularly complicated to compute the commutator subgroup&amp;mdash;just
take all possible pairs of elements \(g, h \in G\), compute \([g,
h]\), and remove duplicates. However, we can make things easier for
ourselves by finding generators for \(K(G)\) as commutators of
generators of \(G\), since then we can easily map those back to \([\![
o_1, o_2 ]\!]\) applied on the appropriate operations. Fortunately,
when \(G = S_n\), we can use a few facts from group theory to easily
compute \(K(S_n)\). First, \(K(S_n)\) is called the &lt;a href=&quot;https://en.wikipedia.org/wiki/Alternating_group&quot;&gt;alternating group&lt;/a&gt; \(S_n\),
  and is generated by the \(3\)-cycles of the form \((i \enspace i+1
  \enspace i+2)\), similar to how \(S_n\) is generated by the
  \(2\)-cycles of the form \((i \enspace i + 1)\). But a \(3\)-cycle
  \((i \enspace i+1 \enspace i+2)\) can be expressed as the commutator
  of two \(2\)-cycles \([(i+2 \enspace i+1), (i \enspace
  i+1)]\).&lt;/p&gt;

&lt;p&gt;Therefore, for \(S_4\), the generators for \(K(S_4)\) are just \((1
  \; 2 \; 3) = [(2 \; 3), (1 \; 2)]\) and \((2 \; 3 \; 4) = [(3 \; 4),
  (2 \; 3)]\), with respective operations \([\![ ↺_{2,
  3}, ↺_{1, 2} ]\!]\) and \([\![ ↺_{3,
  4}, ↺_{2, 3} ]\!]\). However, these two generators
  are not quite enough to generate \(K^{(2)}(S_4)\) via
  commutators. Fortunately, it suffices to just add
  \(↺_{4, 1}\) to the list of operations, which lets us
  add \((1 \; 4)\) to the list of generators for \(S_4\), and then add
  \((3 \; 4 \; 1)\) to the list of generators for \(K(S_4)\). Then
  \((1 \; 4) (2 \; 3) = [(2 \; 3 \; 4), (1 \; 2 \; 3)]\) and \((2 \;
  1) (3 \; 4) = [(3 \; 4 \; 1), (2 \; 3 \; 4)]\) suffice to generate
  \(K^{(2)}(S_4)\).&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; Finally,
  we can easily compute \(K^{(3)}(S_4)\) to be the trivial group.&lt;/p&gt;

&lt;p&gt;What does that tell us about what expressions we can rule out as
solutions to the general quartic equation? Similarly to the cubic
case, we expect to be able to rule out rational expressions and
algebraic expressions with no nested radicals, and since
\(K^{(2)}(S_4)\) is not the trivial group, we also expect to be able
to rule out algebraic expressions with singly-nested radicals, like
\(\sqrt{a - \sqrt{bc - 4}}\). But since \(K^{(3)}(S_4)\) is the
trivial group, we don&amp;rsquo;t expect to be able to rule out algebraic
expressions with doubly-nested radicals, like \(\sqrt{a - \sqrt{bc -
\sqrt{d + 3}}}\).&lt;/p&gt;

&lt;p&gt;As an antidote to all the abstractness above, here is a
  visualization for quartics, where you can examine how the various
  operations interact with the &lt;a href=&quot;https://en.wikipedia.org/wiki/Quartic_function#General_formula_for_roots&quot;&gt;quartic formula&lt;/a&gt;
  and its subexpressions.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 5: The quartic equation&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardQuartic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;rootListQuartic&quot;&gt;
      \(R = \langle r_1, r_2, r_3, r_4 \rangle\)
    &lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.resetRootAndResultList();&quot;&gt;
      Reset \(R\) and \(X\) order
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA1, 1000);&quot;&gt;
      \(A_1 = ↺_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA2, 1000);&quot;&gt;
      \(A_2 = ↺_{2, 3}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA3, 1000);&quot;&gt;
      \(A_3 = ↺_{3, 4}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA4, 1000);&quot;&gt;
      \(A_4 = ↺_{4, 1}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA1.invert(), 1000);&quot;&gt;
      \(A_1^\prime = ↻_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA2.invert(), 1000);&quot;&gt;
      \(A_2^\prime = ↻_{2, 3}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA3.invert(), 1000);&quot;&gt;
      \(A_3^\prime = ↻_{3, 4}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opA4.invert(), 1000);&quot;&gt;
      \(A_4^\prime = ↻_{4, 1}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB1, 4000);&quot;&gt;
      \(B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
 type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB2, 4000);&quot;&gt;
      \(B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB3, 4000);&quot;&gt;
      \(B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 1)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB1.invert(), 4000);&quot;&gt;
      \(B_1^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB2.invert(), 4000);&quot;&gt;
      \(B_2^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opB3.invert(), 4000);&quot;&gt;
      \(B_3^\prime\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opC1, 16000);&quot;&gt;
      \(C_1 = [\![ B_2, B_1 ]\!] \mapsto (1 \; 4) (2 \; 3)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opC2, 16000);&quot;&gt;
      \(C_2 = [\![ B_3, B_2 ]\!] \mapsto (2 \; 1) (3 \; 4)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opC1.invert(), 16000);&quot;&gt;
      \(C_1^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.runOp(quartic.opC2.invert(), 16000);&quot;&gt;
      \(C_2^\prime\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardQuartic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardQuartic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;resultListQuartic&quot;&gt;
      \(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
    &lt;/span&gt;

    &lt;span id=&quot;resultNoteQuartic&quot;&gt;&lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quarticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quartic.findFirstOpRulingOutSelectedFormula();&quot;&gt;
      Find first operation that rules out selected formula
    &lt;/button&gt;

    &lt;span id=&quot;findFirstOpStatusQuartic&quot;&gt;&lt;/span&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quarticDisableWhileRunningOp&quot; name=&quot;formulaQuartic&quot; type=&quot;radio&quot;
             onchange=&quot;quartic.switchFormula(quarticScaledDiscFormula);&quot; /&gt;
      \(x_1 = -27 Δ\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quarticDisableWhileRunningOp&quot; name=&quot;formulaQuartic&quot; type=&quot;radio&quot;
             onchange=&quot;quartic.switchFormula(newQuarticQCubedFormula());&quot; /&gt;
      \(x_{1, 2} = Q^3 = \frac{Δ_1 + \sqrt{-27 Δ}}{2}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;quarticDisableWhileRunningOp&quot; name=&quot;formulaQuartic&quot; type=&quot;radio&quot;
             onchange=&quot;quartic.switchFormula(newQuarticQCubedFormula().root(3));&quot; /&gt;
      \(x_{1, 2, 3, 4, 5, 6} = Q\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quarticDisableWhileRunningOp&quot; name=&quot;formulaQuartic&quot; type=&quot;radio&quot;
             onchange=&quot;quartic.switchFormula(newQuarticSFormula());&quot; /&gt;
      \(x_{1, 2, 3, 4, 5, 6} = S =\)
      &lt;br /&gt;
      \(\qquad \frac{1}{2} \sqrt{-\frac{2}{3} p + \frac{1}{3a} \left( Q + \frac{Δ_0}{Q} \right)}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;quarticDisableWhileRunningOp&quot; name=&quot;formulaQuartic&quot; type=&quot;radio&quot;
             onchange=&quot;quartic.switchFormula(newQuarticFormula());&quot; /&gt;
      \(x_{1, 2, 3, 4} = \)
      &lt;br /&gt;
      \(\qquad -\frac{b}{4a} \mp S + \frac{1}{2} \sqrt{-4S^2 - 2p \pm \frac{q}{S}}\)
      &lt;br /&gt;
      (the quartic formula)
    &lt;/label&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

function isIdentityPermutation(permutation) {
  for (var i = 0; i &lt; permutation.length; ++i) {
    if (permutation[i] != i) {
      return false;
    }
  }
  return true;
}

function updateResultNote(display, resultNoteID, formulaName) {
  var rootPermutation = display.getRootPermutation();
  var resultPermutation = display.getResultPermutation();
  var resultNote = document.getElementById(resultNoteID);
  if (isIdentityPermutation(rootPermutation) ==
    isIdentityPermutation(resultPermutation)) {
      resultNote.innerHTML = &apos;&apos;;
  } else {
    resultNote.innerHTML = &apos;(Applied operation rules out selected formula as the &apos; + formulaName + &apos; formula.)&apos;;
  }
}

function checkOpRulesOutFormula(
  display, resetFn, runOpFn, op, time, undoCallback, doneCallback) {
  resetFn();
  runOpFn(op, time, function() {
    var rootPermutation = display.getRootPermutation();
    var resultPermutation = display.getResultPermutation();
    var rulesOut = (isIdentityPermutation(rootPermutation) !=
      isIdentityPermutation(resultPermutation));
    undoCallback();
    runOpFn(op.invert(), time, function() {
      doneCallback(rulesOut);
    });
  });
}

function findFirstOpRulingOutSelectedFormulaHelper(
  display, resetFn, runOpFn, opInfos, statusCallback, doneCallback) {
    var i = 0;
    var undoCallback = function() {
      statusCallback(opInfos[i], true);
    }
    var _doneCallback = function(rulesOut) {
      if (rulesOut) {
        doneCallback(opInfos[i]);
        return;
      }
     ++i;
      if (i &gt;= opInfos.length) {
        doneCallback(null);
        return;
      }
      statusCallback(opInfos[i], false);
      checkOpRulesOutFormula(
        display, resetFn, runOpFn,
        opInfos[i].op, opInfos[i].time, undoCallback, _doneCallback);
    };
    statusCallback(opInfos[0]);
    checkOpRulesOutFormula(
      display, resetFn, runOpFn,
      opInfos[0].op, opInfos[0].time, undoCallback, _doneCallback);
}

function findFirstOpRulingOutSelectedFormula(
  display, resetFn, runOpFn, opInfos, statusID) {
  var status = document.getElementById(statusID);
  var statusCallback = function(opInfo, isUndo) {
    if (isUndo) {
      status.innerHTML = &apos;Undoing &apos; + opInfo.name + &apos;...&apos;;
    } else {
      status.innerHTML = &apos;Trying &apos; + opInfo.name + &apos;...&apos;;
    }
  };
  var doneCallback = function(opInfo) {
    if (opInfo === null) {
      status.innerHTML = &apos;No op ruling out selected formula found&apos;;
    } else {
      status.innerHTML = opInfo.name + &apos; rules out selected formula&apos;;
    }
  };
  findFirstOpRulingOutSelectedFormulaHelper(
    display, resetFn, runOpFn, opInfos, statusCallback, doneCallback);
}

var quartic = (function() {
  var initialRoots = [
    new Complex(0, 1), new Complex(-0.5, -0.5),
    new Complex(0.5, 0.5), new Complex(0.5, -0.5)
  ];

  var display = new Display(
    &quot;rootBoardQuartic&quot;, &quot;coeffBoardQuartic&quot;, &quot;formulaBoardQuartic&quot;,
    initialRoots, newQuarticQCubedFormula().root(3), function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  function updateRootAndResultList(display) {
    updateRootList(display, &quot;rootListQuartic&quot;);
    updateResultList(display, &quot;resultListQuartic&quot;);
    updateResultNote(display, &quot;resultNoteQuartic&quot;, &quot;quartic&quot;);
  }

  var state = {};

  function runQuarticOp(op, time, doneCallback) {
    runOp(display, op, time, &apos;.quarticDisableWhileRunningOp&apos;, state, doneCallback);
  };

  function switchQuarticFormula(formula) {
    switchFormula(display, state, formula);
    updateRootAndResultList(display);
  }

  function resetRootAndResultList() {
    display.reorderPointsBySubscript();
    display.resetResultRotationCounters();
    updateRootAndResultList(display);
  }

  var opA1 = display.swapRootOp(0, 1, updateRootAndResultList);
  var opA2 = display.swapRootOp(1, 2, updateRootAndResultList);
  var opA3 = display.swapRootOp(2, 3, updateRootAndResultList);
  var opA4 = display.swapRootOp(3, 0, updateRootAndResultList);
  var opB1 = newCommutatorAnimation(opA2, opA1);
  var opB2 = newCommutatorAnimation(opA3, opA2);
  var opB3 = newCommutatorAnimation(opA4, opA3);
  var opC1 = newCommutatorAnimation(opB2, opB1);
  var opC2 = newCommutatorAnimation(opB3, opB2);

  var opInfos = [
    {
      name: &apos;A&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opA1,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opA2,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;3&lt;/sub&gt;&apos;,
      op: opA3,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;4&lt;/sub&gt;&apos;,
      op: opA4,
      time: 1000
    },
    {
      name: &apos;B&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opB1,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opB2,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;3&lt;/sub&gt;&apos;,
      op: opB3,
      time: 4000
    },
    {
      name: &apos;C&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opC1,
      time: 16000
    },
    {
      name: &apos;C&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opC2,
      time: 16000
    }
  ];

  function findFirstOpRulingOutSelectedFormulaQuartic() {
    findFirstOpRulingOutSelectedFormula(
      display, resetRootAndResultList, runQuarticOp, opInfos,
      &apos;findFirstOpStatusQuartic&apos;);
  }

  return {
    display: display,
    opA1: opA1,
    opA2: opA2,
    opA3: opA3,
    opA4: opA4,
    opB1: opB1,
    opB2: opB2,
    opB3: opB3,
    opC1: opC1,
    opC2: opC2,
    runOp: runQuarticOp,
    resetRootAndResultList: resetRootAndResultList,
    switchFormula: switchQuarticFormula,
    findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuartic
  };
})();
&lt;/script&gt;

&lt;p&gt;There are a few additions to the interactive display above. It now
  prints a message when it detects that the selected expression is
  ruled out as the quartic formula, which just looks at whether \(R\)
  is not in order and \(X\) is, and vice versa. There&amp;rsquo;s also a
  button to reset the ordering of \(R\) and \(X\).&lt;/p&gt;

&lt;p&gt;The second addition is that the operations have been organized to
make clear what commutator subgroup they&amp;rsquo;re in. The \(A_i\) map
to generators of \(S_4\). Then taking the commutators of adjacent
\(A_i\) give \(B_i\), which map to the generators of \(K(S_4)\), and
similarly for \(C_i\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The third addition is a button that finds the first operation that
rules out the selected formula, if any. It simply tries all the
\(A_i\)s, then all the \(B_i\)s, then all the \(C_i\)s, checking \(R\)
and \(X\) in between. The general algorithm, which assumes a fixed set
of roots \(r_1, \dotsc, r_n\text{,}\) takes an expression \(f(a_n, a_{n-1}, \dotsc)\)
where \(a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0\) is the general
\(n\)th-degree polynomial equation, takes a depth limit \(k\), and
looks like this (defining \(K^{(0)}(G)\) to be just \(G\)):

&lt;ol&gt;
  &lt;li&gt;For \(i\) from 0 to \(k\):
  &lt;ol&gt;
    &lt;li&gt;If \(K^{(i)}(S_n)\) is trivial, then terminate indicating that
      \(f(a_n, a_{n-1}, \dotsc)\) was unable to be ruled out because
      \(K^{(i)}(S_n)\) is trivial.&lt;/li&gt;
    &lt;li&gt;Otherwise, find operations \(o_1\) to \(o_m\) that act as the
      generators \(g_1\) to \(g_m\) of \(K^{(i)}(S_n)\). For \(i &gt;
      0\), this can be done by applying \([\![ o_1, o_2 ]\!]\) to the
      operations corresponding to the generators of
      \(K^{(i-1)}(S_n)\).&lt;/li&gt;
    &lt;li&gt;For each \(o_j\):
    &lt;ol&gt;
      &lt;li&gt;Apply \(o_j\).&lt;/li&gt;
      &lt;li&gt;If \(R\) is not in order but \(X\) is, terminate indicating
        that \(o_j\) rules out \(f(a_n, a_{n-1}, \dotsc)\).&lt;/li&gt;
      &lt;li&gt;Undo \(o_j\), i.e. apply \(o_j^\prime\) or reset to the
      initial state of \(r_1, \dotsc, r_n\).&lt;/li&gt;
    &lt;/ol&gt;&lt;/li&gt;
  &lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Terminate indicating that \(f(a_n, a_{n-1}, \dotsc)\) was unable to
  be ruled out because the depth limit has been reached.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;This algorithm basically just implements the proof of the following
  lemma, which generalizes the previous theorems, except that it tries
  to find the simplest operation that is a generator that rules out
  the given expression.&lt;/p&gt;

&lt;p&gt;Before we state the lemma, we need another definition: let the &lt;em&gt;radical level&lt;/em&gt; of an algebraic expression
  \(f(a_n, a_{n-1}, \dotsc)\) be \(0\) if \(f(a_n, a_{n-1}, \dotsc)\) is a
  rational expression, \(1\) if \(f(a_n, a_{n-1}, \dotsc)\) has only
  non-nested radicals, and \(n + 1\) if the maximum number of nested
  radicals is \(n\).&lt;/p&gt;

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 3&lt;/span&gt;.) If the
  algebraic expression \(f(a_n, a_{n-1}, \dotsc)\) has radical level
  \(d\) and \(K^{(d)}(S_n)\) is non-trivial, then any operator that
  maps to a non-trivial element \(g\) in \(K^{(d)}(S_n)\) rules out
  \(f(a_n, a_{n-1}, \dotsc)\) as the solution to the general
  \(n\)th-degree polynomial equation
  \[
  a_n x^n + a_{n+1} x^{n+1} + \dotsb + a_0 = 0\text{.}
  \]&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Rough sketch of proof.&lt;/span&gt; We just do
  induction on \(d\). For the base case \(d = 0\), if \(K^{(0)}(S_n)\)
  is non-trivial, then \(n \ge 2\). Let \(g = (i \; j)\) for any \(i
  \ne j\), of which there must at least be one. Then by the same
  reasoning as Theorem&amp;nbsp;1, \(g\) rules out \(f(a_n, a_{n-1},
  \dotsc)\). Since the \((i \; j)\) generate \(S_n\), then any \(g \in
  S_n\) is the composition of some sequence of \((i \; j)\)s, each of
  which rules out \(f(a_n, a_{n-1}, \dotsc)\), so \(g\) must also rule
  it out.&lt;/p&gt;

&lt;p&gt;Assume the lemma holds for \(d\), and let \(x = f_{d+1}(a_n,
  a_{n-1}, \dotsc) = \sqrt[k]{f_d(a_n, a_{n-1}, \dotsc)}\) for some
  \(k\), where \(f_d\) has radical level \(d\). Let \(o\) act on \(R\)
  like any non-trivial element \(g\) of \(K^{(d+1)}(S_n)\). By the
  induction hypothesis, all elements \(h_i \in K^{(d)}(S_n)\) cause
  \(x = f_d(a_n, a_{n-1}, \dotsc)\) to go around a loop, so pick a
  point \(z_0\) that is never equal to any point \(x\) traverses under
  any operation corresponding to \(h_i\). Then, since \(g = [h, k]\)
  for \(h, k \in K^{(d)}(S_n)\), by the same reasoning as in
  Theorem&amp;nbsp;2, the total angle induced by \(o\) on \(x =
  f_{d+1}(a_n, a_{n-1}, \dotsc)\) around \(z_0\) is \(0\), and the
  distance from \(z_0\) remains unchanged. Thus, \(x = f_{d+1}(a_n,
  a_{n-1}, \dotsc)\) remains fixed, and \(o\) rules it out.&lt;/p&gt;

&lt;p&gt;By the same reasoning as in Theorem 2, this can be extended to the
  general case of \(f(a_n, a_{n-1}, \dotsc)\) being any algebraic
  formula with nesting level \(d + 1\). &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;

&lt;div&gt;We can immediately deduce the following corollaries, using the fact
that \(K^{(2)}(S_4)\) is non-trivial:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Corollary 4&lt;/span&gt;.) An
algebraic expression with at most singly-nested radicals in the
coefficients of the general quartic equation
  \[
  ax^4 + bx^3 + cx^2 + dx + e = 0
  \]
  cannot be a solution to this equation.&lt;sup&gt;&lt;a href=&quot;#fn6&quot; id=&quot;r6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;5. Quintic Equations&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now, finally, the quintic. Let&amp;rsquo;s jump right to the interactive example.&lt;/p&gt;

&lt;div class=&quot;interactive-example&quot;&gt;
  &lt;h3&gt;Interactive Example 6: The quintic equation&lt;/h3&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Roots
    &lt;nokatex&gt;&lt;div id=&quot;rootBoardQuintic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;rootListQuintic&quot;&gt;
      \(R = \langle r_1, r_2, r_3, r_4, r_5 \rangle\)
    &lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.resetRootAndResultList();&quot;&gt;
      Reset \(R\) and \(X\) order
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA1, 1000);&quot;&gt;
      \(A_1 = ↺_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA2, 1000);&quot;&gt;
      \(A_2 = ↺_{2, 3}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA3, 1000);&quot;&gt;
      \(A_3 = ↺_{3, 4}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA4, 1000);&quot;&gt;
      \(A_4 = ↺_{4, 5}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA5, 1000);&quot;&gt;
      \(A_5 = ↺_{5, 1}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA1.invert(), 1000);&quot;&gt;
      \(A_1^\prime = ↻_{1, 2}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA2.invert(), 1000);&quot;&gt;
      \(A_2^\prime = ↻_{2, 3}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA3.invert(), 1000);&quot;&gt;
      \(A_3^\prime = ↻_{3, 4}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA4.invert(), 1000);&quot;&gt;
      \(A_4^\prime = ↻_{4, 5}\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opA5.invert(), 1000);&quot;&gt;
      \(A_5^\prime = ↻_{5, 1}\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB1, 4000);&quot;&gt;
      \(B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB2, 4000);&quot;&gt;
      \(B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB3, 4000);&quot;&gt;
      \(B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 5)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB4, 4000);&quot;&gt;
      \(B_4 = [\![ A_5, A_4 ]\!] \mapsto (4 \; 5 \; 1)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB5, 4000);&quot;&gt;
      \(B_5 = [\![ A_1, A_5 ]\!] \mapsto (5 \; 1 \; 2)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB1.invert(), 4000);&quot;&gt;
      \(B_1^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB2.invert(), 4000);&quot;&gt;
      \(B_2^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB3.invert(), 4000);&quot;&gt;
      \(B_3^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB4.invert(), 4000);&quot;&gt;
      \(B_4^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opB5.invert(), 4000);&quot;&gt;
      \(B_5^\prime\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC1, 16000);&quot;&gt;
      \(C_1 = [\![ B_3, B_1 ]\!] \mapsto (2 \; 3 \; 5)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC2, 16000);&quot;&gt;
      \(C_2 = [\![ B_4, B_2 ]\!] \mapsto (3 \; 4 \; 1)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC3, 16000);&quot;&gt;
      \(C_3 = [\![ B_5, B_3 ]\!] \mapsto (4 \; 5 \; 2)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC4, 16000);&quot;&gt;
      \(C_4 = [\![ B_1, B_4 ]\!] \mapsto (5 \; 1 \; 3)\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC5, 16000);&quot;&gt;
      \(C_5 = [\![ B_2, B_5 ]\!] \mapsto (1 \; 2 \; 4)\)
    &lt;/button&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC1.invert(), 16000);&quot;&gt;
      \(C_1^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC2.invert(), 16000);&quot;&gt;
      \(C_2^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC3.invert(), 16000);&quot;&gt;
      \(C_3^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC4.invert(), 16000);&quot;&gt;
      \(C_4^\prime\)
    &lt;/button&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.runOp(quintic.opC5.invert(), 16000);&quot;&gt;
      \(C_5^\prime\)
    &lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Coefficients
    &lt;nokatex&gt;&lt;div id=&quot;coeffBoardQuintic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

  &lt;/div&gt;

  &lt;div class=&quot;graph-container&quot;&gt;
    Candidate solution
    &lt;nokatex&gt;&lt;div id=&quot;formulaBoardQuintic&quot; class=&quot;graph jxgbox&quot;&gt;&lt;/div&gt;&lt;/nokatex&gt;

    &lt;span id=&quot;resultListQuintic&quot;&gt;
      \(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
    &lt;/span&gt;

    &lt;span id=&quot;resultNoteQuintic&quot;&gt;&lt;/span&gt;

    &lt;br /&gt;

    &lt;button class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot;
            type=&quot;button&quot; onclick=&quot;quintic.findFirstOpRulingOutSelectedFormula();&quot;&gt;
      Find first operation that rules out selected formula
    &lt;/button&gt;

    &lt;span id=&quot;findFirstOpStatusQuintic&quot;&gt;&lt;/span&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot; name=&quot;formulaQuintic&quot; type=&quot;radio&quot;
             onchange=&quot;quintic.switchFormula(quintic.fA);&quot; /&gt;
      \(x_1 = f_A = Δ\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot; name=&quot;formulaQuintic&quot; type=&quot;radio&quot;
             onchange=&quot;quintic.switchFormula(quintic.newFB());&quot; /&gt;
      \(x_{1, 2} = f_B = \sqrt{f_A}\)
    &lt;/label&gt;

    &lt;br /&gt;

    &lt;label&gt;
      &lt;input checked class=&quot;interactive-example-button quinticDisableWhileRunningOp&quot; name=&quot;formulaQuintic&quot; type=&quot;radio&quot;
             onchange=&quot;quintic.switchFormula(quintic.newFC());&quot; /&gt;
      \(x_{1, 2, 3, 4, 5, 6} = f_C =\)
      &lt;br /&gt;
      \(\qquad \sqrt[3]{(f_B - 0.8)(f_B - 0.75)}\)
    &lt;/label&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
&apos;use strict&apos;;

var quintic = (function() {
  var initialRoots = [
    new Complex(0, 1), new Complex(-0.5, -0.5), new Complex(0.5, -0.5),
    new Complex(1, 0), new Complex(0.5, 0.5)
  ];

  var display = new Display(
    &quot;rootBoardQuintic&quot;, &quot;coeffBoardQuintic&quot;, &quot;formulaBoardQuintic&quot;,
    initialRoots, newFC(), function() {});

  display._resultRotationCounterPoint.setAttribute({visible: false});

  for (var i = 0; i &lt; display._rootPointsBySubscript.length; ++i) {
    display._rootPointsBySubscript[i].setAttribute({
      fixed: true
    });
  }


  function updateRootAndResultList(display) {
    updateRootList(display, &quot;rootListQuintic&quot;);
    updateResultList(display, &quot;resultListQuintic&quot;);
    updateResultNote(display, &quot;resultNoteQuintic&quot;, &quot;quintic&quot;);
  }

  var state = {};

  function runQuinticOp(op, time, doneCallback) {
    runOp(display, op, time, &apos;.quinticDisableWhileRunningOp&apos;, state, doneCallback);
  };

  function switchQuinticFormula(formula) {
    switchFormula(display, state, formula);
    updateRootAndResultList(display);
  }

  function resetRootAndResultList() {
    display.reorderPointsBySubscript();
    display.resetResultRotationCounters();
    updateRootAndResultList(display);
  }

  var opA1 = display.swapRootOp(0, 1, updateRootAndResultList);
  var opA2 = display.swapRootOp(1, 2, updateRootAndResultList);
  var opA3 = display.swapRootOp(2, 3, updateRootAndResultList);
  var opA4 = display.swapRootOp(3, 4, updateRootAndResultList);
  var opA5 = display.swapRootOp(4, 0, updateRootAndResultList);
  var opA1Inv = opA1.invert();
  var opA2Inv = opA2.invert();
  var opA3Inv = opA3.invert();
  var opA4Inv = opA4.invert();
  var opA5Inv = opA5.invert();
  var opB1 = newCommutatorAnimation(opA2, opA1);
  var opB2 = newCommutatorAnimation(opA3, opA2);
  var opB3 = newCommutatorAnimation(opA4, opA3);
  var opB4 = newCommutatorAnimation(opA5, opA4);
  var opB5 = newCommutatorAnimation(opA1, opA5);
  var opB1Inv = opB1.invert();
  var opB2Inv = opB2.invert();
  var opB3Inv = opB3.invert();
  var opB4Inv = opB4.invert();
  var opB5Inv = opB5.invert();
  var opC1 = newCommutatorAnimation(opB3, opB1);
  var opC2 = newCommutatorAnimation(opB4, opB2);
  var opC3 = newCommutatorAnimation(opB5, opB3);
  var opC4 = newCommutatorAnimation(opB1, opB4);
  var opC5 = newCommutatorAnimation(opB2, opB5);
  var opC1Inv = opC1.invert();
  var opC2Inv = opC2.invert();
  var opC3Inv = opC3.invert();
  var opC4Inv = opC4.invert();
  var opC5Inv = opC5.invert();

  var opInfos = [
    {
      name: &apos;A&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opA1,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opA2,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;3&lt;/sub&gt;&apos;,
      op: opA3,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;4&lt;/sub&gt;&apos;,
      op: opA4,
      time: 1000
    },
    {
      name: &apos;A&lt;sub&gt;5&lt;/sub&gt;&apos;,
      op: opA5,
      time: 1000
    },
    {
      name: &apos;B&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opB1,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opB2,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;3&lt;/sub&gt;&apos;,
      op: opB3,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;4&lt;/sub&gt;&apos;,
      op: opB4,
      time: 4000
    },
    {
      name: &apos;B&lt;sub&gt;5&lt;/sub&gt;&apos;,
      op: opB5,
      time: 4000
    },
    {
      name: &apos;C&lt;sub&gt;1&lt;/sub&gt;&apos;,
      op: opC1,
      time: 16000
    },
    {
      name: &apos;C&lt;sub&gt;2&lt;/sub&gt;&apos;,
      op: opC2,
      time: 16000
    },
    {
      name: &apos;C&lt;sub&gt;3&lt;/sub&gt;&apos;,
      op: opC3,
      time: 16000
    },
    {
      name: &apos;C&lt;sub&gt;4&lt;/sub&gt;&apos;,
      op: opC4,
      time: 16000
    },
    {
      name: &apos;C&lt;sub&gt;5&lt;/sub&gt;&apos;,
      op: opC5,
      time: 16000
    }
  ];

  function findFirstOpRulingOutSelectedFormulaQuintic() {
    findFirstOpRulingOutSelectedFormula(
      display, resetRootAndResultList, runQuinticOp, opInfos,
      &apos;findFirstOpStatusQuintic&apos;);
  }

  // Ruled out by A_i.
  var fA = quinticDiscFormula;

  // Ruled out by B_i.
  function newFB() {
    return quinticDiscFormula.root(2);
  }

  // Has a rotation number with B_1, B_2, B_4, and B_5.
  function newPreFC1() {
    return newFB().minusAll(0.8);
  }

  // Has a rotation number with B_3.
  function newPreFC2() {
    return newFB().minusAll(0.75);
  }

  // Has a rotation number with all B_i.
  function newPreFC3() {
    return ComplexFormula.times(
      newPreFC1(),
      newPreFC2()
    );
  }

  // 2 evenly divides the rotation numbers with B_1, B_2, B_4, and B_5, so
  // this doesn&apos;t work for f_C.
  function newPreFC4() {
    return newPreFC3().root(2);
  }

  // Ruled out by C_i.
  function newFC() {
    return newPreFC3().root(3);
  }

  return {
    display: display,
    opA1: opA1,
    opA2: opA2,
    opA3: opA3,
    opA4: opA4,
    opA5: opA5,
    opB1: opB1,
    opB2: opB2,
    opB3: opB3,
    opB4: opB4,
    opB5: opB5,
    opC1: opC1,
    opC2: opC2,
    opC3: opC3,
    opC4: opC4,
    opC5: opC5,
    fA: fA,
    newFB: newFB,
    newFC: newFC,
    runOp: runQuinticOp,
    resetRootAndResultList: resetRootAndResultList,
    switchFormula: switchQuinticFormula,
    findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuintic
  };
})();
&lt;/script&gt;

&lt;p&gt;Similarly to the interactive example for the quartic, the
  operations are organized to make clear what commutator subgroup
  they&amp;rsquo;re in. There&amp;rsquo;s something interesting
  though&amp;mdash;the \(C_i\) seem very similar to the \(B_i\). In fact,
  the \(C_i\) also act on \(R\) like \(A_5\)! Also, if you compute
  \(D_i = [\![ C_{(i+1) \bmod 5}, C_{i \bmod
  5} ]\!]\), you will find that \(D_i\) acts exactly like \(B_i\) on
  \(R\)!&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Why can we do this for the quintic, but not for anything of lower
  degree? This is because \(A_5\) is &lt;a href=&quot;https://en.wikipedia.org/wiki/Perfect_group&quot;&gt;perfect&lt;/a&gt;,
  which means that it equals its own commutator subgroup. (You can
  verify this yourself by brute force, e.g. writing a program, or you
  can play around with \(3\)-cycles and see that any \(3\)-cycle is
  the commutator of two other \(3\)-cycles.) Then this immediately
  implies that \(K^{(n)}(S_5)\) is non-trivial for any \(n\), which
  then implies our main result:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Abel-Ruffini theorem&lt;/span&gt;.)
  An algebraic expression in the coefficients of the general
  \(n\)th-degree polynomial equation
  \[
  a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0
  \]
  for \(n \ge 5\) cannot be a solution to this equation.&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; By the above, \(A_5\) is
  perfect, so \(K^{(d)}(S_5)\) is non-trivial for all \(d\).&lt;/p&gt;

&lt;p&gt;Since \(S_5\) is a subgroup of \(S_n\) for \(n \ge 5\), \(A_5 =
K(S_5)\) must also be a subgroup of \(A_n = K(S_n)\) for \(n \ge
5\). But since \(A_5\) is perfect, then \(A_5\) must also be a
subgroup of \(K^{(d)}(S_n)\) for any \(d\), which means that
\(K^{(d)}(S_n)\) is non-trivial for any \(d\) and \(n \ge 5\).&lt;/p&gt;

&lt;p&gt;An algebraic expression has some finite radical level \(d\), but
  \(K^{(d)}(S_5)\) is non-trivial for any \(d\) and \(n \ge 5\), so by
  Lemma&amp;nbsp;3 no algebraic expression can be solution to the general
  \(n\)th-degree polynomial equation for \(n \ge 5\). &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;With the theorem above, we now have a succinct answer to the
question at the beginning of this article. You can&amp;rsquo;t write down
a solution to the general quadratic equation that is a rational
expression because you can find an operation on the roots that will
permute them non-trivially and yet leave the result of the expression
constant. For the same reason, you can&amp;rsquo;t write down a solution
to the general \(n\)th-degree polynomial equation that is an algebraic
equation!&lt;/p&gt;

&lt;p&gt;Finally, as a bonus, I&amp;rsquo;ll explain how to generate algebraic
  expressions that require a &amp;ldquo;\(d\)th-level&amp;rdquo; operator,
  meaning an operator that maps to an element of \(K^{(d)}(S_n)\),
  assuming it&amp;rsquo;s non-trivial. This shows that there&amp;rsquo;s no
  single &amp;ldquo;super-operation&amp;rdquo; that rules out all algebraic
  expressions.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;As an example, the formulas in the interactive example above are
chosen so that \(f_A\) is ruled out by the \(A_i\), \(f_B\) is ruled
out by the \(B_i\), etc. They depend on the particular roots chosen,
of course, which is why this interactive example doesn&amp;rsquo;t let you
move the roots around, but in principle you could build formulas for
any polynomial that is first ruled out by \(C_i\), or \(D_i\), or
whatever you wish. Given a polynomial \(P = a_n x^n + a_{n-1} x^{n-1}
+ \dotsb + a_0\) of degree \(n \ge 5\) and \(d\), a recursive
algorithm to generate an expression that is ruled out only by a
&amp;ldquo;\(d\)th-level&amp;rdquo; operator is:

  &lt;ol&gt;
    &lt;li&gt;If \(d = 0\), return \(Δ(a_n, a_{n-1}, \dotsc)\).&lt;/li&gt;
    &lt;li&gt;Otherwise, run this algorithm with \(P\) and \(d-1\) to get
      \(f_{d-1}(a_n, a_{n-1}, \dotsc)\).&lt;/li&gt;
      &lt;li&gt;Find operations \(o_1\) to \(o_m\) that correspond to
        generators \(g_1\) to \(g_m\) of \(K^{(d-1)}(S_n)\).&lt;/li&gt;
      &lt;li&gt;For each \(o_i\):
        &lt;ol&gt;
          &lt;li&gt;Apply \(o_i\), which makes \(x = f_{d-1}(a_n, a_{n-1},
          \dotsc)\) go around a loop. Record the looped-around regions
          and their associated rotation numbers (i.e., the total angle
          divided by \(2π\)).&lt;/li&gt;
        &lt;/ol&gt;
      &lt;/li&gt;
      &lt;li&gt;Pick points \(z_1, \dotsc, z_t\) such that each \(z_i\) has
        a non-zero rotation number for at least one \(o_j\). \(t\) can
        be at most \(m\).&lt;/li&gt;
      &lt;li&gt;Let \(k\) be the least number such that, for every \(o_i\),
        \(k\) doesn&amp;rsquo;t divide any of the rotation numbers of any
        \(z_j\) with respect to \(o_i\). Return \(f_d(a_n, a_{n-1}, \dotsc) = \sqrt[k]{\prod_i
        (f_{k-1}(a_n, a_{n-1}, \dotsc) - z_i)}\).
      &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;/section&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] This proof is originally due to &lt;a href=&quot;https://en.wikipedia.org/wiki/Vladimir_Arnold&quot;&gt;Arnold&lt;/a&gt;. There
  are a &lt;a href=&quot;https://www.youtube.com/watch?v=RhpVSV6iCko&quot;&gt;couple&lt;/a&gt;
    of &lt;a href=&quot;http://drorbn.net/dbnvp/AKT-140314.php&quot;&gt;videos&lt;/a&gt; that
    talk about this proof, as well as
    &lt;a href=&quot;http://link.springer.com/book/10.1007%2F1-4020-2187-9&quot;&gt;this book&lt;/a&gt;
    based on Arnold&amp;rsquo;s lectures, and
    &lt;a href=&quot;https://www.tmna.ncu.pl/static/files/v16n2-02.pdf&quot;&gt;this paper&lt;/a&gt;.
    I mostly follow Boaz&amp;rsquo;s video, and the interactive
    visualizations are based on the visualizations he has in his
    video.&lt;/p&gt;

  &lt;p&gt;The interactive visualizations were generated using
  the excellent
    &lt;a href=&quot;http://jsxgraph.uni-bayreuth.de/wp/index.html&quot;&gt;JSXGraph&lt;/a&gt; library.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] Theorem&amp;nbsp;1 can be generalized even more! We can
    append other functions and operations to rational expressions, as
    long as those functions and operations are continuous and
    single-valued. For example, we can allow the use of exponentials
    and trigonometric functions, which is something that the standard
    Galois theory cannot handle.&lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] More precisely, a \(↺_{i, j}\)
    contains a pair of simple paths, i.e. continuous injective
    functions \([0, 1] \to \mathbb{C}\), between two distinct points
    of \(\mathbb{C}\), such that their concatenation defines a simple
    closed curve
    around a region in \(\mathbb{C}\) with a counter-clockwise
    orientation. Also, depending on the exact method of formalizing
    \(↺_{i, j}\), it either explicitly or implicitly
    encodes a permutation on \(R\). Then we can define an operation
    \(*\) on the \(↺_{i, j}\) and
    \(↻_{i, j}\) (defined analogously) which
    concatenates the paths (and composes the permutations, if
    explicitly encoded). Since the space of paths has no inverses or
    an identity, the \(↺_{i, j}\) and
    \(↻_{i, j}\) generate a &lt;a
    href=&quot;https://en.wikipedia.org/wiki/Free_semigroup&quot;&gt;free semigroup&lt;/a&gt; with
    the operation \(*\). Then this semigroup defines an
    &lt;a href=&quot;https://en.wikipedia.org/wiki/Semigroup_action&quot;&gt;action&lt;/a&gt;
    on \(R\) via its associated permutation on \(R\), which then just
    generates \(S_n\), since \(S_n\) is generated by adjacent swaps.&lt;/p&gt;

  &lt;p&gt;We make a distinction between the operation
    \(↺_{i, j}\) and the permutation it induces on
    \(R\), since the latter &amp;ldquo;loses&amp;rdquo; the orientation
    information, which is important to preserve when talking about the
    action of \(↺_{i, j}\) on some \(x_i\).
    
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] Note that, depending on the text, the commutator may
    be defined slightly differently as \(g h g^{-1} h^{-1}\).
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] \(K(A_4)\) is isomorphic to \(V\), the
    &lt;a href=&quot;https://en.wikipedia.org/wiki/Klein_four-group&quot;&gt;Klein four-group&lt;/a&gt;.
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn6&quot;&gt;[6] In fact, the quartic formula has three nested
  radicals. I wonder why?
    &lt;a href=&quot;#r6&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/computing-iroot</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/computing-iroot"/>
    <title>Computing Integer Roots</title>
    <updated>2016-01-10T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
KaTeXMacros = {
  &quot;\\iroot&quot;: &quot;\\operatorname{iroot}&quot;,
  &quot;\\Bits&quot;: &quot;\\operatorname{Bits}&quot;,
  &quot;\\Err&quot;: &quot;\\operatorname{Err}&quot;,
  &quot;\\NewtonRoot&quot;: &quot;\\mathrm{N{\\small EWTON}\\text{-}I{\\small ROOT}}&quot;,
};
&lt;/script&gt;

&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn2.js&quot;&gt;&lt;/script&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;1. The algorithm&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Today I&amp;rsquo;m going to talk about the generalization of
the &lt;a href=&quot;/computing-isqrt&quot;&gt;integer square root algorithm&lt;/a&gt; to
higher roots. That is, given \(n\) and \(p\), computing
\(\iroot(n, p) = \lfloor \sqrt[p]{n} \rfloor\), or the
greatest integer whose \(p\)th power is less than or equal to
\(n\). The generalized algorithm is straightforward, and it&amp;rsquo;s
easy to generalize the proof of correctness, but the run-time bound is
a bit trickier, since it has a dependence on \(p\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;First, the algorithm, which we&amp;rsquo;ll call \(\NewtonRoot\):

  &lt;ol&gt;
    &lt;li&gt;If \(n = 0\), return \(0\).&lt;/li&gt;
    &lt;li&gt;If \(p \ge \Bits(n)\) return \(1\).&lt;/li&gt;
    &lt;li&gt;Otherwise, set \(i\) to \(0\) and set \(x_0\) to \(2^{\lceil
      \Bits(n) / p\rceil}\).&lt;/li&gt;
    &lt;li&gt;Repeat:
      &lt;ol&gt;
        &lt;li&gt;Set \(x_{i+1}\) to \(\lfloor ((p - 1) x_i + \lfloor
	  n/x_i^{p-1} \rfloor) / p \rfloor\).&lt;/li&gt;
        &lt;li&gt;If \(x_{i+1} \ge x_i\), return \(x_i\). Otherwise, increment
          \(i\).&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;and its implementation in Javascript:&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;



&lt;script&gt;
// iroot returns the greatest number x such that x^p &lt;= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), n must be non-negative, and
// p must be a positive integer.
//
// Example (open up the JS console on this page and type):
//
//   iroot(new BigInteger(&quot;64&quot;), 3).toString()
function iroot(n, p) {
  var s = n.signum();
  if (s &lt; 0) {
    throw new Error(&apos;negative radicand&apos;);
  }
  if (p &lt;= 0) {
    throw new Error(&apos;non-positive degree&apos;);
  }
  if (p !== (p|0)) {
    throw new Error(&apos;non-integral degree&apos;);
  }

  if (s == 0) {
    return n;
  }

  var b = n.bitLength();
  if (p &gt;= b) {
    return n.constructor.ONE;
  }

  // x = 2^ceil(Bits(n)/p)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p));
  var pMinusOne = new n.constructor((p - 1).toString());
  var pBig = new n.constructor(p.toString());
  while (true) {
    // y = floor(((p-1)x + floor(n/x^(p-1)))/p)
    var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig);
    if (y.compareTo(x) &gt;= 0) {
      return x;
    }
    x = y;
  }
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// iroot returns the greatest number x such that x^p &amp;lt;= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), n must be non-negative, and
// p must be a positive integer.
//
// Example (open up the JS console on this page and type):
//
//   iroot(new BigInteger(&amp;quot;64&amp;quot;), 3).toString()
function iroot(n, p) {
  var s = n.signum();
  if (s &amp;lt; 0) {
    throw new Error(&amp;#39;negative radicand&amp;#39;);
  }
  if (p &amp;lt;= 0) {
    throw new Error(&amp;#39;non-positive degree&amp;#39;);
  }
  if (p !== (p|0)) {
    throw new Error(&amp;#39;non-integral degree&amp;#39;);
  }

  if (s == 0) {
    return n;
  }

  var b = n.bitLength();
  if (p &amp;gt;= b) {
    return n.constructor.ONE;
  }

  // x = 2^ceil(Bits(n)/p)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p));
  var pMinusOne = new n.constructor((p - 1).toString());
  var pBig = new n.constructor(p.toString());
  while (true) {
    // y = floor(((p-1)x + floor(n/x^(p-1)))/p)
    var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig);
    if (y.compareTo(x) &amp;gt;= 0) {
      return x;
    }
    x = y;
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This algorithm turns out to require \(Θ(p) + O(\lg \lg n)\)
  loop iterations, with the run-time for a loop iteration depending on
  what kind of arithmetic operations are used.&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;2. Correctness&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;Again we look at the iteration rule:

  \[
  x_{i+1} = \left\lfloor \frac{(p - 1) x_i + \left\lfloor \frac{n}{x_i^{p-1}}
  \right\rfloor}{p} \right\rfloor
  \]

  Letting \(f(x)\) be the right-hand side, we can again use basic
  properties of the floor function to remove the inner floor:

  \[
  f(x) = \left\lfloor \frac{1}{p} ((p-1) x + n/x^{p-1}) \right\rfloor
  \]

  Letting \(g(x)\) be its real-valued equivalent:

  \[
  g(x) = \frac{1}{p} ((p-1) x + n/x^{p-1})
  \]

  we can, again using basic properties of the floor function, show that
  \(f(x) \le g(x)\), and for any integer \(m\), \(m \le f(x)\) if and
  only if \(m \le g(x)\).&lt;/p&gt;

&lt;p&gt;Finally, let&amp;rsquo;s give a name to our desired output: let \(s =
  \iroot(n, p) = \lfloor \sqrt[p]{n} \rfloor\).&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Unsurprisingly, \(f(x)\) never underestimates:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 1&lt;/span&gt;.) For
  \(x \gt 0\), \(f(x) \ge s\).&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; By the basic properties of
  \(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge
  s\). \(g&apos;(x) = (1 - 1/p) (1 - n/x^p)\) and \(g&apos;&apos;(x) = (p - 1)
  (n/x^{p+1})\). Therefore, \(g(x)\) is concave-up for \(x \gt 0\); in
  particular, its single positive extremum at \(x = \sqrt[p]{n}\) is a
  minimum. But \(g(\sqrt[p]{n}) = \sqrt[p]{n} \ge s\). &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;

Also, our initial guess is always an overestimate:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 2&lt;/span&gt;.) \(x_0
  \gt s\).&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; \(\Bits(n) =
  \lfloor \lg n \rfloor + 1 \gt \lg n\). Therefore,

\[
  \begin{aligned}
  x_0 &amp;=   2^{\lceil \Bits(n) / p \rceil} \\
  &amp;\ge 2^{\Bits(n) / p} \\
  &amp;\gt 2^{\lg n / p} \\
  &amp;= \sqrt[p]{n} \\
  &amp;\ge s\text{.} \; \blacksquare
  \end{aligned}
\]
&lt;/p&gt;
&lt;/div&gt;

Therefore, we again have the invariant that \(x_i \ge s\), which
  lets us prove partial correctness:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 1&lt;/span&gt;.) If
  \(\NewtonRoot\) terminates, it
  returns the value \(s\).&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Assume it terminates. If it
  terminates in step \(1\) or \(2\), then we are done. Otherwise, it can
  only terminate in step \(4.2\) where it returns \(x_i\) such that
  \(x_{i+1} = f(x_i) \ge x_i\). This implies \(g(x_i) = ((p-1)x_i +
  n/x_i^{p-1}) / p \ge x_i\). Rearranging yields \(n \ge x_i^p\) and
  combining with our invariant we get \(\sqrt[p]{n} \ge x_i \ge s\). But
  \(s + 1 \gt \sqrt[p]{n}\), so that forces \(x_i\) to be \(s\), and
  thus \(\NewtonRoot\) returns \(s\)
  if it terminates. &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Total correctness is also easy:

&lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 2&lt;/span&gt;.)
  \(\NewtonRoot\) terminates.&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Assume it doesn&amp;rsquo;t
  terminate. Then we have a strictly decreasing infinite sequence of
  integers \(\{ x_0, x_1, \dotsc \}\). But this sequence is bounded below
  by \(s\), so it cannot decrease indefinitely. This is a contradiction,
  so \(\NewtonRoot\) must
  terminate. &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;

Note that, like \(\NewtonRoot\),
  the check in step \(4.2\) cannot be weakened to \(x_{i+1} = x_i\), as
  doing so would cause the algorithm to oscillate. In fact, as \(p\)
  grows, so do the number of values of \(n\) that exhibit this behavior,
  and so do the number of possible oscillations. For example, \(n =
  972\) with \(p = 3\) would yield the sequence \(\{ 16, 11, 10, 9, 10,
  9, \dotsc \}\), and \(n = 80\) with \(p = 4\) would yield the sequence
\(\{ 4, 3, 2, 4, 3, 2, \dotsc \}\).&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;3. Run-time&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;We will show that \(\NewtonRoot\)
    takes \(Θ(p) + O(\lg \lg n)\) loop iterations. Then we will
    analyze a single loop iteration and the arithmetic operations used to
    get a total run-time bound.&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;Analagous to the square root case, define \(\Err(x) =
    x^p/n - 1\) and let \(ϵ_i = \Err(x_i)\). First,
    let&amp;rsquo;s prove our lower bound for \(ϵ_i\), which translates
    directly from the square root case:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 3&lt;/span&gt;.) \(x_i
    \ge s + 1\) if and only if \(ϵ_i \ge 1/n\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; \(n \lt (s + 1)^p\), so \(n + 1
    \le (s + 1)^p\), and therefore \((s + 1)^p/n - 1 \ge 1/n\). But the
    expression on the left side is just \(\Err(s +
    1)\). \(x_i \ge s + 1\) if and only if \(ϵ_i \ge
    \Err(s + 1)\), so the result immediately
    follows. &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;
  &lt;/div&gt;

  &lt;p&gt;Now for the next few lemmas we need to do some algebra and
    calculus. Inverting \(\Err(x)\), we get that \(x_i =
    \sqrt[p]{(ϵ_i + 1) \cdot n}\). Expressing \(g(x_i)\) in terms
    of \(ϵ_i\) and \(q = 1 - 1/p\) we get

    \[ g(x_i) = \sqrt[p]{n} \left( \frac{ϵ_i q +
    1}{(ϵ_i + 1)^q} \right) \]

    and

    \[
    \Err(g(x_i))
    = \frac{(q ϵ_i + 1)^p}{(ϵ_i + 1)^{p-1}} - 1\text{.}
    \]

    Let
    \[
    f(ϵ) = \frac{(q ϵ + 1)^p}{(ϵ + 1)^{p-1}} - 1\text{.}
    \]

    Then computing derivatives,

\[
    \begin{aligned}
    f&apos;(ϵ) &amp;= q ϵ \frac{(q ϵ + 1)^{p-1}}{(ϵ + 1)^p}\text{,} \\
    f&apos;&apos;(ϵ) &amp;= q \frac{(q ϵ + 1)^{p-2}}{(ϵ + 1)^{p + 1}}\text{, and} \\
    f&apos;&apos;&apos;(ϵ) &amp;= -q (2 + q (2 + 3 ϵ)) \frac{(q ϵ + 1)^{p-3}}{(ϵ + 1)^{p + 2}}\text{.}
    \end{aligned}
\]

    Note that \(f(0) = f&apos;(0) = 0\), and \(f&apos;&apos;(0) = q\). Also, for
    \(ϵ &gt; 0\), \(f&apos;(ϵ) \gt 0\), \(f&apos;&apos;(ϵ) \gt 0\), and
    \(f&apos;&apos;&apos;(ϵ) &amp;lt; 0\).&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;Now we&amp;rsquo;re ready to show that the \(ϵ_i\) shrink
    quadratically:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 4&lt;/span&gt;.)
    \(f(ϵ) \lt (ϵ/\sqrt{2})^2\) for \(ϵ \gt 0\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Taylor-expand \(f(ϵ)\)
    around \(0\) with
    the &lt;a href=&quot;https://en.wikipedia.org/wiki/Taylor%27s_theorem#Explicit_formulae_for_the_remainder&quot;&gt;Lagrange
    remainder form&lt;/a&gt; to get \[ f(ϵ) = f(0) + f&apos;(0) ϵ +
    \frac{f&apos;&apos;(0)}{2} ϵ^2 + \frac{f&apos;&apos;&apos;(\xi)}{6} ϵ^3 \] for
    some some \(\xi\) such that \(0 \lt \xi \lt ϵ\). Plugging in
    values, we see that \(f(ϵ) = \frac{1}{2} q ϵ^2 +
    \frac{1}{6} f&apos;&apos;&apos;(\xi) ϵ^3\) with the last term being negative,
    so \(f(ϵ) \lt \frac{1}{2} q ϵ^2 \lt \frac{1}{2}
    ϵ^2\). &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;

  But this is only a useful upper bound when \(ϵ_i \le 1\). In
    the square root case this was okay, since \(ϵ_1 \le 1\), but
    that is not true for larger values of \(p\). In fact, in general, the
    \(ϵ_i\) start off shrinking &lt;em&gt;linearly&lt;/em&gt;:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 5&lt;/span&gt;.) For
    \(ϵ \gt 1\), \(f(ϵ) \gt ϵ/8\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Since \(f(0) = f&apos;(0) = 0\), and
    \(f&apos;&apos;(ϵ) \gt 0\) for \(ϵ \ge 0\), \(f&apos;(ϵ)\) and
    \(f(ϵ)\) are increasing, and thus \(f(1) \gt 0\) and
    \(f(ϵ)\) is a concave-up curve.&lt;/p&gt;

  &lt;p&gt;Then \((0, 0)\) and \((1, f(1))\) are two points on a concave-up
    curve, and thus geometrically the line \(y = f(1) ϵ\) must lie
    below \(y = f(ϵ)\) for \(ϵ \gt 1\), and thus
    \(f(ϵ) \gt f(1) ϵ\) for \(ϵ \gt
    1\). Algebraically, this also follows from the definition
    of &lt;a href=&quot;https://en.wikipedia.org/wiki/Convex_function&quot;&gt;(strict)
    convexity&lt;/a&gt; (with \(x_1 = 0\), \(x_2 = ϵ\), and \(t = 1 -
    1/ϵ\)).&lt;/p&gt;

  &lt;p&gt;But \(f(1) = (2 - 1/p)^p/2^{p-1} - 1 = 2 \left(1 -
    \frac{1}{2p}\right)^p - 1\), which is always increasing as a function
    of \(p\), as you can see by calculating its derivative. Therefore, its
    minimum is at \(p = 2\), which is \(1/8\), and so \(f(ϵ) \gt
    f(1) ϵ \ge ϵ/8\). &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;

  Finally, let&amp;rsquo;s bound our initial values:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 6&lt;/span&gt;.) \(x_0
    \le 2s\) and \(ϵ_0 \le 2^p - 1\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt;
    This is a straightforward generalization of the equivalent lemma
      from the square root case. Let&amp;rsquo;s start with \(x_0\):

\[
      \begin{aligned}
      x_0 &amp;=   2^{\lceil \Bits(n) / p \rceil} \\
      &amp;=   2^{\lfloor (\lfloor \lg n \rfloor + 1 + p - 1)/p \rfloor} \\
      &amp;=   2^{\lfloor \lg n / p \rfloor + 1} \\
      &amp;=   2 \cdot 2^{\lfloor \lg n / p \rfloor}\text{.}
      \end{aligned}
\]

      Then \(x_0/2 = 2^{\lfloor \lg n / p \rfloor} \le 2^{\lg n / p} =
      \sqrt[p]{n}\). Since \(x_0/2\) is an integer, \(x_0/2 \le
      \sqrt[p]{n}\) if and only if \(x_0/2 \le \lfloor \sqrt[p]{n} \rfloor =
      s\). Therefore, \(x_0 \le 2s\).&lt;/p&gt;

    &lt;p&gt;As for \(ϵ_0\):

\[
      \begin{aligned}
      ϵ_0 &amp;=   \Err(x_0) \\
      &amp;\le \Err(2s) \\
      &amp;=   (2s)^p/n - 1 \\
      &amp;=   2^p s^p/n - 1\text{.}
      \end{aligned}
\]

      Since \(s^p \le n\), \(2^p s^p/n \le 2^p\) and thus \(ϵ_0 \le
      2^p - 1\). &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;
  &lt;/div&gt;

  &lt;div class=&quot;p&quot;&gt;Now we&amp;rsquo;re ready to show our main result, which involves
    calculating how long the \(ϵ_i\) shrink linearly:

    &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 3&lt;/span&gt;.)
      \(\NewtonRoot\) performs \(Θ(p)
      + O(\lg \lg n)\) loop iterations.&lt;/div&gt;

    &lt;div class=&quot;proof&quot;&gt;
    &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Assume that \(ϵ_i \gt 1\)
      for \(i \le j\), \(ϵ_{j+1} \le 1\), and \(j+k\) is the number
      of loop iterations performed when running the algorithm for \(n\) and
      \(p\) (i.e., \(x_{j+k} \ge x_{j+k-1}\)). Using Lemma 5,

      \[
      \left( \frac{1}{8} \right)^{j+1} ϵ_0 \lt ϵ_{j+1} \le 1\text{,}
      \]

      which implies

      \[
      j \gt \frac{\lg ϵ_0}{3} - 1\text{.}
      \]
    &lt;/p&gt;

    &lt;p&gt;Similarly,

    \[
    \left( \frac{1}{8} \right)^j ϵ_0 \ge ϵ_j \gt 1\text{,}
    \]

    which implies

    \[
    j \lt \frac{\lg ϵ_0}{3} \text{.}
    \]

    Therefore, \(j = Θ(\lg ϵ_0)\), which is \(Θ(p)\)
    by Lemma 6.&lt;/p&gt;

    &lt;p&gt;Now assume \(k \ge 5\). Then \(x_i \ge s + 1\) for \(i \lt j + k -
      1\). Since \(ϵ_{j+1} \le 1\) by assumption, \(ϵ_{j+3}
      \le 1/2\) and \(ϵ_i \le (ϵ_{j+3})^{2^{i-j-3}}\) for \(j
      + 3 \le i \lt j + k - 1\) by Lemma 4, then \(ϵ_{j+k-2} \le
      2^{-2^{k-5}}\). But \(1/n \le ϵ_{j+k-2}\) by Lemma 3, so \(1/n
      \le 2^{-2^{k-5}}\). Taking logs to bring down the \(k\) yields \(k - 5
      \le \lg \lg n\). Then \(k \le \lg \lg n + 5\), and thus \(k = O(\lg
      \lg n)\).&lt;/p&gt;

    &lt;p&gt;Therefore, the total number of loop iterations is \(Θ(p) +
      O(\lg \lg n)\). &amp;#x220e;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;

  &lt;p&gt;Note that \(p \le \lg n\), so we can just say that
    \(\NewtonRoot\) performs
    \(Θ(\lg n)\) operations. But that obscures rather than
    simplifies. Note that the proof above is very similar to the proof of
    the worse run-time of \(\mathrm{N{\small EWTON}\text{-}I{\small
    SQRT}&apos;}\) where the initial guess varies. In this case, the error in
    our initial guess is magnified, since we raise it to the \((p-1)\)th
    power, and so that manifests as the \(Θ(p)\) term.&lt;/p&gt;

  &lt;p&gt;Furthermore, unlike the square root case, the number of arithmetic
    operations in a loop iteration isn&amp;rsquo;t constant. In particular,
    the sub-step to compute \(x_i^{p-1}\) takes a number of arithmetic
    operations dependent on \(p - 1\). Using repeated squarings, this
    computation would take \(Θ(\lg p)\) squarings and at most
    \(Θ(\lg p)\) multiplications.&lt;/p&gt;

  &lt;p&gt;If the cost of an arithmetic operation is constant, e.g.,
    we&amp;rsquo;re working with fixed-size integers, then the run-time bounds
    is the above multiplied by \(Θ(\lg p)\).&lt;/p&gt;

  &lt;p&gt;Otherwise, if the cost of an arithmetic operation depends on the
    length of its arguments, then we only have to multiply by a constant
    factor to get the run-time bounds in terms of arithmetic
    operations. If the cost of multiplying two numbers \(\le x\) is \(M(x)
    = O(\lg^k x)\), then the cost of computing \(x^p\) is \(O((p \lg
    x)^k)\). But \(x\) is \(Θ(n^{1/p})\), so the cost of computing
    \(x^p\) is \(O(\lg^k n)\), which is on the order of the cost of
    multiplying two numbers \(\le n\). Furthermore, note that we divide
    the result into \(n\), so we can stop once the computation of
    \(x_i^{p-1}\) exceeds \(n\). So in that case, we can treat a loop
    iteration as if it were performing a constant number of arithmetic
    operations on numbers of order \(n\), and so, like in the square root
    case, we pick up a factor of \(D(n)\), where \(D(n)\) is the run-time
    of dividing \(n\) by some number \(\le n\).&lt;/p&gt;
&lt;/section&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] Go and JS implementations are available
    on &lt;a href=&quot;https://github.com/akalin/iroot&quot;&gt;my GitHub&lt;/a&gt;.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] Here, and in most of the article, we&amp;rsquo;ll
    implicitly assume that \(n \gt 0\) and \(p \gt 1\).
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/sampling-visible-sphere</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/sampling-visible-sphere"/>
    <title>Sampling the Visible Sphere</title>
    <updated>2015-08-26T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;p&gt;&lt;em&gt;(Note: this article is a summary of
&lt;a href=&quot;http://ompf2.com/viewtopic.php?f=3&amp;t=1914&quot;&gt;this thread on
ompf2&lt;/a&gt;.)
&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The usual method for sampling a sphere from a point outside the
sphere is to calculate the angle of the cone of the visible portion
and uniformly sample within that cone, as described in
&lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.6561&quot;&gt;Shirley/Wang&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, one detail that is glossed over is that you still need to map
from the sampled direction to the point on the sphere. The usual
method is to simply generate a ray from the point and the sampled
direction and intersect it with the sphere. However, this intersection
test may fail due to floating point inaccuracies (e.g., if the sphere
is small and the distance from the point is large).&lt;/p&gt;

&lt;p&gt;I&apos;ve found a couple of existing ways to deal with this. As
described in the pbrt book, pbrt simply assumes that the ray just
grazes the sphere if the intersection fails, and then projects the
center of the sphere onto the ray
(&lt;a href=&quot;https://github.com/mmp/pbrt-v2/blob/master/src/shapes/sphere.cpp#L249&quot;&gt;code
here&lt;/a&gt;). mitsuba moves the origin of the ray closer to the sphere
(in fact, from within it) before doing the test (falling back to
projecting the center onto the ray if that still fails)
(&lt;a href=&quot;https://www.mitsuba-renderer.org/repos/mitsuba/files/aeb7f95b37111187cc2ddf21cfffeff118bc52d2/src/shapes/sphere.cpp#L287&quot;&gt;code
here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;However, this seems inelegant. I&apos;ve come up with a better way,
which involves converting the sampled cone angle \(θ\) (as
measured from the segment connecting the point to the sphere center)
into an angle \(α\) from the inside of the sphere, and then
simply using \(α\) and the sampled polar angle \(\varphi\) onto
the sphere. This turns out to be simple, and in my unscientific tests
a bit faster.&lt;/p&gt;

&lt;p&gt;Here&apos;s a crude diagram showing the geometry:&lt;p&gt;

&lt;img src=&quot;/sampling-visible-sphere-files/diagram.png&quot; alt=&quot;Diagram for derivation of cos &amp;alpha;&quot; /&gt;

&lt;p&gt;You can see that

\[
  L = d \cos θ - \sqrt{r^2 - d^2 \sin^2 θ}
\]

  and also by the law of cosines,

\[
  L^2 = d^2 + r^2 - 2 d r \cos α\text{.}
\]

We&apos;re actually more interested in \(\cos α\), so solving for that
  we get

\[
\cos α = \frac{d}{r} \sin^2 θ + \cos θ \sqrt{1 - \frac{d^2}{r^2} \sin^2 θ}\text{.}
\]

An alternate form, which may be easier to analyze, recalling that
\(\sin θ_{\max} = r/d\), is

\[
\cos α = \frac{\sin^2 θ}{\sin θ_{\max}} + \cos θ \sqrt{1 - \frac{\sin^2 θ}{\sin^2 θ_{\max}}}\text{.}
\]
&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;So sampling pseudocode would look like:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-c++&quot;&gt;(cos θ, φ) = uniformSampleCone(rng, cos θmax)
D = 1 - d² sin² θ / r²
if D ≤ 0 {
  cos α = sin θmax
} else {
  cos α = (d/r) sin² θ + cos θ √D
}
ω = sphericalDirection(cos α, φ)
pSurface = C + r ω&lt;/code&gt;&lt;/pre&gt;

I haven&apos;t done any analysis yet on the most robust way [in the
  floating-point sense] to do the calculations above.)&lt;/div&gt;

&lt;p&gt;There&apos;s no backfacing since we clamp \(\cos α\) to \(\sin
θ_{\max}\), which is analogous to the case when the ray from
\(P\) misses the sphere.&lt;/p&gt;

&lt;p&gt;Note that one cannot just compute \(α_{\max}\) and uniformly
sample the cone from inside the sphere, as that doesn&apos;t produce the
same distribution over the visible region as sampling the cone from
outside the sphere. To preserve correctness, you would have to use the
(uniform) PDF over the surface area of the visible portion of the
sphere, but you would have to then convert that to a PDF with respect
to projected solid angle from \(P\), which is suboptimal to just doing
the sampling with respect to projected solid angle from \(P\) as
described above.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;

</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/computing-isqrt</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/computing-isqrt"/>
    <title>Computing the Integer Square Root</title>
    <updated>2014-12-09T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
KaTeXMacros = {
  &quot;\\isqrt&quot;: &quot;\\operatorname{isqrt}&quot;,
  &quot;\\Bits&quot;: &quot;\\operatorname{Bits}&quot;,
  &quot;\\Err&quot;: &quot;\\operatorname{Err}&quot;,
  &quot;\\NewtonSqrt&quot;: &quot;\\mathrm{N{\\small EWTON}\\text{-}I{\\small SQRT}}&quot;,
};
&lt;/script&gt;

&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;https://cdn.jsdelivr.net/gh/akalin/jsbn@v1.4/jsbn2.js&quot;&gt;&lt;/script&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;1. The algorithm&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Today I&amp;rsquo;m going to talk about a fast algorithm to compute
the &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Integer_square_root&quot;&gt;integer
square root&lt;/a&gt;&lt;/em&gt; of a non-negative integer \(n\),
\(\isqrt(n) = \lfloor \sqrt{n} \rfloor\), or in words,
the greatest integer whose square is less than or equal to \(n\).&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; Most
  sources that describe the algorithm take it for granted that it is
  correct and fast. This is far from obvious! So I will prove both
  correctness and speed below.&lt;/p&gt;

&lt;p&gt;One simple fact is that \(\isqrt(n) \le n/2\), so a
  straightforward algorithm is just to test every non-negative integer
  up to \(n/2\). This takes \(O(n)\) arithmetic operations, which is bad
  since it&amp;rsquo;s exponential in the &lt;em&gt;size&lt;/em&gt; of the input. That
  is, letting \(\Bits(n)\) be the number of bits required
  to store \(n\) and letting \(\lg n\) be the base-\(2\) logarithm of
  \(n\), \(\Bits(n) = O(\lg n)\), and thus this algorithm
  takes \(O(2^{\Bits(n)})\) arithmetic operations.&lt;/p&gt;

&lt;p&gt;We can do better by doing binary search; start with the range \([0,
  n/2]\) and adjust it based on comparing the square of an integer in
  the middle of the range to \(n\). This takes \(O(\lg n) =
  O(\Bits(n))\) arithmetic operations.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;However, the algorithm below is even faster:&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;

  &lt;ol&gt;
    &lt;li&gt;If \(n = 0\), return \(0\).&lt;/li&gt;
    &lt;li&gt;Otherwise, set \(i\) to \(0\) and set \(x_0\) to \(2^{\lceil
      \Bits(n) / 2\rceil}\).&lt;/li&gt;
    &lt;li&gt;Repeat:
      &lt;ol&gt;
        &lt;li&gt;Set \(x_{i+1}\) to \(\lfloor (x_i + \lfloor n/x_i \rfloor) /
          2 \rfloor\).&lt;/li&gt;
        &lt;li&gt;If \(x_{i+1} \ge x_i\), return \(x_i\). Otherwise, increment
          \(i\).&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Call this algorithm \(\NewtonSqrt\), since it&amp;rsquo;s based
  on &lt;a href=&quot;https://en.wikipedia.org/wiki/Newton%27s_method&quot;&gt;Newton&amp;rsquo;s
  method&lt;/a&gt;. It&amp;rsquo;s not obvious, but this algorithm returns
  \(\isqrt(n)\) using only \(O(\lg \lg n) =
  O(\lg(\Bits(n)))\) arithmetic operations, as we will
  prove below. But first, here&amp;rsquo;s an implementation of the
  algorithm in Javascript:&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;



&lt;script&gt;
// isqrt returns the greatest number x such that x^2 &lt;= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
//   isqrt(new BigInteger(&quot;64&quot;)).toString()
function isqrt(n) {
  var s = n.signum();
  if (s &lt; 0) {
    throw new Error(&apos;negative radicand&apos;);
  }
  if (s == 0) {
    return n;
  }

  // x = 2^ceil(Bits(n)/2)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
  while (true) {
    // y = floor((x + floor(n/x))/2)
    var y = x.add(n.divide(x)).shiftRight(1);
    if (y.compareTo(x) &gt;= 0) {
      return x;
    }
    x = y;
  }
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// isqrt returns the greatest number x such that x^2 &amp;lt;= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
//   isqrt(new BigInteger(&amp;quot;64&amp;quot;)).toString()
function isqrt(n) {
  var s = n.signum();
  if (s &amp;lt; 0) {
    throw new Error(&amp;#39;negative radicand&amp;#39;);
  }
  if (s == 0) {
    return n;
  }

  // x = 2^ceil(Bits(n)/2)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
  while (true) {
    // y = floor((x + floor(n/x))/2)
    var y = x.add(n.divide(x)).shiftRight(1);
    if (y.compareTo(x) &amp;gt;= 0) {
      return x;
    }
    x = y;
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;2. Correctness&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;The core of the algorithm is the iteration rule:

    \[
    x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i}
    \rfloor}{2} \right\rfloor
    \]

    where
    the &lt;a href=&quot;https://en.wikipedia.org/wiki/Floor_and_ceiling_functions&quot;&gt;floor
    functions&lt;/a&gt; are there only because we&amp;rsquo;re using integer
    division. Define an integer-valued function \(f(x)\) for the right
    side. Using basic properties of the floor function, you can show that
    you can remove the inner floor:

    \[
    f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor
    \]

    which makes it a bit easier to analyze. Also, the properties of
    \(f(x)\) are closely related to its equivalent real-valued function:

    \[
    g(x) = \frac{1}{2} (x + n/x)\text{.}
    \]&lt;/p&gt;

  &lt;p&gt;For starters, again using basic properties of the floor function,
    you can show that \(f(x) \le g(x)\), and for any integer \(m\), \(m
    \le f(x)\) if and only if \(m \le g(x)\).&lt;/p&gt;

  &lt;p&gt;Finally, let&amp;rsquo;s give a name to our desired output: let \(s =
    \isqrt(n) = \lfloor \sqrt{n} \rfloor\).&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;Intuitively, \(f(x)\) and \(g(x)\) &amp;ldquo;average out&amp;rdquo;
    however far away their input \(x\) is from \(\sqrt{n}\). Conveniently,
    this &amp;ldquo;average&amp;rdquo; is never an undereestimate:

    &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 1&lt;/span&gt;.) For
    \(x \gt 0\), \(f(x) \ge s\).&lt;/div&gt;

    &lt;div class=&quot;proof&quot;&gt;
      &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; By the basic properties of
        \(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge
        s\). \(g&apos;(x) = (1 - n/x^2)/2\) and \(g&apos;&apos;(x) = n/x^3\). Therefore,
        \(g(x)\) is concave-up for \(x \gt 0\); in particular, its single
        positive extremum at \(x = \sqrt{n}\) is a minimum. But \(g(\sqrt{n})
        = \sqrt{n} \ge s\). &amp;#x220e;&lt;/p&gt;
    &lt;/div&gt;

    (You can also prove Lemma 1 without calculus; show that \(g(x) \ge
    s\) if and only if \(x^2 - 2sx + n \ge 0\), which is true when \(s^2
    \le n\), which is true by definition.)&lt;/div&gt;

  &lt;div class=&quot;p&quot;&gt;Furthermore, our initial estimate is always an overestimate:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 2&lt;/span&gt;.) \(x_0
    \gt s\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
    &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; \(\Bits(n) =
      \lfloor \lg n \rfloor + 1 \gt \lg n\). Therefore,

      \[
      \begin{aligned}
      x_0 &amp;=   2^{\lceil \Bits(n) / 2 \rceil} \\
      &amp;\ge 2^{\Bits(n) / 2} \\
      &amp;\gt 2^{\lg n / 2} \\
      &amp;= \sqrt{n} \\
      &amp;\ge s\text{.} \; \blacksquare
      \end{aligned}
      \]
    &lt;/p&gt;
  &lt;/div&gt;
  &lt;/div&gt;

  &lt;p&gt;(Note that any number greater than \(s\), say \(n\) or \(\lceil n/2
    \rceil\), can be chosen for our initial guess without affecting
    correctness. However, the expression above is necessary to guarantee
    performance. Another possibility is \(2^{\lceil \lceil \lg n \rceil /
    2 \rceil}\), which has the advantage that if \(n\) is an even power of
    \(2\), then \(x_0\) is immediately set to \(\sqrt{n}\). However, this
    is usually not worth the cost of checking that \(n\) is a power of
    \(2\), as is required to compute \(\lceil \lg n \rceil\).)&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;An easy consequence of Lemmas 1 and 2 is that the invariant \(x_i
    \ge s\) holds. That lets us prove partial correctness of
    \(\NewtonSqrt\):

    &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 1&lt;/span&gt;.) If
    \(\NewtonSqrt\) terminates, it
    returns the value \(s\).&lt;/div&gt;

    &lt;div class=&quot;proof&quot;&gt;
      &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Assume it terminates. If it
        terminates in step \(1\), then we are done. Otherwise, it can only
        terminate in step \(3.2\) where it returns \(x_i\) such that \(x_{i+1}
        = f(x_i) \ge x_i\). This implies that \(g(x_i) = (x_i + n/x_i) / 2 \ge
        x_i\). Rearranging yields \(n \ge x_i^2\) and combining with our
        invariant we get \(\sqrt{n} \ge x_i \ge s\). But \(s + 1 \gt
        \sqrt{n}\), so that forces \(x_i\) to be \(s\), and thus
        \(\NewtonSqrt\) returns \(s\) if it
        terminates. &amp;#x220e;&lt;/p&gt;
    &lt;/div&gt;

    For total correctness we also need to show that
    \(\NewtonSqrt\) terminates. But this
    is easy:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 2&lt;/span&gt;.)
    \(\NewtonSqrt\) terminates.&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Assume it doesn&amp;rsquo;t
    terminate. Then we have a strictly decreasing infinite sequence of
    integers \(\{ x_0, x_1, \dotsc \}\). But this sequence is bounded below
    by \(s\), so it cannot decrease indefinitely. This is a contradiction,
    so \(\NewtonSqrt\) must
    terminate. &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;
  &lt;/div&gt;

  &lt;p&gt;We are done proving correctness, but you might wonder if the check
    \(x_{i+1} \ge x_i\) in step \(3.2\) is necessary. That is, can it be
    weakened to the check \(x_{i+1} = x_i\)? The answer is
    &amp;ldquo;no&amp;rdquo;; to see that, let \(k = n - s^2\). Since \(n \lt
    (s+1)^2\), \(k \lt 2s + 1\). On the other hand, consider the
    inequality \(f(x_i) \gt x_i\). Since that would cause the algorithm to
    terminate and return \(x_i\), that implies that \(x_i =
    s\). Therefore, that inequality is equivalent to \(f(s) \gt s\), which
    is equivalent to \(f(s) \ge s + 1\), which is equivalent to \(g(s) =
    (s + n/s) / 2 \ge s + 1\). Rearranging yields \(n \ge s^2 +
    2s\). Substituting in \(n = s^2 + k\), we get \(s^2 + k \ge s^2 +
    2s\), which is equivalent to \(k \ge 2s\). But since \(k \lt 2s + 1\),
    that forces \(k\) to equal \(2s\). That is the maximum value \(k\) can
    be, so therefore \(n\) must be one less than a perfect square. Indeed,
    for such numbers, weakening the check would cause the algorithm to
    oscillate between \(s\) and \(s + 1\). For example, \(n = 99\) would
    yield the sequence \(\{ 16, 11, 10, 9, 10, 9, \dotsc \}\).&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;3. Run-time&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;We will show that \(\NewtonSqrt\)
    takes \(O(\lg \lg n)\) arithmetic operations. Since each loop
    iteration does only a fixed number of arithmetic operations (with the
    division of \(n\) by \(x\) being the most expensive), it suffices to
    show that our algorithm performs \(O(\lg \lg n)\) loop iterations.&lt;/p&gt;

  &lt;p&gt;It is well known that Newton&amp;rsquo;s
    method &lt;a href=&quot;https://en.wikipedia.org/wiki/Newton%27s_method#Proof_of_quadratic_convergence_for_Newton.27s_iterative_method&quot;&gt;converges
    quadratically&lt;/a&gt; sufficiently close to a simple root. We can&amp;rsquo;t
    actually use this result directly, since it&amp;rsquo;s not clear that the
    convergence properties of Newton&amp;rsquo;s method are preserved when
    using integer operations, but we can do something similar.&lt;/p&gt;

  &lt;p&gt;Define \(\Err(x) = x^2/n - 1\) and let \(ϵ_i =
    \Err(x_i)\). Intuitively, \(\Err(x)\) is a
    conveniently-scaled measure of the error of \(x\): it is less than
    \(1\) for most of the values we care about and it bounded below for
    integers greater than our target \(s\). Also, we will show that the
    \(ϵ_i\) shrink quadratically. These facts will then let us show
    our bound for the iteration count.&lt;/p&gt;

  &lt;div class=&quot;p&quot;&gt;First, let&amp;rsquo;s prove our lower bound for \(ϵ_i\):

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 3&lt;/span&gt;.) \(x_i
    \ge s + 1\) if and only if \(ϵ_i \ge 1/n\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; \(n \lt (s + 1)^2\), so \(n + 1
    \le (s + 1)^2\), and therefore \((s + 1)^2/n - 1 \ge 1/n\). But the
    expression on the left side is just \(\Err(s +
    1)\). \(x_i \ge s + 1\) if and only if \(ϵ_i \ge
    \Err(s + 1)\), so the result immediately
    follows. &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;

  Then we can use that to show that the \(ϵ_i\) shrink
    quadratically:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 4&lt;/span&gt;.) If
    \(x_i \ge s + 1\), then \(ϵ_{i+1} \lt (ϵ_i/2)^2\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; \(ϵ_{i+1}\) is just
    \(\Err(f(x_i)) \le \Err(g(x_i))\), so it
    suffices to show that \(\Err(g(x_i)) \lt
    (ϵ_i/2)^2\). Inverting \(\Err(x)\), we get that
    \(x_i = \sqrt{(ϵ_i + 1) \cdot n}\). Expressing \(g(x_i)\) in
    terms of \(ϵ_i\) we get

    \[ g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{ϵ_i +
    2}{\sqrt{ϵ_i + 1}} \right) \]

    and

    \[
    \Err(g(x_i)) = \frac{(ϵ_i/2)^2}{ϵ_i+1}\text{.}
    \]

    Therefore, it suffices to show that the denominator is greater than
    \(1\). But \(x_i \ge s + 1\) implies \(ϵ_i \gt 0\) by Lemma 3,
    so that follows immediately and the result is proved. &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;

  Then let&amp;rsquo;s bound our initial values:

  &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 5&lt;/span&gt;.) \(x_0
    \le 2s\), \(ϵ_0 \le 3\), and \(ϵ_1 \le 1\).&lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Let&amp;rsquo;s start with \(x_0\):

\[
      \begin{aligned}
      x_0 &amp;=   2^{\lceil \Bits(n) / 2 \rceil} \\
      &amp;=   2^{\lfloor (\lfloor \lg n \rfloor + 1 + 1)/2 \rfloor} \\
      &amp;=   2^{\lfloor \lg n / 2 \rfloor + 1} \\
      &amp;=   2 \cdot 2^{\lfloor \lg n / 2 \rfloor}\text{.}
      \end{aligned}
\]

      Then \(x_0/2 = 2^{\lfloor \lg n / 2 \rfloor} \le 2^{\lg n / 2} =
      \sqrt{n}\). Since \(x_0/2\) is an integer, \(x_0/2 \le \sqrt{n}\) if
      and only if \(x_0/2 \le \lfloor \sqrt{n} \rfloor = s\). Therefore,
      \(x_0 \le 2s\).&lt;/p&gt;

    &lt;p&gt;As for \(ϵ_0\):

\[
      \begin{aligned}
      ϵ_0 &amp;=   \Err(x_0) \\
      &amp;\le \Err(2s) \\
      &amp;=   (2s)^2/n - 1 \\
      &amp;=   4s^2/n - 1\text{.}
      \end{aligned}
\]

      Since \(s^2 \le n\), \(4s^2/n \le 4\) and thus \(ϵ_0 \le 3\).&lt;/p&gt;

    &lt;p&gt;Finally, \(ϵ_1\) is just
      \(\Err(f(x_0))\). Using calculations from Lemma 4,

\[
      \begin{aligned}
      ϵ_1 &amp;\le \Err(g(x_0)) \\
      &amp;=   (ϵ_0/2)^2/(ϵ_0 + 1) \\
      &amp;\le (3/2)^2/(3 + 1) \\
      &amp;=   9/16\text{.}
      \end{aligned}
\]

      Therefore, \(ϵ_1 \le 1\). &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;
  &lt;/div&gt;

  &lt;div class=&quot;p&quot;&gt;Finally, we can show our main result:

    &lt;div class=&quot;theorem&quot;&gt;(&lt;span class=&quot;theorem-name&quot;&gt;Theorem 3&lt;/span&gt;.)
      \(\NewtonSqrt\) performs \(O(\lg \lg
      n)\) loop iterations.&lt;/div&gt;

    &lt;div class=&quot;proof&quot;&gt;
    &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Let \(k\) be the number of loop
      iterations performed when running the algorithm for \(n\) (i.e., \(x_k
      \ge x_{k-1}\)) and assume \(k \ge 4\). Then \(x_i \ge s + 1\) for \(i
      \lt k - 1\). Since \(ϵ_1 \le 1\) by Lemma 5, \(ϵ_2 \le
      1/2\) and \(ϵ_i \le (ϵ_2)^{2^{i-2}}\) for \(2 \le i \lt
      k - 1\) by Lemma 4, then \(ϵ_{k-2} \le 2^{-2^{k-4}}\). But
      \(1/n \le ϵ_{k-2}\) by Lemma 3, so \(1/n \le
      2^{-2^{k-4}}\). Taking logs to bring down the \(k\) yields \(k - 4 \le
      \lg \lg n\). Then \(k \le \lg \lg n + 4\), and thus \(k = O(\lg \lg
      n)\). &amp;#x220e;&lt;/p&gt;
    &lt;/div&gt;

    Note that in general, an arithmetic operation is not constant-time,
    and in fact has run-time \(\Omega(\lg n)\). Since the most expensive
    arithmetic operation we do is division, we can say that
    \(\NewtonSqrt\) has run-time that is
    both \(\Omega(\lg n)\) and \(O(D(n) \cdot \lg \lg n)\), where \(D(n)\)
    is the run-time of dividing \(n\) by some number \(\le n\).&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
  &lt;header&gt;
    &lt;h2&gt;4. The Initial Guess&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p&gt;It&amp;rsquo;s also useful to show that if the initial guess \(x_0\) is
    bad, then the run-time degrades to \(Θ(\lg n)\). We&amp;rsquo;ll do
    this by defining the function \(\NewtonSqrt\)
    except that it takes a function \(\mathrm{I{\small
    NITIAL}\text{-}G{\small UESS}}\) that is called with \(n\) and assigned to
    \(x_0\) in step 1. Then, we can treat \(ϵ_0\) as a function of
    \(n\) and analyze how long \(ϵ_i\) stays above \(1\) to show
    that \(\NewtonSqrt\) uses an
    initial guess such that \(ϵ_0(n) = Θ(1)\), then Theorem 4
    reduces to Theorem 3 in that case. However, if \(x_0\) is chosen to be
    \(Θ(n)\), e.g. the initial guess is just \(n\) or \(n/k\) for
    some \(k\), then \(ϵ_0(n)\) will also be \(Θ(n)\), and so
    the run time will degrade to \(Θ(\lg n)\). So having a good
    initial guess is important for the performance of
    \(\NewtonSqrt\)!&lt;/p&gt;

&lt;/section&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] Aside from
    the &lt;a href=&quot;https://en.wikipedia.org/wiki/Integer_square_root&quot;&gt;Wikipedia
    article&lt;/a&gt;, the algorithm is described as Algorithm 9.2.11 in
    &lt;a href=&quot;http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827&quot;&gt;Prime
      Numbers: A Computational Perspective&lt;/a&gt;.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] Note that only integer operations are used, which makes this
    algorithm suitable for arbitrary-precision integers.
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] Go and JS implementations are available
    on &lt;a href=&quot;https://github.com/akalin/iroot&quot;&gt;my GitHub&lt;/a&gt;.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] Here, and in most of the article, we&amp;rsquo;ll
    implicitly assume that \(n \gt 0\).
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] \(D(n)\) is \(Θ(\lg^2 n)\) using long division, but
    fancier division algorithms have better run-times.
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/constant-time-mssb</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/constant-time-mssb"/>
    <title>Finding the Most Significant Set Bit of a Word in Constant Time</title>
    <updated>2014-07-03T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
// Converts the given binary string (possibly with whitespace) to an integer.
function b(s) {
  return parseInt(s.replace(/\s+/g, &apos;&apos;), 2);
}

// Converts the given integer to a binary string.
function bs(x) {
  return x.toString(2);
}
&lt;/script&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;1. Overall method&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Finding the most significant set bit of a word (equivalently, finding
the integer log base 2 of a word, or counting the leading zeros of a
word) is
a &lt;a href=&quot;https://stackoverflow.com/questions/2589096/find-most-significant-bit-left-most-that-is-set-in-a-bit-array&quot;&gt;well-studied
problem&lt;/a&gt;. &lt;a href=&quot;http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious&quot;&gt;Bit
Twiddling Hacks&lt;/a&gt; lists various methods,
and &lt;a href=&quot;https://en.wikipedia.org/wiki/Count_leading_zeros&quot;&gt;Wikipedia&lt;/a&gt;
gives the CPU instructions that perform the operation directly.&lt;/p&gt;

&lt;p&gt;However, all of these methods are either specific to a certain word
size or take more than constant time (in terms of number of word
operations). That raises the question of whether there &lt;em&gt;is&lt;/em&gt; a
method that takes constant time&amp;mdash;surprisingly, the answer is
  &amp;ldquo;yes&amp;rdquo;!&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The key idea is to split a word into \(\lceil \sqrt{w} \rceil\)
blocks of \(\lceil \sqrt{w} \rceil\) bits (where \(w\) is the number
of bits in a word). One can then do certain operations on blocks
&amp;ldquo;in parallel&amp;rdquo; by stuffing multiple blocks into a word and
then performing a single word operation.&lt;/p&gt;

&lt;p&gt;Furthermore, since the block size and block count are the same, one
can transform the bits of a block into the blocks of a word and vice
versa in various ways using only a constant number of word
operations.&lt;/p&gt;

&lt;p&gt;In particular, this lets us split up the problem into two parts:
finding the most significant set (i.e., non-zero) block, and finding
the most significant set bit within that block. It then turns out that
both parts can be done in constant time.&lt;/p&gt;

&lt;p&gt;For concreteness, we&apos;ll use 32-bit words when explaining the
  method below.&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;2. Finding the most significant set bit of a block&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;First, let&apos;s consider the sub-problem of finding the most
significant set bit of a block. In fact, let&apos;s give ourselves a bit of
room and consider only blocks with the high bit cleared for now; we&apos;ll
see why we need this extra bit of room soon.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;For 32 bits, the block size is 6 bits, so with the high bit of a
block cleared we&apos;re left with 5 bits. Let&apos;s look at a naive
implementation:



&lt;script&gt;
function mssb5_naive(x) {
  var c = 0;
  for (var i = 0; i &lt; 5 &amp;&amp; x &gt;= (1 &lt;&lt; i); ++i) {
    ++c;
  }
  return c - 1;
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;function mssb5_naive(x) {
  var c = 0;
  for (var i = 0; i &amp;lt; 5 &amp;amp;&amp;amp; x &amp;gt;= (1 &amp;lt;&amp;lt; i); ++i) {
    ++c;
  }
  return c - 1;
}&lt;/code&gt;&lt;/pre&gt;


In the above, we consider successive powers of 2 until we find one
greater than our given number. Then the answer is simply one less than
that power.&lt;/div&gt;

&lt;p&gt;Notice that the loop has at most 5 iterations; this lines up nicely
with the 5 full blocks in an entire 32-bit word. (This is why we saved
our extra bit of room.) If we can copy our block to the higher 4
blocks and then use word operations to operate on those blocks in
parallel, then we&apos;re good.&lt;/p&gt;

&lt;p&gt;For our example, let \(x = 5 = 00101\). Duplicating \(x\) among all
the blocks can easily be done by multiplying by the appropriate
constant:&lt;/p&gt;

&lt;style&gt;
pre.binary-example {
  border: 1px solid #073642; /* solarized base02 */
  background-color: #fdf6e3; /* solarized base3 */
  color: #586e75;
  padding: 1em;
}

pre.binary-example span.dont-care {
  color: #a3b1b1;
}

pre.binary-example span.last-operand {
  text-decoration: underline;
}
&lt;/style&gt;

&lt;pre class=&quot;binary-example&quot;&gt;
  &lt;span class=&quot;first-five&quot;
 &gt;00 000000 000000 000000 000000 000101&lt;/span&gt;
* &lt;span class=&quot;last-operand low-bit-full&quot;
 &gt;00 000001 000001 000001 000001 000001&lt;/span&gt;
  &lt;span class=&quot;first-five&quot;
 &gt;00 000000 000000 000000 000000 000101&lt;/span&gt;
  &lt;span class=&quot;first-five&quot;
 &gt;00 000000 000000 000000 000101&lt;/span&gt;
  &lt;span class=&quot;first-five&quot;
 &gt;00 000000 000000 000101&lt;/span&gt;
  &lt;span class=&quot;first-five&quot;
 &gt;00 000000 000101&lt;/span&gt;
  &lt;span class=&quot;first-five last-operand&quot;
 &gt;00 000101                            &lt;/span&gt;
  &lt;span class=&quot;lower-bits-full&quot;
 &gt;00 000101 000101 000101 000101 000101&lt;/span&gt;
&lt;/pre&gt;

&lt;p&gt;In fact, this is a simple use of a more general tool. If \(x\) and
\(y\) are expressed in binary, then multiplying \(x\) by \(y\) can be
seen as taking the index of each set bit in \(y\), creating a copy of
\(x\) shifted by each such index, and then adding up all the shifted
copies. This case is just taking \(y\) to be the constant where the
\(\{ 0, 6, 12, 18, 24 \}\)th bits are set.&lt;/p&gt;

&lt;p&gt;The first operation we need to parallelize is the comparisons to
the powers of 2. This can be converted to a word operation by noting
the comparison \(x \geq y\) can be performed by checking the sign of \(x
- y\), and that checking the sign can be done by setting the unused
high bit of \(x\) before doing the comparison, and then checking to
see if that high bit was left intact (i.e., not borrowed from). So we
pre-compute a constant with the \(n\)th block containing the \(n\)th
power of 2, then subtract that from our block containing the
duplicated blocks with the high bit set. Finally, we can then mask off
the unneeded lower bits:&lt;/p&gt;

&lt;pre class=&quot;binary-example&quot;&gt;
  &lt;span class=&quot;lower-bits-full&quot;
 &gt;00 000101 000101 000101 000101 000101&lt;/span&gt;
| &lt;span class=&quot;last-operand high-bit-full&quot;
 &gt;00 100000 100000 100000 100000 100000&lt;/span&gt;
  &lt;span class=&quot;full&quot;
 &gt;00 100101 100101 100101 100101 100101&lt;/span&gt;
- &lt;span class=&quot;last-operand lower-bits-full&quot;
 &gt;00 010000 001000 000100 000010 000001&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 010101 011101 100001 100011 100100&lt;/span&gt;
&amp; &lt;span class=&quot;last-operand high-bit-full&quot;
 &gt;00 100000 100000 100000 100000 100000&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 000000 000000 100000 100000 100000&lt;/span&gt;
&lt;/pre&gt;

&lt;p&gt;We&apos;re left with a word where all bits except for the high bits of a
block are zero. We still need to sum up those bits, but since they&apos;re
a block apart, that can be done by multiplication with a constant to
line up the bits in a single column. The constant turns out to have
the \(\{ 0, 6, 12, 18, 24 \}\)th bits set, with the answer being in
  the top three bits:&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;pre class=&quot;binary-example&quot;&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 000000 000000 100000 100000 100000&lt;/span&gt;
* &lt;span class=&quot;last-operand low-bit-full&quot;
 &gt;00 000001 000001 000001 000001 000001&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 000000 000000 100000 100000 100000&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 000000 100000 100000 100000&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 100000 100000 100000&lt;/span&gt;
  &lt;span class=&quot;high-bit-full&quot;
 &gt;00 100000 100000&lt;/span&gt;
  &lt;span class=&quot;high-bit-full last-operand&quot;
 &gt;00 100000                            &lt;/span&gt;
  &lt;span class=&quot;top-three&quot;
 &gt;01 100001 100001 100001 000000 100000&lt;/span&gt;

MSSB5(x) = 011 - 1 = 2
&lt;/pre&gt;

&lt;div class=&quot;p&quot;&gt;We can now write &lt;code&gt;mssb5()&lt;/code&gt; using a constant number of
  word operations:&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;



&lt;script&gt;
function mssb5(x) {
  // Duplicate x among all the blocks.
  x *= b(&apos;00 000001 000001 000001 000001 000001&apos;);

  // Compare to successive powers of 2 in parallel.
  x |= b(&apos;00 100000 100000 100000 100000 100000&apos;);
  x -= b(&apos;00 010000 001000 000100 000010 000001&apos;);
  x &amp;= b(&apos;00 100000 100000 100000 100000 100000&apos;);

  // Sum up the bits into the high 3 bits.
  x *= b(&apos;00 000001 000001 000001 000001 000001&apos;);

  // Shift down and subtract 1 to get the answer.
  return (x &gt;&gt;&gt; 29) - 1;
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;function mssb5(x) {
  // Duplicate x among all the blocks.
  x *= b(&amp;#39;00 000001 000001 000001 000001 000001&amp;#39;);

  // Compare to successive powers of 2 in parallel.
  x |= b(&amp;#39;00 100000 100000 100000 100000 100000&amp;#39;);
  x -= b(&amp;#39;00 010000 001000 000100 000010 000001&amp;#39;);
  x &amp;amp;= b(&amp;#39;00 100000 100000 100000 100000 100000&amp;#39;);

  // Sum up the bits into the high 3 bits.
  x *= b(&amp;#39;00 000001 000001 000001 000001 000001&amp;#39;);

  // Shift down and subtract 1 to get the answer.
  return (x &amp;gt;&amp;gt;&amp;gt; 29) - 1;
}&lt;/code&gt;&lt;/pre&gt;


Then we can then find the most significant set bit of a full block
by simply testing the high bit first:



&lt;script&gt;
function mssb6(x) {
  return (x &amp; b(&apos;100000&apos;)) ? 5 : mssb5(x);
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;function mssb6(x) {
  return (x &amp;amp; b(&amp;#39;100000&amp;#39;)) ? 5 : mssb5(x);
}&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;3. Finding the most significant set block&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Let&apos;s now consider the sub-problem of finding the most significant
set block of a word (ignoring the partial one). Similar to the above,
we&apos;d like to be able to use subtraction to compare all the blocks to
zero at the same time. However, that requires the high bit of each
block to be unused. That&apos;s easy enough to handle: just separate the
high bit and the lower bits of each block, test the lower bits, and
then bitwise-or the results together:&lt;/p&gt;

&lt;pre class=&quot;binary-example&quot;&gt;
   x = &lt;span class=&quot;full&quot;
    &gt;00 100000 000000 010000 000000 000001&lt;/span&gt;
&amp;  C = &lt;span class=&quot;last-operand high-bit-full&quot;
    &gt;00 100000 100000 100000 100000 100000&lt;/span&gt;
  y1 = &lt;span class=&quot;high-bit-full&quot;
    &gt;00 100000 000000 000000 000000 100000&lt;/span&gt;

   x = &lt;span class=&quot;full&quot;
    &gt;00 100000 000000 010000 000000 000001&lt;/span&gt;
&amp; ~C = &lt;span class=&quot;last-operand lower-bits-full&quot;
    &gt;00 011111 011111 011111 011111 011111&lt;/span&gt;
  t1 = &lt;span class=&quot;lower-bits-full&quot;
    &gt;00 000000 000000 010000 000000 000001&lt;/span&gt;

   C = &lt;span class=&quot;full&quot;
      &gt;00 100000 100000 100000 100000 100000&lt;/span&gt;
- t1 = &lt;span class=&quot;last-operand lower-bits-full&quot;
      &gt;00 000000 000000 010000 000000 000001&lt;/span&gt;
  t2 = &lt;span class=&quot;high-bit-full&quot;
      &gt;00 100000 100000 010000 100000 011111&lt;/span&gt;

 ~t2 = &lt;span class=&quot;high-bit-full&quot;
      &gt;11 011111 011111 101111 011111 100000&lt;/span&gt;
&amp;  C = &lt;span class=&quot;last-operand high-bit-full&quot;
      &gt;00 100000 100000 100000 100000 100000&lt;/span&gt;
  y2 = &lt;span class=&quot;high-bit-full&quot;
      &gt;00 000000 000000 100000 000000 100000&lt;/span&gt;

  y1 = &lt;span class=&quot;high-bit-full&quot;
      &gt;00 100000 000000 000000 000000 100000&lt;/span&gt;
| y2 = &lt;span class=&quot;last-operand high-bit-full&quot;
      &gt;00 000000 000000 100000 000000 100000&lt;/span&gt;
   y = &lt;span class=&quot;high-bit-full&quot;
      &gt;00 100000 000000 100000 000000 100000&lt;/span&gt;
&lt;/pre&gt;

&lt;p&gt;The result is stored in the high bits of each block. If we could
pack all the bits together, we&apos;d then be able to
use &lt;code&gt;mssb5()&lt;/code&gt;. This is similar to where we had to add all
the bits together in part 2, but we need a constant to stagger the
bits instead of lining them up. The constant to put the answer in the
high bits turns out to have the \(\{ 7, 12, 17, 22, 27 \}\)th bits
set:&lt;/p&gt;

&lt;pre class=&quot;binary-example&quot;&gt;
y &gt;&gt;&gt; 5 = &lt;span class=&quot;low-bit-full&quot;
         &gt;00 000001 000000 000001 000000 000001&lt;/span&gt;
        * &lt;span class=&quot;last-operand every-fifth-from-seventh&quot;
         &gt;00 001000 010000 100001 000010 000000&lt;/span&gt;
          &lt;span class=&quot;low-bit-full&quot;
         &gt;10 000000 000010 000000 00001&lt;/span&gt;
          &lt;span class=&quot;low-bit-full&quot;
         &gt;00 000001 000000 000001&lt;/span&gt;
          &lt;span class=&quot;low-bit-full&quot;
         &gt;00 100000 000000 1&lt;/span&gt;
          &lt;span class=&quot;low-bit-full&quot;
         &gt;00 000000 01&lt;/span&gt;
          &lt;span class=&quot;last-operand low-bit-full&quot;
         &gt;00 001                               &lt;/span&gt;
        = &lt;span class=&quot;top-five&quot;
         &gt;10 101001 010010 100001 000010 000000&lt;/span&gt;
&lt;/pre&gt;

&lt;p&gt;This yields the answer &lt;code&gt;10101&lt;/code&gt;, where the \(i\)th bit is
set exactly when the \(i\)th block of \(x\) is non-zero. Therefore,
the most significant block is then
simply &lt;code&gt;mssb5(10101)&lt;/code&gt;.&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;4. Putting it all together&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;With the building blocks above, we can now implement the algorithm
for finding the most significant set bit in the full blocks of a
  word:&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;



&lt;script&gt;
function mssb30(x) {
  var C = b(&apos;00 100000 100000 100000 100000 100000&apos;);

  // Check whether the high bit of each block is set.
  var y1 = x &amp; C;

  // Check whether the lower bits of each block is set.
  var y2 = ~(C - (x &amp; ~C)) &amp; C;

  var y = y1 | y2;

  // Shift the result bits down to the lowest 5 bits.
  var z = ((y &gt;&gt;&gt; 5) * b(&apos;0000 10000 10000 10000 10000 10000000&apos;)) &gt;&gt;&gt; 27;

  // Compute the bit index of the most significant set block.
  var b1 = 6 * mssb5(z);

  // Compute the most significant set bit inside the most significant
  // set block.
  var b2 = mssb6((x &gt;&gt;&gt; b1) &amp; b(&apos;111111&apos;));

  return b1 + b2;
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;function mssb30(x) {
  var C = b(&amp;#39;00 100000 100000 100000 100000 100000&amp;#39;);

  // Check whether the high bit of each block is set.
  var y1 = x &amp;amp; C;

  // Check whether the lower bits of each block is set.
  var y2 = ~(C - (x &amp;amp; ~C)) &amp;amp; C;

  var y = y1 | y2;

  // Shift the result bits down to the lowest 5 bits.
  var z = ((y &amp;gt;&amp;gt;&amp;gt; 5) * b(&amp;#39;0000 10000 10000 10000 10000 10000000&amp;#39;)) &amp;gt;&amp;gt;&amp;gt; 27;

  // Compute the bit index of the most significant set block.
  var b1 = 6 * mssb5(z);

  // Compute the most significant set bit inside the most significant
  // set block.
  var b2 = mssb6((x &amp;gt;&amp;gt;&amp;gt; b1) &amp;amp; b(&amp;#39;111111&amp;#39;));

  return b1 + b2;
}&lt;/code&gt;&lt;/pre&gt;


And then it&apos;s simple enough to extend it to find the most
significant set bit of a full word:



&lt;script&gt;
function mssb32(x) {
  // Check the high duplet and fall back to mssb30 if it&apos;s not set.
  var h = x &gt;&gt;&gt; 30;
  return h ? (30 + mssb5(h)) : mssb30(x);
}
&lt;/script&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;function mssb32(x) {
  // Check the high duplet and fall back to mssb30 if it&amp;#39;s not set.
  var h = x &amp;gt;&amp;gt;&amp;gt; 30;
  return h ? (30 + mssb5(h)) : mssb30(x);
}&lt;/code&gt;&lt;/pre&gt;


So the code above shows that we can find the most significant set
bit of a 32-bit word in a constant number of 32-bit word
operations. It is easy enough to see how it can be adapted to yield a
similar algorithm for a given arbitrary (but sufficiently large) word
size, simply by pre-computing the various word-size-dependent
constants.&lt;/div&gt;

&lt;p&gt;It is also easy to see why no one actually uses this method on real
  computers even in the absence of built-in instructions: it is much
  more complicated and almost certainly slower than existing methods for
  real word sizes! Also, the word-RAM model&amp;mdash;where we assume all
  word operations take constant time&amp;mdash;is useful only when the word
  size is fixed or narrowly bounded. When we allow word size to vary
  arbitrarily, the word-RAM model breaks down&amp;mdash;for one,
  multiplication grows super-linearly with respect to word size!  Alas,
  this method is doomed to remain a theoretical curiosity, albeit one
  that uses a few clever tricks.&lt;/p&gt;

&lt;script&gt;
function highlightIndices(str, indices) {
  var highlightedStr = &apos;&apos;;
  var i = 0, j = 0;
  for (var k = 0; k &lt; str.length; ++k) {
    var chStr = str[str.length - k - 1];
    if (chStr == &apos;0&apos; || chStr == &apos;1&apos;) {
      if (j &lt; indices.length &amp;&amp; i == indices[j]) {
        ++j;
      } else {
        chStr = &apos;&lt;span class=&quot;dont-care&quot;&gt;&apos; + chStr + &apos;&lt;/span&gt;&apos;;
      }
      ++i;
    }

    highlightedStr = chStr + highlightedStr;
  }
  return highlightedStr;
}

function highlightElements(selector, indices) {
  var es = document.querySelectorAll(selector);
  for (var i = 0; i &lt; es.length; ++i) {
    var e = es[i];
    e.innerHTML = highlightIndices(e.textContent, indices);
  }
}

highlightElements(&apos;pre.binary-example span.first-five&apos;, [0, 1, 2, 3, 4]);

highlightElements(&apos;pre.binary-example span.low-bit-full&apos;, [0, 6, 12, 18, 24]);

highlightElements(&apos;pre.binary-example span.every-fifth-from-seventh&apos;,
                  [7, 12, 17, 22, 27]);

highlightElements(&apos;pre.binary-example span.lower-bits-full&apos;,
                  [0, 1, 2, 3, 4,
                   6, 7, 8, 9, 10,
                   12, 13, 14, 15, 16,
                   18, 19, 20, 21, 22,
                   24, 25, 26, 27, 28]);

highlightElements(&apos;pre.binary-example span.high-bit-full&apos;, [5, 11, 17, 23, 29]);

highlightElements(&apos;pre.binary-example span.full&apos;,
                  [0, 1, 2, 3, 4, 5,
                   6, 7, 8, 9, 10, 11,
                   12, 13, 14, 15, 16, 17,
                   18, 19, 20, 21, 22, 23,
                   24, 25, 26, 27, 28, 29]);

highlightElements(&apos;pre.binary-example span.top-three&apos;, [29, 30, 31]);

highlightElements(&apos;pre.binary-example span.top-five&apos;, [27, 28, 29, 30, 31]);
&lt;/script&gt;

&lt;/section&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] The constant-time method is detailed in the original
    papers for the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fusion_tree&quot;&gt;fusion
    tree&lt;/a&gt; data
    structure. &lt;a href=&quot;http://dl.acm.org/citation.cfm?id=100217&quot;&gt;The
    first paper&lt;/a&gt; is unfortunately behind a paywall, but
    &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/0022000093900404?np=y&quot;&gt;the
      second paper&lt;/a&gt;, essentially a rehash of the first one, is
    freely downloadable.&lt;/p&gt;

  &lt;p&gt;The method is also explained in
    &lt;a href=&quot;http://courses.csail.mit.edu/6.851/spring12/lectures/L12.html&quot;&gt;lecture
      12&lt;/a&gt; of Erik
    Demaine&apos;s &lt;a href=&quot;http://courses.csail.mit.edu/6.851/spring12/&quot;&gt;Advanced
      Data Structures&lt;/a&gt; class, which is how I originally found out
    about it.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] Demaine uses 16-bit words, which factors nicely into
    4 blocks of 4 bits, but it is instructive to see how the method
    deals with the word size not a perfect square.
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] In this case, the partial 6th block has enough room
    to hold the answer but this may not be true in general. This can
    be remedied easily enough by shifting down the block high bits to
    the low bits before multiplying; the answer will then be in the
    last full block.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] &lt;code&gt;b(str)&lt;/code&gt; just parses a number from its
    binary string representation.
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] Try out this function (and the others on this page)
    by opening up the JS console on this page!
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/primality-testing-polynomial-time-part-2</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/primality-testing-polynomial-time-part-2"/>
    <title>Primality Testing in Polynomial Time (&#8545;)</title>
    <updated>2012-12-29T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script type=&quot;text/javascript&quot;
        src=&quot;https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js&quot;&gt;&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/simple-arith.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/trial-division.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/euler-phi.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/multiplicative-order.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/primality-testing.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;&lt;em&gt;(Note: this article isn&apos;t fully polished yet, but I thought it
would be a shame to let it languish during my sabbatical.  Happy new
year!)&lt;/em&gt;&lt;/p&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;5. Strengthening the AKS theorem&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;It turns out the conditions of the AKS theorem are stronger than
they appear; they themselves imply that \(n\) is prime.  To show this,
we need the following theorem, which we&apos;ll state without proof:

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;Lenstra&apos;s squarefree test&lt;/span&gt;.)  If
\(a^n \equiv a \pmod{n}\) for \(1 \le a \lt \ln^2 n\), then \(n\) is
  &lt;a href=&quot;http://en.wikipedia.org/wiki/Squarefree&quot;&gt;squarefree&lt;/a&gt;.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;

We also need a couple of lemmas:

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 1&lt;/span&gt;.)
For \(0 \le a \lt n\) and \(r \gt 1\), let

\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]

Then

\[
(a + 1)^n = a + 1 \pmod{n}\text{.}
\]
&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; By definition,
\((X + a)^n - (X^n + a) = k(X) \cdot (X^r - 1) \pmod{n}\).  Treating
both sides as a function of \(x\) and substituting in \(1\), we
immediately get \((1 + a)^n - (1 + a) = 0 \pmod{n}\). &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;Lemma 2&lt;/span&gt;.)
For \(n \gt 1\), \(\lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n\).
&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; Since \(\ln n = \frac{\lg n}{\lg
e}\) and \(e \gt 2\), \(\lg n \gt \ln n\) for \(n \gt 1\).&lt;/p&gt;

&lt;p&gt;Letting \(k = \lfloor \lg n \rfloor\), \(\ln n \lt \frac{k + 1}{\lg
e}\), so if \(\frac{k + 1}{\lg e} \lt k\), that implies that \(\ln n
\lt k\).  Solving for \(k\), we get that \(k \gt \frac{1}{\lg e -
1}\), which is true when \(n \ge 8\).&lt;/p&gt;

&lt;p&gt;So if \(n \ge 8\), then \(\ln n \lt \lfloor \lg n \rfloor\).
Checking manually, we find that \(\ln n \lt \lfloor \lg n \rfloor\)
holds also for \(n \in \{ 2, 4, 5, 6, 7 \}\), immediately implying the
lemma for all \(n \gt 1\) except \(3\).  But checking manually again,
we find that the lemma holds for \(3\) also.  &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Then, we can prove the strong version of the AKS theorem:

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;AKS theorem, strong version&lt;/span&gt;.)  Let
\(n \ge 2\), \(r\) be relatively prime to \(n\) with \(o_r(n) \gt
\lg^2 n\), and \(M \gt \sqrt{φ(r)} \lg n\).  Furthermore, let
\(n\) have no prime factor less than \(M\) and let

\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]

for \(0 \le a \lt M\).  Then \(n\) is prime.&lt;/div&gt;

&lt;div class=&quot;proof&quot;&gt;
&lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;Proof.&lt;/span&gt; From Lemma 1, we know that \(a^n
= a \pmod{n}\) for \(1 \le a \lt M\).  Since \(M \gt \lfloor \sqrt{t}
\rfloor \lg n \gt \lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n\) by
Lemma 2, we can apply Lenstra&apos;s squarefree test to show that \(n\) is
squarefree.  From the weak version of the AKS theorem, we also know
that \(n\) is a prime power.  But since \(n\) is squarefree, it must
have distinct prime factors, which immediately implies that \(n\) is
prime. &amp;#x220e;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;6. Finding a suitable \(r\)&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;The only remaining loose end is to show that there exists an \(r\)
  with \(o_r(n) \gt \lg^2 n\) and that it&apos;s small enough (i.e., polylog
  in \(n\)).  The existence of \(r\) is easy to see; we can simply pick
  the smallest \(r\) that is co-prime to \(n\) and greater than
  \(n^{\lg^2 n}\).  But that&apos;s obviously too big.  We can do better:

  &lt;div class=&quot;theorem&quot;&gt;
    &lt;span class=&quot;theorem-name&quot;&gt;(Upper bound for \(r\).)&lt;/span&gt; Let \(n \ge 2\).
    Then there exists some \(r \le \max(3, \lceil \lg n \rceil^5)\) such
    that \(o_r(n) \gt \lceil \lg n \rceil^2\).&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;
  &lt;/div&gt;

  &lt;div class=&quot;proof&quot;&gt;
    &lt;div class=&quot;p&quot;&gt;&lt;span class=&quot;proof-name&quot;&gt;(Proof.)&lt;/span&gt; Let&apos;s
      first prove the following lemma:

      &lt;div class=&quot;theorem&quot;&gt;
        &lt;span class=&quot;theorem-name&quot;&gt;(Lemma 3.)&lt;/span&gt; Let \(n \ge 9\) and \(b =
        \lceil \lg n \rceil\).  Then for \(m \ge 1\), there exists some \(r
        \le b^{2m + 1}\) such that \(o_r(n) \gt b^m\).
      &lt;/div&gt;

      &lt;div class=&quot;proof&quot;&gt;
        &lt;p&gt;&lt;span class=&quot;proof-name&quot;&gt;(Proof.)&lt;/span&gt; Let

          \[
            N = n \cdot (n - 1) \cdot (n^2 - 1) \dotsm (n^{b^m} - 1)\text{.}
          \]

          Note that \(r\) divides \(N\) if and only if \(o_r(n) \le b^m\).  So
          it suffices to find some \(r\) that does not divide \(N\).&lt;/p&gt;

        &lt;p&gt;We can see that:

          \[
          \begin{aligned}
            N &amp;= n \cdot (n - 1) \cdot (n^2 - 1) \dotsm (n^{b^m} - 1) \\
            &amp;\lt n \cdot n \cdot n^2 \dotsm n^{b^m} \\
            &amp;= n^{1 + 1 + 2 + 3 + \dotsm + b^m} \\
            &amp;= n^{1 + b^m (b^m + 1) / 2} \\
            &amp;= n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1}\text{.}
          \end{aligned}
          \]

          Furthermore, we can upper-bound the exponent of \(n\):

          \[
          \begin{aligned}
            b^{2m} &amp;\gt \frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1 \\
            \frac{1}{2} b^{2m} - \frac{1}{2} b^m - 1 &amp;\gt 0 \\
            b^{2m} - b^m - 2 &amp;\gt 0 \\
            (b^m - 2) \cdot (b^m + 1) &amp;\gt 0\text{.}
          \end{aligned}
          \]

          The last statement holds when \(b^m \gt 2\), which is always since \(b
          \ge 4\) and \(m \ge 1\).&lt;/p&gt;

        &lt;p&gt;Applying the upper bound,

          \[
          \begin{aligned}
            N &amp;\lt n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1} \\
            &amp;\lt n^{b^{2m}} \\
            &amp;\le 2^{b^{2m + 1}}\text{.}
          \end{aligned}
          \]
        &lt;/p&gt;

        &lt;div class=&quot;p&quot;&gt;We can then use the following theorem, which
we&apos;ll state without proof:

          &lt;div class=&quot;theorem&quot;&gt;
            &lt;span class=&quot;theorem-name&quot;&gt;(&lt;a href=&quot;http://en.wikipedia.org/wiki/Primorial&quot;&gt;Primorial&lt;/a&gt;
              lower bound.)&lt;/span&gt; For \(x \ge 31\), the product of primes \(\le x\)
            exceeds \(2^x\).&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;  That is,

            \[
              x\# = \prod_{p \le x\text{, }p\text{ is prime}} p \gt 2^x\text{.}
            \]
          &lt;/div&gt;

          &lt;p&gt;Since \(b \ge 4\) and \(m \ge 1\), \(b^{2m + 1} \ge 31\), and so
            \(2^{b^{2m + 1}} \lt (b^{2m + 1})\#\).  Therefore,

            \[
            N \lt 2^{b^{2m + 1}} \lt (b^{2m + 1})\#\text{.}
            \]

            But that implies that there is some prime number \(p_0 \le b^{2m +
            1}\) that does not divide \(N\); if they all did, then \(N\) would be
            at least their product \((b^{2m + 1})\#\), contradicting the
            inequality above.  Therefore, \(o_{p_0}(n) \gt b^m\). &amp;#x220e;&lt;/p&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;

    &lt;p&gt;We can then prove our theorem: for \(n \ge 9\), apply Lemma 3 with
      \(m = 2\).  Here are explicit values for the rest: for \(n = 2\), \(r
      = 3\); \(n = 3\), \(r = 7\); \(n \in \{ 4, 6, 7, 8\}\), \(r = 11\);
      and for \(n = 5\), \(r = 17\). &amp;#x220e;&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Also, it turns out that about half the time, we can do better.
We&apos;ll state this theorem without proof:

  &lt;div class=&quot;theorem&quot;&gt;&lt;span class=&quot;theorem-name&quot;&gt;(Tight upper bound for
    some \(r\).)&lt;/span&gt; Let \(n \equiv \pm 3 \pmod{8}\).  Then there
    exists some \(r \lt 8 \lceil \lg n \rceil^2\) such that \(o_r(n) \gt
    \lceil \lg n \rceil^2\).&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;7. The AKS algorithm (simple version)&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;Without further ado, here is a simple version of the AKS
  algorithm, given \(n \ge 2\):

  &lt;ol&gt;
    &lt;li&gt;Starting from \(\lceil \lg n \rceil^2 + 2\), find an \(r\) such
      that \(\gcd(r, n) = 1\) and \(o_r(n) \gt \lceil \lg n
      \rceil^2\).&lt;/li&gt;
    &lt;li&gt;Compute \(M = \lfloor \sqrt{r - 1} \rfloor \lceil \lg n
      \rceil + 1\).&lt;/li&gt;
    &lt;li&gt;Search for a prime factor of \(n\) less than \(M\).  If one is
      found, return &amp;ldquo;composite&amp;rdquo;.  If none are found and \(M \ge
      \lfloor \sqrt{n} \rfloor\), return &amp;ldquo;prime&amp;rdquo;.&lt;/li&gt;
    &lt;li&gt;For each \(1 \le a \lt M\), compute \((X + a)^n\), reducing
      coefficients mod \(n\) and powers mod \(r\).  If the result is not
      equal to \(X^{n\text{ mod }r} + a\), return
      &amp;ldquo;composite&amp;rdquo;.&lt;/li&gt;
    &lt;li&gt;Otherwise, return &amp;ldquo;prime&amp;rdquo;.&lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;As we&apos;ve showed in the previous section, there always exists an
\(r\) such that \(o_r(n) \gt \lceil \lg n \rceil^2\), so step 1 will
terminate.  All other steps are bounded, so the entire algorithm will
always terminate.&lt;/p&gt;

&lt;p&gt;In step 2, since \(φ(r) \le r - 1\), the value of \(M\) that
we compute is always greater than \(\sqrt{φ(r)} \lceil \lg n
\rceil\).  Step 4 checks if \((X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\) holds.  Therefore, By the strong AKS theorem, if the algorithm
returns &amp;ldquo;prime&amp;rdquo;, then \(n\) is prime.  Furthermore, by the
weak version of Fermat&apos;s little theorem for polynomials, if the
algorithm returns &amp;ldquo;composite&amp;rdquo;, then \(n\) is
composite.&lt;/p&gt;

&lt;p&gt;Since the algorithm always terminates and it returns the correct
answer when it terminates, it
is &lt;a href=&quot;http://en.wikipedia.org/wiki/Total_correctness&quot;&gt;totally
correct&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As shown in the previous section, we have to test \(O(\lg^5 n)\)
values to find a suitable \(r\).  Assuming a straightforward algorithm
to compute the multiplicative order that bails out once \(\lfloor \lg
n \rfloor^2\) is reached, and assuming we use the
division-based &lt;a href=&quot;http://en.wikipedia.org/wiki/Euclidean_algorithm&quot;&gt;Euclidean
algorithm&lt;/a&gt; for computing the greatest common divisor, testing each
value takes \(O(\lg^2 n)\) multiplies and \(O(\lg r) = O(\lg \lg n)\)
divisions of \(O(\lg r)\)-bit numbers.  Let \(M(b)\) be the cost to
multiply two \(b\)-bit numbers.  The complexity of division is
asymptotically the same as multiplication, so the total cost of step 1
is \(O(\lg^5 n \cdot (\lg^2 n + \lg \lg n) \cdot M(\lg \lg n)) =
O(\lg^7 n \cdot M(\lg \lg n))\), assuming \(M(O(b)) = O(M(b))\).&lt;/p&gt;

&lt;p&gt;Step 2 involves one square root, one multiplication, and one
increment, all involving \(O(\lg \lg n)\)-bit numbers.  The complexity
of taking the square root is asymptotically the same as
multiplication, so the total cost of step 2 is \(O(M(\lg \lg n))\).&lt;/p&gt;

&lt;p&gt;Step 3 takes a square root and tests \(M = O(\lg^{7/2} n)\)
numbers, and each test involves dividing two \(O(\lg M)\)-bit numbers,
so the total cost of step 3 is \(O(\lg^{7/2} n \cdot M(\lg \lg
n))\).&lt;/p&gt;

&lt;p&gt;Steps 4 and 5 test \(O(\lg^{7/2} n)\) polynomials.  Testing each
polynomial involves exponentiating it by \(n\), reducing power mod
\(r\) and coefficients mod \(n\) at each step, which requires \(O(\lg
n)\) multiplications of polynomials with \(O(r)\) coefficients each of
size \(O(\lg n)\).  The cost of multiplying two polynomials with \(s\)
coefficients of size \(b\) is \(M(s) \cdot M(b)\), so the total cost
of steps 4 and 5 is \(O(\lg^{9/2} n \cdot M(\lg^5 n \cdot \lg \lg
n))\), assuming \(M(a) \cdot M(b) = M(a \cdot b)\).&lt;/p&gt;

&lt;p&gt;If &lt;a href=&quot;http://en.wikipedia.org/wiki/Multiplication_algorithm#Long_multiplication&quot;&gt;long
multiplication&lt;/a&gt; is used, then it costs \(M(b) = b^2\), which gives
a total cost of \(O(\lg^{29/2} n \cdot \lg^2 \lg n) = O(\lg^{15} n)\)
for the whole
algorithm.  &lt;a href=&quot;http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm&quot;&gt;More
complicated multiplication methods&lt;/a&gt; cost only \(M(b) = b \lg b\),
which gives a total cost of \(O(\lg^{10} n)\) for the whole algorithm.
Either way, the AKS primality test is shown to be implementable in
polynomial time.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Below is step 1 implemented in Javascript; however, here we bound
  \(r\) explicitly to be able to detect bugs quickly.&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns an upper bound for r such that o_r(n) &gt; ceil(lg(n))^2 that
// is polylog in n.
function calculateAKSModulusUpperBound(n) {
  n = SNat.cast(n);
  var ceilLgN = new SNat(n.ceilLg());
  var rUpperBound = ceilLgN.pow(5).max(3);
  var nMod8 = n.mod(8);
  if (nMod8.eq(3) || nMod8.eq(5)) {
    rUpperBound = rUpperBound.min(ceilLgN.pow(2).times(8));
  }
  return rUpperBound;
}

// Returns the least r such that o_r(n) &gt; ceil(lg(n))^2 &gt;= ceil(lg(n)^2).
function calculateAKSModulus(n, multiplicativeOrderCalculator) {
  n = SNat.cast(n);
  multiplicativeOrderCalculator =
    multiplicativeOrderCalculator || calculateMultiplicativeOrderCRT;

  var ceilLgN = new SNat(n.ceilLg());
  var ceilLgNSq = ceilLgN.pow(2);
  var rLowerBound = ceilLgNSq.plus(2);
  var rUpperBound = calculateAKSModulusUpperBound(n);

  for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) {
    if (n.gcd(r).ne(1)) {
      continue;
    }
    var o = multiplicativeOrderCalculator(n, r);
    if (o.gt(ceilLgNSq)) {
      return r;
    }
  }

  throw new Error(&apos;Could not find AKS modulus&apos;);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Here is step 2 implemented in Javascript:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns floor(sqrt(r-1)) * ceil(lg(n)) + 1 &gt; floor(sqrt(Phi(r))) * lg(n).
function calculateAKSUpperBoundSimple(n, r) {
  n = SNat.cast(n);
  r = SNat.cast(r);

  // Use r - 1 instead of calculating Phi(r).
  return r.minus(1).floorRoot(2).times(n.ceilLg()).plus(1);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Here is part of step 3 implemented in Javascript, along with the
comments for the functions used in trial division:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Given a number n, a generator function getNextDivisor, and a
// processing function processPrimeFactor, factors n using the
// divisors returned by genNextDivisor and passes each prime factor
// with its multiplicity to processPrimeFactor.
//
// getNextDivisor is passed the current unfactorized part of n and it
// should return the next divisor to try, or null if there are no more
// divisors to generate (although processPrimeFactor may still be
// called).  processPrimeFactor is called with each non-trivial prime
// factor and its multiplicity.  If it returns a false value, it won&apos;t
// be called anymore.
function trialDivide(n, getNextDivisor, processPrimeFactor) {
  ...
}

// Returns a generator that generates primes up to 7, then odd numbers
// up to floor(sqrt(n)), using a mod-30 wheel to eliminate odd numbers
// that are known composite (roughly half).
function makeMod30WheelDivisorGenerator() {
  ...
}

// Returns the first factor of n &amp;lt; m from generator, or null if there
// is no such factor.
function getFirstFactorBelow(n, M, generator) {
  n = SNat.cast(n);
  M = SNat.cast(M);
  generator = generator || makeMod30WheelDivisorGenerator();

  var boundedGenerator = function(n) {
    var d = generator(n);
    return (d &amp;&amp; d.lt(M)) ? d : null;
  };
  var factor = null;
  trialDivide(n, boundedGenerator, function(p, k) {
    if (p.lt(M.min(n))) {
      factor = p;
    }
    return false;
  });
  return factor;
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Below is a function that ties steps 1 to 3 together; it is useful
for testing purposes to separate it from the other steps.  (Actually,
we use a different function to compute \(M\) which computes
\(φ(r)\) instead of using \(r - 1\) so that we always have the
tightest bound possible for \(M\).)

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// The getAKSParameters* functions below return a parameters object
// with the following fields:
//
//   n: the number the parameters are for.
//
//   factor: A prime factor of n.  If present, the fields below may
//           not be present.
//
//   isPrime: if set, n is prime.  If present, the fields below may
//            not be present.
//
//   r: the AKS modulus for n.
//
//   M: the AKS upper bound for n.

function getAKSParametersSimple(n) {
  n = SNat.cast(n);

  var r = calculateAKSModulus(n);
  var M = calculateAKSUpperBound(n, r);
  var parameters = {
    n: n,
    r: r,
    M: M
  };

  var factor = getFirstFactorBelow(n, M);
  if (factor) {
    parameters.factor = factor;
  } else if (M.gt(n.floorRoot(2))) {
    parameters.isPrime = true;
  }

  return parameters;
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Finally, here is step 4 implemented in Javascript:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns whether (X + a)^n = X^n + a mod (X^r - 1, n).
function isAKSWitness(n, r, a) {
  n = SNat.cast(n);
  r = SNat.cast(r);
  a = SNat.cast(a);

  function reduceAKS(p) {
    return p.modPow(r).mod(n);
  }

  function prodAKS(x, y) {
    return reduceAKS(x.times(y));
  };

  var one = new SPoly(new SNat(1));
  var xn = one.shiftLeft(n.mod(r));
  var ap = new SPoly(a);
  var lhs = one.shiftLeft(1).plus(ap).pow(n, prodAKS);
  var rhs = reduceAKS(one.shiftLeft(n).plus(ap));
  return lhs.ne(rhs);
}

// Returns the first a &amp;lt; M that is an AKS witness for n, or null if
// there isn&apos;t one.
function getFirstAKSWitness(n, r, M) {
  for (var a = new SNat(1); a.lt(M); a = a.plus(1)) {
    if (isAKSWitness(n, r, a)) {
      return a;
    }
  }
  return null;
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Here&apos;s the code that ties it all together:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns whether n is prime or not using the AKS primality test.
function isPrimeByAKS(n) {
  n = SNat.cast(n);

  var parameters = getAKSParameters(n);
  if (parameters.factor) {
    return false;
  }
  if (parameters.isPrime) {
    return true;
  }
  return (getFirstAKSWitness(n, parameters.r, parameters.M) == null);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p class=&quot;interactive-example&quot; id=&quot;aksExample&quot;&gt;
  Let
  &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt; =
  &lt;input class=&quot;parameter&quot; size=&quot;6&quot; pattern=&quot;[0-9]*&quot; required
         type=&quot;text&quot; value=&quot;175507&quot;
         data-bind=&quot;value: nStr, valueUpdate: &apos;afterkeydown&apos;&quot; /&gt;&lt;/span&gt;.
  &lt;!-- ko template: outputTemplate --&gt;&lt;!-- /ko --&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;aks.error.invalidN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is not a valid number.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;aks.error.outOfBoundsN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
    must be greater than or equal to 2.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;aks.success&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&amp;lceil;lg &lt;var&gt;n&lt;/var&gt;&amp;rceil;&lt;/span&gt;&lt;/span&gt; is
    &lt;span class=&quot;fake-katex intermediate&quot; data-bind=&quot;text: ceilLgN&quot;&gt;&lt;/span&gt;,
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;r&lt;/var&gt; =
      &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: r&quot;&gt;&lt;/span&gt;&lt;/span&gt;
    is the least value such that
    &lt;span class=&quot;fake-katex&quot;&gt;o&lt;sub&gt;&lt;var&gt;r&lt;/var&gt;&lt;/sub&gt;(&lt;var&gt;n&lt;/var&gt;) =
      &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: nOrder&quot;&gt;&lt;/span&gt;
      &amp;gt; &amp;lceil;lg &lt;var&gt;n&lt;/var&gt;&amp;rceil;&lt;sup&gt;2&lt;/sup&gt;
      = &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: ceilLgNSq&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;&amp;phi;&lt;/var&gt;(&lt;var&gt;r&lt;/var&gt;) =
      &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: eulerPhiR&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
    and &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;M&lt;/var&gt; =
      &amp;lfloor;&amp;radic;&lt;var&gt;&amp;phi;&lt;/var&gt;(&lt;var&gt;r&lt;/var&gt;)&amp;rfloor; &amp;sdot;
      &amp;lceil;lg &lt;var&gt;n&lt;/var&gt;&amp;rceil; + 1 =
      &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: M&quot;&gt;&lt;/span&gt; &amp;gt;
    &amp;lfloor;&amp;radic;&lt;var&gt;&amp;phi;&lt;/var&gt;(&lt;var&gt;r&lt;/var&gt;)&amp;rfloor; &amp;sdot;
    lg &lt;var&gt;n&lt;/var&gt;&lt;/span&gt;.

    &lt;span data-bind=&quot;if: factor()&quot;&gt;
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
      has a factor
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;span class=&quot;intermediate&quot;
                                        data-bind=&quot;text: factor&quot;&gt;&lt;/span&gt;
      &amp;lt; &lt;var&gt;M&lt;/var&gt;&lt;/span&gt;, so therefore
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is
      &lt;span class=&quot;result&quot;&gt;composite&lt;/span&gt;.
    &lt;/span&gt;

    &lt;span data-bind=&quot;if: isPrime()&quot;&gt;
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
      has no factor &lt;span class=&quot;fake-katex&quot;&gt;&amp;lt; &lt;var&gt;M&lt;/var&gt;&lt;/span&gt;
      and &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;M&lt;/var&gt; &amp;le;
       &amp;lfloor;&amp;radic;&lt;var&gt;n&lt;/var&gt;&amp;rfloor; =
       &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: floorSqrtN&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
      so therefore
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is
      &lt;span class=&quot;result&quot;&gt;prime&lt;/span&gt;.
    &lt;/span&gt;

    &lt;span data-bind=&quot;if: !factor() &amp;&amp; !isPrime()&quot;&gt;
      &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
      has no factor &lt;span class=&quot;fake-katex&quot;&gt;&amp;lt; &lt;var&gt;M&lt;/var&gt;&lt;/span&gt;
      and &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;M&lt;/var&gt; &amp;gt;
       &amp;lfloor;&amp;radic;&lt;var&gt;n&lt;/var&gt;&amp;rfloor; =
       &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: floorSqrtN&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
      so &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is prime iff
      &lt;span class=&quot;fake-katex&quot;&gt;(&lt;var&gt;X&lt;/var&gt; +
        &lt;var&gt;a&lt;/var&gt;)&lt;sup&gt;&lt;var&gt;n&lt;/var&gt;&lt;/sup&gt;
        &amp;equiv; &lt;var&gt;X&lt;/var&gt;&lt;sup&gt;&lt;var&gt;n&lt;/var&gt;&lt;/sup&gt; + &lt;var&gt;a&lt;/var&gt;
        (mod &lt;var&gt;X&lt;/var&gt;&lt;sup&gt;&lt;var&gt;r&lt;/var&gt;&lt;/sup&gt; &amp;minus; 1,
        &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt; for
      &lt;span class=&quot;fake-katex&quot;&gt;0 &amp;le; &lt;var&gt;a&lt;/var&gt;
	&amp;le; &lt;var&gt;M&lt;/var&gt;&lt;/span&gt;.
    &lt;/span&gt;
  &lt;/script&gt;
&lt;/p&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;/primality-testing-polynomial-time-part-2-files/aks-example.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;&lt;em&gt;(To-do: Have an interactive box to demonstrate how the
    per-\(a\) AKS test works.)&lt;/em&gt;&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;8. The AKS algorithm (improved version)&lt;/h2&gt;
&lt;/header&gt;

&lt;div class=&quot;p&quot;&gt;Here is a slightly more complicated version of the AKS algorithm.
Again given \(n \ge 2\):

  &lt;ol&gt;
    &lt;li&gt;Search for a prime factor of \(n\) less than \(\lceil \lg n
      \rceil^2 + 2\).  If one is found, return &amp;ldquo;composite&amp;rdquo;.&lt;/li&gt;
    &lt;li&gt;For each \(r\) from \(\lceil \lg n \rceil^2 + 2\):
      &lt;ol&gt;
        &lt;li&gt;If \(r \gt \lfloor \sqrt{n} \rfloor\), return
          &amp;ldquo;prime&amp;rdquo;.&lt;/li&gt;
        &lt;li&gt;If \(r\) divides \(n\), return &amp;ldquo;composite&amp;rdquo;.&lt;/li&gt;
        &lt;li&gt;Otherwise, factorize \(r\).&lt;/li&gt;
        &lt;li&gt;Compute \(o_r(n)\) using \(r\)&apos;s prime factors.  If it is less
          than or equal to \(\lceil \lg n \rceil^2\), jump back to the top of
          the loop with the next \(r\).&lt;/li&gt;
        &lt;li&gt;Otherwise, compute \(φ(r)\) using \(r\)&apos;s prime factors.&lt;/li&gt;
        &lt;li&gt;Compute \(M = \lfloor \sqrt{φ(r)} \rfloor \lceil \lg n
          \rceil + 1\), and break out of the loop.&lt;/li&gt;
      &lt;/ol&gt;
    &lt;/li&gt;
    &lt;li&gt;For each \(1 \le a \lt M\), compute \((X + a)^n\), reducing
      coefficients mod \(n\) and powers mod \(r\).  If the result is not
      equal to \(X^{n\text{ mod }r} + a\), return
      &amp;ldquo;composite&amp;rdquo;.&lt;/li&gt;
    &lt;li&gt;Otherwise, return &amp;ldquo;prime&amp;rdquo;.&lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;The logic of steps 1 to 3 of the simple version is essentially
merged together to form steps 1 and 2 of this version; since each
\(r\) has to be checked for co-primality with \(n\), that effectively
also checks if \(r\) is a prime factor of \(n\), so we only have to
check for prime factors of \(n\) up to the lower bound of \(r\).
Furthermore, both the multiplicative order as well as the totient
function can be computed more quickly given a complete prime
factorization, so we can compute that for each \(r\).  Third, we use
\(φ(r)\) instead of \(r - 1\) to give a tighter bound for \(M\).
Finally, the last two steps are the same as in the simple version.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Here are steps 1 and 2 of the above algorithm, implemented in
Javascript:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;langauge-javascript&quot;&gt;function getAKSParameters(n, factorizer) {
  n = SNat.cast(n);
  factorizer = factorizer || defaultFactorizer;

  var ceilLgN = new SNat(n.ceilLg());
  var ceilLgNSq = ceilLgN.pow(2);
  var floorSqrtN = n.floorRoot(2);

  var rLowerBound = ceilLgNSq.plus(2);
  var rUpperBound = calculateAKSModulusUpperBound(n).min(floorSqrtN);

  var parameters = {
    n: n
  };

  var factor = getFirstFactorBelow(n, rLowerBound);
  if (factor) {
    parameters.factor = factor;
    return parameters;
  }

  for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) {
    if (n.mod(r).isZero()) {
      parameters.factor = d;
      return parameters;
    }

    var rFactors = getFactors(r, factorizer);
    var o = calculateMultiplicativeOrderCRTFactors(n, rFactors, factorizer);
    if (o.gt(ceilLgNSq)) {
      parameters.r = r;
      parameters.M = calculateAKSUpperBoundFactors(n, rFactors);
      return parameters;
    }
  }

  if (rUpperBound.eq(floorSqrtN)) {
    parameters.isPrime = true;
    return parameters;
  }

  throw new Error(&apos;Could not find AKS modulus&apos;);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;/section&gt;

&lt;p&gt;&lt;em&gt;(To-do: Wrap up and lead into what will be shown in part
  3.)&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] This is a version of Theorem 2 from Lenstra&apos;s
    paper &lt;a href=&quot;http://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1979e/art.pdf&quot;&gt;Miller&apos;s
    Primality Test&lt;/a&gt;.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] We work with \(\lceil \lg n \rceil^2\) instead of
    \(\lceil \lg^2 n \rceil\) or \(\lg^2 n\) as it&apos;s easier to work
    with in an actual implementation.
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] This is exercise 1.27
    from &lt;a href=&quot;http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827&quot;&gt;Prime
    Numbers: A Computational Perspective&lt;/a&gt;.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] This is an adapted from section 8.4 of Granville&apos;s &lt;a href=&quot;http://www.dms.umontreal.ca/~andrew/PDF/Bulletin04.pdf&quot;&gt;It
    is Easy to Determine Whether a Given Number is Prime&lt;/a&gt;.
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] The &lt;a href=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/simple-arith.js&quot;&gt;&lt;code&gt;SNat&lt;/code&gt;&lt;/a&gt;
    class used is the same as in my previous
    article, &lt;a href=&quot;intro-primality-testing&quot;&gt;An Introduction to
    Primality Testing&lt;/a&gt;.
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/primality-testing-polynomial-time-part-1</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/primality-testing-polynomial-time-part-1"/>
    <title>Primality Testing in Polynomial Time (&#8544;)</title>
    <updated>2012-08-06T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;1. Introduction&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Exactly ten years
ago, &lt;a href=&quot;http://www.cse.iitk.ac.in/users/manindra/&quot;&gt;Agrawal&lt;/a&gt;,
&lt;a href=&quot;http://research.microsoft.com/en-us/people/neeraka/&quot;&gt;Kayal&lt;/a&gt;,
and &lt;a href=&quot;http://www.math.uni-bonn.de/people/saxena/&quot;&gt;Saxena&lt;/a&gt;
published &lt;a href=&quot;http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf&quot;&gt;&amp;ldquo;PRIMES
is in P&amp;rdquo;&lt;/a&gt;, which described an algorithm that could provably
determine whether a given number was prime or composite in polynomial
time.&lt;/p&gt;

&lt;p&gt;The AKS algorithm is quite short, but understanding how it works
via the proofs in the paper requires some mathematical sophistication.
Also, some results in the last decade have simplified both the
algorithm and its accompanying proofs.  In this article I will explain
in detail the main result of the AKS paper, and in a follow-up article
I will strengthen the main result, use it to get a polynomial-time
primality testing algorithm, and implement that algorithm in
Javascript.  If you&apos;ve
understood &lt;a href=&quot;/intro-primality-testing&quot;&gt;my introduction to
primality testing&lt;/a&gt;, you should be able to follow along.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Let&apos;s get started! The basis for the AKS primality test is the
  following generalization
  of &lt;a href=&quot;http://en.wikipedia.org/wiki/Fermat%27s_little_theorem&quot;&gt;Fermat&apos;s
  little theorem&lt;/a&gt; to polynomials:

  &lt;div class=&quot;theorem&quot;&gt;
    (&lt;span class=&quot;theorem-name&quot;&gt;Fermat&apos;s little theorem for polynomials,
    strong version&lt;/span&gt;.) If \(n \ge 2\) and \(a\) is relatively prime
    to \(n\), then \(n\) is prime if and only if

    \[
      (X + a)^n \equiv X^n + a \pmod{n}\text{.}
    \]
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The form of the equation above may be unfamiliar.  The polynomials
in question
are &lt;a href=&quot;http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_K.5BX.5D&quot;&gt;&lt;em&gt;formal
polynomials&lt;/em&gt;&lt;/a&gt;.  That is, we care only about the coefficients of
the polynomial and not how it behaves as a function.  In this case, we
restrict ourselves to polynomials with integer coefficients.  Then we
can meaningfully compare two polynomials modulo \(n\): we consider two
polynomials congruent modulo \(n\) if their respective coefficients
are all congruent modulo \(n\).  (Equivalently, two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\) if \(f(X) - g(X) = n
\cdot h(X)\) for some polynomial \(h(X)\).)  This definition is
consistent with how they behave as functions; if two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\), then treating them
as functions, \(f(x)\ \equiv g(x) \pmod{n}\) for any integer
  \(x\).&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Unfortunately, this test by itself cannot give a polynomial-time
algorithm as testing even one value of \(a\) may require looking at
\(n\) coefficients of the left-hand side.  (Remember that we&apos;re
interested in algorithms with time polynomial not in the input \(n\),
but in its bit length \(\lg n\).  Such an algorithm is described as
having time &lt;em&gt;polylog in \(n\)&lt;/em&gt;.)  However, we can reduce the
number of coefficients we have to look at by taking the powers of
\(X\) modulo some number \(r\).  This is equivalent to taking the
modulo of the polynomials themselves by \(X^r - 1\); you can see this
for yourself by picking some polynomial and some value for \(r\) and
doing long division by \(X^r - 1\) to find the remainder.  (It may
seem weird to talk about taking the modulo of one polynomial with
another, but it&apos;s entirely analogous to integers.)  This gives us a
weaker version of the theorem above:

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;Fermat&apos;s little theorem for polynomials,
weak version&lt;/span&gt;.)  If \(n\) is prime and \(a\) is not a multiple
of \(n\), then for any \(r \ge 2\)

\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
  \]&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The &amp;ldquo;double mod&amp;rdquo; notation above may be unfamiliar, but
in this case its meaning is simple.  We consider two polynomials
congruent modulo \(X^r - 1, n\) when they are congruent modulo \(n\)
after you reduce the powers of \(X\) modulo \(r\) and combine like
terms.  More generally, two polynomials \(f(X)\) and \(g(X)\) are
congruent modulo \(n(X), n\) if \(f(X) - g(X) \equiv n(X) \cdot h(X)
\pmod{n}\) for some polynomial \(h(X)\).&lt;/p&gt;

&lt;!-- TODO(akalin): Put interactive applet for the condition here. --&gt;

&lt;p&gt;With this theorem, we only have to compare \(r\) coefficients, but
we introduce the possibility of the condition above being met even
when \(n\) is composite.  But can we impose conditions on \(r\) and
\(a\) such that if the condition holds for a polynomial number of
pairs of \(r\) and \(a\), we can be sure that \(n\) is prime?  The
answer is &amp;ldquo;yes&amp;rdquo;; in particular, we can find a single \(r\)
and an upper bound \(M\) polylog in \(n\) such that if the condition
holds for \(r\) and \(0 \le a \lt M\), then \(n\) is prime.&lt;/p&gt;

&lt;p&gt;In the remainder of this article, we&apos;ll work backwards.  That is,
we&apos;ll first assume we have some \(n \ge 2\), \(r \ge 2\), and \(M \ge
1\) such that for all \(0 \le a \lt M\)

\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]

Then we&apos;ll assume that \(n\) is not a power of one of its prime
divisors \(p\) and try to deduce the conditions that imposes on \(n\),
\(r\), \(M\), and \(p\).  Then we can take the contrapositive to find
the inverse conditions on \(n\), \(r\), \(M\), and \(p\) that would
then force \(n\) to be a power of \(p\).  Since we can easily test
whether \(n\) is
a &lt;a href=&quot;http://en.wikipedia.org/wiki/Perfect_power&quot;&gt;perfect
power&lt;/a&gt;, if it&apos;s not one, we can immediately conclude that \(n =
p^1\) and thus prime. (Of course, if it does turn out to be a perfect
power, then it is trivially composite.)&lt;/p&gt;

&lt;p&gt;To understand the conditions that we will derive, we must first
talk about &lt;em&gt;introspective numbers&lt;/em&gt;.

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;2. Introspective numbers&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Given a base \(b\), a polynomial \(g(X)\) and a number \(q\), we
  call \(q\) &lt;em&gt;introspective&lt;/em&gt;&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; for \(g(X)\) modulo \(b\) if

\[
g(X)^q = g(X^q) \pmod{b}\text{.}
\]&lt;/p&gt;

&lt;p&gt;We also say that \(g(X)\) is &lt;em&gt;introspective&lt;/em&gt; under \(q\)
modulo \(b\).&lt;/p&gt;

&lt;p&gt;A basic property of introspective numbers and polynomials is that
they are closed under multiplication.  That is, if \(q_1\) and \(q_2\)
are introspective for \(g(X)\) modulo \(b\), then \(q_1 \cdot q_2\) is
also introspective for \(g(X)\) modulo \(b\), and if \(g_1(X)\) and
\(g_2(X)\) are introspective under \(q\) modulo \(b\), then \(g_1(X)
\cdot g_2(X)\) is also introspective under \(q\) modulo \(b\).&lt;/p&gt;

&lt;p&gt;In particular, given our assumptions above, we can easily see that
\(1\), \(p\), and \(n\) are introspective for \(X + a\) modulo \(p\)
for any \(0 \le a \lt M\).  We can also show that \(n/p\) is also
introspective for \(X + a\) modulo \(p\).  Using closure under
multiplication, we can talk about the set of numbers generated by
\(p\) and \(n/p\), which are all introspective for \(X + a\) modulo
\(p\).  Call this set \(I\):&lt;/p&gt;

\[
I = \left\{ p^i \left( n/p \right)^j \mid i, j \ge 0 \right\}\text{.}
\]

&lt;p&gt;We can also take the closure of all \(X + a\) to get a set of
polynomials which are all introspective under \(p\), \(n/p\), or any
number in \(I\).  Call this set \(P\):

\[
P = \left\{ 0 \right\} \cup
\left\{ X^{e_0} \cdot (X + 1)^{e_1} \dotsm (X + M -
1)^{e_{M - 1}} \mid e_0, e_1, \dotsc, e_{M - 1} \ge 0 \right\}\text{.}
\]

To summarize, \(I\) is a set of numbers and \(P\) is a set of
polynomials such that for any \(i \in I\) and \(g(X) \in P\), \(i\) is
introspective for \(g(X)\) modulo \(p\).  Of course, it&apos;s still not
clear what these two sets have to do with whether \(n\) is prime or
not.  But we will examine certain finite sets related to \(I\) and
\(P\) and their sizes, and we will see that we can deduce their
properties depending on the relation of \(n\) to \(p\).&lt;/p&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;3. Bounds on finite sets related to \(I\) and \(P\)&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;Now we&apos;re ready to work towards finding our restrictions on \(n\),
\(r\), \(M\), and \(p\).  We&apos;ll slowly build them up such that when
the last one falls into place, we know that \(n\) is a perfect power
of \(p\). Here&apos;s what we&apos;re starting with:&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;br/&gt;
  \(r \ge 2\), &lt;br/&gt;
  \(M \ge 1\), &lt;br/&gt;
  \(p\) is a prime divisor of \(n\).
&lt;/div&gt;

&lt;p&gt;Let us restrict \(I\) to a finite set by bounding the exponents of
\(p\) and \(n/p\):

\[
I_k = \left\{ p^i (n/p)^j \mid 0 \le i, j \lt k \right\} \subset I\text{.}
\]&lt;/p&gt;

&lt;p&gt;Notice that if \(n\) is not a power of \(p\), then all members of
  \(I_k\) are distinct, and therefore we can easily calculate its
  size:&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;

\[
|I_k| = k^2\text{.}
\]&lt;/p&gt;

&lt;p&gt;Let&apos;s also restrict \(P\) to a finite set by bounding the degrees
of its polynomials:

\[
P_d = \left\{ g \in P \mid \deg(g) \lt d \right\} \subset P\text{.}
\]&lt;/p&gt;

&lt;p&gt;We can calculate \(|P_d|\) exactly,&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; but
we only need a lower bound for when \(d \le M\).  Consider \(P_d^{\{0,
1\}}\), the subset of \(P_d\) where each \(X + a\) is present at most
once.  Since each \(X + a\) is either present or not present, but not
all of them can be present at the same time, there are \(2^d - 1\)
distinct polynomials in \(P_d^{\{0, 1\}}\).  Adding back the zero
polynomial yields \(|P_d^{\{0, 1\}}| = 2^d\).  Since \(P_d^{\{0,
1\}}\) is a subset of \(P_d\), \(|P_d| \ge |P_d^{\{0, 1\}}| = 2^d\).
  Therefore, if \(d \le M\), then&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;

\[ |P_d| \ge 2^d\text{.}  \]

This will be useful later (for a particular value of \(d\)), so let&apos;s
add the restriction to \(M\):
&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;br/&gt;
  \(r \ge 2\), &lt;br/&gt;
  &lt;em&gt;\(M \ge d\)&lt;/em&gt;, &lt;br/&gt;
  \(p\) is a prime divisor of \(n\).
&lt;/div&gt;

&lt;p&gt;Let us restrict \(I\) in a different way, by reducing modulo \(r\):

\[
J = \left\{ x \bmod r \mid x \in I \right\}
\]

and let \(t = |J|\). (This size will play an important role
later.)&lt;/p&gt;

&lt;p&gt;Our final set that we&apos;re interested in needs some background to
define.  We want to find a subset of \(P\) that lies in some field
\(F\) because fields have some convenient properties that we will use
  later.&lt;sup&gt;&lt;a href=&quot;#fn6&quot; id=&quot;r6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Consider \(\mathbb{Z}/p\mathbb{Z}\), the ring
of &lt;a href=&quot;http://en.wikipedia.org/wiki/Integers_modulo_n#Integers_modulo_n&quot;&gt;integers
modulo \(p\)&lt;/a&gt;.  Since \(p\) is prime, it is also a field.  In
particular, it is
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Finite_field&quot;&gt;finite
field&lt;/a&gt; \(\mathbb{F}_p\) of order \(p\).  Then consider
\(\mathbb{F}_p[X]\),
its &lt;a href=&quot;http://en.wikipedia.org/wiki/Polynomial_ring&quot;&gt;polynomial
ring&lt;/a&gt;, which is the set of polynomials with coefficients in
\(\mathbb{F}_p\).  Given some polynomial \(q(X) \in \mathbb{F}_p[X]\),
we can further reduce modulo \(q(X)\) to get \(\mathbb{F}_p[X] /
q(X)\).  Finally, if \(q(X)\) is
&lt;a href=&quot;http://en.wikipedia.org/wiki/Irreducible_polynomial&quot;&gt;irreducible&lt;/a&gt;
over \(\mathbb{F}_p\), then \(\mathbb{F}_p[X] / q(X)\) is also a
field.&lt;/p&gt;

&lt;p&gt;(We can show that both \(\mathbb{F}_p = \mathbb{Z}/p\mathbb{Z}\)
and \(\mathbb{F}_p[X] / q(X)\) are fields from the same general
theorem of rings: if \(R\) is
a &lt;a href=&quot;http://en.wikipedia.org/wiki/Principal_ideal_domain&quot;&gt;principal
ideal domain&lt;/a&gt; and \((c)\) is
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Two-sided_ideal#Ideal_generated_by_a_set&quot;&gt;two-sided
ideal generated by \(c\)&lt;/a&gt;, then
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Quotient_ring&quot;&gt;quotient
ring&lt;/a&gt; \(R / (c)\) is a field if and only if \(c\) is
a &lt;a href=&quot;http://en.wikipedia.org/wiki/Prime_element&quot;&gt;prime
  element&lt;/a&gt; of \(R\).)&lt;sup&gt;&lt;a href=&quot;#fn7&quot; id=&quot;r7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;So we just need to find a polynomial that&apos;s irreducible over
\(\mathbb{F}_p\).  We know that \(X^r - 1\) has \(Φ_r(X)\), the
\(r\)th &lt;a href=&quot;http://en.wikipedia.org/wiki/Cyclotomic_polynomial&quot;&gt;cyclotomic
polynomial&lt;/a&gt;, as a factor.  \(Φ_r(X)\) is irreducible over
\(\mathbb{Z}\), but not necessarily over \(\mathbb{F}_p\).  But if
\(r\) is relatively prime to \(p\), then \(Φ_r(X)\) factors into
irreducible polynomials all of degree \(o_r(p)\)
(the &lt;a href=&quot;http://en.wikipedia.org/wiki/Multiplicative_order&quot;&gt;multiplicative
  order&lt;/a&gt; of \(p\) modulo \(r\)) over \(\mathbb{F}_p\).&lt;sup&gt;&lt;a href=&quot;#fn8&quot; id=&quot;r8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;  Then we can
just require that \(r\) be relatively prime to \(p\).  If we do so,
then we can let \(h(X)\) be one of the factors of \(Φ_r(X)\) over
\(\mathbb{F}_p\) and we have our field \(F = \mathbb{F}_p[X] /
h(X)\).&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;br/&gt;
  \(r \ge 2\), &lt;em&gt;\(r\) relatively prime to \(p\)&lt;/em&gt;,&lt;br/&gt;
  \(M \ge d\), &lt;br/&gt;
  \(p\) is a prime divisor of \(n\).
&lt;/div&gt;

&lt;p&gt;Finally, we can define our last set.  Let

\[
Q = \left\{ f(X) \bmod (h(X), p) \mid f(X) \in P \right\} \subseteq F\text{.}
\]&lt;/p&gt;

&lt;p&gt;We can map elements of \(P\) into \(Q\) via reduction modulo
\((h(X), p)\).  But we&apos;re interested in only the elements of \(P\)
that map to distinct elements of \(Q\), since that will let us find a
lower bound for \(|Q|\).  A simple example would be the set of \(X +
a\) for \(0 \le a \lt M\); if the degree of \(h(X)\) is greater than
\(1\) and \(p \ge M\), then each \(X + a\) is distinct in \(Q\).&lt;/p&gt;

&lt;p&gt;Another interesting set is \(X^k\) for \(1 \le k \le r\).  Since
\(h(X) \equiv 0 \pmod{h(X}, p)\), we can say that \(X\) is a root of
the polynomial function \(h(y)\) over the field \(F\).  But since
\(h(y)\) is a factor of \(Φ_r(y)\), \(X\) is then a primitive
\(r\)th root of unity in \(Q\).&lt;sup&gt;&lt;a href=&quot;#fn9&quot; id=&quot;r9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;  But the powers of a primitive \(r\)th
root of unity (from \(1\) to \(r\)) are all distinct.  Therefore all
\(X^k\) for \(1 \le k \le r\) are distinct in \(Q\).&lt;/p&gt;

&lt;p&gt;Most importantly, we can show that distinct elements in \(P_d\) map
to distinct elements in \(Q\) if \(d \le t\).  Let \(f(X)\) and
\(g(X)\) be two different elements of \(P_d\).  Assume that \(f(X)
\equiv g(X) \pmod{h(x}, p)\).  Then, for \(m \in I\):

\[
f(X^m) \equiv f(X)^m \pmod{X^r - 1, p}
\]

and

\[
g(X^m) \equiv g(X)^m \pmod{X^r - 1, p}
\]

by introspection modulo \(p\), and therefore

\[
f(X^m) \equiv g(X^m) \pmod{X^r - 1, p}
\]

which immediately leads to

\[
f(X^m) \equiv g(X^m) \pmod{h(X}, p)\text{.}
\]

Therefore, all \(X^m\) for \(m \in I\) are roots of the polynomial
function \(u(y) = f(y) - g(y)\) over the field \(F\), and in
particular all \(X^m\) for \(m \in J\).  But all such \(X^m\)
are distinct in \(Q\) by the argument above.  Therefore, \(u(y)\) must
have degree at least \(t\) since a polynomial over a field cannot have
more roots than its degree.  But the degree of \(u(y)\) is less than
\(d\) since both \(f(y)\) and \(g(y)\) have degree less than \(d\).
Since \(d \le t\), this is a contradiction, so therefore \(f(X)
\not\equiv g(X) \pmod{h(x}, p)\).  But since \(f(X)\) and \(g(X)\)
were arbitrary, that implies that distinct elements of \(P_d\) map to
distinct elements of \(Q\) for \(d \le t\).&lt;/p&gt;

&lt;p&gt;Given the above, we can conclude that as long as we require that
\(d \le t\), \(p \ge M\), and \(o_r(p) = \deg(h(X)) \gt 1\), then

\[
|Q| \ge |P_d| \ge 2^d\text{.}
\]&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;br/&gt;
  &lt;em&gt;\(o_r(p) \gt 1\)&lt;/em&gt;,&lt;br/&gt;
  \(M \ge d\), &lt;br/&gt;
  &lt;em&gt;\(t \ge d\)&lt;/em&gt;,&lt;br/&gt;
  &lt;em&gt;\(p \ge M\)&lt;/em&gt;, \(p\) is a prime divisor of \(n\).
&lt;/div&gt;

&lt;/section&gt;

&lt;section&gt;
&lt;header&gt;
&lt;h2&gt;4. The AKS theorem (weak version)&lt;/h2&gt;
&lt;/header&gt;

&lt;p&gt;We&apos;re finally ready to put it all together.  Again assume \(n\) is
not a power of \(p\), and recall that \(|J| = t\).  Let \(s \gt
\sqrt{t}\).  Then \(|I_s| = s^2 \gt t\).  By
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Pigeonhole_principle&quot;&gt;pigeonhole
principle&lt;/a&gt;, there must be two elements \(m_1, m_2 \in I_s\) that
map to the same element in \(J\); that is, there must be \(m_1, m_2
\in I_s\) such that \(m_1 \equiv m_2 \pmod{r}\).  Now pick some
\(g(X)\) from \(P\).  Then

\[
g(X)^{m_1} \equiv g(X^{m_1}) \pmod{X^r - 1, p}
\]

and

\[
g(X)^{m_2} \equiv g(X^{m_2}) \pmod{X^r - 1, p}
\]

by introspection modulo \(p\).  But \(X^{m_1} \equiv X^{m_2} \pmod{X^r - 1}\) since \(m_1 \equiv m_2 \pmod{r}\), so

\[
g(X^{m_1}) \equiv g(X^{m_2}) \pmod{X^r - 1, p}\text{.}
\]

Chaining all these congruences together lets us deduce that

\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{X^r - 1, p}\text{,}
\]

which immediately leads to

\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{h(X}, p)\text{.}
\]
&lt;/p&gt;

&lt;p&gt;That means that \(g(X) \bmod (h(X), p) \in Q\) is a root of the
polynomial function \(u(y) = y^{m_1} - y^{m_2}\) over the field \(F\).
But \(g(X)\) was picked arbitrarily from \(P\), so \(u(y)\) has at
least \(|Q|\) roots.  \(\deg(u(y)) = \max(m_1, m_2) \le p^{s-1} \cdot
(n/p)^{s-1} = n^{s-1}\), and \(u(y)\), being a polynomial over a
field, cannot have more roots than its degree, so if \(n\) is not a
power of \(p\), then \(|Q| \le n^{s-1}\).  Equivalently, if \(|Q| \gt
  n^{s-1}\), then \(n\) must be a power of \(p\).&lt;sup&gt;&lt;a href=&quot;#fn10&quot; id=&quot;r10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;  But
we&apos;ve shown above that \(|Q| \ge 2^d\) for \(d \le t\), so if we can
pick \(d\) and \(s\) such that \(2^d \gt n^{s-1}\), then we can force
\(n\) to be a power of \(p\).  Taking logs, we see that this is
equivalent to picking \(d\) and \(s\) such that \(d \gt (s - 1) \lg
n\).  Since \(d \le t\), this imposes \(t \gt (s - 1) \lg n\) in order
for there to be room to pick \(d\).  Rearranging, we get \(s \lt
\frac{t}{\lg n} + 1\).  But \(s \gt \sqrt{t}\), so this imposes
\(\sqrt{t} \lt \frac{t}{\lg n} + 1\) in order for there to be room to
pick \(s\).  Rearranging again, we get \(\frac{t}{\sqrt{t} - 1} \gt
\lg n\).  Since \(\frac{t}{\sqrt{t} - 1} \gt \sqrt{t}\), it suffices
to require that \(t \gt \lg^2 n\) in order for there to be room to
pick \(d\) and \(s\).  Furthermore, since \(s\) has to be an integer,
then \(s \ge \lfloor \sqrt{t} \rfloor + 1\), and therefore \(d \gt
\lfloor \sqrt{t} \rfloor \lg n\).  Let&apos;s update our assumptions:&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;br/&gt;
  \(o_r(p) \gt 1\)&lt;br/&gt;
  &lt;em&gt;\(M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n\)&lt;/em&gt;,&lt;br/&gt;
  &lt;em&gt;\(t \gt \lg^2 n\)&lt;/em&gt;,&lt;br/&gt;
  \(p \ge M\), \(p\) is a prime divisor of \(n\).
&lt;/div&gt;

&lt;p&gt;So to summarize, if we make the above assumptions, we can pick
\(d\) and \(s\) such that \(|Q| \ge 2^d \gt n^{s - 1}\), which implies
that \(n\) must be a power of \(p\), which was our goal.  Now we just
have to express all assumptions in terms of \(n\), \(r\), and \(M\),
strengthening them if necessary.  \(J\) is generated by \(p\) and
\(n/p\), so its order (i.e., \(t\)) is at least \(o_r(p)\), which is
in turn at least \(o_r(n)\), since \(p\) is a prime factor of \(n\)
(this brings along the assumption that \(r\) and \(n\) are relatively
prime).  Therefore, we can replace the assumptions \(t \gt \lg^2 n\)
and \(o_r(p) \gt 1\) with \(o_r(n) \gt \lg^2 n\).  We can remove the
reference to \(d\) by finding the maximum value of \(t\).  Since \(r\)
is relatively prime to \(n\), \(J\) is a subgroup of \(Z_r\), and
therefore its order divides (and therefore is at most) \(φ(r)\).
So we can replace \(M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n\) with
\(M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n\).  Finally, we can
remove the reference to \(p\) by mandating that \(n\) has no prime
factor less than \(M\).  Here are our final assumptions:&lt;/p&gt;

&lt;div class=&quot;insert&quot;&gt;
  \(n \ge 2\), &lt;em&gt;\(n\) has no prime factors less than \(M\)&lt;/em&gt;,&lt;br/&gt;
  &lt;em&gt;\(o_r(n) \gt \lg^2 n\)&lt;/em&gt;,&lt;br/&gt;
  &lt;em&gt;\(M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n\)&lt;/em&gt;.&lt;br/&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;We can summarize the above discussion in the following theorem:

&lt;div class=&quot;theorem&quot;&gt;
  (&lt;span class=&quot;theorem-name&quot;&gt;AKS theorem, weak version&lt;/span&gt;.)  Let
  \(n \ge 2\), \(r\) be relatively prime to \(n\) with \(o_r(n) \gt
  \lg^2 n\), and \(M \gt \lfloor \sqrt{φ(r)} \rfloor \lg n\).
  Furthermore, let \(n\) have no prime factor less than \(M\) and let

  \[
  (X + a)^n \equiv X^n + a \pmod{X^r - 1, n}
  \]

  for \(0 \le a \lt M\).  Then \(n\) is the power of some prime \(p \ge
  M\).&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;And that&apos;s it for now!  In the follow-up article we will strengthen
  this theorem to further show that \(n\) is equal to \(p\), and
  therefore prime.  Then we will use this result to get a
  primality-testing algorithm that we will prove to be polynomial
  time.&lt;/p&gt;

&lt;/section&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] We use uppercase letters for variables when we treat
    polynomials as formal polynomials and lowercase letters when we
    treat them as functions. &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] The term &amp;ldquo;introspection&amp;rdquo;, which comes
    from the original AKS paper, was probably chosen to invoke the idea
    that the exponent \(q\) can be pushed into and pulled out of \(g(X)\).
    Here we generalize it a bit. &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] This condition is too weak to be useful by itself,
    but we will parlay it into something we can use later.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] Using the ideas
    on &lt;a href=&quot;http://www.johndcook.com/TwelvefoldWay.pdf&quot;&gt;this page&lt;/a&gt;,
    we can show that \(|P_d| = {M + d \choose d - 1} + 1\) by
    considering each \(X + a\) a labeled urn (plus a
    &amp;ldquo;dummy&amp;rdquo; urn) and each unit of power an unlabeled
    ball. (This was used in the AKS paper.)
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] This lower bound, as well as other ideas that simplify the
    proof, was taken
    from &lt;a href=&quot;http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827&quot;&gt;Prime
    Numbers: A Computational Perspective&lt;/a&gt;.
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn6&quot;&gt;[6] You may first want to brush up on the definitions
    of &lt;a href=&quot;http://en.wikipedia.org/wiki/Group_(mathematics)&quot;&gt;group&lt;/a&gt;,
    &lt;a href=&quot;http://en.wikipedia.org/wiki/Ring_(mathematics)&quot;&gt;ring&lt;/a&gt;,
    and &lt;a href=&quot;http://en.wikipedia.org/wiki/Field_(mathematics)&quot;&gt;field&lt;/a&gt;,
    and the differences between them.
    &lt;a href=&quot;#r6&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn7&quot;&gt;[7] This is Theorem 1.47(iv) from
    &amp;ldquo;&lt;a href=&quot;http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948&quot;&gt;Introduction
    to finite fields and their applications&lt;/a&gt;&amp;rdquo;.
    &lt;a href=&quot;#r7&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn8&quot;&gt;[8] The reducibility of \(Φ_r(X)\) over
    \(\mathbb{F}_p\) given \(r\) relatively prime to \(p\) is Theorem
    2.47(ii) from
    &amp;ldquo;&lt;a href=&quot;http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948&quot;&gt;Introduction
    to finite fields and their applications&lt;/a&gt;&amp;rdquo;.
    &lt;a href=&quot;#r8&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn9&quot;&gt;[9] It&apos;s a bit weird to talk about a polynomial being
    the root of other polynomials, but recall that we can form a
    polynomial ring over any field, even a field of polynomials.  We
    keep track of which polynomials belong to which domains by using
    different variables.
    &lt;a href=&quot;#r9&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn10&quot;&gt;[10] Here&apos;s where we force \(n\) to be a prime power.
    &lt;a href=&quot;#r10&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/intro-primality-testing</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/intro-primality-testing"/>
    <title>An Introduction to Primality Testing</title>
    <updated>2012-07-08T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script type=&quot;text/javascript&quot;
        src=&quot;https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js&quot;&gt;&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/simple-arith.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/primality-testing.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;I will explain two commonly-used primality tests: Fermat and
Miller-Rabin.  Along the way, I will cover the basic concepts of
primality testing.  I won&apos;t be assuming any background in number
theory, but familiarity
with &lt;a href=&quot;http://en.wikipedia.org/wiki/Modular_arithmetic&quot;&gt;modular
arithmetic&lt;/a&gt; will be helpful.  I will also be providing
implementations in Javascript,
so &lt;a href=&quot;https://developer.mozilla.org/en/JavaScript&quot;&gt;familiarity
with it&lt;/a&gt; will also be helpful.  Finally, since Javascript doesn&apos;t
natively support arbitrary-precision arithmetic, I wrote a simple
natural number class
(&lt;a href=&quot;https://cdn.jsdelivr.net/gh/akalin/num.js@eab08d4/simple-arith.js&quot;&gt;&lt;code&gt;SNat&lt;/code&gt;&lt;/a&gt;) that
represents a number as an array of decimal digits.  All algorithms
used are the simplest possible, except when a more efficient one is
needed by the algorithms we discuss.&lt;/p&gt;

&lt;p&gt;Primality testing is the problem of determining whether a given
natural number is prime or composite.  Compared to the problem of
&lt;a href=&quot;http://en.wikipedia.org/wiki/Integer_factorization&quot;&gt;integer
factorization&lt;/a&gt;, which is to determine the prime factors of a given
natural number, primality testing turns out to be easier; integer
factorization is
in &lt;a href=&quot;http://en.wikipedia.org/wiki/NP_(complexity)&quot;&gt;NP&lt;/a&gt; and
thought to be
outside &lt;a href=&quot;http://en.wikipedia.org/wiki/P_(complexity)&quot;&gt;P&lt;/a&gt;
and &lt;a href=&quot;http://en.wikipedia.org/wiki/NP-complete&quot;&gt;NP-complete&lt;/a&gt;,
whereas primality testing
is &lt;a href=&quot;http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf&quot;&gt;now
known to be in P&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Most primality tests are actually compositeness tests; they involve
finding &lt;em&gt;composite witnesses&lt;/em&gt;, which are numbers that, along
with a given number to be tested, can be fed to some easily-computable
function to prove that the given number is composite.  (The composite
witness, along with the function, is a &lt;em&gt;certificate of
compositeness&lt;/em&gt; of the given number.)  A primality test can either
check each possible witness or, like the Fermat and Miller-Rabin
tests, it can randomly sample some number of possible witnesses and
call the number prime if none turn out to be witnesses.  In the latter
case, there is a chance that a composite number can erroneously be
called prime; ideally, this chance goes to zero quickly as the sample
size increases.&lt;/p&gt;

&lt;p&gt;The simplest possible witness type is, of course, a factor of the
given number, which we&apos;ll call a &lt;em&gt;factor witness&lt;/em&gt;.  If the
number to be tested is \(n\) and the possible factor witness is \(a\),
then one can simply test whether \(a\) divides \(n\) (written as \(a
\mid n\)) by evaluating \(n \bmod a = 0\); that is, whether the
remainder of \(n\) divided by \(a\) is zero.  This doesn&apos;t yield a
feasible deterministic primality test, though, since checking all
possible witnesses is equivalent to factoring the given number.  Nor
does it yield a feasible probabilistic primality test, since in the
worst case the given number has very few factors, which random
sampling would miss.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The simplest useful witness type is a &lt;em&gt;Fermat witness&lt;/em&gt;,
which relies on the following theorem of Fermat:

&lt;div class=&quot;theorem&quot;&gt;
(&lt;span class=&quot;theorem-name&quot;&gt;Fermat&apos;s little theorem&lt;/span&gt;.)  If \(n\)
is prime and \(a\) is not a multiple of \(n\), then

\[
a^{n-1} \equiv 1 \pmod{n}\text{.}
\]
&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Thus, a Fermat witness is a number \(1 \lt a \lt n\) such that
\(a^{n-1} \not\equiv 1 \pmod{n}\).  Conversely, if \(n\) is composite
and \(a^{n-1} \equiv 1 \pmod{n}\), then \(a\) is a &lt;em&gt;Fermat
liar&lt;/em&gt;.&lt;/p&gt;

&lt;p class=&quot;interactive-example&quot; id=&quot;fermatExample&quot;&gt;
  Let
  &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt; =
  &lt;input class=&quot;parameter&quot; size=&quot;6&quot; pattern=&quot;[0-9]*&quot; required
         type=&quot;text&quot; value=&quot;355207&quot;
         data-bind=&quot;value: nStr, valueUpdate: &apos;afterkeydown&apos;&quot; /&gt;&lt;/span&gt;
  and
  &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt; =
  &lt;input class=&quot;parameter&quot; size=&quot;6&quot; pattern=&quot;[0-9]*&quot; required
         type=&quot;text&quot; value=&quot;2&quot;
         data-bind=&quot;value: aStr, valueUpdate: &apos;afterkeydown&apos;&quot; /&gt;&lt;/span&gt;.
  &lt;!-- ko template: outputTemplate --&gt;&lt;!-- /ko --&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;fermat.error.invalidN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is not a valid number.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;fermat.error.invalidA&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt; is not a valid number.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;fermat.error.outOfBoundsN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; must be greater than
    &lt;span class=&quot;fake-katex&quot;&gt;2&lt;/span&gt;.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;fermat.error.outOfBoundsA&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt; must be greater than
    &lt;span class=&quot;fake-katex&quot;&gt;1&lt;/span&gt; and less than
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;fermat.success&quot;&gt;
    Then
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;sup&gt;&lt;var&gt;n&lt;/var&gt;&amp;minus;1&lt;/sup&gt;
      &amp;equiv;
    &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: r&quot;&gt;&lt;/span&gt;
    &lt;span data-bind=&quot;if: r() &amp;&amp; r().ne(1)&quot;&gt;&amp;equiv;&amp;#824; 1&lt;/span&gt;
    (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt; so therefore
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is
    &lt;span data-bind=&quot;if: isCompositeByFermat()&quot;&gt;
      &lt;span class=&quot;result&quot;&gt;composite&lt;/span&gt;.
      &lt;span data-bind=&quot;if: r() &amp;&amp; r().isZero()&quot;&gt;
        Furthermore,
        &lt;span class=&quot;fake-katex&quot;&gt;gcd(&lt;var&gt;a&lt;/var&gt;, &lt;var&gt;n&lt;/var&gt;) =
        &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: k&quot;&gt;&lt;/span&gt;&lt;/span&gt;
        is a non-trivial factor of
        &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;.
      &lt;/span&gt;
    &lt;/span&gt;
    &lt;span data-bind=&quot;ifnot: isCompositeByFermat()&quot;&gt;
      either &lt;span class=&quot;result&quot;&gt;prime&lt;/span&gt; or a
      &lt;span class=&quot;result&quot;&gt;Fermat pseudoprime base
        &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt;&lt;/span&gt;.
    &lt;/span&gt;
  &lt;/script&gt;
&lt;/p&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;/intro-primality-testing-files/fermat-example.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;If \(n\) has at least one Fermat witness that is relatively prime,
then we can show that at least half of all possible witnesses are
Fermat witnesses.  (Roughly, if \(a\) is the Fermat witness and \(a_1,
a_2, \dotsc, a_s\) are Fermat liars, then all \(a \cdot a_i\) are also
Fermat witnesses.)  Therefore, for a sample of \(k\) possible
witnesses of \(n\), the probability of all of them being Fermat liars
is \(\le 2^{-k}\), which goes to zero quickly enough to be
practical.&lt;/p&gt;

&lt;p&gt;However, there is the possibility that \(n\) is a composite number
with no relatively prime Fermat witnesses.  These are
called &lt;a href=&quot;http://en.wikipedia.org/wiki/Carmichael_numbers&quot;&gt;&lt;em&gt;Carmichael
numbers&lt;/em&gt;&lt;/a&gt;.  Even though Carmichael numbers are rare, their
existence still makes the Fermat primality test unsuitable for some
situations, as when the numbers to be tested are provided by some
adversary.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Here is the Fermat compositeness test implemented in
Javascript:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Runs the Fermat compositeness test given n &gt; 2 and 1 &amp;lt; a &amp;lt; n.
// Calculates r = a^{n-1} mod n and whether a is a Fermat witness to n
// (i.e., r != 1, which means n is composite).  Returns a dictionary
// with a, n, r, and isCompositeByFermat, which is true iff a is a
// Fermat witness to n.
function testCompositenessByFermat(n, a) {
  n = SNat.cast(n);
  a = SNat.cast(a);

  if (n.le(2)) {
    throw new RangeError(&apos;n must be &gt; 2&apos;);
  }

  if (a.le(1) || a.ge(n)) {
    throw new RangeError(&apos;a must satisfy 1 &amp;lt; a &amp;lt; n&apos;);
  }

  var r = a.powMod(n.minus(1), n);
  var isCompositeByFermat = r.ne(1);
  return {
    a: a,
    n: n,
    r: r,
    isCompositeByFermat: isCompositeByFermat
  };
}&lt;/code&gt;&lt;/pre&gt;

Note that the algorithm depends on the efficiency
of &lt;a href=&quot;http://en.wikipedia.org/wiki/Modular_exponentiation&quot;&gt;&lt;em&gt;modular
exponentiation&lt;/em&gt;&lt;/a&gt; when calculating \(a^{n-1} \pmod{n}\).  The
naive method is unsuitable since it requires \(Θ(n)\) \(b\)-bit
multiplications, where \(b = \lceil \lg n \rceil\).  &lt;code&gt;SNat&lt;/code&gt;
uses &lt;a href=&quot;http://en.wikipedia.org/wiki/Repeated_squaring&quot;&gt;repeated
squaring&lt;/a&gt;, which requires only \(Θ(\lg n)\) \(b\)-bit
multiplications.&lt;/div&gt;

&lt;p&gt;Another useful witness type is a &lt;em&gt;non-trivial square root of
unity \(\operatorname{mod} n\)&lt;/em&gt;; that is, a number \(a ≠ \pm
1 \pmod{n}\) such that \(a^2 \equiv 1 \pmod{n}\).  It is a theorem of
number theory that if \(n\) is prime, there are no non-trivial square
roots of unity \(\operatorname{mod} n\).  Therefore, if we do find one,
that means \(n\) is composite.  In fact, finding one leads directly to
factors of \(n\).  By definition, a non-trivial square root of unity
\(a\) satisfies \(a \pm 1 ≠ 0 \pmod{n}\) and \(a^2 - 1 \equiv 0
\pmod{n}\).  Factoring the latter leads to \((a+1)(a-1) \equiv 0
\pmod{n}\), which means that \(n\) divides \((a+1)(a-1)\).  But the
first condition says that \(n\) divides neither \(a+1\) nor \(a-1\),
so it must be a product of two numbers \(p \mid a+1\) and \(q \mid
a-1\).  Then \(\gcd(a+1, n)\)&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
and \(\gcd(a-1, n)\) are factors of \(n\).&lt;/p&gt;

&lt;p&gt;Finding non-trivial square roots of unity by itself doesn&apos;t give a
useful primality testing algorithm, but combining it with the Fermat
primality test does.  \(a^{n-1} \bmod n\) either equals \(1\) or not.
If it doesn&apos;t, you&apos;re done since you have a Fermat witness.  If it
does equal \(1\), and \(n-1\) is even, then consider the square root
of \(a^{n-1}\), i.e. \(a^{(n-1)/2}\).  If it is not \(\pm 1\), then it
is a non-trivial square root of unity.  If it is \(-1\), then you
can&apos;t do anything else.  But if it is \(1\), and \((n-1)/2\) is even,
you can then take another square root and repeat the test, stopping
when the exponent of \(a\) becomes odd or when you get a result not
equal to \(1\).&lt;/p&gt;

&lt;p&gt;To turn this into an algorithm, you simply start from the bottom
up: find the greatest odd factor of \(n-1\), call it \(t\), and keep
squaring \(a^t\) mod \(n\) until you find a non-trivial square root of
\(n\) or until you can deduce the value of \(a^{n-1}\).  In fact, this
is almost as fast as the original Fermat primality test, since the
exponentiation by \(n-1\) has to do the same sort of squaring, and
we&apos;re just adding comparisons to \(±1\) in between squarings.&lt;/p&gt;

&lt;p&gt;The original idea for the test above is from Artjuhov, although it
is usually credited to Miller.  Therefore, we call \(a\) an &lt;em&gt;Artjuhov witness&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; of \(n\)&lt;/em&gt; if it shows \(n\) composite by
the above test.&lt;/p&gt;

&lt;p class=&quot;interactive-example&quot; id=&quot;artjuhovExample&quot;&gt;
  Let
  &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt; =
  &lt;input class=&quot;parameter&quot; size=&quot;6&quot; pattern=&quot;[0-9]*&quot; required
         type=&quot;text&quot; value=&quot;561&quot;
         data-bind=&quot;value: nStr, valueUpdate: &apos;afterkeydown&apos;&quot; /&gt;&lt;/span&gt;
  and
  &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt; =
  &lt;input class=&quot;parameter&quot; size=&quot;6&quot; pattern=&quot;[0-9]*&quot; required
         type=&quot;text&quot; value=&quot;2&quot;
         data-bind=&quot;value: aStr, valueUpdate: &apos;afterkeydown&apos;&quot; /&gt;&lt;/span&gt;.
  &lt;!-- ko template: outputTemplate --&gt;&lt;!-- /ko --&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.error.invalidN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is not a valid number.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.error.invalidA&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt; is not a valid number.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.error.outOfBoundsN&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; must be greater than
    &lt;span class=&quot;fake-katex&quot;&gt;2&lt;/span&gt;.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.error.outOfBoundsA&quot;&gt;
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt; must be greater than
    &lt;span class=&quot;fake-katex&quot;&gt;1&lt;/span&gt; and less than
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;.
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.success.fermatEquivResult&quot;&gt;
    Then
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
    is even, so this reduces to the Fermat primality test.

    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;sup&gt;&lt;var&gt;n&lt;/var&gt;&amp;minus;1&lt;/sup&gt;
      &amp;equiv;
    &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: r&quot;&gt;&lt;/span&gt;
    &lt;span data-bind=&quot;if: r() &amp;&amp; r().ne(1)&quot;&gt;&amp;equiv;&amp;#824; 1&lt;/span&gt;
    (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt; so therefore
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is
    &lt;span data-bind=&quot;if: isCompositeByArtjuhov()&quot;&gt;
      &lt;span class=&quot;result&quot;&gt;composite&lt;/span&gt;.
      &lt;span data-bind=&quot;html: factorsHtml&quot;&gt;&lt;/span&gt;
    &lt;/span&gt;
    &lt;span data-bind=&quot;ifnot: isCompositeByArtjuhov()&quot;&gt;
      an &lt;span class=&quot;result&quot;&gt;Artjuhov pseudoprime base
        &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt;&lt;/span&gt;.
    &lt;/span&gt;
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.success.impliesFinalEquivResult&quot;&gt;
    Then
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt; &amp;minus; 1 =
      &lt;span data-bind=&quot;html: nMinusOneHtml&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
    and
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;r&lt;/var&gt; &amp;equiv;
      &lt;span data-bind=&quot;html: rHtml&quot;&gt;&lt;/span&gt; &amp;equiv;
      &lt;span data-bind=&quot;html: rResultHtml&quot;&gt;&lt;/span&gt; (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt;,
    so
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;sup&gt;&lt;var&gt;n&lt;/var&gt;&amp;minus;1&lt;/sup&gt;
      &amp;equiv;
    &lt;span data-bind=&quot;html: aNMinusOneHtml&quot;&gt;&lt;/span&gt; (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt;,
    and therefore
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt; is
    &lt;span data-bind=&quot;if: isCompositeByArtjuhov()&quot;&gt;
      &lt;span class=&quot;result&quot;&gt;composite&lt;/span&gt;.
      &lt;span data-bind=&quot;html: factorsHtml&quot;&gt;&lt;/span&gt;
    &lt;/span&gt;
    &lt;span data-bind=&quot;ifnot: isCompositeByArtjuhov()&quot;&gt;
      either &lt;span class=&quot;result&quot;&gt;prime&lt;/span&gt; or an
      &lt;span class=&quot;result&quot;&gt;Artjuhov pseudoprime base
        &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;a&lt;/var&gt;&lt;/span&gt;&lt;/span&gt;.
    &lt;/span&gt;
  &lt;/script&gt;

  &lt;script type=&quot;text/html&quot; id=&quot;artjuhov.success.nonTrivialSqrtResult&quot;&gt;
    Then
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt; &amp;minus; 1 =
      &lt;span data-bind=&quot;html: nMinusOneHtml&quot;&gt;&lt;/span&gt;&lt;/span&gt;,
    &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;r&lt;/var&gt; &amp;equiv;
      &lt;span data-bind=&quot;html: rHtml&quot;&gt;&lt;/span&gt;
      &amp;equiv; &lt;span class=&quot;intermediate&quot;&gt;1&lt;/span&gt;
      (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt;, and
    &lt;span class=&quot;fake-katex&quot;&gt;&amp;radic;&lt;var&gt;r&lt;/var&gt; &amp;equiv;
      &lt;span data-bind=&quot;html: rSqrtHtml&quot;&gt;&lt;/span&gt;
      &amp;equiv; &lt;span class=&quot;intermediate&quot; data-bind=&quot;text: rSqrt&quot;&gt;&lt;/span&gt;
      (mod &lt;var&gt;n&lt;/var&gt;)&lt;/span&gt;, which is a non-trivial square root
    of unity &lt;span class=&quot;fake-katex&quot;&gt;mod &lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
    and therefore &lt;span class=&quot;fake-katex&quot;&gt;&lt;var&gt;n&lt;/var&gt;&lt;/span&gt;
    is &lt;span class=&quot;result&quot;&gt;composite&lt;/span&gt;.
    &lt;span data-bind=&quot;html: factorsHtml&quot;&gt;&lt;/span&gt;
  &lt;/script&gt;
&lt;/p&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;/intro-primality-testing-files/artjuhov-example.js&quot;&gt;&lt;/script&gt;

&lt;p&gt;If \(n\) is an odd composite, then it can be shown (originally by
Rabin) that at least three quarters of all possible witnesses are
Artjuhov witnesses.  Therefore, for a sample of \(k\) possible
witnesses of \(n\), the probability of all of them being Artjuhov
liars is \(\le 4^{-k}\), which is stronger than the bound for the
Fermat primality test.  Furthermore, this bound is unconditional;
there is nothing like Carmichael numbers for the Artjuhov test.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Here is the Artjuhov compositeness test, implemented in
Javascript:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Runs the Artjuhov compositeness test given n &gt; 2 and 1 &amp;lt; a &amp;lt; n-1.
// Finds the largest s such that n-1 = t*2^s, calculates r = a^t mod
// n, then repeatedly squares r (mod n) up to s times until r is
// congruent to -1, 0, or 1 (mod n).  Then, based on the value of s
// and the final value of r and i (the number of squarings),
// determines whether a is an Artjuhov witness to n (i.e., n is
// composite).
//
// Returns a dictionary with, a, n, s, t, i, r, rSqrt = sqrt(r) if i &gt;
// 0 and null otherwise, and isCompositeByArtjuhov, which is true iff
// a is an Artjuhov witness to n.
function testCompositenessByArtjuhov(n, a) {
  n = SNat.cast(n);
  a = SNat.cast(a);

  if (n.le(2)) {
    throw new RangeError(&apos;n must be &gt; 2&apos;);
  }

  if (a.le(1) || a.ge(n)) {
    throw new RangeError(&apos;a must satisfy 1 &amp;lt; a &amp;lt; n&apos;);
  }

  var nMinusOne = n.minus(1);

  // Find the largest s and t such that n-1 = t*2^s.
  var t = nMinusOne;
  var s = new SNat(0);
  while (t.isEven()) {
    t = t.div(2);
    s = s.plus(1);
  }

  // Find the smallest 0 &amp;lt;= i &amp;lt; s such that a^{t*2^i} = 0/-1/+1 (mod
  // n).
  var i = new SNat(0);
  var rSqrt = null;
  var r = a.powMod(t, n);
  while (i.lt(s) &amp;&amp; r.gt(1) &amp;&amp; r.lt(nMinusOne)) {
    i = i.plus(1);
    rSqrt = r;
    r = r.times(r).mod(n);
  }

  var isCompositeByArtjuhov = false;
  if (s.isZero()) {
    // If 0 = i = s, then this reduces to the Fermat primality test.
    isCompositeByArtjuhov = r.ne(1);
  } else if (i.isZero()) {
    // If 0 = i &amp;lt; s, then:
    //
    //   * r = 0    (mod n) -&gt; a^{n-1} = 0 (mod n), and
    //   * r = +/-1 (mod n) -&gt; a^{n-1} = 1 (mod n).
    isCompositeByArtjuhov = r.isZero();
  } else if (i.lt(s)) {
    // If 0 &amp;lt; i &amp;lt; s, then:
    //
    //   * r =  0 (mod n) -&gt; a^{n-1} = 0 (mod n),
    //   * r = +1 (mod n) -&gt; a^{t*2^{i-1}} is a non-trivial square root of
    //                       unity mod n, and
    //   * r = -1 (mod n) -&gt; a^{n-1} = 1 (mod n).
    //
    // Note that the last case means r = n - 1 &gt; 1.
    isCompositeByArtjuhov = r.le(1);
  } else {
    // If 0 &amp;lt; i = s, then:
    //
    //   * r =  0 (mod n) can&apos;t happen,
    //   * r = +1 (mod n) -&gt; a^{t*2^{i-1}} is a non-trivial square root of
    //                       unity mod n, and
    //   * r &gt; +1 (mod n) -&gt; failure of the Fermat primality test.
    isCompositeByArtjuhov = true;
  }

  return {
    a: a,
    n: n,
    t: t,
    s: s,
    i: i,
    r: r,
    rSqrt: rSqrt,
    isCompositeByArtjuhov: isCompositeByArtjuhov
  };
}&lt;/code&gt;&lt;/pre&gt;

With the two compositeness tests above, we can now write a
probabilistic primality test:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns true iff a is a Fermat witness to n, and thus n is
// composite.  a and n must satisfy the same conditions as in
// testCompositenessByFermat.
function hasFermatWitness(n, a) {
  return testCompositenessByFermat(n, a).isCompositeByFermat;
}

// Returns true iff a is an Arjuhov witness to n, and thus n is
// composite.  a and n must satisfy the same conditions as in
// testCompositenessByArtjuhov.
function hasArtjuhovWitness(n, a) {
  return testCompositenessByArtjuhov(n, a).isCompositeByArtjuhov;
}

// Returns true if n is probably prime, based on sampling the given
// number of possible witnesses and testing them against n.  If false
// is returned, then n is definitely composite.
//
// By default, uses the Artjuhov test for witnesses with 20 samples
// and Math.random for the random number generator.  This gives an
// error bound of 4^-20 if true is returned.
function isProbablePrime(n, hasWitness, numSamples, rng) {
  n = SNat.cast(n);
  hasWitness = hasWitness || hasArtjuhovWitness;
  rng = rng || Math.random;
  numSamples = numSamples || 20;

  if (n.le(1)) {
    return false;
  }

  if (n.le(3)) {
    return true;
  }

  if (n.isEven()) {
    return false;
  }

  for (var i = 0; i &amp;lt; numSamples; ++i) {
    var a = SNat.random(2, n.minus(2), rng);
    if (hasWitness(n, a)) {
      return false;
    }
  }

  return true;
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;code&gt;isProbablePrime&lt;/code&gt; called
with &lt;code&gt;hasFermatWitness&lt;/code&gt; is the &lt;em&gt;Fermat primality
test&lt;/em&gt;, and &lt;code&gt;isProbablePrime&lt;/code&gt; called
with &lt;code&gt;hasArtjuhovWitness&lt;/code&gt; is the &lt;em&gt;Miller-Rabin primality
test&lt;/em&gt;.  The latter is the current general primality test of
choice, replacing
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Solovay-Strassen&quot;&gt;Solovay-Strassen
primality test&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We can also use &lt;code&gt;isProbablePrime&lt;/code&gt; to randomly generate
probable primes, which is useful
for &lt;a href=&quot;http://en.wikipedia.org/wiki/RSA_(algorithm)#Key_generation&quot;&gt;cryptographic
applications&lt;/a&gt;:&lt;/p&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;// Returns a probable b-bit prime that is at least 2^b.  All
// parameters but b are passed to isProbablePrime.
function findProbablePrime(b, hasWitness, rng, numSamples) {
  b = SNat.cast(b);

  var lb = (new SNat(2)).pow(b.minus(1));
  var ub = lb.times(2);
  while (true) {
    var n = SNat.random(lb, ub);
    if (isProbablePrime(n, hasWitness, rng, numSamples)) {
      return n;
    }
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, for sufficiently large \(b\), the Fermat primality
  test is acceptable, since Carmichael numbers are so rare and we&apos;re the
  ones generating the possible primes to be tested.&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;There are other primality tests, but they&apos;re less often used in
  practice because they&apos;re
  either &lt;a href=&quot;http://en.wikipedia.org/wiki/Solovay%E2%80%93Strassen_primality_test&quot;&gt;less
  efficient&lt;/a&gt; or &lt;a href=&quot;http://www.pseudoprime.com/pseudo2.pdf&quot;&gt;more
  sophisticated&lt;/a&gt; than the algorithms above, or they require \(n\) to
  have &lt;a href=&quot;http://en.wikipedia.org/wiki/Lucas_primality_test&quot;&gt;special&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Proth%27s_theorem&quot;&gt;properties&lt;/a&gt;.
  Perhaps the most interesting of these tests is
  the &lt;a href=&quot;http://en.wikipedia.org/wiki/Aks_primality_test&quot;&gt;&lt;em&gt;AKS
  primality test&lt;/em&gt;&lt;/a&gt;, which proved once and for all that primality
  testing is in P.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] \(\gcd\) is
    the &lt;a href=&quot;http://en.wikipedia.org/wiki/Greatest_common_divisor&quot;&gt;greatest
    common divisor&lt;/a&gt; function.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] &amp;ldquo;Artjuhov witness&amp;rdquo; is an idiosyncratic
    name on my part; a more common name is &lt;em&gt;strong witness&lt;/em&gt;, which
    I don&apos;t like.
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3]
    &lt;a href=&quot;http://en.wikipedia.org/wiki/Fermat_primality_test#Applications&quot;&gt;According to Wikipedia&lt;/a&gt;, PGP uses the Fermat primality test.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/pair-counterexamples-vector-calculus</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/pair-counterexamples-vector-calculus"/>
    <title>A Pair of Counterexamples in Vector Calculus</title>
    <updated>2011-11-27T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
KaTeXMacros = {
  &quot;\\sgn&quot;: &quot;\\operatorname{sgn}&quot;,
};
&lt;/script&gt;

&lt;p&gt;While recently reviewing some topics in vector calculus, I became
curious as to why violating seemingly innocuous conditions for some
theorems leads to surprisingly wild results.  In fact, I was struck by
how these theorems resemble computer programs, not in some
&lt;a href=&quot;http://en.wikipedia.org/wiki/Curry-Howard_Correspondence&quot;&gt;abstract
way&lt;/a&gt;, but in how the lack of &amp;ldquo;input validation&amp;rdquo; leads
to
&lt;a href=&quot;http://en.wikipedia.org/wiki/Undefined_behavior&quot;&gt;non-obvious
behavior&lt;/a&gt; in the face of erroneous input.&lt;/p&gt;

&lt;p&gt;I found that understanding why these counterexamples lead to wild
results deepened my understanding of the theorems involved and their
  proofs.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; Besides,
pathological examples are more interesting than well-behaved ones!&lt;/p&gt;

&lt;p&gt;First, let&apos;s look at a &amp;ldquo;counterexample&amp;rdquo;
to &lt;a href=&quot;http://en.wikipedia.org/wiki/Green%27s_theorem&quot;&gt;Green&apos;s
theorem&lt;/a&gt;:&lt;/p&gt;

&lt;p class=&quot;example&quot;&gt;1. Two functions \(L, M \colon \mathbb{R}^2 \to \mathbb{R}\) and
  a positively-oriented, piecewise smooth, simple closed curve \(C\)
  in \(\mathbb{R}^2\) enclosing the region \(D\) such that

\[
∮_C L \,dx + M \,dy \ne
∬_D \left( \frac{∂{M}}{∂{x}} - \frac{∂{L}}{∂{y}} \right) \,dx \,dy \text{.}
\]&lt;/p&gt;

&lt;p&gt;Let

\[
  L = -\frac{y}{x^2+y^2} \text{,} \quad M = \frac{x}{x^2+y^2} \text{,}
\]

and \(C\) be a curve going clockwise around the rectangle \(D = [-1,
  1]^2\).&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;  Then the integral of \(L \,dx + M \, dy\) around \(C\) is \(2
π\) since it encloses the origin.  But

\[
\frac{∂{M}}{∂{x}} = \frac{∂{L}}{∂{y}} = \frac{y^2-x^2}{x^2+y^2}
\]

so the difference of the two vanishes everywhere but the origin, where
neither function is defined.  Therefore, the (improper) integral over
\(D\) also vanishes, proving the inequality. &amp;#8718;&lt;/p&gt;

&lt;p&gt;Of course, the easy explanation is that the discontinuity of \(L\)
and \(M\) at the origin violates a condition of Green&apos;s theorem.  But
that doesn&apos;t really tell us anything, so let&apos;s break down the theorem
and see where exactly it fails.&lt;/p&gt;

&lt;p&gt;Green&apos;s theorem is usually proved first for rectangles \([a, b]
\times [c, d]\), which suffices for our purpose.  If \(C\) is a curve
that goes counter-clockwise around such a rectangle \(D\), then we can
easily show that

\[
∮_C L \,dx = - ∬_D \frac{∂{L}}{∂{y}} \,dx \,dy
\]

and

\[
∮_C M \,dy = ∬_D \frac{∂{M}}{∂{x}} \,dx \,dy \text{,}
\]

with the sum of these two formulas proving the theorem.&lt;/p&gt;

&lt;p&gt;So the first sign of trouble is that the theorem freely
interchanges addition and integration.  Since the partial derivatives
of our functions diverge at the origin, if \(D\) contains the origin
then the integrals of those partial derivatives over \(D\) may not
even be defined, even if the integral of their difference is.&lt;/p&gt;

&lt;p&gt;But the problem arises even before that.  The statements above are
proved by showing

\[
∮_C L \,dx = - ∫_a^b \left( ∫_c^d \frac{∂{L}}{∂{y}} \,dy \right) \,dx
\]

and

\[
∮_C M \,dy = ∫_c^d \left( ∫_a^b \frac{∂{M}}{∂{x}} \,dx \right) \,dy
\text{.}
\]

both of which hold for our example.  But notice that in one case we
integrate with respect to \(y\) first, and in the other case we
integrate with respect to \(x\) first.  Therefore, we have to
interchange the order of integration or convert to a double integral
in order to get them to a form where we can add them.  And there&apos;s the
rub: if \(D\) contains the origin, switching the order of integration
for either integral above switches the sign of the result!&lt;/p&gt;

&lt;p&gt;This fully explains the discrepancy; since the result of both
integrals above (with the iteration order preserved) is \(π\),
adding them together as-is gives the expected result of \(2 π\).
But if we switch the iteration order of one of the iterated integrals
as done in the proof of Green&apos;s theorem, then we switch the result of
that integral to \(-π\), which cancels with the result of the other
unchanged integral to produce \(0\).&lt;/p&gt;

&lt;p&gt;So now let&apos;s examine this strange behavior of the sign of an
integration&apos;s result depending on the iteration order.  This leads us
to our next &amp;ldquo;counterexample,&amp;rdquo; this time
for &lt;a href=&quot;http://en.wikipedia.org/wiki/Fubini%27s_theorem&quot;&gt;Fubini&apos;s
theorem&lt;/a&gt;:&lt;/p&gt;

&lt;p class=&quot;example&quot;&gt;2. A function \(f \colon \mathbb{R}^2 \to \mathbb{R}\) whose
  iterated integrals over a rectangle \(D = [a, b] \times [c, d]
  \subset \mathbb{R}^2\) differ.&lt;/p&gt;

&lt;p&gt;Let

\[
f(x, y) = \frac{x^2-y^2}{(x^2+y^2)^2}
\quad \text{ and } \quad
D = [-1, 1]^2\text{.}
\]

The two iterated integrals of \(f\) over \(D\) are usually written as

\[
∫_{-1}^1 \left( ∫_{-1}^1 f(x, y) \,dy \right) \,dx
\qquad \text{ and } \qquad
∫_{-1}^1 \left( ∫_{-1}^1 f(x, y) \,dx \right) \,dy
\]

but let&apos;s define them more carefully to make it easier to justify our
calculations.&lt;/p&gt;

&lt;p&gt;Let

\[
\begin{aligned}
u_k &amp;= y \mapsto f(k, y) \\
v_l &amp;= x \mapsto f(x, l) \text{.}
\end{aligned}
\]

In other words, given the real constants \(k\) and \(l\), construct
the (possibly partial) real functions \(u_k(y)\) and \(v_l(x)\) by
partially-evaluating \(f\) at \(x = k\) and \(y = l\),
respectively.&lt;/p&gt;

&lt;p&gt;Then, if we also let&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;

\[
U(x) = ∫_{-1}^1 u_x(y) \,dy
\qquad \text{ and } \qquad
  V(y) = ∫_{-1}^1 v_y(x) \,dx \text{,}
\]

we can write the iterated integrals as

\[
∫_{-1}^1 U(x) \,dx
\qquad \text{ and } \qquad
∫_{-1}^1 V(y) \,dy \text{.}
\]
&lt;/p&gt;

&lt;p&gt;Computing \(U(x)\) for \(x ≠ 0\), we get&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;

\[
\begin{aligned}
  U(x) &amp;= ∫_{-1}^1 \frac{∂{}}{∂{y}} \left( -\frac{y}{x^2+y^2} \right) \,dy \\
       &amp;= \left. -\frac{y}{x^2+y^2} \right|_{y=-1}^{y=1}              \\
  &amp;= -\frac{2}{x^2+1} \text{.}
\end{aligned}
\]
&lt;/p&gt;

&lt;p&gt;Attempting to evaluate \(U(0)\), we see that

\[
\begin{aligned}
  U(0) &amp;= ∫_{-1}^1 \frac{0^2-y^2}{(0^2+y^2)^2} \,dy \\
       &amp;= - ∫_{-1}^1 \frac{dy}{y^2}
\end{aligned}
\]

which diverges.  So

\[
  U(x) = -\frac{2}{x^2+1} \text{ for } x \ne 0 \text{.}
\]
&lt;/p&gt;

&lt;p&gt;
  By a similar computation, we find that&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;

\[
  V(y) = \frac{2}{y^2+1} \text{ for } y \ne 0 \text{.}
\]
&lt;/p&gt;

&lt;p&gt;Since \(U(x)\) isn&apos;t defined at \(0\), we have to treat it as an
improper integral, although doing so poses no real difficulty:

\[
\begin{aligned}
  ∫_{-1}^1 U(x)\,dx
    &amp;= \lim_{a \nearrow 0} \left( ∫_{-1}^a -\frac{2}{x^2+1} \,dx \right) +
       \lim_{a \searrow 0} \left( ∫_{a}^1 -\frac{2}{x^2+1} \,dx \right) \\
    &amp;= \lim_{a \nearrow 0}
         \Bigl( \left. -2 \arctan(x) \right|_{-1}^{a} \Bigr) +
       \lim_{a \searrow 0}
         \Bigl( \left. -2 \arctan(x) \right|_{a}^{1} \Bigr) \\
    &amp;= \left. -2 \arctan(x) \right|_{-1}^{0} +
       \left. -2 \arctan(x) \right|_{0}^{1} \\
    &amp;= \left. -2 \arctan(x) \right|_{-1}^{1} \\
    &amp;= -π \text{.}
\end{aligned}
\]
&lt;/p&gt;

&lt;p&gt;Similarly,

\[
  ∫_{-1}^1 V(y)\,dy = π \text{,}
\]

so the iterated integrals of \(f(x, y)\) over \([-1, 1]^2\) differ; in
fact, as we claimed above, switching the iteration order switches the
sign of the result. &amp;#8718;&lt;/p&gt;

&lt;p&gt;We can repeat the above calculations for an arbitrary rectangle to
see that the iterated integrals of \(f(x, y)\) differ if \(D\)
contains the origin either as an interior point or a corner.  But
there&apos;s an easier way to prove that statement and also gain some
insight as to why \(f(x, y)\) has this strange property.&lt;/p&gt;

&lt;p&gt;Note that the key facts in the above calculations were that \(U(x)
\lt 0\) for any \(x \ne 0\) and \(V(y) \gt 0\) for any \(y \ne 0\).
Therefore, integrating \(U(x)\) over any interval on the \(x\)-axis
would produce a negative result and integrating \(V(x)\) over any
interval on the \(y\)-axis would produce a positive result, leading to
the difference in iterated integrals.  This holds more generally; for
any \(m, n \gt 0\):

\[
∫_{-n}^n f(x, y) \,dy \lt 0
\qquad \text{ and } \qquad
∫_{-m}^m f(x, y) \,dx \gt 0 \text{.}
\]

Therefore,

\[
∫_{-m}^m \left( ∫_{-n}^n f(x, y) \,dy \right) \,dx \lt 0
\qquad \text{ and } \qquad
∫_{-n}^n \left( ∫_{-m}^m f(x, y) \,dx \right) \,dy \gt 0
\]

so the iterated integrals of \(f(x, y)\) differ over the rectangles
\([-m, m] \times [-n, n]\).  Since any rectangle \(D\) containing the
origin as an interior point must contain some smaller rectangle \(E =
[-m, m] \times [-n, n]\), the iterated integrals of \(f(x, y)\) over
\(E\) differ and therefore must also differ over \(D\).&lt;/p&gt;

&lt;p&gt;Furthermore, since \(f(x, y)\) is even in both \(x\) and \(y\), you
can carry out a similar argument to the above with intervals of the
form \([0, m]\) or \([-m, 0]\) to show that the iterated integrals of
\(f(x, y)\) must also differ over any rectangle with the origin as a
corner.
&lt;/p&gt;

&lt;p&gt;So the essential property of \(f(x, y)\) is that slicing it along
the \(x\)-axis gives a function which has positive area under the
curve on any interval symmetric around \(0\) or with \(0\) as an
endpoint, and that slicing it similarly along the \(y\)-axis gives a
function with has negative area.  Therefore, on a rectangle symmetric
around the origin or with the origin as a corner, one can choose the
sign of the iterated integral by choosing which axis to slice
first.&lt;/p&gt;

&lt;p&gt;The next thing to investigate is how exactly the iterated integrals
of \(f(x, y)\) over the rectangle \(D\) are expressed such that they
differ only when \(D\) contains the origin, especially considering
that the \(f(x, y)\) is expressed in quite a simple form.  To do that,
let&apos;s consider the simple case of a rectangle \(D = [δ, 1] \times
[ϵ, 1]\) where we can vary \(δ\) and \(ϵ\) at
will.&lt;/p&gt;

&lt;p&gt;Let

\[
\begin{aligned}
I_{yx}(δ, ϵ) &amp;=
  ∫_{δ}^1 \left( ∫_{ϵ}^1 f(x, y) \,dy \right) \,dx \\
I_{xy}(δ, ϵ) &amp;=
  ∫_{ϵ}^1 \left( ∫_{δ}^1 f(x, y) \,dx \right) \,dy
\text{.}
\end{aligned}
\]

Then, for \(ϵ ≠ 0\):

\[
\begin{aligned}
I_{yx}(δ, ϵ) &amp;=
  ∫_{δ}^1 \left( ∫_{ϵ}^1
    \frac{y^2-x^2}{(x^2+y^2)^2} \,dy \right) \,dx \\
  &amp;= ∫_{δ}^1 \left(
       \left. -\frac{y}{x^2+y^2} \right|_{y=ϵ}^{y=1} \right) \,dx \\
  &amp;= ∫_{δ}^1 \Biggl(
       -\frac{1}{1+x^2} -
       \left( -\frac{ϵ}{ϵ^2+x^2} \right) \Biggr) \,dx \\
  &amp;= ∫_{δ}^1 \frac{dx/ϵ}{1+(x/ϵ)^2} -
     ∫_{δ}^1 \frac{dx}{1+x^2} \\
  &amp;= \arctan\left(\frac{1}{ϵ}\right) -
     \arctan\left(\frac{δ}{ϵ}\right) -
     \frac{π}{4} + \arctan(δ) \text{,}
\end{aligned}
\]

and for \(ϵ = 0\):

\[
I_{yx}(δ, 0) = -\frac{π}{4} + \arctan(δ) \text{.}
\]

Similarly, for \(δ ≠ 0\):

\[
\begin{aligned}
I_{xy}(δ, ϵ) &amp;=
  ∫_{ϵ}^1 \left( ∫_{δ}^1
    \frac{y^2-x^2}{(x^2+y^2)^2} \,dx \right) \,dy \\
  &amp;= ∫_{ϵ}^1 \left(
       \left. \frac{x}{x^2+y^2} \right|_{x=δ}^{x=1} \right) \,dy \\
  &amp;= ∫_{ϵ}^1 \left(
       \frac{1}{1+y^2} - \frac{δ}{δ^2+x^2} \right) \,dy \\
  &amp;= ∫_{ϵ}^1 \frac{dy}{1+y^2} -
     ∫_{ϵ}^1 \frac{dy/δ}{1+(y/δ)^2} \\
  &amp;= \frac{π}{4} - \arctan(ϵ) -
     \arctan\left(\frac{1}{δ}\right) +
     \arctan\left(\frac{ϵ}{δ}\right) \text{,}
\end{aligned}
\]

and for \(δ = 0\):

\[
I_{xy}(0, ϵ) = \frac{π}{4} - \arctan(ϵ) \text{.}
\]


Then let \(Δ = I_{xy} - I_{yx}\) be the difference between the
two iterated integrals.  We can use the identity

\[
\arctan(x) + \arctan\left(\frac{1}{x}\right) = \frac{π}{2} \sgn(x)
\]

to simplify \(Δ(δ, ϵ)\) if neither \(δ\) nor
\(ϵ\) is zero:

\[
\begin{aligned}
Δ(δ, ϵ)
  &amp;= \bigl( π/4 - \arctan(ϵ) - \arctan(1/δ)
     + \arctan(ϵ/δ) \bigr) \\
  &amp;  \quad \mathbin{-}
     \bigl( \arctan(1/ϵ) - \arctan(δ/ϵ)
     - π/4 + \arctan(δ) \bigr) \\
  &amp;= π/2 - \bigl( \arctan(ϵ) + \arctan(1/ϵ) \bigr) \\
  &amp;  \quad \mathbin{-} \bigl( \arctan(δ) + \arctan(1/δ) \bigr) \\
  &amp;  \quad \mathbin{+}
       \bigl( \arctan(δ/ϵ) + \arctan(ϵ/δ) \bigr) \\
  &amp;= \frac{π}{2} \bigl( 1 - \sgn(ϵ) - \sgn(δ)
     + \sgn(δ/ϵ) \bigr) \text{.}
\end{aligned}
\]
&lt;/p&gt;

&lt;p&gt;
  Using the properties of \(\sgn(x)\), we can simplify this to the final
  expression:

  \[
  Δ(δ, ϵ) =
  \frac{π}{2}
  \bigl( 1 - \sgn(δ) \bigr) \bigl( 1 - \sgn(ϵ) \bigr)
  \]

  which we can prove still holds if either \(δ\) or \(ϵ\) is
  zero (or both).&lt;/p&gt;

&lt;p&gt;So with the simplified expression for \(Δ(δ, ϵ)\),
  it becomes apparent how \(\sgn(x)\) is used to control the value of
  \(Δ(δ, ϵ)\); as long as either \(δ\) or
  \(ϵ\) is positive, \(1 - \sgn(x)\) zeroes out the entire
  expression.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] There are
    actually &lt;a href=&quot;http://amzn.com/048668735X&quot;&gt;whole&lt;/a&gt;
    &lt;a href=&quot;http://amzn.com/0486428753&quot;&gt;books&lt;/a&gt; dedicated to
    counterexamples.  They make good bathroom reading material.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn2&quot;&gt;[2] The vector field \((L, M)\) also serves as the
  canonical &amp;ldquo;counterexample&amp;rdquo; to
  the &lt;a href=&quot;http://en.wikipedia.org/wiki/Gradient_theorem&quot;&gt;gradient
  theorem&lt;/a&gt;. &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn3&quot;&gt;[3] \(U(x)\) and \(V(y)\) are also (partial) real
  functions. &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn4&quot;&gt;[4] We&apos;re justified in applying standard integration
  techniques here since \(u_k(y)\) for \(k \gt 0\) is defined and
  bounded for all \(y\). &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

&lt;p id=&quot;fn5&quot;&gt;[5] Note that \(U(x)\) and \(V(y)\) differ only in
  variable name and sign. &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/evlis-tail-recursion</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/evlis-tail-recursion"/>
    <title>Understanding Evlis Tail Recursion</title>
    <updated>2011-10-28T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;p&gt;While reading
about &lt;a href=&quot;http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%25_sec_3.5&quot;&gt;proper
tail recursion&lt;/a&gt; in Scheme, I encountered a similar but obscure
optimization called &lt;em&gt;evlis tail recursion&lt;/em&gt;.
In &lt;a href=&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8567&amp;rep=rep1&amp;type=pdf&quot;&gt;the
paper where it was first described&lt;/a&gt;, the author claims it
&quot;dramatically improve the space performance of many programs,&quot; which
sounded promising.&lt;/p&gt;

&lt;p&gt;However, the few places where its mentioned don&apos;t do much more than
state its definition and claim its usefulness.  Hopefully I can
provide a more detailed analysis here.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Consider the straightforward factorial implementation in
  Scheme:&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-lisp&quot;&gt;(define (fact n) (if (&amp;lt;= n 1) 1 (* n (fact (- n 1)))))&lt;/code&gt;&lt;/pre&gt;

It is not tail-recursive, since the recursive call is nested in
another procedure call.  However, it&apos;s &lt;em&gt;almost&lt;/em&gt; tail-recursive;
the call to &lt;code&gt;*&lt;/code&gt; is a tail call, and the recursive call is
its last subexpression, so it will be the last subexpression to be
evaluated.&lt;/div&gt;

&lt;p&gt;Recall what happens when a procedure call (represented as a list of
subexpressions) is evaluated: each subexpression is evaluated, and the
first result (the procedure) is passed the other results as
  arguments.&lt;sup&gt;&lt;a href=&quot;#fn2&quot; id=&quot;r2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Evlis tail recursion can be described as follows: when performing a
  procedure call and during the evaluation of the last subexpression,
  the calling environment is discarded as soon as it is not
  required.&lt;sup&gt;&lt;a href=&quot;#fn3&quot; id=&quot;r3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; The distinction
  between evlis tail recursion and proper tail recursion is subtle.
  Proper tail recursion requires only that the calling environment be
  discarded before the actual procedure call; evlis tail recursion
  discards the calling environment even sooner, if possible.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;An example will help to clarify things.  Given &lt;code&gt;fact&lt;/code&gt; as
  defined above, say you evaluate &lt;code&gt;(fact 10)&lt;/code&gt; and you&apos;re in
  the procedure call with &lt;code&gt;n = 5&lt;/code&gt;.  The call stack of a
  properly tail-recursive interpreter would look like this:

  &lt;style&gt;
  pre.stack {
    margin-top: 1em;
    margin-bottom: 1em;
  }
  &lt;/style&gt;

&lt;pre class=&quot;stack&quot;&gt;
evalExpr
--------
env = { n: 10 } -&amp;gt; &amp;lt;top-level environment&amp;gt;
expr = &apos;(* n (fact (- n 1)))&apos;
proc = &amp;lt;native function: *&amp;gt;
args = [10, &amp;lt;pending evalExpr(&apos;(fact (- n 1))&apos;, env)&amp;gt;]

evalExpr
--------
env = { n: 9 } -&amp;gt; &amp;lt;top-level environment&amp;gt;
expr = &apos;(* n (fact (- n 1)))&apos;
proc = &amp;lt;native function: *&amp;gt;
args = [9, &amp;lt;pending evalExpr(&apos;(fact (- n 1))&apos;, env)&amp;gt;]

...

evalExpr
--------
env = { n: 6 } -&amp;gt; &amp;lt;top-level environment&amp;gt;
expr = &apos;(* n (fact (- n 1)))&apos;
proc = &amp;lt;native function: *&amp;gt;
args = [6, &amp;lt;pending evalExpr(&apos;(fact (- n 1))&apos;, env)&amp;gt;]

evalExpr
--------
env = { n: 5 } -&amp;gt; &amp;lt;top-level environment&amp;gt;
expr = &apos;(if ...)&apos;
&lt;/pre&gt;

whereas the call stack of an evlis tail-recursive interpreter would
look like this:

&lt;pre class=&quot;stack&quot;&gt;
evalExpr
--------
env = { n: 5 } -&amp;gt; &amp;lt;top-level environment&amp;gt;
pendingProcedureCalls = [
  [&amp;lt;native function: *&amp;gt;, 10],
  [&amp;lt;native function: *&amp;gt;, 9],
  ...
  [&amp;lt;native function: *&amp;gt;, 6]
]
expr = (if ...)
&lt;/pre&gt;

In this implementation, the last subexpression of a procedure call
is evaluated exactly like a tail expression, but the procedure call
and non-last subexpressions are pushed onto a stack.  Whenever an
expression is reduced to a simple one and the stack is non-empty, a
pending procedure call with its other args are popped off, and it is
called with the reduced expression as the final argument.&lt;/div&gt;

&lt;p&gt;Note that this didn&apos;t change the asymptotic behavior of the
procedure; it still takes \(Θ(n)\) memory to evaluate.  However,
only the bare minimum of information is saved: the list of pending
functions and their arguments.  Other auxiliary variables, and
crucially the nested calling environments, aren&apos;t preserved, leading
to a significant constant-factor reduction in memory.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;This raises the question: Are there cases where evlis tail
recursion leads to better asymptotic behavior?  In fact, yes; consider
  the following (contrived) implementation of
  factorial&lt;sup&gt;&lt;a href=&quot;#fn4&quot; id=&quot;r4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-lisp&quot;&gt;(define (fact2 n)
  (define v (make-vector n))
  (* (n (fact2 (- n 1)))))&lt;/code&gt;&lt;/pre&gt;

Before the main body of the function, a vector of size \(n\) is
defined.  This means that the environments in the call stack of a
  properly tail-recursive interpreter would look like this:&lt;sup&gt;&lt;a href=&quot;#fn5&quot; id=&quot;r5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;

&lt;pre class=&quot;stack&quot;&gt;
env = { n: 10, v: &amp;lt;vector of size 10&amp;gt; } -&amp;gt; &amp;lt;top-level environment&amp;gt;
env = { n: 9, v: &amp;lt;vector of size 9&amp;gt; } -&amp;gt; &amp;lt;top-level environment&amp;gt;
env = { n: 8, v: &amp;lt;vector of size 8&amp;gt; } -&amp;gt; &amp;lt;top-level environment&amp;gt;
env = { n: 7, v: &amp;lt;vector of size 7&amp;gt; } -&amp;gt; &amp;lt;top-level environment&amp;gt;
...
&lt;/pre&gt;

whereas the an evlis tail-recursive interpreter would keep around
only the current environment.  Therefore, the properly tail-recursive
interpreter would require \(Θ(n^2)\) memory to
evaluate &lt;code&gt;(fact2 n)&lt;/code&gt; while the evlis tail-recursive
interpreter would require only \(Θ(n)\)&lt;/div&gt;

&lt;p&gt;Studying examples like the one above enabled me to finally
understand how evlin tail recursion worked and what sort of savings it
gives.  However, I have yet to find a practical example where evlis
tail recursion gives the same sort of asymptotic gains as described
above, and I&apos;d be interested to receive some.  But perhaps the &quot;large
gains&quot; mentioned in the various papers describing it are only
constant-factor reductions in memory.&lt;/p&gt;

&lt;p&gt;In any case, another important difference in Scheme between proper
tail recursion and evlis tail recursion is that the former is
a &lt;em&gt;language feature&lt;/em&gt; and the latter is
an &lt;em&gt;optimization&lt;/em&gt;.  That means that it is acceptable and even
encouraged to write Scheme programs that take advantage of proper tail
recursion, but it would be unwise to rely on evlis tail recursion for
the asymptotic performance of your function.  Instead, one should
treat it just as a nice constant-factor speed gain.&lt;/p&gt;

&lt;p&gt;Note that it is easy to make evlis tail recursion &quot;smarter.&quot;  Since
Scheme doesn&apos;t specify the order of argument evaluation, an
interpreter could evaluate arguments to maximize the gains from evlis
tail recursion.  As an easy example, if we had switched the arguments
to &lt;code&gt;+&lt;/code&gt; in &lt;code&gt;fact&lt;/code&gt; above, making it
non-evlis-tail-recursive, a smart compiler could still treat it as
such.  A possible rule of thumb would be to pick a non-trivial
function call to evaluate last.&lt;/p&gt;

&lt;p&gt;To complete the picture, I will outline below the evaluation
function for a simple evlis tail-recursive Scheme interpreter in
Javascript.  All of the sources I&apos;ve found describe it in terms of
compilers, so I think it&apos;ll be useful to have a reference
implementation for an interpreter.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Let&apos;s say we already have a properly tail-recursive
  interpreter:&lt;sup&gt;&lt;a href=&quot;#fn6&quot; id=&quot;r6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;lang-javascript&quot;&gt;function evalExpr(expr, env) {
  // Fake tail calls with a while loop and continue.
  while (true) {
    // Symbols, constants, quoted expressions, and lambdas.
    if (isSimpleExpr(expr)) {
      // The only exit point.
      return evalSimpleExpr(expr, env);
    }
    // (if test conseq alt)
    if (isSpecialForm(expr, &apos;if&apos;)) {
      expr = evalExpr(expr[1], env) ? expr[2] : expr[3];
      continue;
    }
    // (set! var expr)
    if (isSpecialForm(expr, &apos;set!&apos;)) {
      env.set(expr[1], evalExpr(expr[2], env));
      expr = null;
      continue;
    }
    // (define var expr?)
    if (isSpecialForm(expr, &apos;define&apos;)) {
      env.define(expr[1], evalExpr(expr[2], env));
      expr = null;
      continue;
    }
    // (begin expr*)
    if (isSpecialForm(expr, &apos;begin&apos;)) {
      if (expr.length == 1) {
        expr = null;
        continue;
      }
      // Evaluate all but the last subexpression.
      for (var i = 1; i &amp;lt; expr.length - 1; ++i) {
        evalExpr(expr[i], env);
      }
      expr = expr[expr.length - 1];
      continue;
    }
    // (proc expr*)
    var proc = evalExpr(expr.shift(), env);
    var args = expr.map(function(subExpr) { return evalExpr(subExpr, env); });
    // proc.run() returns its body in result.expr and the environment
    // in which to evaluate it (with all its arguments bound) in
    // result.env.
    var result = proc.run(args);
    expr = result.expr;
    // The only time when env is changed.
    env = result.env;
    continue;
  }
}&lt;/code&gt;&lt;/pre&gt;

  Then implementing evlis tail recursion requires only a few
  changes:

  &lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;lang-javascript&quot;&gt;function evalExpr(expr, env) {
  // This is a stack of procedures and their non-final arguments that
  // are waiting for their final argument to be evaluated.
  var pendingProcedureCalls = [];
  while (true) {
    if (isSimpleExpr(expr)) {
      expr = evalSimpleExpr(expr, env);
      // Discard calling environment.
      env = null;
      if (pendingProcedureCalls.length == 0) {
        // No pending procedure calls, so we&apos;re done (the only exit
        // point).
        return expr;
      }
      var args = pendingProcedureCalls.pop();
      var proc = args.shift();
      args.push(expr);
      var result = proc.run(args);
      expr = result.expr;
      // Change to new environment (the only time when env is
      // changed).
      env = result.env;
      continue;
    }
    ...
    // Everything else remains the same.
    ...
    // (proc expr*)
    var nonFinalSubExprs =
      exprs.slice(0, -1).map(
        function(subExpr) { return evalExpr(subExpr, env); });
    pendingProcecureCalls.push(nonFinalSubExprs);
    // Evaluate the last subexpression as a tail call.
    expr = expr[expr.length - 1];
    continue;
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] Assume a left-to-right evaluation order for now.
    &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn2&quot;&gt;[2] The function that takes a list of expressions, evaluates them,
    and returns the results as a list is traditionally
    called &lt;code&gt;evlis&lt;/code&gt;, hence the name of the optimization.
    &lt;a href=&quot;#r2&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn3&quot;&gt;[3] This assumes that the calling environment isn&apos;t
    stored somewhere else.
    &lt;a href=&quot;#r3&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn4&quot;&gt;[4] This was adapted from an example
    in &lt;a href=&quot;ftp://ftp.ccs.neu.edu/pub/people/will/tail.pdf&quot;&gt;Proper
    Tail Recursion and Space Efficiency&lt;/a&gt;.
    &lt;a href=&quot;#r4&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn5&quot;&gt;[5] Assume that the interpreter isn&apos;t smart enough to deduce that \(v\)
    can be optimized out since it&apos;s never used.
    &lt;a href=&quot;#r5&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;

  &lt;p id=&quot;fn6&quot;&gt;[6] Adapted from Peter Norvig&apos;s
    excellent &lt;a href=&quot;http://norvig.com/lispy.html&quot;&gt;&lt;code&gt;lis.py&lt;/code&gt;&lt;/a&gt;.
    &lt;a href=&quot;#r6&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;

</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/elementary-gaussian-proof</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/elementary-gaussian-proof"/>
    <title>An Elementary Way to Calculate the Gaussian Integral</title>
    <updated>2011-01-06T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;p&gt;
While reading &lt;a href=&quot;http://gowers.wordpress.com&quot;&gt;Timothy Gowers&apos;s blog&lt;/a&gt; I stumbled on
&lt;a href=&quot;http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-239&quot;&gt;Scott Carnahan&apos;s comment&lt;/a&gt;
describing an elegant calculation of the Gaussian integral
\[
∫_{-∞}^{∞} e^{-x^2} \, dx = \sqrt{π}\text{.}
\]
I was so struck by its elementary character that I imagined what it
would be like written up, say, as an extra credit exercise in a
single-variable calculus class:
&lt;/p&gt;

&lt;div class=&quot;exercise&quot;&gt;
  &lt;span class=&quot;exercise&quot;&gt;Exercise 1.&lt;/span&gt;
  (&lt;span class=&quot;exercise-name&quot;&gt;The Gaussian integral&lt;/span&gt;.)  Let
  \[
  F(t) = ∫_0^t e^{-x^2} \, dx
  \text{, }\qquad
  G(t) = ∫_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx
  \text{,}
  \]
  and \(H(t) = F(t)^2 + G(t)\).

  &lt;ol class=&quot;exercise-list&quot;&gt;
  &lt;li&gt;Calculate \(H(0)\).&lt;/li&gt;

  &lt;li&gt;Calculate and simplify \(H&apos;(t)\).  What does this
    imply about \(H(t)\)?&lt;/li&gt;

  &lt;li&gt;Use part&amp;nbsp;b to calculate \(F(∞) =
    \displaystyle\lim_{t \to ∞} F(t)\).&lt;/li&gt;

  &lt;li&gt;Use part&amp;nbsp;c to calculate
    \[
    ∫_{-∞}^{∞} e^{-x^2} \, dx\text{.}
    \]&lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;
Although this is simpler than
&lt;a href=&quot;http://en.wikipedia.org/wiki/Gaussian_integral#Careful_proof&quot;&gt;the
  usual calculation of the Gaussian integral&lt;/a&gt;, for which careful
reasoning is needed to justify the use of polar coordinates, it seems
more like a
&lt;a href=&quot;http://en.wikipedia.org/wiki/Certificate_(complexity)&quot;&gt;certificate&lt;/a&gt;
than an actual
proof; you can convince yourself that the calculation is valid, but
you gain no insight into the reasoning that led up to it.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;
&lt;/p&gt;

&lt;p&gt;
Fortunately, &lt;a href=&quot;http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-243&quot;&gt;David Speyer&apos;s
  comment&lt;/a&gt; solves the mystery; \(G(t)\) falls out of doing the
integration in Cartesian coordinates over a triangular region.  Just
for kicks, here&apos;s how I imagine an exercise based on this method would
look like (this time for a multi-variable calculus class):
&lt;/p&gt;

&lt;div class=&quot;exercise&quot;&gt;
  &lt;span class=&quot;exercise&quot;&gt;Exercise 2.&lt;/span&gt;
  (&lt;span class=&quot;exercise-name&quot;&gt;The Gaussian integral in Cartesian coordinates.&lt;/span&gt;) Let
  \[
  A(t) = ∬\limits_{\triangle_t} e^{-(x^2+y^2)} \, dx \, dy
  \]
  where \(\triangle_t\) is the triangle with vertices \((0, 0)\), \((t,
  0)\), and \((t, t)\).
  &lt;!-- TODO(akalin): Draw a diagram for \triangle_t. --&gt;

  &lt;ol class=&quot;exercise-list&quot;&gt;
  &lt;li&gt;Use the substitution \(y = sx\) to reduce \(A(t)\) to a
    one-dimensional integral.&lt;/li&gt;

  &lt;li&gt;Use part&amp;nbsp;a to calculate \(A(∞) =
    \lim_{t \to ∞} A(t)\).&lt;/li&gt;

  &lt;li&gt;Use part&amp;nbsp;b to calculate
    \[
    ∫_{-∞}^{∞} e^{-x^2} \, dx\text{.}
    \]&lt;/li&gt;

  &lt;li&gt;Let
    \[
    F(t) = ∫_0^t e^{-x^2} \, dx
    \qquad\text{ and }\qquad
    G(t) = ∫_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx
    \text{.}
    \]
    Use part&amp;nbsp;a to relate \(F(t)\) to \(G(t)\).&lt;/li&gt;

  &lt;li&gt;Use part&amp;nbsp;d to derive a proof of part&amp;nbsp;c
    using only single-variable calculus.&lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] Similar to proving \(\sum\limits_{i=0}^n m^3 =
      \frac{n^2(n+1)^2}{4}\) by induction. &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/parallelizing-flac-encoding</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/parallelizing-flac-encoding"/>
    <title>Parallelizing FLAC Encoding</title>
    <updated>2008-05-05T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;style type=&quot;text/css&quot; media=&quot;all&quot;&gt;
/*&lt;![CDATA[*/
table.benchmark-results,
table.benchmark-results tr,
table.benchmark-results th {
  border: 1px solid black;
}

table.benchmark-results {
  font-family: &quot;Arial&quot;, &quot;Helvetica&quot;, sans-serif;
}

table.benchmark-results th,
table.benchmark-results td {
padding: .2em .4em;
}
/*]]&gt;*/
&lt;/style&gt;

&lt;p&gt;One thing I noticed ever since getting a multi-core system
was that the reference FLAC encoder is not multi-threaded.  This isn&apos;t
a huge problem for most people as you can simply encode multiple files
at the same time but I usually rip my audio CDs into a single audio
file with a cue sheet instead of separate track files and so I am
usually encoding a single large audio file instead of multiple smaller
ones.  Even so, encoding a CD-length audio file takes under a minute
but I thought it would be a fun and useful weekend project to see if I
could parallelize the simpler &lt;a href=&quot;http://flac.cvs.sourceforge.net/flac/flac/examples/c/encode/file/main.c?revision=1.2&amp;amp;view=markup&quot;&gt;example encoder&lt;/a&gt;.  The

&lt;a href=&quot;http://flac.sourceforge.net/format.html&quot;&gt;format specification&lt;/a&gt; indicates that input blocks are
encoded independently which makes the problem &lt;a href=&quot;http://en.wikipedia.org/wiki/Embarrassingly_parallel&quot;&gt;embarassingly
parallel&lt;/a&gt; and trawling through the &lt;a href=&quot;http://www.mail-archive.com/flac-dev@xiph.org/msg00724.html&quot;&gt;FLAC
mailing lists&lt;/a&gt; reveals that no one has had the time
nor the inclination to look into it.&lt;/p&gt;

&lt;p&gt;However, I was able to write a multithreaded FLAC encoder that
achieves near-linear speedup with only minor hacks to the libFLAC API.
Here are some encode times on an 8-core 2.8 GHz Xeon 5400 for a 636 MB
wave file (some caveats are discussed below):&lt;/p&gt;

&lt;table class=&quot;benchmark-results&quot;&gt;
&lt;tr&gt;
&lt;th&gt;baseline&lt;/th&gt;&lt;td&gt;34.906s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;1 threads&lt;/th&gt;&lt;td&gt;31.424s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;2 threads&lt;/th&gt;&lt;td&gt;16.936s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;4 threads&lt;/th&gt;&lt;td&gt;10.173s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;8 threads&lt;/th&gt;&lt;td&gt;6.808s&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;I took the simple approach of sharding the input file into

&lt;var&gt;n&lt;/var&gt; roughly equal pieces and passing them to &lt;var&gt;n&lt;/var&gt;
encoder threads, assembling the output file from the &lt;var&gt;n&lt;/var&gt;
output buffers.  In general this is not a good way of partitioning the
workload as time is wasted if one shard takes significantly more time
to process but for my use case this isn&apos;t a significant problem.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The best way to share the input file among the encoding threads is to
map it into memory.  In fact, memory-mapped file I/O has so many
advantages in general that I&apos;m surprised at how little I see it used,
although it does have the disadvantage of requiring a bit more
bookkeeping.  Here is how I use it in my multithreaded encoder
(slightly paraphrased):

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;#include &amp;lt;fcntl.h&amp;gt; /* open() */
#include &amp;lt;sys/mman.h&amp;gt; /* mmap()/munmap() */
#include &amp;lt;sys/stat.h&amp;gt; /* stat() */
#include &amp;lt;unistd.h&amp;gt; /* close() */

int main(int argc, char *argv[]) {
  int fdin;
  struct stat buf;
  char *bufin;

  fdin = open(argv[1], O_RDONLY);
  fstat(fdin, &amp;buf);
  bufin = mmap(NULL, buf.st_size, PROT_READ, MAP_SHARED, fdin, 0);

  /* The input file (passed in via argv[1]) is now mapped read-only to
     the memory region in bufin up to bufin + buf.st_size. */

  /* Note that you can work directly with the mapped input file
     instead of fread()ing the header into a buffer. */
  if((buf.st_size &amp;lt; WAV_HEADER_SIZE) ||
     memcmp(bufin, &quot;RIFF&quot;, 4) ||
     memcmp(bufin+8, &quot;WAVEfmt \020\000\000\000\001\000\002\000&quot;, 16) ||
     memcmp(bufin+32, &quot;\004\000\020\000data&quot;, 8)) {
    /* Invalid input file: print error and exit. */
  }

  for (i = 0; i &amp;lt; num_threads; ++i) {
    shard_infos[i].bufin = bufin + WAV_HEADER_SIZE + i * bytes_per_thread;
    /* bufsize for the last thread may be slightly larger. */
    shard_infos[i].bufsize = bytes_per_thread;
  }

  /* Spawn encode threads (which calls encode_shard() below) passing
     an element of shard_infos to each. */

  ...

  munmap(bufin, buf.st_size);
  close(fdin);
}

FLAC__bool encode_shard(struct shard_info *shard_info) {
  FLAC__StreamEncoder *encoder = FLAC__stream_encoder_new();

  ...

  /* The input file is paged in lazily as this function accesses
     bufin from shard_info-&gt;bufin. */
  FLAC__stream_encoder_process_interleaved(encoder,
                                           shard_info-&gt;bufin,
                                           shard_info-&gt;bufsize);

  ...

  FLAC__stream_encoder_delete(encoder);
}&lt;/code&gt;&lt;/pre&gt;

However, handling the output file is a bit trickier.  Since the
encoded FLAC data output by the threads vary in size we have to wait
until all encoding threads are done before we know the right offsets
to write the output data.  A convenient and fast way to handle this is
to use asynchronous I/O; we know where to write the output data for a
shard as soon as the encoding thread for all previous shards finish so
we simply wait for the encoding threads in shard order and queue up a
write request after each thread finishes.  Here I use the POSIX
asynchronous I/O API in my multithreaded encoder (again, slightly
paraphrased):

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;#include &amp;lt;aio.h&amp;gt; /* aio_*() */
#include &amp;lt;pthread.h&amp;gt; /* pthread_*() */
#include &amp;lt;string.h&amp;gt; /* memset() */

int main(int argc, char *argv[]) {
  int fdout;
  pthread_t threads[MAX_THREADS];
  struct aiocb aiocbs[MAX_THREADS];
  unsigned long byte_offset = 0;

  /* mmap input file in. */

  ...

  fdout = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC);

  /* Spawn encode threads passing an element of shard_infos to
     each. */

  ...

  /* Wait for each thread in sequence and queue up output writes. */

  /* We need to zero out any aiocb struct that we use before we fill
     in any members. */
  memset(aiocbs, 0, num_threads * sizeof(*aiocbs));
  for (i = 0; i &amp;lt; num_threads; ++i) {
    pthread_join(threads[i], NULL);
    aiocbs[i].aio_buf = shard_infos[i].bufout;
    aiocbs[i].aio_nbytes = shards_infos[i].bytes_written;
    aiocbs[i].aio_offset = byte_offset;
    aiocbs[i].aio_fildes = fdout;
    aio_write(&amp;aiocbs[i]);
    byte_offset += shard_infos[i].bytes_written;
  }

  /* Wait for all output writes to finish. */

  for (i = 0; i &amp;lt; num_threads; ++i) {
    const struct aiocb *aiocbp = &amp;aiocbs[i];
    aio_suspend(&amp;aiocbp, 1, NULL);
    aio_return(&amp;aiocbs[i]);
  }

  close(fdout);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The POSIX API is a bit unwieldy for this use case; ideally, there
would be a version of &lt;code&gt;aio_suspend()&lt;/code&gt; that would suspend the
calling process until &lt;em&gt;all&lt;/em&gt; of the specified requests have completed.
As it is now the simplest way is to loop through the requests as
above, especially since the maximum number of simultaneous
asynchronous I/O requests is usually quite small (16 on my system).&lt;/p&gt;

&lt;p&gt;Also, I found that the OS X implementation of &lt;code&gt;aio_write()&lt;/code&gt;
did not obey this part of the specified behavior:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre&gt;  If O_APPEND is set for aiocbp-&gt;aio_fildes, aio_write() operations append
  to the file in the same order as the calls were made.  If O_APPEND is not
  set for the file descriptor, the write operation will occur at the abso-
  lute position from the beginning of the file plus aiocbp-&gt;aio_offset.&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;but it was just as easy (and clearer) to explicitly set the correct
offset.&lt;/p&gt;

&lt;p&gt;I had to hack up libFLAC a bit to implement my multithreaded encoder.
I exposed the &lt;code&gt;update_metadata_()&lt;/code&gt; to make it easy to write the
correct number of total samples in the metadata field and also to zero
out the min/max framesize fields.  I also exposed the
&lt;code&gt;FLAC__stream_encoder_set_do_md5()&lt;/code&gt; function (which it should
have been in the first place) so that I can turn off the writing of
md5 field in the metadata.  Finally, I added the function
&lt;code&gt;FLAC__stream_encoder_set_current_frame_number()&lt;/code&gt; so that the
correct frame numbers are written at encode time.&lt;/p&gt;

&lt;p&gt;For comparison purposes I turn off md5 calculation in my multithreaded
encoder as well as the baseline one.  Since calling
&lt;code&gt;FLAC__stream_encoder_set_current_frame_number()&lt;/code&gt; causes
crashes with vericiation turned on I also turn that off.  The numbers
above reflect that so they&apos;re underestimates of how a production
multithreaded encoder would perform.  However, the essential behavior
of the program shouldn&apos;t change much.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/parallelizing-flac-encoding-files/patch-libFLAC.in&quot;&gt;Here&lt;/a&gt; is a patch file for the &lt;a href=&quot;http://downloads.sourceforge.net/flac/flac-1.2.1.tar.gz?modtime=1189961849&amp;amp;big_mirror=0&quot;&gt;flac 1.2.1
source&lt;/a&gt; that implements the hacks I described
above.  &lt;a href=&quot;/parallelizing-flac-encoding-files/mt_encode.c&quot;&gt;Here&lt;/a&gt; is the source for my multithreaded FLAC
encoder.  I&apos;ve tested it with &lt;code&gt;i686-apple-darwin9-gcc-4.0.1&lt;/code&gt;

and &lt;code&gt;i686-apple-darwin9-gcc-4.2.1&lt;/code&gt; on Mac OS X.  I got the
above numbers compiling
&lt;code&gt;mt_encode.c&lt;/code&gt; with gcc 4.2.1 and the switches &lt;code&gt;-Wall
-Werror -g -O2 -ansi&lt;/code&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;

</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/bfpp</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/bfpp"/>
    <title>bfpp</title>
    <updated>2008-04-23T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;div class=&quot;p&quot;&gt;Okay, I lied; you can&apos;t &lt;em&gt;really&lt;/em&gt; embed &lt;a href=&quot;http://www.muppetlabs.com/~breadbox/bf/&quot;&gt;brainfuck&lt;/a&gt; in C++
but you can get pretty close.  Here is an example:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;#include &quot;bfpp.h&quot;

int main() {
  // Prints out factorial numbers in sequence.  Adapted from
  // http://www.hevanet.com/cristofd/brainfuck/factorial.b .
  bfpp
    * + + + + + + + + + + * * * + * + -- * * * + -- - -- &amp; &amp; &amp; &amp; &amp; -- +
    &amp; &amp; &amp; &amp; &amp; ++ * * -- -- - ++ * -- &amp; &amp; + * + * - ++ &amp; -- * + &amp; - ++ &amp;
    -- * + &amp; - -- * + &amp; - -- * + &amp; - -- * + &amp; - -- * + &amp; - -- * + &amp; - --
    * + &amp; - -- * + &amp; - -- * + &amp; - -- * -- - ++ * * * * + * + &amp; &amp; &amp; &amp; &amp; &amp;
    - -- * + &amp; - ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ * -- &amp; + * - ++ + * *
    * * * ++ &amp; &amp; &amp; &amp; &amp; -- &amp; &amp; &amp; &amp; &amp; ++ * * * * * * * -- * * * * * ++ + +
    -- - &amp; &amp; &amp; &amp; &amp; ++ * * * * * * - ++ + * * * * * ++ &amp; -- * + + &amp; - ++
    &amp; &amp; &amp; &amp; -- &amp; -- * + &amp; - ++ &amp; &amp; &amp; &amp; ++ * * -- - * -- - ++ + + + + + +
    -- &amp; + + + + + + + + * - ++ * * * * ++ &amp; &amp; &amp; &amp; &amp; -- &amp; -- * + * + &amp; &amp;
    - ++ * ! &amp; &amp; &amp; &amp; &amp; ++ * ! * * * * ++ 
  end_bfpp
}&lt;/code&gt;&lt;/pre&gt;

I call this variant &amp;ldquo;bfpp&amp;rdquo; as it has some pretty significant
differences from brainfuck.  First of all, some commands had to be
adapted; although &lt;code&gt;+&lt;/code&gt; and &lt;code&gt;-&lt;/code&gt; remain the same,

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;&amp;lt;&lt;/code&gt; and &lt;code&gt;&amp;gt;&lt;/code&gt; were changed to &lt;code&gt;&amp;amp;&lt;/code&gt; and
    &lt;code&gt;*&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;.&lt;/code&gt; and &lt;code&gt;,&lt;/code&gt; were changed to &lt;code&gt;!&lt;/code&gt; and &lt;code&gt;~&lt;/code&gt;

    (mnemonic: &lt;code&gt;!&lt;/code&gt; contains &lt;code&gt;.&lt;/code&gt; within it and &lt;code&gt;~&lt;/code&gt;
    is kind of like a sideways &lt;code&gt;,&lt;/code&gt;),&lt;/li&gt;
  &lt;li&gt;and &lt;code&gt;[&lt;/code&gt; and &lt;code&gt;]&lt;/code&gt; were changed to &lt;code&gt;--&lt;/code&gt; and

    &lt;code&gt;++&lt;/code&gt; (mnemonic: &lt;code&gt;[&lt;/code&gt; and &lt;code&gt;]&lt;/code&gt; are the most
    complex brainfuck commands [to implement, at least] and so deserve to be mapped to the wider
    and more prominent operators).&lt;/li&gt;
&lt;/ul&gt;

This magic is made possible by the fact that brainfuck has exactly
eight commands and C++ has exactly eight overloadable symbolic unary
operators.  Add some macros to hide the C++ scaffolding behind some
delimiters and you have a convincing illusion of an embedded language.&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;/bfpp-files/bfpp.h&quot;&gt;bfpp.h&lt;/a&gt; implements a simple (&amp;lt;100 lines) bfpp interpreter and
the magic described above, and &lt;a href=&quot;/bfpp-files/bf2bfpp.c&quot;&gt;bf2bfpp.c&lt;/a&gt; is a
straightforward translator from brainfuck to bfpp.  Gotta love C++!&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;

</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/longest-palindrome-linear-time</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/longest-palindrome-linear-time"/>
    <title>Finding the Longest Palindromic Substring in Linear Time</title>
    <updated>2007-11-28T00:00:00-08:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;style type=&quot;text/css&quot; media=&quot;all&quot;&gt;
/*&lt;![CDATA[*/
span.palind {
  color: red;
}
/*]]&gt;*/
&lt;/style&gt;

&lt;script&gt;
function trackOutboundLink(url) {
  ga(&apos;send&apos;, &apos;event&apos;, &apos;outbound&apos;, &apos;click&apos;, url, {
    &apos;hitCallback&apos;: function() { document.location = url; }
  });
}
&lt;/script&gt;

&lt;p&gt;Another &lt;a href=&quot;http://www.reddit.com/r/programming/comments/2dykz/finding_palindromes_repairing_endos_dna_and_the/&quot;
onclick=&quot;trackOutboundLink(&apos;http://programming.reddit.com/info/2dykz/comments/c2e7r0&apos;);
return false;&quot;&gt;interesting problem&lt;/a&gt; I stumbled across on reddit is
finding the longest substring of a given string that is a palindrome.
I
found &lt;a href=&quot;http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html&quot;
onclick=&quot;trackOutboundLink(&apos;http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html&apos;);
return false;&quot;&gt;the explanation on Johan Jeuring&apos;s blog&lt;/a&gt; somewhat
confusing and I had to spend some time poring over the Haskell code
(eventually rewriting it in Python) and walking through examples
before it &quot;clicked.&quot;  I haven&apos;t found any other explanations of the
same approach so hopefully my explanation below will help the next
person who is curious about this problem.&lt;/p&gt;

&lt;p&gt;Of course, the most naive solution would be to exhaustively examine
all \(n \choose 2\) substrings of the given \(n\)-length string, test each
one if it&apos;s a palindrome, and keep track of the longest one seen so
far.  This has complexity \(O(n^3)\), but we can easily do better by
realizing that a palindrome is centered on either a letter (for
odd-length palindromes) or a space between letters (for even-length
palindromes).  Therefore we can examine all \(2n + 1\) possible centers
and find the longest palindrome for that center, keeping track of the
overall longest palindrome.  This has complexity \(O(n^2)\).&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;It is not immediately clear that we can do better but
if we&apos;re told that an \(Θ(n)\) algorithm exists we can infer that
the algorithm is most likely structured as an iteration through all
possible centers.  As an off-the-cuff first attempt, we can adapt the
above algorithm by keeping track of the current center and expanding
until we find the longest palindrome around that center, in which case
we then consider the last letter (or space) of that palindrome as the
new center.  The algorithm (which isn&apos;t correct) looks like this
informally:

&lt;ol type=&quot;1&quot;&gt;
  &lt;li&gt;Set the current center to the first letter.&lt;/li&gt;
  &lt;li&gt;Loop while the current center is valid:
    &lt;ol type=&quot;a&quot;&gt;
      &lt;li&gt;Expand to the left and right simultaneously until we find
	the largest palindrome around this center.&lt;/li&gt;
      &lt;li&gt;If the current palindrome is bigger than the stored maximum
	one, store the current one as the maximum one.&lt;/li&gt;
      &lt;li&gt;Set the space following the current palindrome as the
	current center unless the two letters immediately surrounding
	it are different, in which case set the last letter of the
	current palindrome as the current center.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;Return the stored maximum palindrome.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;This seems to work but it doesn&apos;t handle all cases: consider the
string &quot;abababa&quot;.  The first non-trivial palindrome we see is &quot;&lt;span
class=&quot;palind&quot;&gt;a&lt;/span&gt;|bababa&quot;, followed by &quot;&lt;span
class=&quot;palind&quot;&gt;aba&lt;/span&gt;|baba&quot;.  Considering the current space as the
center doesn&apos;t get us anywhere but considering the preceding letter
(the second &apos;a&apos;) as the center, we can expand to get &quot;&lt;span
class=&quot;palind&quot;&gt;ababa&lt;/span&gt;|ba&quot;.  From this state, considering the
current space again doesn&apos;t get us anywhere but considering the preceding
letter as the center, we can expand to get &quot;ab&lt;span
class=&quot;palind&quot;&gt;ababa&lt;/span&gt;|&quot;.  However, this is incorrect as the
longest palindrome is actually the entire string!  We can remedy this
case by changing the algorithm to try and set the new center to be one
before the end of the last palindrome, but it is clear that having a
fixed &quot;lookbehind&quot; doesn&apos;t solve the general case and anything more
than that will probably bump us back up to quadratic time.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The key question is this: given the state from the example above,
&quot;&lt;span class=&quot;palind&quot;&gt;ababa&lt;/span&gt;|ba&quot;, what makes the second &apos;b&apos; so
special that it should be the new center?  To use another example, in
&quot;&lt;span class=&quot;palind&quot;&gt;abcbabcba&lt;/span&gt;|bcba&quot;, what makes the second
&apos;c&apos; so special that it should be the new center?  Hopefully, the
answer to this question will lead to the answer to the more important
question: once we stop expanding the palindrome around the current
center, how do we pick the next center?  To answer the first question,
first notice that the current palindromes in the above examples
themselves contain smaller non-trivial palindromes: &quot;ababa&quot; contains
&quot;aba&quot; and &quot;abcbabcba&quot; contains &quot;abcba&quot; which also contains &quot;bcb&quot;.
Then, notice that if we expand around the &quot;special&quot; letters, we get a
palindrome which shares a right edge with the current palindrome; that
is, &lt;em&gt;the longest palindrome around the special letters are proper
suffixes of the current palindrome&lt;/em&gt;.  With a little thought, we
can then answer the second question: &lt;em&gt;to pick the next center, take
the center of the longest palindromic proper suffix of the current
palindrome&lt;/em&gt;.  Our algorithm then looks like this:

&lt;ol type=&quot;1&quot;&gt;
  &lt;li&gt;Set the current center to the first letter.&lt;/li&gt;
  &lt;li&gt;Loop while the current center is valid:
    &lt;ol type=&quot;a&quot;&gt;
      &lt;li&gt;Expand to the left and right simultaneously until we find
	the largest palindrome around this center.&lt;/li&gt;
      &lt;li&gt;If the current palindrome is bigger than the stored maximum
	one, store the current one as the maximum one.&lt;/li&gt;
      &lt;li&gt;Find the maximal palindromic proper suffix of the current
	palindrome.&lt;/li&gt;
      &lt;li&gt;Set the center of the suffix from c as the current center
	and start expanding from the suffix as it is palindromic.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;Return the stored maximum palindrome.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;However, unless step 2c can be done efficiently, it will cause the
algorithm to be superlinear.  Doing step 2c efficiently seems
impossible since we have to examine the entire current palindrome to
find the longest palindromic suffix unless we somehow keep track of
extra state as we progress through the input string.  Notice that the
longest palindromic suffix would by definition also be a palindrome of
the input string so it might suffice to keep track of every palindrome
that we see as we move through the string and hopefully, by the time
we finish expanding around a given center, we would know where all the
palindromes with centers lying to the left of the current one are.
However, if the longest palindromic suffix has a center to the right
of the current center, we would not know about it.  But we also have
at our disposal the very useful fact that &lt;em&gt;a palindromic proper
suffix of a palindrome has a corresponding dual palindromic proper
prefix&lt;/em&gt;.  For example, in one of our examples above, &quot;abcbabcba&quot;,
notice that &quot;abcba&quot; appears twice: once as a prefix and once as a
suffix.  Therefore, while we wouldn&apos;t know about all the palindromic
suffixes of our current palindrome, we would know about either it or
its dual.&lt;/p&gt;

&lt;p&gt;Another crucial realization is the fact that we don&apos;t have to keep
track of all the palindromes we&apos;ve seen.  To use the example
&quot;abcbabcba&quot; again, we don&apos;t really care about &quot;bcb&quot; that much, since
it&apos;s already contained in the palindrome &quot;abcba&quot;.  In fact, we only
really care about keeping track of the longest palindromes for a given
center or equivalently, the length of the longest palindrome for a
given center.  But this is simply a more general version of our
original problem, which is to find the longest palindrome around
&lt;em&gt;any&lt;/em&gt; center!  Thus, if we can keep track of this state
efficiently, maybe by taking advantage of the properties of
palindromes, we don&apos;t have to keep track of the maximal palindrome and
can instead figure it out at the very end.&lt;/p&gt;

&lt;p&gt;Unfortunately, we seem to be back where we started; the second
naive algorithm that we have is simply to loop through all possible
centers and for each one find the longest palindrome around that
center.  But our discussion has led us to a different incremental
formulation: given a current center, the longest palindrome around
that center, and a list of the lengths of the longest palindromes
around the centers to the left of the current center, can we figure
out the new center to consider and extend the list of longest
palindrome lengths up to that center efficiently?  For example, if we
have the state:&lt;/p&gt;

&lt;p&gt;&amp;lt;&quot;ab&lt;span class=&quot;palind&quot;&gt;a&lt;/span&gt;ba|??&quot;, [0, 1, 0, 3, 0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]&amp;gt;&lt;/p&gt;

&lt;p&gt;where the highlighted letter is the current center, the vertical line
is our current position, the question marks represent unread
characters or unknown quantities, and the array represents the list
of longest palindrome lengths by center, can we get to the state:&lt;/p&gt;

&lt;p&gt;&amp;lt;&quot;aba&lt;span class=&quot;palind&quot;&gt;b&lt;/span&gt;a|??&quot;, [0, 1, 0, 3, 0, 5, 0, ?, ?, ?, ?, ?, ?, ?, ?]&amp;gt;&lt;/p&gt;

&lt;p&gt;and then to:&lt;/p&gt;

&lt;p&gt;&amp;lt;&quot;aba&lt;span class=&quot;palind&quot;&gt;b&lt;/span&gt;aba|&quot;, [0, 1, 0, 3, 0, 5, 0, 7, 0, 5, 0, 3, 0, 1, 0]&amp;gt;&lt;/p&gt;

&lt;p&gt;efficiently?  The crucial thing to notice is that the longest
palindrome lengths array (we&apos;ll call it simply the lengths array) in
the final state is palindromic since the original string is
palindromic.  In fact, the lengths array obeys a more general
property: &lt;em&gt;the longest palindrome &lt;var&gt;d&lt;/var&gt; places to the right
of the current center (the &lt;var&gt;d&lt;/var&gt;-right palindrome) is at least
as long as the longest palindrome d places to the left of the current
center (the &lt;var&gt;d&lt;/var&gt;-left palindrome) if the &lt;var&gt;d&lt;/var&gt;-left
palindrome is completely contained in the longest palindrome around
the current center (the center palindrome), and it is of equal length
if the &lt;var&gt;d&lt;/var&gt;-left palindrome is not a prefix of the center
palindrome or if the center palindrome is a suffix of the entire
string&lt;/em&gt;.  This then implies that we can more or less fill in the
values to the right of the current center from the values to the left
of the current center.  For example, from [0, 1, 0, 3, 0, 5, ?, ?, ?,
?, ?, ?, ?, ?, ?] we can get to [0, 1, 0, 3, 0, 5, 0, &amp;ge;3?, 0,
&amp;ge;1?, 0, ?, ?, ?, ?].  This also implies that the first unknown
entry (in this case, &amp;ge;3?) should be the new center because it
means that the center palindrome is not a suffix of the input string
(i.e., we&apos;re not done) and that the &lt;var&gt;d&lt;/var&gt;-left palindrome is a
prefix of the center palindrome.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;From these observations we can construct our final algorithm which
returns the lengths array, and from which it is easy to find the
longest palindromic substring:

&lt;ol type=&quot;1&quot;&gt;
  &lt;li&gt;Initialize the lengths array to the number of possible
  centers.&lt;/li&gt;
  &lt;li&gt;Set the current center to the first center.&lt;/li&gt;
  &lt;li&gt;Loop while the current center is valid:
    &lt;ol type=&quot;a&quot;&gt;
      &lt;li&gt;Expand to the left and right simultaneously until we find
	the largest palindrome around this center.&lt;/li&gt;
      &lt;li&gt;Fill in the appropriate entry in the longest palindrome
	lengths array.&lt;/li&gt;
      &lt;li&gt;Iterate through the longest palindrome lengths array
	backwards and fill in the corresponding values to the right of
	the entry for the current center until an unknown value (as
	described above) is encountered.&lt;/li&gt;
      &lt;li&gt;set the new center to the index of this unknown value.&lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;Return the lengths array.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

&lt;p&gt;Note that at each step of the algorithm we&apos;re either incrementing
our current position in the input string or filling in an entry in the
lengths array.  Since the lengths array has size linear in the size of
the input array, the algorithm has worst-case linear running time.
Since given the lengths array we can find and return the longest
palindromic substring in linear time, a linear-time algorithm to find
the longest palindromic substring is the composition of these two
operations.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;Here is Python code that implements the above algorithm (although
it is closer to Johan Jeuring&apos;s Haskell implementation than to the
above description):

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;def fastLongestPalindromes(seq):
    &quot;&quot;&quot;
    Behaves identically to naiveLongestPalindrome (see below), but
    runs in linear time.
    &quot;&quot;&quot;
    seqLen = len(seq)
    l = []
    i = 0
    palLen = 0
    # Loop invariant: seq[(i - palLen):i] is a palindrome.
    # Loop invariant: len(l) &amp;gt;= 2 * i - palLen. The code path that
    # increments palLen skips the l-filling inner-loop.
    # Loop invariant: len(l) &amp;lt; 2 * i + 1. Any code path that
    # increments i past seqLen - 1 exits the loop early and so skips
    # the l-filling inner loop.
    while i &amp;lt; seqLen:
        # First, see if we can extend the current palindrome.  Note
        # that the center of the palindrome remains fixed.
        if i &amp;gt; palLen and seq[i - palLen - 1] == seq[i]:
            palLen += 2
            i += 1
            continue

        # The current palindrome is as large as it gets, so we append
        # it.
        l.append(palLen)

        # Now to make further progress, we look for a smaller
        # palindrome sharing the right edge with the current
        # palindrome.  If we find one, we can try to expand it and see
        # where that takes us.  At the same time, we can fill the
        # values for l that we neglected during the loop above. We
        # make use of our knowledge of the length of the previous
        # palindrome (palLen) and the fact that the values of l for
        # positions on the right half of the palindrome are closely
        # related to the values of the corresponding positions on the
        # left half of the palindrome.

        # Traverse backwards starting from the second-to-last index up
        # to the edge of the last palindrome.
        s = len(l) - 2
        e = s - palLen
        for j in xrange(s, e, -1):
            # d is the value l[j] must have in order for the
            # palindrome centered there to share the left edge with
            # the last palindrome.  (Drawing it out is helpful to
            # understanding why the - 1 is there.)
            d = j - e - 1

            # We check to see if the palindrome at l[j] shares a left
            # edge with the last palindrome.  If so, the corresponding
            # palindrome on the right half must share the right edge
            # with the last palindrome, and so we have a new value for
            # palLen.
            #
            # An exercise for the reader: in this place in the code you
            # might think that you can replace the == with &amp;gt;= to improve
            # performance.  This does not change the correctness of the
            # algorithm but it does hurt performance, contrary to
            # expectations.  Why?
            if l[j] == d:
                palLen = d
                # We actually want to go to the beginning of the outer
                # loop, but Python doesn&apos;t have loop labels.  Instead,
                # we use an else block corresponding to the inner
                # loop, which gets executed only when the for loop
                # exits normally (i.e., not via break).
                break

            # Otherwise, we just copy the value over to the right
            # side.  We have to bound l[i] because palindromes on the
            # left side could extend past the left edge of the last
            # palindrome, whereas their counterparts won&apos;t extend past
            # the right edge.
            l.append(min(d, l[j]))
        else:
            # This code is executed in two cases: when the for loop
            # isn&apos;t taken at all (palLen == 0) or the inner loop was
            # unable to find a palindrome sharing the left edge with
            # the last palindrome.  In either case, we&apos;re free to
            # consider the palindrome centered at seq[i].
            palLen = 1
            i += 1

    # We know from the loop invariant that len(l) &amp;lt; 2 * seqLen + 1, so
    # we must fill in the remaining values of l.

    # Obviously, the last palindrome we&apos;re looking at can&apos;t grow any
    # more.
    l.append(palLen)

    # Traverse backwards starting from the second-to-last index up
    # until we get l to size 2 * seqLen + 1. We can deduce from the
    # loop invariants we have enough elements.
    lLen = len(l)
    s = lLen - 2
    e = s - (2 * seqLen + 1 - lLen)
    for i in xrange(s, e, -1):
        # The d here uses the same formula as the d in the inner loop
        # above.  (Computes distance to left edge of the last
        # palindrome.)
        d = i - e - 1
        # We bound l[i] with min for the same reason as in the inner
        # loop above.
        l.append(min(d, l[i]))

    return l&lt;/code&gt;&lt;/pre&gt;

And here is a naive quadratic version for comparison:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;def naiveLongestPalindromes(seq):
    &quot;&quot;&quot;
    Given a sequence seq, returns a list l such that l[2 * i + 1]
    holds the length of the longest palindrome centered at seq[i]
    (which must be odd), l[2 * i] holds the length of the longest
    palindrome centered between seq[i - 1] and seq[i] (which must be
    even), and l[2 * len(seq)] holds the length of the longest
    palindrome centered past the last element of seq (which must be 0,
    as is l[0]).

    The actual palindrome for l[i] is seq[s:(s + l[i])] where s is i
    // 2 - l[i] // 2. (// is integer division.)

    Example:
    naiveLongestPalindrome(&apos;ababa&apos;) -&gt; [0, 1, 0, 3, 0, 5, 0, 3, 0, 1]
    
    Runs in quadratic time.
    &quot;&quot;&quot;
    seqLen = len(seq)
    lLen = 2 * seqLen + 1
    l = []

    for i in xrange(lLen):
        # If i is even (i.e., we&apos;re on a space), this will produce e
        # == s.  Otherwise, we&apos;re on an element and e == s + 1, as a
        # single letter is trivially a palindrome.
        s = i / 2
        e = s + i % 2

        # Loop invariant: seq[s:e] is a palindrome.
        while s &amp;gt; 0 and e &amp;lt; seqLen and seq[s - 1] == seq[e]:
            s -= 1
            e += 1

        l.append(e - s)

    return l&lt;/code&gt;&lt;/pre&gt;

Note that this is not the only efficient solution to this problem;
building a suffix tree is linear in the length of the input string and
you can use one to solve this problem but as Johan also mentions,
that is a much less direct and efficient solution compared to this
one.&lt;/div&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;

</content>
  </entry>
  
  <entry>
    <id>https://www.akalin.com/number-theory-haskell-foray</id>
    <link type="text/html" rel="alternate" href="https://www.akalin.com/number-theory-haskell-foray"/>
    <title>A Foray into Number Theory with Haskell</title>
    <updated>2007-07-06T00:00:00-07:00</updated>
    <author>
  <name>Fred Akalin</name>
  <uri>https://www.akalin.com/</uri>
</author>
<rights>© Fred Akalin
2005–2021.
All rights reserved.</rights>

    <content type="html">&lt;script&gt;
// See https://github.com/Khan/KaTeX/issues/85 .
KaTeXMacros = {
  &quot;\\cfrac&quot;: &quot;\\dfrac{#1}{#2}\\kern-1.2pt&quot;,
};
&lt;/script&gt;

&lt;div class=&quot;p&quot;&gt;I encountered
&lt;a href=&quot;http://programming.reddit.com/info/216p9/comments&quot;&gt;an
interesting problem&lt;/a&gt; on reddit a few days ago which can be
paraphrased as follows:

&lt;blockquote&gt;&lt;p&gt;Find a perfect square \(s\) such that \(1597s + 1\) is also
  perfect square.&lt;/p&gt;&lt;/blockquote&gt;
&lt;/div&gt;

&lt;p&gt;After reading the discussion about implementing a brute-force
algorithm to solve the problem and spending a futile half-hour or so
trying my hand at find a better way, someone noticed that the problem
was an instance
of &lt;a href=&quot;http://en.wikipedia.org/wiki/Pell%27s_equation&quot;&gt;Pell&apos;s
equation&lt;/a&gt; which is known to have an elegant and fast solution;
indeed, he posted
a &lt;a href=&quot;http://programming.reddit.com/info/216p9/comments/c21dpn&quot;&gt;one-liner
in Mathematica&lt;/a&gt; solving the given problem. However, I wanted to try
coding up the solution myself as the Mathematica solution, while
succinct, isn&apos;t very enlightening since the heavy lifting is already
done by a built-in function and an arbitrary constant was used for this
particular instance of Pell&apos;s equation.&lt;/p&gt;

&lt;p&gt;Pell&apos;s equation is simply the
&lt;a href=&quot;http://en.wikipedia.org/wiki/Diophantine_equation&quot;&gt;Diophantine
  equation&lt;/a&gt; \(x^2 - dy^2 = 1\) for a given
\(d\)&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;r1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;; being Diophantine means
that all variables involved take on only integer values. (In our
original problem, \(d\) is 1597 and we are asked for \(y^2\).) The
solution involves finding the &lt;em&gt;continued fraction expansion&lt;/em&gt; of
\(\sqrt{d}\), finding the first &lt;em&gt;convergent&lt;/em&gt; of the expansion
that satisfies Pell&apos;s equation, and then generating all other
solutions from that
&lt;em&gt;fundamental solution&lt;/em&gt;. We rule out the trivial solution \(x =
1\), \(y = 0\) which also implies that if \(d\) is a perfect square then
there is no solution.&lt;/p&gt;

&lt;p&gt;A continued fraction is an expression of the form:
\[
  x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cfrac{1}{\ddots\,}}}}
\]
where all \(a_i\) are integers and all but the
first one are positive.  The standard math notation for continued
fractions is quite unwieldy so from now on we&apos;ll use \(\left \langle
a_0; a_1, a_2, \dotsc \right \rangle\) instead of the above.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;The theory of continued fractions is a rich and beautiful one but
  for now we&apos;ll just state a few facts:

  &lt;ul&gt;
    &lt;li&gt;The continued fraction expansion of a number is (mostly) unique.&lt;/li&gt;
    &lt;li&gt;The continued fraction expansion of a rational number is
      finite.&lt;/li&gt;
    &lt;li&gt;The continued fraction expansion of a irrational number is
      infinite.&lt;/li&gt;
    &lt;li&gt;A &lt;a href=&quot;http://en.wikipedia.org/wiki/Quadratic_surd&quot;&gt;quadratic
      surd&lt;/a&gt; is a number of the form \(\frac{a + \sqrt{b}}{c}\)
      where
      \(a\), \(b\), and \(c\) are integers.  Except
      maybe for the first term, the continued fraction expansion of a
      quadratic surd is periodic; that is, it repeats forever after a
      certain number of terms. This applies in particular to the square root
      of an integer.&lt;/li&gt;
    &lt;li&gt;Truncating an infinite continued fraction to get a finite
      continued fraction gives (in some sense) an optimal rational
      approximation to the irrational number represented by the infinite
      continued fraction.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Given a quadratic surd it is fairly easy to manipulate it into the
form \(a + \frac{1}{q}\) where \(q\) is another quadratic surd. This fact
can be used to come up with an algorithm to find the continued
fraction expansion of a square
root. Wikipedia &lt;a href=&quot;http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Continued_fraction_expansion&quot;&gt;explains
it pretty well&lt;/a&gt; so I won&apos;t go over it, but here is my Haskell
implementation:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-haskell&quot;&gt;sqrt_continued_fraction n = [ a_i | (_, _, a_i) &amp;lt;- mdas ]
    where
      mdas = iterate get_next_triplet (m_0, d_0, a_0)

      m_0 = 0
      d_0 = 1
      a_0 = truncate $ sqrt $ fromIntegral n

      get_next_triplet (m_i, d_i, a_i) = (m_j, d_j, a_j)
          where
            m_j = d_i * a_i - m_i
            d_j = (n - m_j * m_j) `div` d_i
            a_j = (a_0 + m_j) `div` d_j&lt;/code&gt;&lt;/pre&gt;

and here are some examples:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;Prelude Main&gt; take 20 $ sqrt_continued_fraction 2
[1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]

Prelude Main&gt; take 20 $ sqrt_continued_fraction 103
[10,6,1,2,1,1,9,1,1,2,1,6,20,6,1,2,1,1,9,1]

Prelude Main&gt; take 20 $ sqrt_continued_fraction 36
[6,*** Exception: divide by zero&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;(Note that we&apos;re assuming that we won&apos;t be called with a perfect
square. Also, do you notice anything interesting about the periodic
portion of the continued fractions, particularly of \(\sqrt{103}\)?)&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;For those who are unfamiliar with Haskell, here&apos;s a quick list of key facts:

  &lt;ul&gt;
    &lt;li&gt;The first line takes a list of triplets and forms a list of all
      third elements, which is what we&apos;re interested in. (The other two
      elements of the triplet are auxiliary variables used by the
      algorithm.)&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;iterate&lt;/code&gt; is a function which takes in another
      function &lt;code&gt;f&lt;/code&gt;, an initial variable &lt;code&gt;x&lt;/code&gt;, and
      returns the infinite list &lt;code&gt;[ x, f(x), f(f(x)), f(f(f(x))),
  ... ]&lt;/code&gt;.&lt;/li&gt;
    &lt;li&gt;Note that Haskell
      uses &lt;a href=&quot;http://en.wikipedia.org/wiki/Lazy_evaluation&quot;&gt;lazy
      evaluation&lt;/a&gt; and so this function does not take an infinite amount
      of time to run; all its elements are evaluated (and memoized) only
      when needed.&lt;/li&gt;
    &lt;li&gt;The rest of the function is a straightforward representation of
      the meat of the algorithm described in the above Wikipedia entry.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;p&gt;It may not be clear what \(\sqrt{d}\) and its continued fraction
expansion has to do with solving Pell&apos;s equation. However, notice that
if \(x\) and \(y\) solve Pell&apos;s equation then manipulating Pell&apos;s equation
to get \(\sqrt{d}\) on one side reveals that \(\frac{x}{y}\) is a good
approximation of \(\sqrt{n}\). In fact, it is so good that you can prove
that \(\frac{x}{y}\) &lt;em&gt;must&lt;/em&gt; come from truncating the continued
fraction expansion of \(\sqrt{d}\).&lt;/p&gt;

&lt;p&gt;This leads us to the following: if you have an infinite continued
fraction \(\left \langle a_0; a_1, a_2, \dotsc \right \rangle\) you can
truncate it into a finite continued fraction \(\left \langle a_0; a_1,
a_2, \dotsc, a_i \right \rangle\) and simplify it into the rational
number \(\frac{p_i}{q_i}\).  The sequence \(\frac{p_0}{q_0},
\frac{p_1}{q_1}, \frac{p_2}{q_2}, \dotsc\) forms the
&lt;a href=&quot;http://en.wikipedia.org/wiki/Convergent_%28continued_fraction%29&quot;&gt;&lt;em&gt;convergents&lt;/em&gt;&lt;/a&gt;
of \(\left \langle a_0; a_1, a_2, \dotsc \right \rangle\) and converges to
its represented irrational number.&lt;/p&gt;

&lt;div class=&quot;p&quot;&gt;It turns out you can calculate \(p_{i+1}\) and \(q_{i+1}\)
efficiently from \(p_i\), \(q_i\), \(p_{i-1}\), \(q_{i-1}\), and \(a_{i+1}\)
using
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Fundamental_recurrence_formulas&quot;&gt;&lt;em&gt;fundamental
recurrence formulas&lt;/em&gt;&lt;/a&gt; (which can be proved by induction). Here
is my Haskell implementation:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-haskell&quot;&gt;get_convergents (a_0 : a_1 : as) = pqs
    where
      pqs = (p_0, q_0) : (p_1, q_1) :
            zipWith3 get_next_convergent pqs (tail pqs) as

      p_0 = a_0
      q_0 = 1

      p_1 = a_1 * a_0 + 1
      q_1 = a_1

      get_next_convergent (p_i, q_i) (p_j, q_j) a_k = (p_k, q_k)
          where
            p_k = a_k * p_j + p_i
            q_k = a_k * q_j + q_i&lt;/code&gt;&lt;/pre&gt;

and some more examples:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;Prelude Main&gt; take 8 $ get_convergents $ sqrt_continued_fraction 2
[(1,1),(3,2),(7,5),(17,12),(41,29),(99,70),(239,169),(577,408)]

Prelude Main&gt; take 8 $ get_convergents $ sqrt_continued_fraction 103
[(10,1),(61,6),(71,7),(203,20),(274,27),(477,47),(4567,450),(5044,497)]

Prelude Main&gt; take 8 $ get_convergents $ sqrt_continued_fraction 1597
[(39,1),(40,1),(1039,26),(1079,27),(2118,53),(3197,80),(27694,693),(113973,2852)]

Prelude Main&gt; let divFrac (x, y) = (fromInteger x) / (fromInteger y)

Prelude Main&gt; take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 2
[1.0,1.5,1.4,1.4166666666666667,1.4137931034482758,1.4142857142857144,1.4142011834319526,1.4142156862745099]

Prelude Main&gt; take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 103
[10.0,10.166666666666666,10.142857142857142,10.15,10.148148148148149,10.148936170212766,10.148888888888889,10.148893360160965]

Prelude Main&gt; take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 1597
[39.0,40.0,39.96153846153846,39.96296296296296,39.9622641509434,39.9625,39.96248196248196,39.9624824684432]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Here are a few more quick facts to help those unfamiliar with
  Haskell:

  &lt;ul&gt;
    &lt;li&gt;The expression &lt;code&gt;a : as&lt;/code&gt; forms a new list from the
      element &lt;code&gt;a&lt;/code&gt; and the existing list &lt;code&gt;as&lt;/code&gt;
      (equivalent to &lt;code&gt;cons&lt;/code&gt; in Lisp).&lt;/li&gt;
    &lt;li&gt;&lt;code&gt;zipWith3&lt;/code&gt; is a function that takes in a
      function &lt;code&gt;f&lt;/code&gt;, three lists &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;,
      and &lt;code&gt;c&lt;/code&gt; of the same (possibly infinite)
      length &lt;code&gt;n&lt;/code&gt;, and forms the new list
      &lt;code&gt;[ f(a[0], b[0], c[0]), f(a[1], b[1], c[1]), ..., f(a[n], b[n],
  c[n]) ]&lt;/code&gt;.&lt;/li&gt;
    &lt;li&gt;Note that the result of &lt;code&gt;zipWith3&lt;/code&gt; is part of the
      variable &lt;code&gt;pqs&lt;/code&gt; which itself appears (twice!) in the
      arguments to &lt;code&gt;zipWith3&lt;/code&gt;. This is a Haskell idiom and
      reflects the fact that the recurrence formulas define a convergent
      in terms of its two previous convergents. A simpler example (using
      the Fibonacci sequence) can be found in the
      &lt;a href=&quot;http://en.wikipedia.org/wiki/Lazy_evaluation&quot;&gt;Wikipedia
        entry for lazy evaluation&lt;/a&gt;.&lt;/li&gt;
    &lt;li&gt;Haskell has built-in data types for integers of arbitrary size
      which is necessary as the numerators and denominators of the
      convergents get large quickly. In fact, Haskell has built-in
      data types for rational numbers (represented as fractions) but it
      doesn&apos;t help us much here.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Since we are guaranteed that some convergent eventually satisfies
  Pell&apos;s equation, we can write a simple function to generate all
  convergents, test each one to see if it satisfies Pell&apos;s equation,
  and return the first one we see. Here is the Haskell implementation:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-haskell&quot;&gt;get_pell_fundamental_solution n = head $ solutions
    where
      solutions = [ (p, q) | (p, q) &amp;lt;- convergents, p * p - n * q * q == 1 ]

      convergents = get_convergents $ sqrt_continued_fraction n&lt;/code&gt;&lt;/pre&gt;

Note the use of the
  Haskell&apos;s &lt;a href=&quot;http://en.wikipedia.org/wiki/List_comprehension&quot;&gt;list
  comprehension&lt;/a&gt; syntax, similar to Python, which expresses what I
just described in a matter reminiscent of set notation.&lt;/div&gt;

&lt;div class=&quot;p&quot;&gt;Here is the full Haskell program designed so its output may be
  conveniently piped
  to &lt;a href=&quot;http://en.wikipedia.org/wiki/Bc_programming_language&quot;&gt;bc&lt;/a&gt;
  for verification:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-haskell&quot;&gt;module Main where

import System (getArgs)

sqrt_continued_fraction :: (Integral a) =&gt; a -&gt; [a]
{- ... the sqrt_continued_fraction function explained above ... -}

get_convergents :: (Integral a) =&gt; [a] -&gt; [(a, a)]
{- ... the get_convergents function explained above ... -}

get_pell_fundamental_solution :: (Integral a) =&gt; a -&gt; (a, a)
{- ... the get_pell_fundamental_solution function explained above ... -}

main :: IO ()
main = do
  args &amp;lt;- System.getArgs
  let d      = (read $ head $ args :: Integer)
      (p, q) = get_pell_fundamental_solution d in
    putStr $ &quot;d = &quot; ++ (show d) ++ &quot;\n&quot; ++
             &quot;p = &quot; ++ (show p) ++ &quot;\n&quot; ++
             &quot;q = &quot; ++ (show q) ++ &quot;\n&quot; ++
             &quot;p^2 - d * q^2 == 1\n&quot;&lt;/code&gt;&lt;/pre&gt;

and here is it in action:

&lt;pre class=&quot;code-container&quot;&gt;&lt;code class=&quot;language-shell&quot;&gt;$ ./solve_pell 1597
d = 1597
p = 519711527755463096224266385375638449943026746249
q = 13004986088790772250309504643908671520836229100
p^2 - d * q^2 == 1&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The solution to the original problem is therefore:&lt;br/&gt;
&lt;strong&gt;5054112910466227478111803017176109047976100000000.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we&apos;ve found a method to get &lt;em&gt;a&lt;/em&gt; solution, the
question remains as to whether it&apos;s the only one. In fact it is not,
but it is the minimal one, and all other solutions (of which there are
an infinite number) can be generated from this fundamental one with a
simple recurrence relation as described on
the &lt;a href=&quot;http://en.wikipedia.org/wiki/Pell%27s_equation#Solution_technique&quot;&gt;Wikipedia
article&lt;/a&gt;. My program above can be easily extended to generate all
solutions instead of just the fundamental one (I&apos;ll leave it to the
reader as an exercise).&lt;/p&gt;

&lt;p&gt;One remaining question is the efficiency of this algorithm. For
  simplicity, let&apos;s neglect the cost of the arbitrary-precision
  arithmetic involved and assume that the incremental cost of generating
  each term of the continued fraction expansion and the convergents is
  constant. Then the main cost is just how many convergents we have to
  generate before we find one that satisfies Pell&apos;s equation. In fact,
  it turns out that this depends on the length of the period of the
  continued fraction expansion of \(\sqrt{d}\), which has a rough upper
  bound of \(O(\ln(d \sqrt{d}))\). Therefore, the cost of solving Pell&apos;s
  equation (in terms of how many convergents to generate) for a given
  \(n\)-digit number is \(O(n 2^{n/2})\). This is pretty expensive already,
  although it&apos;s still much better than brute-force search (which is on
  the order of exponentiating the above expression). Can we do better?
  Well, sort of; it turns out the length of the answer is of the same
  order as the expression above, so any algorithm that explicitly
  outputs a solution necessarily takes that long. However, if you can
  somehow factor \(d\) into \(s d&apos;\), where \(s\) is a perfect square and \(d&apos;\)
  is &lt;a href=&quot;http://en.wikipedia.org/wiki/Squarefree&quot;&gt;squarefree&lt;/a&gt;
  (i.e., not divisible by any perfect square), then you can solve Pell&apos;s
  equation for the smaller number \(d&apos;\) and output the solution for \(d&apos;\)
  as the smaller fundamental solution and an expression raised to a
  certain power involving it. Note that in general this involves
  factoring \(d\), another hard problem, but for which there exists tons
  of prior work. An interested reader can peruse the papers
  by &lt;a href=&quot;http://www.ams.org/notices/200202/fea-lenstra.pdf&quot;&gt;Lenstra&lt;/a&gt;
  and &lt;a href=&quot;http://www.math.nyu.edu/~crorres/Archimedes/Cattle/cattle_vardi.pdf&quot;&gt;Vardi&lt;/a&gt;
  for more details.&lt;/p&gt;

&lt;p&gt;As a final note, one of the things I really like about number
  theory is that investigating such a simple program can lead you down
  surprising avenues of mathematics and computational theory. In fact,
  I&apos;ve had to omit a lot of things I had planned to say to avoid growing
  this entry to be longer than it already is. Hopefully, this entry
  helps someone else learn more about this interesting corner of number
  theory.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Like this post? Subscribe to
  &lt;!-- The image is 256x256, the center of the dot is 189 pixels from the
     top, and the radius of the dot is 24. Therefore, the dot is 43/256 =
     0.16796875 of the image height above the bottom.--&gt;
&lt;a href=&quot;feed/atom&quot;&gt;my feed &lt;img src=&quot;feed-icon.svg&quot; alt=&quot;RSS icon&quot; style=&quot;width: 1em; height: 1em; vertical-align: -0.16796875em;&quot; /&gt;&lt;/a&gt;

  or follow me on
  &lt;a href=&quot;https://twitter.com/fakalin&quot;&gt;Twitter &lt;img src=&quot;twitter-icon.svg&quot; alt=&quot;Twitter icon&quot; style=&quot;width: 1em; height 1em;&quot; /&gt;&lt;/a&gt;.&lt;/p&gt;


&lt;section class=&quot;footnotes&quot;&gt;
  &lt;header&gt;
    &lt;h2&gt;Footnotes&lt;/h2&gt;
  &lt;/header&gt;

  &lt;p id=&quot;fn1&quot;&gt;[1] As a rule we&apos;ll avoid considering trivial cases and
      re-stating obvious assumptions (like \(d\) having to be a positive
      integer). &lt;a href=&quot;#r1&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
</content>
  </entry>
  
 
</feed>
