Fred Akalin
Notes on math, tech, and everything in between
2017-08-09T20:29:24-07:00
https://www.akalin.com/
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
https://www.akalin.com/quintic-unsolvability
Why is the Quintic Unsolvable?
2016-09-26T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: Use \dotsc when it is supported by KaTeX. -->
<link rel="stylesheet" type="text/css" href="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/jsxgraph.css" />
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jsxgraph/0.99.5/jsxgraphcore.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/complex.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/complex_poly.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/animation.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/rotation_counter.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/display.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/complex_formula.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/quadratic.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/cubic.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/quartic.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/abel-ruffini-topological-proof/1e2557c/quintic.js"></script>
<!-- KaTeX messes up axes labels, for some reason, so remember to surround a
jxgbox div with <nokatex></nokatex>. -->
<style>
.graph {
display: block;
width: 300px;
height: 300px;
margin: 0.5em 0.2em;
}
.graph-container {
display: inline-block;
vertical-align: top;
max-width: 300px;
}
button.interactive-example-button {
margin: 2.5px;
padding: 5px;
}
</style>
<section>
<header>
<h2>Overview</h2>
</header>
<p>In this article, I hope to convince you that the quintic equation
is unsolvable, in the sense that I can’t write down the solution
to the equation
\[
ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0
\]
using only addition, subtraction, multiplication, division, raising
to an integer power, and taking an integer root. In fact, I hope to
go further and explain how this is true for the same reason
that I can’t write down the solution to the equation
\[
ax^2 + bx + c = 0
\]
using only the first five operations above!</p>
<p>The usual approach to the above claim involves a semester’s
worth of abstract algebra and Galois theory. However, there’s
a much easier and shorter proof which involves only a bit of group
theory and complex analysis—enough to fit in a blog
post—and some interactive
visualizations.<sup><a href="#fn1" id="r1">[1]</a></sup></p>
</section>
<section>
<header>
<h2>Quadratic Equations</h2>
</header>
<p>Let’s start with quadratic equations, which hopefully you all
remember from high school. Given two complex numbers \(r_1\) and
\(r_2\), you can determine the quadratic equation whose solutions are
\(r_1\) and \(r_2\), namely
\[
(x - r_1)(x - r_2) = x^2 - (r_1 + r_2) x + r_1 r_2 = 0\text{.}
\]
If we take the standard form of a quadratic equation to be
\[
a x^2 + bx + c = 0\text{,}
\]
then we can define a function from \(r_1\) to \(r_2\) to \(a\), \(b\),
and \(c\), which is shown by the first two panels in the visualization below;
drag either of the points \(r_1\) and \(r_2\) and notice how \(b\) and
\(c\) move (\(a\) will always remain fixed at \(1\)).</p>
<p>Now pretend that we misremember the quadratic formula as
\[
x_{1, 2} = \frac{-b \pm b^2 - 4ac}{4a}\text{.}
\]
The results of this formula—our candidate solution—are
shown in the third panel. Note that since \(x_1\) and \(x_2\) depend
on \(a\), \(b\), and \(c\), which all depend on \(r_1\) and \(r_2\),
they also move when you drag either \(r_1\) and \(r_2\)</p>
<div class="interactive-example">
<h3>Interactive Example 1: An incorrect quadratic formula</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardQuad1" class="graph jxgbox"></div></nokatex>
<button class="interactive-example-button quad1DisableWhileSwapping"
type="button" onclick="quad1.swap();">
Swap \(r_1\) and \(r_2\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardQuad1" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardQuad1" class="graph jxgbox"></div></nokatex>
</div>
</div>
<script type="text/javascript">
'use strict';
function runOp(display, op, time, disableSelector, state, doneCallback) {
if (state.running) {
return;
}
state.running = true;
var oldFixed = display.setRootsFixed(true);
var elems = document.querySelectorAll(disableSelector);
for (var i = 0; i < elems.length; ++i) {
elems[i].disabled = true;
}
op.run(time, function() {
state.running = false;
display.setRootsFixed(oldFixed);
for (var i = 0; i < elems.length; ++i) {
elems[i].disabled = false;
}
if (doneCallback !== undefined) {
doneCallback();
}
});
}
var incorrectQuadraticFormula = (function() {
var a = ComplexFormula.select(-1);
var b = ComplexFormula.select(-2);
var x1 = b.neg().plus(quadraticDiscriminantFormula).div(a.times(4));
var x2 = b.neg().minus(quadraticDiscriminantFormula).div(a.times(4));
return x1.concat(x2);
})();
var quad1 = (function() {
var initialRoots = [ new Complex(1, 0), new Complex(-1, 0) ];
var display = new Display(
"rootBoardQuad1", "coeffBoardQuad1", "formulaBoardQuad1", initialRoots,
incorrectQuadraticFormula, function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
var op = display.swapRootOp(0, 1, function() {});
function swap() {
runOp(display, op, 1000, '.quad1DisableWhileSwapping', {});
};
return {
display: display,
swap: swap
};
})();
</script>
<p>Now this formula looks right, since \(x_1\) and \(x_2\) are at the
same coordinates as \(r_1\) and \(r_2\). However, if you move
\(r_1\) or \(r_2\) around, you can easily convince yourself that
this formula can’t be right, since \(x_1\) and \(x_2\)
don’t move in the same way.</p>
<p>Now if you remember from high school, the real quadratic formula
involves taking a square root, and since our candidate solution
doesn’t do that, that means it’s probably incorrect. I
say “probably” because there’s no immediate reason
why there can’t be <em>multiple</em> quadratic formulas, some
simpler than others, of which one is simple enough to not need a
square root. From manipulating \(r_1\) and \(r_2\), we know that our
candidate formula is incorrect, but that doesn’t immediately
follow from it not having a square root.</p>
<p>Fortunately, there is a general way to rule out candidate solutions
that are similar to the one above, namely those that use only
addition, subtraction, multiplication, division, and raising to an
integer power; we’ll call these <em>rational expressions</em>. Here’s
how it goes: if you press the button to swap \(r_1\) and \(r_2\),
which moves \(r_1\) to \(r_2\)’s position and vice versa,
\(a\), \(b\), and \(c\) move from their starting positions but
return once \(r_1\) and \(r_2\) reach their destinations. This makes
sense, because the coefficients of a polynomial don’t depend
on how you order the roots. But since \(x_1\) and \(x_2\) depend
only on \(a\), \(b\), and \(c\), they too must loop back to their
starting positions.</p>
<p>But that means that our candidate solution cannot be the quadratic
formula! If it were, then \(x_1\) and \(x_2\) would have ended up
swapped, too. Instead, they went back to their starting positions,
which is a contradiction. This reasoning holds for any expression
which is a <em>single-valued</em> function of \(a\), \(b\), and \(c\),
so in particular this holds for rational expressions.</p>
<p>Let’s summarize our reasoning in a theorem:</p>
<p class="theorem">(<span class="theorem-name">Theorem 1</span>.) A
rational expression<sup><a href="#fn2" id="r2">[2]</a></sup> in the coefficients of the general quadratic
equation
\[
ax^2 + bx + c = 0
\]
cannot be a solution to this equation.</p>
<div class="proof">
<p><span class="proof-name">Sketch of proof.</span> Assume to the
contrary that the rational expression \(x = f(a, b, c)\) is a
solution. Assume that we start with \(r_1 = z_1\) and \(r_2 = z_2
\ne z_1\), and without loss of generality assume that we start with
\(x = z_1\).</p>
<p>Run \(r_1\) and \(r_2\) along continuous paths that swap their two
positions, i.e. make \(r_1\) head from \(z_1\) to \(z_2\)
continuously, and at the same time make \(r_2\) head from \(z_2\) to
\(z_1\) continuously, and make sure to pick paths such that \(r_1\)
and \(r_2\) never coincide.</p>
<p>Since \(a\), \(b\), and \(c\) are continuous functions of \(r_1\)
and \(r_2\), and \(x\) is a rational function of \(a\), \(b\) and
\(c\), and thus continuous, \(x\) then depends continuously on \(r_1\)
and \(r_2\). Thus, since we start with \(x = r_1 = z_1\), and \(r_1\)
never coincides with \(r_2\), then as \(r_1\) moves, \(x = r_1\) must
continue to hold, since \(x\) is a solution, and therefore
\(x\)’s final position must be the same as \(r_1\)’s,
which is \(z_2\).</p>
<p>However, since the coefficients \(a\), \(b\), and \(c\) don’t
depend on the ordering of \(r_1\) and \(r_2\), then their final
positions are the same as their initial positions. Since \(x\) is a
function of only \(a\), \(b\), and \(c\), its final position also
must be the same as its initial position, \(z_1\). This contradicts
the above, and therefore \(x\) cannot be a solution. ∎</p>
</div>
<p>Now consider the candidate solution
\[
x_{1,2} = \sqrt{b^2 - 4ac}\text{.}
\]
This isn’t a rational expression since it has a square root. In
particular, in the visualization below, it behaves quite differently
from our first candidate solution. First, even though we have just a
single expression, it yields two points \(x_1\) and \(x_2\). Second,
and more surprisingly, if you swap \(r_1\) and \(r_2\), \(x_1\) and
\(x_2\) also exchange places, seemingly contradicting Theorem 1!
What is going on?</p>
<div class="interactive-example">
<h3>Interactive Example 2: The quadratic equation</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardQuad2" class="graph jxgbox"></div></nokatex>
<button class="interactive-example-button quad2DisableWhileSwapping"
type="button" onclick="quad2.swap();">
Swap \(r_1\) and \(r_2\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardQuad2" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardQuad2" class="graph jxgbox"></div></nokatex>
<label>
<input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio"
onchange="quad2.switchFormula(incorrectQuadraticFormula);" />
\(x_{1, 2} = \frac{-b \pm b^2 - 4ac}{4a}\)
</label>
<br />
<label>
<input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio"
onchange="quad2.switchFormula(quadraticDiscriminantFormula);" />
\(x_1 = b^2 - 4ac\)
</label>
<br />
<label>
<input checked class="quad2DisableWhileSwapping" name="quad2Formula" type="radio"
onchange="quad2.switchFormula(quadraticDiscriminantFormula.root(2));" />
\(x_{1, 2} = \sqrt{b^2 - 4ac}\)
</label>
<br />
<label>
<input class="quad2DisableWhileSwapping" name="quad2Formula" type="radio"
onchange="quad2.switchFormula(newQuadraticFormula());" />
\(x_{1, 2} = \frac{-b + \sqrt{b^2 - 4ac}}{2a}\)
<br />
(the quadratic formula)
</label>
</div>
</div>
<script type="text/javascript">
'use strict';
function switchFormula(display, state, formula) {
if (state.running) {
return;
}
var numResults = display.setFormula(formula);
}
var quad2 = (function() {
var initialRoots = [ new Complex(1, 0), new Complex(0, 1) ];
var display = new Display(
"rootBoardQuad2", "coeffBoardQuad2", "formulaBoardQuad2", initialRoots,
quadraticDiscriminantFormula.root(2), function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
var op = display.swapRootOp(0, 1, function() {});
var state = {};
function swap() {
runOp(display, op, 1000, '.quad2DisableWhileSwapping', state);
}
function switchQuadFormula(formula) {
switchFormula(display, state, formula);
}
return {
display: display,
swap: swap,
switchFormula: switchQuadFormula
};
})();
</script>
<p>To answer this, we first need to review some facts about complex
numbers. Recall that a complex number \(z\) can be expressed in polar
coordinates, where it has a length \(r\) and an angle \(\theta\), and
that it can be converted to the usual Cartesian coordinates using <a href="https://en.wikipedia.org/wiki/Euler%27s_formula">Euler’s formula</a>:
\[
z = r e^{i \theta} = r \cos \theta + i \, r \sin \theta\text{.}
\]
Then, if you have two complex numbers \(z_1 = r_1 e^{i \theta_1}\) and
\(z_2 = r_2 e^{i \theta_2}\) in polar form, you can multiply them by
multiplying their lengths, and adding their angles:
\[
z_1 z_2 = r_1 r_2 e^{i (\theta_1 + \theta_2)}\text{.}
\]
So a square root of a complex number \(z = r e^{i \theta}\) is just
\(\sqrt{r} e^{i \theta/2}\), as you can easily verify. However, if
\(z\) is non-zero, there is one more square root of \(z\), namely
\(\sqrt{r} e^{i (\theta/2 + \pi)}\), as you can also verify. (Recall
that angles that differ by \(2\pi = 360^\circ\) are considered the
same.)</p>
<p>So in general, the square root of a rational expression, like our
candidate solution, yields two distinct points as long as the
rational expression is non-zero. In our case, \(b^2 - 4ac\) remains
non-zero as \(r_1\) and \(r_2\) don’t coincide. (We’ll
have more to say about this expression, called the <em>discriminant</em>,
once we talk about cubic equations below.) Therefore, if we want to
examine how \(x_1\) and \(x_2\) move as \(r_1\) and \(r_2\) move, we
have to number the square roots of \(b^2 - 4ac\), and we have to
keep this numbering consistent.</p>
<p>To do so, we have to do two things: we have to vary \(r_1\) and
\(r_2\) only continuously, and we have to vary \(r_1\) and \(r_2\)
such that they never coincide. If we do this, then we can
intuitively “lift” the expression \(b^2 - 4ac\) from the
complex plane to a new surface \(S\) where we consider only angles
that differ by \(4 \pi = 720^\circ\), rather than \(2 \pi\), to be
the same. In this space, we can take the “first” square
root of a non-zero complex number to be the one with half the angle,
and the “second” square root to be the one with half the
angle plus \(\pi\), and have these two square root functions behave
continuously as their argument goes around the origin.</p>
<figure>
<img src="quintic-unsolvability-files/Riemann_sqrt.svg"/>
<figcaption>
<span class="figure-text">Figure 1</span> \(S\), which is the
<a href="https://en.wikipedia.org/wiki/Riemann_surface">Riemann surface</a>
of \(\sqrt{z}\). (Image by <a href="https://en.wikipedia.org/wiki/File:Riemann_sqrt.svg">Leonid 2</a> licensed under <a href="https://creativecommons.org/licenses/by-sa/3.0/deed.en">CC BY-SA 3.0</a>.)
</figcaption>
</figure>
</section>
<p>Now this answers the question of why the proof of Theorem 1
fails for \(\sqrt{b^2 - 4ac}\). \(a\), \(b\), and \(c\), go around a
single loop as \(r_1\) is swapped with \(r_2\), and therefore \(b^2
- 4ac\) goes around a single loop in the complex plane, but when
\(b^2 - 4ac\) is lifted to \(S\), the final position of \(b^2 -
4ac\) differs from the initial position only by an angle of
\(2\pi\), so it is <em>distinct</em> from the initial position, and
thus we can’t conclude that the final position of \(\sqrt{b^2
- 4ac}\) is the same as the initial position.</p>
<p>Similar reasoning holds for any algebraic expression that
isn’t a rational expression, i.e. ones that involve taking any
integer root, so Theorem 1 cannot apply to algebraic expressions
in general. Of course, this is consistent with what we know about the
quadratic formula, since we know that it has a square root!</p>
</section>
<section>
<header>
<h2>Cubic Equations</h2>
</header>
<p>Now we can move on to cubic equations. Similarly, given three
complex numbers \(r_1\), \(r_2\), and \(r_3\), you can determine the
cubic equation with those solutions, namely
\[
(x - r_1) (x - r_2) (x - r_3) = x^3 - (r_1 + r_2 + r_3) x^2 + (r_1 r_2 + r_1 r_3 + r_2 r_3) x - r_1 r_2 r_3\text{,}
\]
and so we can define a function from \(r_1\), \(r_2\), and \(r_3\) to
\(a\), \(b\), \(c\), and \(d\), where
\[
a x^3 + b x^2 + c x + d
\]
is the standard form of a cubic polynomial, and this is shown in the
visualization below.</p>
<p>In the previous section, we talked about the discriminant \(b^2 -
4ac\) of the general quadratic polynomial. However, the discriminant
is an expression that is defined for <em>any</em> polynomial. If
\(r_1, \ldots, r_n\) are the roots of a polynomial (counting multiplicity)
with leading coefficient \(a_n\), then the
<a href="https://en.wikipedia.org/wiki/Discriminant">discriminant</a> is
\[
\Delta = a_n^{2n - 2} \prod_{i \lt j} (r_i - r_j)^2\text{.}
\]
In other words, the discriminant is, up to sign and a power of the
leading coefficient, the product of the differences of all pairs of
different roots. In particular, if the polynomial has repeated roots,
the discriminant is zero.</p>
<p>Using the formula above, you can express the discriminant in terms
of the coefficients of the polynomial, as you can verify for
yourself with the quadratic equation. Indeed this is true in
general; for cubic polynomials, the discriminant can be expressed in
terms of the coefficients as
\[
\Delta = b^2 c^2 - 4 a c^3 - 4 b^3 d - 27 a^2 d^2 + 18 a b c
d\text{.}
\]
But why do we care? Because, as you can see in the visualization below, if
you swap any pair of roots, this causes the discriminant to make a single
loop around the origin, so it serves as a useful test functions for
taking roots.</p>
<p>So now that we have three roots, we can swap them in multiple
ways. If \(R\) is a list that starts off as \(\langle r_1, r_2, r_3
\rangle\), let \(\circlearrowleft_{i, j}\) denote counter-clockwise
paths that takes the root at the \(i\)th index of \(R\) to the one
at the \(j\)th index of \(R\) and vice versa, and similarly for
\(\circlearrowright_{i, j}\). (Note that this is not the same as the
paths that swap \(r_i\) and \(r_j\)! Play around with the buttons
in the visualization below to understand the difference.)</p>
<div class="interactive-example">
<h3>Interactive Example 3: The cubic discriminant</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardCubic1" class="graph jxgbox"></div></nokatex>
<span id="rootListCubic1">
\(R = \langle r_1, r_2, r_3 \rangle\)
</span>
<br />
<button class="interactive-example-button cubic1DisableWhileRunningOp"
type="button" onclick="cubic1.runOp(cubic1.opA, 1000);">
\(\circlearrowleft_{1, 2}\)
</button>
<button class="interactive-example-button cubic1DisableWhileRunningOp"
type="button" onclick="cubic1.runOp(cubic1.opB, 1000);">
\(\circlearrowleft_{2, 3}\)
</button>
<br />
<button class="interactive-example-button cubic1DisableWhileRunningOp"
type="button" onclick="cubic1.runOp(cubic1.opA.invert(), 1000);">
\(\circlearrowright_{1, 2}\)
</button>
<button class="interactive-example-button cubic1DisableWhileRunningOp"
type="button" onclick="cubic1.runOp(cubic1.opB.invert(), 1000);">
\(\circlearrowright_{2, 3}\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardCubic1" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardCubic1" class="graph jxgbox"></div></nokatex>
<label>
<input class="cubic1DisableWhileRunningOp" name="cubic1Formula" type="radio"
onchange="cubic1.switchFormula(cubicDiscFormula);" />
\(x_1 = \Delta\)
</label>
<br />
<label>
<input checked class="cubic1DisableWhileRunningOp" name="cubic1Formula" type="radio"
onchange="cubic1.switchFormula(cubicDiscFormula.root(5));" />
\(x_{1, 2, 3, 4, 5} = \sqrt[5]{\Delta}\)
</label>
</div>
</div>
<script type="text/javascript">
'use strict';
function updateRootList(display, rootListID) {
var rootPermutation = display.getRootPermutation();
var rootList = document.getElementById(rootListID);
var TeXOutput = 'R = \\langle ' + rootPermutation.map(function(i) {
return 'r_{' + (i+1) + '}';
}).join(', ') + ' \\rangle';
katex.render(TeXOutput, rootList);
}
function updateResultList(display, resultListID) {
var resultPermutation = display.getResultPermutation();
var resultList = document.getElementById(resultListID);
var TeXOutput = 'X = \\langle ' + resultPermutation.map(function(i) {
return 'x_{' + (i+1) + '}';
}).join(', ') + ' \\rangle';
katex.render(TeXOutput, resultList);
}
var cubicDiscFormula = cubicScaledDiscFormula.div(
ComplexFormula.select(-1).pow(2).times(-27));
var cubic1 = (function() {
var initialRoots = [
new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1)
];
var display = new Display(
"rootBoardCubic1", "coeffBoardCubic1", "formulaBoardCubic1", initialRoots,
cubicDiscFormula.root(5), function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
function updateRootListCubic(display) {
updateRootList(display, "rootListCubic1");
}
var opA = display.swapRootOp(0, 1, updateRootListCubic);
var opB = display.swapRootOp(1, 2, updateRootListCubic);
var state = {}
function runCubicOp(op, time) {
runOp(display, op, time, '.cubic1DisableWhileRunningOp', state);
};
function switchCubicFormula(formula) {
switchFormula(display, state, formula);
updateRootAndResultList(display);
}
return {
display: display,
opA: opA,
opB: opB,
runOp: runCubicOp,
cubicDiscFormula: cubicDiscFormula,
switchFormula: switchCubicFormula
};
})();
</script>
<p>Now, with the formula \(\Delta\), the same reasoning as in the
previous section shows that it cannot possibly be the cubic formula,
nor can any other rational expression. However, unlike the quadratic
case, we can also rule out \(\sqrt[5]{\Delta}\), or any other
algebraic formula with no nested radicals (i.e., that doesn’t
have a radical within a radical like \(\sqrt{a - \sqrt{bc - 5}}\)).
If you apply the operations \(\circlearrowleft_{2, 3}\),
\(\circlearrowleft_{1, 2}\), \(\circlearrowright_{2, 3}\), and
\(\circlearrowright_{1, 2}\) in sequence, \(r_1\), \(r_2\), and
\(r_3\) rotate among themselves, but all the \(x_i\) go back to
their original positions. Therefore, by similar reasoning as the
previous section, \(\sqrt[5]{\Delta}\) also cannot possibly be the
cubic formula!</p>
<p>To make this statement precise, we need to review some group
theory. Recall that a
<a href="https://en.wikipedia.org/wiki/Group_(mathematics)">group</a>
is a set with an associative binary operation, an identity element,
and inverse elements. Most basic examples of groups are related to
numbers, like the integers under addition, or the non-zero rationals
under multiplication. However, more interesting examples of groups
are related to <em>functions</em>, none the least because the group
operation for functions is <em>composition</em>, which is in general
not commutative; in other words, if \(f\) and \(g\) are functions,
\(f \circ g \ne g \circ f\), and it is this non-commutativity that
will come in handy for our purposes.</p>
<p>So let’s say we have a list of \(n\) objects, and we’re
interested in the functions that rearrange this list’s
elements. These are <a href="https://en.wikipedia.org/wiki/Permutation">permutations</a>,
and they naturally form a group under composition, as you can check
for yourself, called \(S_n\), the <a href="https://en.wikipedia.org/wiki/Symmetric_group">symmetric group</a> on
\(n\) objects.</p>
<p>There’s a convenient way to write permutations, called <a href="https://en.wikipedia.org/wiki/Permutation#Cycle_notation">cycle notation</a>. If
you write \((i_1 \; i_2 \; \ldots \; i_k)\), this denotes the
permutation that maps the \(i_1\)th position of the list to the
\(i_2\)th position the \(i_2\)th position to the \(i_3\)th, and so on,
called a <em>cycle</em>. Then you can write <em>any</em> permutation
as a composition of disjoint cycles, so this provides a convenient
way to write down and compute with permutations.</p>
<p>In the visualization above, we have four operations
\(\circlearrowleft_{1, 2}\), \(\circlearrowleft_{2, 3}\),
\(\circlearrowright_{1, 2}\), and \(\circlearrowright_{1, 2}\),
which <em>act on \(R\)</em>, meaning that they define permutations
on \(R\). In particular, \(\circlearrowleft_{1, 2}\) and
\(\circlearrowright_{1, 2}\) both swap the first and second
elements of \(R\), so we say that \(\circlearrowleft_{1, 2}\) and
\(\circlearrowright_{1, 2}\) act on \(R\) as \((1 \; 2)\), and
similarly, \(\circlearrowleft_{2, 3}\) and \(\circlearrowright_{2,
3}\) act on \(R\) as \((2 \; 3)\).</p>
<p>Now concatenating two operations—doing one after the
other—corresponds to composing their mapped-to permutations on
\(R\). Denoting \(o_2 * o_1\) as doing \(o_1\), then doing \(o_2\),
the sequence of operations above is \(\circlearrowright_{1, 2} *
\circlearrowright_{2, 3} * \circlearrowleft_{1, 2} *
\circlearrowleft_{2, 3}\) (note the order!), which acts on \(R\)
like \((1 \; 2) (2 \; 3) (1 \; 2) (2 \; 3)\), which is equal to \((1
\; 3 \; 2)\).<sup><a href="#fn3" id="r3">[3]</a></sup> (The
\(\circ\) is usually dropped when composing permutations.)</p>
<p>Now for the formula \(\Delta\), all the operations make \(x_1\)
loop around the origin either clockwise or counter-clockwise; in
other words, they all induce a rotation of \(2\pi\) or \(-2\pi\) on
\(x_1\), and the final distance of \(x_1\) from the origin is the
same as the initial distance. Therefore, if we apply an equal number
of clockwise and counter-clockwise rotations, the total angle of
rotation will be \(0\) and the final distance will be the same as
the initial distance, i.e. the final position of \(x_1\) is the same
as it’s initial distance. But the same reasoning holds for the
formula \(\sqrt[5]{\Delta}\); all the operations induce a rotation
of \(2\pi/5\) or \(-2\pi/5\) and leave the distance from the origin
unchanged, so an equal number of clockwise and counter-clockwise
rotations will still induce a total angle of \(0\) and leave the
distance from the origin unchanged. Therefore, the operation
\(\circlearrowright_{1, 2} * \circlearrowright_{2, 3} *
\circlearrowleft_{1, 2} * \circlearrowleft_{2, 3}\) acts like \((1
\; 3\; 2)\) on \(R\), but leaves all \(x_i\) unchanged.</p>
<p>But how did we come up with \(\circlearrowright_{1, 2} *
\circlearrowright_{2, 3} * \circlearrowleft_{1, 2} *
\circlearrowleft_{2, 3}\) in the first place? This involves a bit
more group theory. \(S_3\) is <em>not</em> a <a href="https://en.wikipedia.org/wiki/Abelian_group">commutative
group</a>; in particular, \((1 \; 2) (2 \; 3) \ne (2 \; 3) (1 \;
2)\). For two group elements \(g\) and \(h\), we can define
their
<a href="https://en.wikipedia.org/wiki/Commutator">commutator</a><sup><a href="#fn4" id="r4">[4]</a></sup>
\([ g, h ]\), which is the group element that corrects for
\(g\) and \(h\) not commutating. That is, we want the equation
\[
g h = h g [g, h]
\]
to hold, which means that
\[
[g, h] = g^{-1} h^{-1} g h\text{.}
\]
So the commutator provides a convenient way to generate a non-trivial
permutation from two other non-commuting permutations. Furthermore, it
involves two appearances of both elements, so we can pick a sequence of
operations that induce the commutator and also have an equal number of
clockwise and counter-clockwise operations. Then we’re guaranteed
that this sequence of operations permutes \(R\) and leaves all \(x_i\)
unchanged, even if each individual operation moves some \(x_i\). But of
course, this is just \(\circlearrowright_{1, 2} * \circlearrowright_{2, 3} *
\circlearrowleft_{1, 2} * \circlearrowleft_{2, 3}\)!</p>
<p>Let’s define some terminology to make proofs and discussion
easier. If \(o\) is an operation that acts on \(R\) non-trivially
but has the final position of the expression \(x = f(a, b, c,
\ldots)\) the same as its initial position, we say that \(o\) <em>rules out</em> the
expression \(x = f(a, b, c, \ldots)\). For example, Theorem 1
says that swapping both roots of a quadratic rules out all rational
expressions.</p>
<p>Now we’re ready to state and prove the theorem:</p>
<p class="theorem">(<span class="theorem-name">Theorem 2</span>.) An
algebraic expression with no nested radicals in the coefficients of
the general cubic equation
\[
ax^3 + bx^2 + cx + d = 0
\]
cannot be a solution to this equation.</p>
<div class="proof">
<p><span class="proof-name">Sketch of proof.</span> First assume to
the contrary that the expression \(x = \sqrt[k]{r(a, b, c, d)}\) is
a solution, where \(r(a, b, c, d)\) is a rational
expression. Assume we start with \(r_1 = z_1\), \(r_2 = z_2\), and
\(r_3 = z_3\), where all \(z_i\) are distinct, and without loss of
generality assume that we start with \(x = z_1\).</p>
<p>Any of the operations \(\circlearrowleft_{1, 2}\),
\(\circlearrowleft_{2, 3}\), \(\circlearrowright_{1, 2}\), and
\(\circlearrowright_{2, 3}\) applied to \(x = r(a, b, c, d)\)
cause \(x\)’s final position to be the same as its initial
position, by Theorem 1. Pick a point \(z_0\) that is never
equal to any point \(x\) traverses under any operation. Then, by
the same reasoning as above, the total angle induced by
\(\circlearrowright_{1, 2} * \circlearrowright_{2, 3} *
\circlearrowleft_{1, 2} * \circlearrowleft_{2, 3}\) on \(x =
\sqrt[k]{r(a, b, c, d)}\) around \(z_0\) is \(0\), and the
distance from \(z_0\) remains unchanged. Thus \(x\) remains
fixed, and this operation rules out \(x = \sqrt[k]{r(a, b, c,
d)}\).</p>
<p>For the general case, it suffices to show that if \(o\) rules out
the expressions \(f\) and \(g\), then \(o\) also rules out \(f\)
raised to an integer power, \(f + g\text{,}\) \(f - g\text{,}\) \(f
\cdot g\text{,}\) and \(f / g\) where \(g \ne 0\text{.}\) But this
is straightforward, and such formulas are just the algebraic
expressions with no nested radicals, so the statement holds in
general. ∎</p>
</div>
<p>Theorem 2 can be summarized thus: any \(\circlearrowleft_{i,
j}\) or \(\circlearrowright_{i, j}\) rules out any rational
expression as the cubic formula, and if given an algebraic
expression with no nested radicals, either some
\(\circlearrowleft_{i, j}\) or \(\circlearrowright_{i, j}\) rules it
out, or \(\circlearrowright_{1, 2} * \circlearrowright_{2, 3} *
\circlearrowleft_{1, 2} * \circlearrowleft_{2, 3}\) rules it out.</p>
<p>Now we can consider algebraic expressions with one level of
nesting. Can such formulas be ruled out as being the cubic formula?
We can’t do so via Theorem 2, at least; we would need a
non-trivial element of \(S_3\) that is the commutator of
commutators. But you can calculate that all non-trivial commutators of
\(S_3\) are either \((3 \; 2 \; 1)\) or \((1 \; 2\; 3)\), and these
two elements commute, so \(S_3\) cannot have a non-trivial commutator
of commutators.</p>
<p>In fact, as we would expect, the actual <a href="https://en.wikipedia.org/wiki/Cubic_function#General_formula">cubic formula</a>
has such an algebraic expression, which is \(C\) in the visualization
below, so that serves as a convenient example of an algebraic
expression with a single nested radical that can’t be ruled out
by Theorem 2.</p>
<div class="interactive-example">
<h3>Interactive Example 4: The cubic equation</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardCubic2" class="graph jxgbox"></div></nokatex>
<span id="rootListCubic2">
\(R = \langle r_1, r_2, r_3 \rangle\)
</span>
<br />
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opA, 1000);">
\(\circlearrowleft_{1, 2}\)
</button>
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opB, 1000);">
\(\circlearrowleft_{2, 3}\)
</button>
<br />
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opA.invert(), 1000);">
\(\circlearrowright_{1, 2}\)
</button>
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opB.invert(), 1000);">
\(\circlearrowright_{2, 3}\)
</button>
<br />
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opComAB, 4000);">
\(\circlearrowright_{1, 2} * \circlearrowright_{2, 3} * \circlearrowleft_{1, 2} * \circlearrowleft_{2, 3}\)
</button>
<br />
<button class="interactive-example-button cubic2DisableWhileRunningOp"
type="button" onclick="cubic2.runOp(cubic2.opComAB.invert(), 4000);">
\(\circlearrowleft_{1, 2} * \circlearrowleft_{2, 3} * \circlearrowright_{1, 2} * \circlearrowright_{2, 3}\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardCubic2" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardCubic2" class="graph jxgbox"></div></nokatex>
<span id="resultListCubic2">
\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
</span>
<br />
<label>
<input class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio"
onchange="cubic2.switchFormula(cubicScaledDiscFormula);" />
\(x_1 = -27a^2 \Delta = {\Delta_1}^2 - 4 {\Delta_0}^3\)
</label>
<br />
<label>
<input checked class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio"
onchange="cubic2.switchFormula(newCubicCCubedFormula());" />
\(x_{1, 2} = C^3 = \frac{\Delta_1 + \sqrt{-27a^2 \Delta}}{2}\)
</label>
<br />
<label>
<input checked class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio"
onchange="cubic2.switchFormula(newCubicCCubedFormula().root(3));" />
\(x_{1,2,3,4,5,6} = C\)
</label>
<br />
<label>
<input class="cubic2DisableWhileRunningOp" name="cubic2Formula" type="radio"
onchange="cubic2.switchFormula(newCubicFormula());" />
\(x_{1, 2, 3} = -\frac{1}{3a} \left( b + C + \frac{\Delta_0}{C} \right)\)
<br />
(the cubic formula)
</label>
</div>
</div>
<script type="text/javascript">
'use strict';
var cubic2 = (function() {
var initialRoots = [
new Complex(-1, -0.5), new Complex(0.5, 0.5), new Complex(0, 1)
];
var display = new Display(
"rootBoardCubic2", "coeffBoardCubic2", "formulaBoardCubic2", initialRoots,
newCubicCCubedFormula().root(3), function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
function updateRootAndResultList(display) {
updateRootList(display, "rootListCubic2");
updateResultList(display, "resultListCubic2");
}
var opA = display.swapRootOp(0, 1, updateRootAndResultList);
var opB = display.swapRootOp(1, 2, updateRootAndResultList);
var opComAB = newCommutatorAnimation(opA, opB);
var state = {}
function runCubicOp(op, time) {
runOp(display, op, time, '.cubic2DisableWhileRunningOp', state);
};
function switchCubicFormula(formula) {
switchFormula(display, state, formula);
updateRootAndResultList(display);
}
return {
display: display,
opA: opA,
opB: opB,
opComAB: opComAB,
runOp: runCubicOp,
cubicDiscFormula: cubicDiscFormula,
switchFormula: switchCubicFormula
};
})();
</script>
<p>Note that there is a new list \(X\), which lists the \(x_i\) in the
order which they occupy their initial positions, like how \(R\) does
the same for the \(r_i\). In general, we can’t do this, since a
general multi-valued function won’t necessarily permute that
\(x_i\) among themselves, but in the interactive visualizations
we’ll only consider expressions that do.</p>
<p>We can then talk how an operation acts on \(X\). For example, if we
pick \(\sqrt[5]{\Delta}\) in Interactive Example 3, we can
say that \(\circlearrowleft_{i, j}\) acts like \((5 \; 1 \; 2 \; 3
\; 4)\) on \(X\) and \(\circlearrowright_{i, j}\) acts like \((1 \; 2 \; 3 \; 4 \;
5)\) on \(X\). Therefore, \(\circlearrowright_{1, 2} *
\circlearrowright_{2, 3} * \circlearrowleft_{1, 2} *
\circlearrowleft_{2, 3}\) acts non-trivially on \(R\) but acts
trivially on \(X\), which is another more algebraic way of saying
that if this operation rules out \(\sqrt[5]{\Delta}\), since the
action on \(X\) depends on the candidate formula. On the other hand,
if you choose \(C\) in the visualization above, you can convince
yourself that no operation acts non-trivially on \(R\) without also
acting non-trivially on \(X\), and so \(C\) can’t be ruled out
as the cubic formula.</p>
</section>
<section>
<header>
<h2>Quartic Equations</h2>
</header>
<p>Now we can move on to quartic equations. As usual, given four
complex numbers \(r_1\), \(r_2\), \(r_3\), and \(r_4\), you can map
this to the coefficients \(a\), \(b\), \(c\), \(d\), and \(e\) of the
standard form of a quartic polynomial, as shown in the visualization
below, such that the \(r_i\) are the solutions to the quartic
equation
\[
a x^4 + b x^3 + c x^2 + d x + e = 0\text{.}
\]
<p>Now that we have four roots, we have even more ways to permute them
using the \(\circlearrowleft_{i, j}\) and \(\circlearrowright_{i,
j}\). Before we move on, we need more terminology and group theory to
handle this more complicated case.</p>
<p>First, we want a convenient way to denote the combination of operations
that act like a commutator, so let’s define
\(\circlearrowleft_{i, j}^\prime\) to mean \(\circlearrowright_{i,
j}\) and vice versa, \((o_1 \circ o_2 \circ \cdots \circ o_n)^\prime\)
to mean \(o_n^\prime \circ o_{n-1}^\prime \circ \cdots \circ
o_1^\prime\), and \([\![ o_1, o_2 ]\!]\) to mean \(o_1^\prime \circ
o_2^\prime \circ o_1 \circ o_2\), so that if \(o_i\) acts on \(R\)
like \(g_i\), then \(o_i^\prime\) acts on \(R\) like \(g_i^{-1}\) and
\([\![o_i, o_j]\!]\) acts on \(R\) like \([g_i, g_j]\). For example,
in the previous section, we were using \([\![ \circlearrowleft_{1, 2},
\circlearrowleft_{2, 3} ]\!]\) to rule out algebraic expressions with
no nested radicals.</p>
<p>Then not only do we want to talk about commutators of particular
permutations, we want to talk about the set of commutators
of a particular group. In fact, for a group \(G\), this set of
commutators forms a subgroup \(K(G)\) called the <a href="https://en.wikipedia.org/wiki/Commutator_subgroup">commutator subgroup</a>. For
the quadratic case, we just have \(S_2\), which has only a single
non-trivial element, so its commutator subgroup \(K(S_2)\) is the
trivial group. For the cubic case, we started with \(S_3\), and we
computed the commutator subgroup \(K(S_3)\), which is just \(\{ e,
(1 \; 2 \; 3), (3 \; 2 \; 1) \}\). We can also compute the
commutator of <em>this</em> group, which is just the trivial group
again, since \(K(S_3)\) is commutative. So we can see that
\(K(K(S_3))\) being the trivial group means that we can’t rule
out algebraic expressions with nested radicals as solutions to the
general cubic equation.</p>
<p>Given all the elements of a group \(G\), it’s not
particularly complicated to compute the commutator subgroup—just
take all possible pairs of elements \(g, h \in G\), compute \([g,
h]\), and remove duplicates. However, we can make things easier for
ourselves by finding generators for \(K(G)\) as commutators of
generators of \(G\), since then we can easily map those back to \([\![
o_1, o_2 ]\!]\) applied on the appropriate operations. Fortunately,
when \(G = S_n\), we can use a few facts from group theory to easily
compute \(K(S_n)\). First, \(K(S_n)\) is called the <a href="https://en.wikipedia.org/wiki/Alternating_group">alternating group</a> \(S_n\),
and is generated by the \(3\)-cycles of the form \((i \enspace i+1
\enspace i+2)\), similar to how \(S_n\) is generated by the
\(2\)-cycles of the form \((i \enspace i + 1)\). But a \(3\)-cycle
\((i \enspace i+1 \enspace i+2)\) can be expressed as the commutator
of two \(2\)-cycles \([(i+2 \enspace i+1), (i \enspace
i+1)]\).</p>
<p>Therefore, for \(S_4\), the generators for \(K(S_4)\) are just \((1
\; 2 \; 3) = [(2 \; 3), (1 \; 2)]\) and \((2 \; 3 \; 4) = [(3 \; 4),
(2 \; 3)]\), with respective operations \([\![ \circlearrowleft_{2,
3}, \circlearrowleft_{1, 2} ]\!]\) and \([\![ \circlearrowleft_{3,
4}, \circlearrowleft_{2, 3} ]\!]\). However, these two generators
are not quite enough to generate \(K^{(2)}(S_4)\) via
commutators. Fortunately, it suffices to just add
\(\circlearrowleft_{4, 1}\) to the list of operations, which lets us
add \((1 \; 4)\) to the list of generators for \(S_4\), and then add
\((3 \; 4 \; 1)\) to the list of generators for \(K(S_4)\). Then
\((1 \; 4) (2 \; 3) = [(2 \; 3 \; 4), (1 \; 2 \; 3)]\) and \((2 \;
1) (3 \; 4) = [(3 \; 4 \; 1), (2 \; 3 \; 4)]\) suffice to generate
\(K^{(2)}(S_4)\).<sup><a href="#fn5" id="r5">[5]</a></sup> Finally,
we can easily compute \(K^{(3)}(S_4)\) to be the trivial group.</p>
<p>What does that tell us about what expressions we can rule out as
solutions to the general quartic equation? Similarly to the cubic
case, we expect to be able to rule out rational expressions and
algebraic expressions with no nested radicals, and since
\(K^{(2)}(S_4)\) is not the trivial group, we also expect to be able
to rule out algebraic expressions with singly-nested radicals, like
\(\sqrt{a - \sqrt{bc - 4}}\). But since \(K^{(3)}(S_4)\) is the
trivial group, we don’t expect to be able to rule out algebraic
expressions with doubly-nested radicals, like \(\sqrt{a - \sqrt{bc -
\sqrt{d + 3}}}\).</p>
<p>As an antidote to all the abstractness above, here is a
visualization for quartics, where you can examine how the various
operations interact with the <a href="https://en.wikipedia.org/wiki/Quartic_function#General_formula_for_roots">quartic formula</a>
and its subexpressions.</p>
<div class="interactive-example">
<h3>Interactive Example 5: The quartic equation</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardQuartic" class="graph jxgbox"></div></nokatex>
<span id="rootListQuartic">
\(R = \langle r_1, r_2, r_3, r_4 \rangle\)
</span>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.resetRootAndResultList();">
Reset \(R\) and \(X\) order
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA1, 1000);">
\(A_1 = \circlearrowleft_{1, 2}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA2, 1000);">
\(A_2 = \circlearrowleft_{2, 3}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA3, 1000);">
\(A_3 = \circlearrowleft_{3, 4}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA4, 1000);">
\(A_4 = \circlearrowleft_{4, 1}\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA1.invert(), 1000);">
\(A_1^\prime = \circlearrowright_{1, 2}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA2.invert(), 1000);">
\(A_2^\prime = \circlearrowright_{2, 3}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA3.invert(), 1000);">
\(A_3^\prime = \circlearrowright_{3, 4}\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opA4.invert(), 1000);">
\(A_4^\prime = \circlearrowright_{4, 1}\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB1, 4000);">
\(B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB2, 4000);">
\(B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB3, 4000);">
\(B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 1)\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB1.invert(), 4000);">
\(B_1^\prime\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB2.invert(), 4000);">
\(B_2^\prime\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opB3.invert(), 4000);">
\(B_3^\prime\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opC1, 16000);">
\(C_1 = [\![ B_2, B_1 ]\!] \mapsto (1 \; 4) (2 \; 3)\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opC2, 16000);">
\(C_2 = [\![ B_3, B_2 ]\!] \mapsto (2 \; 1) (3 \; 4)\)
</button>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opC1.invert(), 16000);">
\(C_1^\prime\)
</button>
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.runOp(quartic.opC2.invert(), 16000);">
\(C_2^\prime\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardQuartic" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardQuartic" class="graph jxgbox"></div></nokatex>
<span id="resultListQuartic">
\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
</span>
<span id="resultNoteQuartic"></span>
<br />
<button class="interactive-example-button quarticDisableWhileRunningOp"
type="button" onclick="quartic.findFirstOpRulingOutSelectedFormula();">
Find first operation that rules out selected formula
</button>
<span id="findFirstOpStatusQuartic"></span>
<br />
<label>
<input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio"
onchange="quartic.switchFormula(quarticScaledDiscFormula);" />
\(x_1 = -27 \Delta\)
</label>
<br />
<label>
<input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio"
onchange="quartic.switchFormula(newQuarticQCubedFormula());" />
\(x_{1, 2} = Q^3 = \frac{\Delta_1 + \sqrt{-27 \Delta}}{2}\)
</label>
<br />
<label>
<input checked class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio"
onchange="quartic.switchFormula(newQuarticQCubedFormula().root(3));" />
\(x_{1, 2, 3, 4, 5, 6} = Q\)
</label>
<br />
<label>
<input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio"
onchange="quartic.switchFormula(newQuarticSFormula());" />
\(x_{1, 2, 3, 4, 5, 6} = S =\)
<br />
\(\qquad \frac{1}{2} \sqrt{-\frac{2}{3} p + \frac{1}{3a} \left( Q + \frac{\Delta_0}{Q} \right)}\)
</label>
<br />
<label>
<input class="quarticDisableWhileRunningOp" name="formulaQuartic" type="radio"
onchange="quartic.switchFormula(newQuarticFormula());" />
\(x_{1, 2, 3, 4} = \)
<br />
\(\qquad -\frac{b}{4a} \mp S + \frac{1}{2} \sqrt{-4S^2 - 2p \pm \frac{q}{S}}\)
<br />
(the quartic formula)
</label>
</div>
</div>
<script type="text/javascript">
'use strict';
function isIdentityPermutation(permutation) {
for (var i = 0; i < permutation.length; ++i) {
if (permutation[i] != i) {
return false;
}
}
return true;
}
function updateResultNote(display, resultNoteID, formulaName) {
var rootPermutation = display.getRootPermutation();
var resultPermutation = display.getResultPermutation();
var resultNote = document.getElementById(resultNoteID);
if (isIdentityPermutation(rootPermutation) ==
isIdentityPermutation(resultPermutation)) {
resultNote.innerHTML = '';
} else {
resultNote.innerHTML = '(Applied operation rules out selected formula as the ' + formulaName + ' formula.)';
}
}
function checkOpRulesOutFormula(
display, resetFn, runOpFn, op, time, undoCallback, doneCallback) {
resetFn();
runOpFn(op, time, function() {
var rootPermutation = display.getRootPermutation();
var resultPermutation = display.getResultPermutation();
var rulesOut = (isIdentityPermutation(rootPermutation) !=
isIdentityPermutation(resultPermutation));
undoCallback();
runOpFn(op.invert(), time, function() {
doneCallback(rulesOut);
});
});
}
function findFirstOpRulingOutSelectedFormulaHelper(
display, resetFn, runOpFn, opInfos, statusCallback, doneCallback) {
var i = 0;
var undoCallback = function() {
statusCallback(opInfos[i], true);
}
var _doneCallback = function(rulesOut) {
if (rulesOut) {
doneCallback(opInfos[i]);
return;
}
++i;
if (i >= opInfos.length) {
doneCallback(null);
return;
}
statusCallback(opInfos[i], false);
checkOpRulesOutFormula(
display, resetFn, runOpFn,
opInfos[i].op, opInfos[i].time, undoCallback, _doneCallback);
};
statusCallback(opInfos[0]);
checkOpRulesOutFormula(
display, resetFn, runOpFn,
opInfos[0].op, opInfos[0].time, undoCallback, _doneCallback);
}
function findFirstOpRulingOutSelectedFormula(
display, resetFn, runOpFn, opInfos, statusID) {
var status = document.getElementById(statusID);
var statusCallback = function(opInfo, isUndo) {
if (isUndo) {
status.innerHTML = 'Undoing ' + opInfo.name + '...';
} else {
status.innerHTML = 'Trying ' + opInfo.name + '...';
}
};
var doneCallback = function(opInfo) {
if (opInfo === null) {
status.innerHTML = 'No op ruling out selected formula found';
} else {
status.innerHTML = opInfo.name + ' rules out selected formula';
}
};
findFirstOpRulingOutSelectedFormulaHelper(
display, resetFn, runOpFn, opInfos, statusCallback, doneCallback);
}
var quartic = (function() {
var initialRoots = [
new Complex(0, 1), new Complex(-0.5, -0.5),
new Complex(0.5, 0.5), new Complex(0.5, -0.5)
];
var display = new Display(
"rootBoardQuartic", "coeffBoardQuartic", "formulaBoardQuartic",
initialRoots, newQuarticQCubedFormula().root(3), function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
function updateRootAndResultList(display) {
updateRootList(display, "rootListQuartic");
updateResultList(display, "resultListQuartic");
updateResultNote(display, "resultNoteQuartic", "quartic");
}
var state = {};
function runQuarticOp(op, time, doneCallback) {
runOp(display, op, time, '.quarticDisableWhileRunningOp', state, doneCallback);
};
function switchQuarticFormula(formula) {
switchFormula(display, state, formula);
updateRootAndResultList(display);
}
function resetRootAndResultList() {
display.reorderPointsBySubscript();
display.resetResultRotationCounters();
updateRootAndResultList(display);
}
var opA1 = display.swapRootOp(0, 1, updateRootAndResultList);
var opA2 = display.swapRootOp(1, 2, updateRootAndResultList);
var opA3 = display.swapRootOp(2, 3, updateRootAndResultList);
var opA4 = display.swapRootOp(3, 0, updateRootAndResultList);
var opB1 = newCommutatorAnimation(opA2, opA1);
var opB2 = newCommutatorAnimation(opA3, opA2);
var opB3 = newCommutatorAnimation(opA4, opA3);
var opC1 = newCommutatorAnimation(opB2, opB1);
var opC2 = newCommutatorAnimation(opB3, opB2);
var opInfos = [
{
name: 'A<sub>1</sub>',
op: opA1,
time: 1000
},
{
name: 'A<sub>2</sub>',
op: opA2,
time: 1000
},
{
name: 'A<sub>3</sub>',
op: opA3,
time: 1000
},
{
name: 'A<sub>4</sub>',
op: opA4,
time: 1000
},
{
name: 'B<sub>1</sub>',
op: opB1,
time: 4000
},
{
name: 'B<sub>2</sub>',
op: opB2,
time: 4000
},
{
name: 'B<sub>3</sub>',
op: opB3,
time: 4000
},
{
name: 'C<sub>1</sub>',
op: opC1,
time: 16000
},
{
name: 'C<sub>2</sub>',
op: opC2,
time: 16000
}
];
function findFirstOpRulingOutSelectedFormulaQuartic() {
findFirstOpRulingOutSelectedFormula(
display, resetRootAndResultList, runQuarticOp, opInfos,
'findFirstOpStatusQuartic');
}
return {
display: display,
opA1: opA1,
opA2: opA2,
opA3: opA3,
opA4: opA4,
opB1: opB1,
opB2: opB2,
opB3: opB3,
opC1: opC1,
opC2: opC2,
runOp: runQuarticOp,
resetRootAndResultList: resetRootAndResultList,
switchFormula: switchQuarticFormula,
findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuartic
};
})();
</script>
<p>There are a few additions to the interactive display above. It now
prints a message when it detects that the selected expression is
ruled out as the quartic formula, which just looks at whether \(R\)
is not in order and \(X\) is, and vice versa. There’s also a
button to reset the ordering of \(R\) and \(X\).</p>
<p>The second addition is that the operations have been organized to
make clear what commutator subgroup they’re in. The \(A_i\) map
to generators of \(S_4\). Then taking the commutators of adjacent
\(A_i\) give \(B_i\), which map to the generators of \(K(S_4)\), and
similarly for \(C_i\).</p>
<p>The third addition is a button that finds the first operation that
rules out the selected formula, if any. It simply tries all the
\(A_i\)s, then all the \(B_i\)s, then all the \(C_i\)s, checking \(R\)
and \(X\) in between. The general algorithm, which assumes a fixed set
of roots \(r_1, \ldots, r_n\text{,}\) takes an expression \(f(a_n, a_{n-1}, \ldots)\)
where \(a_n x^n + a_{n-1} x^{n-1} + \cdots + a_0 = 0\) is the general
\(n\)th-degree polynomial equation, takes a depth limit \(k\), and
looks like this (defining \(K^{(0)}(G)\) to be just \(G\)):
<ol>
<li>For \(i\) from 0 to \(k\):
<ol>
<li>If \(K^{(i)}(S_n)\) is trivial, then terminate indicating that
\(f(a_n, a_{n-1}, \ldots)\) was unable to be ruled out because
\(K^{(i)}(S_n)\) is trivial.</li>
<li>Otherwise, find operations \(o_1\) to \(o_m\) that act as the
generators \(g_1\) to \(g_m\) of \(K^{(i)}(S_n)\). For \(i >
0\), this can be done by applying \([\![ o_1, o_2 ]\!]\) to the
operations corresponding to the generators of
\(K^{(i-1)}(S_n)\).</li>
<li>For each \(o_j\):
<ol>
<li>Apply \(o_j\).</li>
<li>If \(R\) is not in order but \(X\) is, terminate indicating
that \(o_j\) rules out \(f(a_n, a_{n-1}, \ldots)\).</li>
<li>Undo \(o_j\), i.e. apply \(o_j^\prime\) or reset to the
initial state of \(r_1, \ldots, r_n\).</li>
</ol></li>
</ol></li>
<li>Terminate indicating that \(f(a_n, a_{n-1}, \ldots)\) was unable to
be ruled out because the depth limit has been reached.</li>
</ol>
</p>
<p>This algorithm basically just implements the proof of the following
lemma, which generalizes the previous theorems, except that it tries
to find the simplest operation that is a generator that rules out
the given expression.</p>
<p>Before we state the lemma, we need another definition: let the <em>radical level</em> of an algebraic expression
\(f(a_n, a_{n-1}, \ldots)\) be \(0\) if \(f(a_n, a_{n-1}, \ldots)\) is a
rational expression, \(1\) if \(f(a_n, a_{n-1}, \ldots)\) has only
non-nested radicals, and \(n + 1\) if the maximum number of nested
radicals is \(n\).</p>
<p class="theorem">(<span class="theorem-name">Lemma 3</span>.) If the
algebraic expression \(f(a_n, a_{n-1}, \ldots)\) has radical level
\(d\) and \(K^{(d)}(S_n)\) is non-trivial, then any operator that
maps to a non-trivial element \(g\) in \(K^{(d)}(S_n)\) rules out
\(f(a_n, a_{n-1}, \ldots)\) as the solution to the general
\(n\)th-degree polynomial equation
\[
a_n x^n + a_{n+1} x^{n+1} + \cdots + a_0 = 0\text{.}
\]</p>
<div class="proof">
<p><span class="proof-name">Rough sketch of proof.</span> We just do
induction on \(d\). For the base case \(d = 0\), if \(K^{(0)}(S_n)\)
is non-trivial, then \(n \ge 2\). Let \(g = (i \; j)\) for any \(i
\ne j\), of which there must at least be one. Then by the same
reasoning as Theorem 1, \(g\) rules out \(f(a_n, a_{n-1},
\ldots)\). Since the \((i \; j)\) generate \(S_n\), then any \(g \in
S_n\) is the composition of some sequence of \((i \; j)\)s, each of
which rules out \(f(a_n, a_{n-1}, \ldots)\), so \(g\) must also rule
it out.</p>
<p>Assume the lemma holds for \(d\), and let \(x = f_{d+1}(a_n,
a_{n-1}, \ldots) = \sqrt[k]{f_d(a_n, a_{n-1}, \ldots)}\) for some
\(k\), where \(f_d\) has radical level \(d\). Let \(o\) act on \(R\)
like any non-trivial element \(g\) of \(K^{(d+1)}(S_n)\). By the
induction hypothesis, all elements \(h_i \in K^{(d)}(S_n)\) cause
\(x = f_d(a_n, a_{n-1}, \ldots)\) to go around a loop, so pick a
point \(z_0\) that is never equal to any point \(x\) traverses under
any operation corresponding to \(h_i\). Then, since \(g = [h, k]\)
for \(h, k \in K^{(d)}(S_n)\), by the same reasoning as in
Theorem 2, the total angle induced by \(o\) on \(x =
f_{d+1}(a_n, a_{n-1}, \ldots)\) around \(z_0\) is \(0\), and the
distance from \(z_0\) remains unchanged. Thus, \(x = f_{d+1}(a_n,
a_{n-1}, \ldots)\) remains fixed, and \(o\) rules it out.</p>
<p>By the same reasoning as in Theorem 2, this can be extended to the
general case of \(f(a_n, a_{n-1}, \ldots)\) being any algebraic
formula with nesting level \(d + 1\). ∎</p>
</div>
<p>We can immediately deduce the following corollaries, using the fact
that \(K^{(2)}(S_4)\) is non-trivial:</p>
<p class="theorem">(<span class="theorem-name">Corollary 4</span>.) An
algebraic expression with at most singly-nested radicals in the
coefficients of the general quartic equation
\[
ax^4 + bx^3 + cx^2 + dx + e = 0
\]
cannot be a solution to this equation.<sup><a href="#fn6" id="r6">[6]</a></sup></p>
</section>
<section>
<header>
<h2>Quintic Equations</h2>
</header>
<p>Now, finally, the quintic. Let’s jump right to the interactive example.</p>
<div class="interactive-example">
<h3>Interactive Example 6: The quintic equation</h3>
<div class="graph-container">
Roots
<nokatex><div id="rootBoardQuintic" class="graph jxgbox"></div></nokatex>
<span id="rootListQuintic">
\(R = \langle r_1, r_2, r_3, r_4, r_5 \rangle\)
</span>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.resetRootAndResultList();">
Reset \(R\) and \(X\) order
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA1, 1000);">
\(A_1 = \circlearrowleft_{1, 2}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA2, 1000);">
\(A_2 = \circlearrowleft_{2, 3}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA3, 1000);">
\(A_3 = \circlearrowleft_{3, 4}\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA4, 1000);">
\(A_4 = \circlearrowleft_{4, 5}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA5, 1000);">
\(A_5 = \circlearrowleft_{5, 1}\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA1.invert(), 1000);">
\(A_1^\prime = \circlearrowright_{1, 2}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA2.invert(), 1000);">
\(A_2^\prime = \circlearrowright_{2, 3}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA3.invert(), 1000);">
\(A_3^\prime = \circlearrowright_{3, 4}\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA4.invert(), 1000);">
\(A_4^\prime = \circlearrowright_{4, 5}\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opA5.invert(), 1000);">
\(A_5^\prime = \circlearrowright_{5, 1}\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB1, 4000);">
\(B_1 = [\![ A_2, A_1 ]\!] \mapsto (1 \; 2 \; 3)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB2, 4000);">
\(B_2 = [\![ A_3, A_2 ]\!] \mapsto (2 \; 3 \; 4)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB3, 4000);">
\(B_3 = [\![ A_4, A_3 ]\!] \mapsto (3 \; 4 \; 5)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB4, 4000);">
\(B_4 = [\![ A_5, A_4 ]\!] \mapsto (4 \; 5 \; 1)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB5, 4000);">
\(B_5 = [\![ A_1, A_5 ]\!] \mapsto (5 \; 1 \; 2)\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB1.invert(), 4000);">
\(B_1^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB2.invert(), 4000);">
\(B_2^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB3.invert(), 4000);">
\(B_3^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB4.invert(), 4000);">
\(B_4^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opB5.invert(), 4000);">
\(B_5^\prime\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC1, 16000);">
\(C_1 = [\![ B_3, B_1 ]\!] \mapsto (2 \; 3 \; 5)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC2, 16000);">
\(C_2 = [\![ B_4, B_2 ]\!] \mapsto (3 \; 4 \; 1)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC3, 16000);">
\(C_3 = [\![ B_5, B_3 ]\!] \mapsto (4 \; 5 \; 2)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC4, 16000);">
\(C_4 = [\![ B_1, B_4 ]\!] \mapsto (5 \; 1 \; 3)\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC5, 16000);">
\(C_5 = [\![ B_2, B_5 ]\!] \mapsto (1 \; 2 \; 4)\)
</button>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC1.invert(), 16000);">
\(C_1^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC2.invert(), 16000);">
\(C_2^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC3.invert(), 16000);">
\(C_3^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC4.invert(), 16000);">
\(C_4^\prime\)
</button>
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.runOp(quintic.opC5.invert(), 16000);">
\(C_5^\prime\)
</button>
</div>
<div class="graph-container">
Coefficients
<nokatex><div id="coeffBoardQuintic" class="graph jxgbox"></div></nokatex>
</div>
<div class="graph-container">
Candidate solution
<nokatex><div id="formulaBoardQuintic" class="graph jxgbox"></div></nokatex>
<span id="resultListQuintic">
\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
</span>
<span id="resultNoteQuintic"></span>
<br />
<button class="interactive-example-button quinticDisableWhileRunningOp"
type="button" onclick="quintic.findFirstOpRulingOutSelectedFormula();">
Find first operation that rules out selected formula
</button>
<span id="findFirstOpStatusQuintic"></span>
<br />
<label>
<input class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio"
onchange="quintic.switchFormula(quintic.fA);" />
\(x_1 = f_A = \Delta\)
</label>
<br />
<label>
<input class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio"
onchange="quintic.switchFormula(quintic.newFB());" />
\(x_{1, 2} = f_B = \sqrt{f_A}\)
</label>
<br />
<label>
<input checked class="interactive-example-button quinticDisableWhileRunningOp" name="formulaQuintic" type="radio"
onchange="quintic.switchFormula(quintic.newFC());" />
\(x_{1, 2, 3, 4, 5, 6} = f_C =\)
<br />
\(\qquad \sqrt[3]{(f_B - 0.8)(f_B - 0.75)}\)
</label>
</div>
</div>
<script type="text/javascript">
'use strict';
var quintic = (function() {
var initialRoots = [
new Complex(0, 1), new Complex(-0.5, -0.5), new Complex(0.5, -0.5),
new Complex(1, 0), new Complex(0.5, 0.5)
];
var display = new Display(
"rootBoardQuintic", "coeffBoardQuintic", "formulaBoardQuintic",
initialRoots, newFC(), function() {});
display._resultRotationCounterPoint.setAttribute({visible: false});
for (var i = 0; i < display._rootPointsBySubscript.length; ++i) {
display._rootPointsBySubscript[i].setAttribute({
fixed: true
});
}
function updateRootAndResultList(display) {
updateRootList(display, "rootListQuintic");
updateResultList(display, "resultListQuintic");
updateResultNote(display, "resultNoteQuintic", "quintic");
}
var state = {};
function runQuinticOp(op, time, doneCallback) {
runOp(display, op, time, '.quinticDisableWhileRunningOp', state, doneCallback);
};
function switchQuinticFormula(formula) {
switchFormula(display, state, formula);
updateRootAndResultList(display);
}
function resetRootAndResultList() {
display.reorderPointsBySubscript();
display.resetResultRotationCounters();
updateRootAndResultList(display);
}
var opA1 = display.swapRootOp(0, 1, updateRootAndResultList);
var opA2 = display.swapRootOp(1, 2, updateRootAndResultList);
var opA3 = display.swapRootOp(2, 3, updateRootAndResultList);
var opA4 = display.swapRootOp(3, 4, updateRootAndResultList);
var opA5 = display.swapRootOp(4, 0, updateRootAndResultList);
var opA1Inv = opA1.invert();
var opA2Inv = opA2.invert();
var opA3Inv = opA3.invert();
var opA4Inv = opA4.invert();
var opA5Inv = opA5.invert();
var opB1 = newCommutatorAnimation(opA2, opA1);
var opB2 = newCommutatorAnimation(opA3, opA2);
var opB3 = newCommutatorAnimation(opA4, opA3);
var opB4 = newCommutatorAnimation(opA5, opA4);
var opB5 = newCommutatorAnimation(opA1, opA5);
var opB1Inv = opB1.invert();
var opB2Inv = opB2.invert();
var opB3Inv = opB3.invert();
var opB4Inv = opB4.invert();
var opB5Inv = opB5.invert();
var opC1 = newCommutatorAnimation(opB3, opB1);
var opC2 = newCommutatorAnimation(opB4, opB2);
var opC3 = newCommutatorAnimation(opB5, opB3);
var opC4 = newCommutatorAnimation(opB1, opB4);
var opC5 = newCommutatorAnimation(opB2, opB5);
var opC1Inv = opC1.invert();
var opC2Inv = opC2.invert();
var opC3Inv = opC3.invert();
var opC4Inv = opC4.invert();
var opC5Inv = opC5.invert();
var opInfos = [
{
name: 'A<sub>1</sub>',
op: opA1,
time: 1000
},
{
name: 'A<sub>2</sub>',
op: opA2,
time: 1000
},
{
name: 'A<sub>3</sub>',
op: opA3,
time: 1000
},
{
name: 'A<sub>4</sub>',
op: opA4,
time: 1000
},
{
name: 'A<sub>5</sub>',
op: opA5,
time: 1000
},
{
name: 'B<sub>1</sub>',
op: opB1,
time: 4000
},
{
name: 'B<sub>2</sub>',
op: opB2,
time: 4000
},
{
name: 'B<sub>3</sub>',
op: opB3,
time: 4000
},
{
name: 'B<sub>4</sub>',
op: opB4,
time: 4000
},
{
name: 'B<sub>5</sub>',
op: opB5,
time: 4000
},
{
name: 'C<sub>1</sub>',
op: opC1,
time: 16000
},
{
name: 'C<sub>2</sub>',
op: opC2,
time: 16000
},
{
name: 'C<sub>3</sub>',
op: opC3,
time: 16000
},
{
name: 'C<sub>4</sub>',
op: opC4,
time: 16000
},
{
name: 'C<sub>5</sub>',
op: opC5,
time: 16000
}
];
function findFirstOpRulingOutSelectedFormulaQuintic() {
findFirstOpRulingOutSelectedFormula(
display, resetRootAndResultList, runQuinticOp, opInfos,
'findFirstOpStatusQuintic');
}
// Ruled out by A_i.
var fA = quinticDiscFormula;
// Ruled out by B_i.
function newFB() {
return quinticDiscFormula.root(2);
}
// Has a rotation number with B_1, B_2, B_4, and B_5.
function newPreFC1() {
return newFB().minusAll(0.8);
}
// Has a rotation number with B_3.
function newPreFC2() {
return newFB().minusAll(0.75);
}
// Has a rotation number with all B_i.
function newPreFC3() {
return ComplexFormula.times(
newPreFC1(),
newPreFC2()
);
}
// 2 evenly divides the rotation numbers with B_1, B_2, B_4, and B_5, so
// this doesn't work for f_C.
function newPreFC4() {
return newPreFC3().root(2);
}
// Ruled out by C_i.
function newFC() {
return newPreFC3().root(3);
}
return {
display: display,
opA1: opA1,
opA2: opA2,
opA3: opA3,
opA4: opA4,
opA5: opA5,
opB1: opB1,
opB2: opB2,
opB3: opB3,
opB4: opB4,
opB5: opB5,
opC1: opC1,
opC2: opC2,
opC3: opC3,
opC4: opC4,
opC5: opC5,
fA: fA,
newFB: newFB,
newFC: newFC,
runOp: runQuinticOp,
resetRootAndResultList: resetRootAndResultList,
switchFormula: switchQuinticFormula,
findFirstOpRulingOutSelectedFormula: findFirstOpRulingOutSelectedFormulaQuintic
};
})();
</script>
<p>Similarly to the interactive example for the quartic, the
operations are organized to make clear what commutator subgroup
they’re in. There’s something interesting
though—the \(C_i\) seem very similar to the \(B_i\). In fact,
the \(C_i\) also act on \(R\) like \(A_5\)! Also, if you compute
\(D_i = [\![ C_{(i+1) \bmod 5}, C_{i \bmod
5} ]\!]\), you will find that \(D_i\) acts exactly like \(B_i\) on
\(R\)!</p>
<p>Why can we do this for the quintic, but not for anything of lower
degree? This is because \(A_5\) is <a href="https://en.wikipedia.org/wiki/Perfect_group">perfect</a>,
which means that it equals its own commutator subgroup. (You can
verify this yourself by brute force, e.g. writing a program, or you
can play around with \(3\)-cycles and see that any \(3\)-cycle is
the commutator of two other \(3\)-cycles.) Then this immediately
implies that \(K^{(n)}(S_5)\) is non-trivial for any \(n\), which
then implies our main result:</p>
<p class="theorem">(<span class="theorem-name">Abel-Ruffini theorem</span>.)
An algebraic expression in the coefficients of the general
\(n\)th-degree polynomial equation
\[
a_n x^n + a_{n-1} x^{n-1} + \cdots + a_0 = 0
\]
for \(n \ge 5\) cannot be a solution to this equation.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> By the above, \(A_5\) is
perfect, so \(K^{(d)}(S_5)\) is non-trivial for all \(d\).</p>
<p>Since \(S_5\) is a subgroup of \(S_n\) for \(n \ge 5\), \(A_5 =
K(S_5)\) must also be a subgroup of \(A_n = K(S_n)\) for \(n \ge
5\). But since \(A_5\) is perfect, then \(A_5\) must also be a
subgroup of \(K^{(d)}(S_n)\) for any \(d\), which means that
\(K^{(d)}(S_n)\) is non-trivial for any \(d\) and \(n \ge 5\).</p>
<p>An algebraic expression has some finite radical level \(d\), but
\(K^{(d)}(S_5)\) is non-trivial for any \(d\) and \(n \ge 5\), so by
Lemma 3 no algebraic expression can be solution to the general
\(n\)th-degree polynomial equation for \(n \ge 5\). ∎</p>
</div>
<p>With the theorem above, we now have a succinct answer to the
question at the beginning of this article. You can’t write down
a solution to the general quadratic equation that is a rational
expression because you can find an operation on the roots that will
permute them non-trivially and yet leave the result of the expression
constant. For the same reason, you can’t write down a solution
to the general \(n\)th-degree polynomial equation that is an algebraic
equation!</p>
<p>Finally, as a bonus, I’ll explain how to generate algebraic
expressions that require a “\(d\)th-level” operator,
meaning an operator that maps to an element of \(K^{(d)}(S_n)\),
assuming it’s non-trivial. This shows that there’s no
single “super-operation” that rules out all algebraic
expressions.</p>
<p>As an example, the formulas in the interactive example above are
chosen so that \(f_A\) is ruled out by the \(A_i\), \(f_B\) is ruled
out by the \(B_i\), etc. They depend on the particular roots chosen,
of course, which is why this interactive example doesn’t let you
move the roots around, but in principle you could build formulas for
any polynomial that is first ruled out by \(C_i\), or \(D_i\), or
whatever you wish. Given a polynomial \(P = a_n x^n + a_{n-1} x^{n-1}
+ \cdots + a_0\) of degree \(n \ge 5\) and \(d\), a recursive
algorithm to generate an expression that is ruled out only by a
“\(d\)th-level” operator is:
<ol>
<li>If \(d = 0\), return \(\Delta(a_n, a_{n-1}, \ldots)\).</li>
<li>Otherwise, run this algorithm with \(P\) and \(d-1\) to get
\(f_{d-1}(a_n, a_{n-1}, \ldots)\).</li>
<li>Find operations \(o_1\) to \(o_m\) that correspond to
generators \(g_1\) to \(g_m\) of \(K^{(d-1)}(S_n)\).</li>
<li>For each \(o_i\):
<ol>
<li>Apply \(o_i\), which makes \(x = f_{d-1}(a_n, a_{n-1},
\ldots)\) go around a loop. Record the looped-around regions
and their associated rotation numbers (i.e., the total angle
divided by \(2\pi\)).</li>
</ol>
</li>
<li>Pick points \(z_1, \ldots, z_t\) such that each \(z_i\) has
a non-zero rotation number for at least one \(o_j\). \(t\) can
be at most \(m\).</li>
<li>Let \(k\) be the least number such that, for every \(o_i\),
\(k\) doesn’t divide any of the rotation numbers of any
\(z_j\) with respect to \(o_i\). Return \(f_d(a_n, a_{n-1}, \ldots) = \sqrt[k]{\prod_i
(f_{k-1}(a_n, a_{n-1}, \ldots) - z_i)}\).
</li>
</ol>
</section>
<section class="footnotes">
<p id="fn1">[1] This proof is originally due to <a href="https://en.wikipedia.org/wiki/Vladimir_Arnold">Arnold</a>. There
are a <a href="https://www.youtube.com/watch?v=RhpVSV6iCko">couple</a>
of <a href="http://drorbn.net/dbnvp/AKT-140314.php">videos</a> that
talk about this proof, as well as
<a href="http://link.springer.com/book/10.1007%2F1-4020-2187-9">this book</a>
based on Arnold’s lectures, and
<a href="https://www.tmna.ncu.pl/static/files/v16n2-02.pdf">this paper</a>.
I mostly follow Boaz’s video, and the interactive
visualizations are based on the visualizations he has in his
video.</p>
<p>The interactive visualizations were generated using
the excellent
<a href="http://jsxgraph.uni-bayreuth.de/wp/index.html">JSXGraph</a> library.
<a href="#r1">↩</a></p>
<p id="fn2">[2] Theorem 1 can be generalized even more! We can
append other functions and operations to rational expressions, as
long as those functions and operations are continuous and
single-valued. For example, we can allow the use of exponentials
and trigonometric functions, which is something that the standard
Galois theory cannot handle.<a href="#r2">↩</a></p>
<p id="fn3">[3] More precisely, a \(\circlearrowleft_{i, j}\)
contains a pair of simple paths, i.e. continuous injective
functions \([0, 1] \to \mathbb{C}\), between two distinct points
of \(\mathbb{C}\), such that their concatenation defines a simple
closed curve
around a region in \(\mathbb{C}\) with a counter-clockwise
orientation. Also, depending on the exact method of formalizing
\(\circlearrowleft_{i, j}\), it either explicitly or implicitly
encodes a permutation on \(R\). Then we can define an operation
\(*\) on the \(\circlearrowleft_{i, j}\) and
\(\circlearrowright_{i, j}\) (defined analogously) which
concatenates the paths (and composes the permutations, if
explicitly encoded). Since the space of paths has no inverses or
an identity, the \(\circlearrowleft_{i, j}\) and
\(\circlearrowright_{i, j}\) generate a <a
href="https://en.wikipedia.org/wiki/Free_semigroup">free semigroup</a> with
the operation \(*\). Then this semigroup defines an
<a href="https://en.wikipedia.org/wiki/Semigroup_action">action</a>
on \(R\) via its associated permutation on \(R\), which then just
generates \(S_n\), since \(S_n\) is generated by adjacent swaps.</p>
<p>We make a distinction between the operation
\(\circlearrowleft_{i, j}\) and the permutation it induces on
\(R\), since the latter “loses” the orientation
information, which is important to preserve when talking about the
action of \(\circlearrowleft_{i, j}\) on some \(x_i\).
<a href="#r3">↩</a></p>
<p id="fn4">[4] Note that, depending on the text, the commutator may
be defined slightly differently as \(g h g^{-1} h^{-1}\).
<a href="#r4">↩</a></p>
<p id="fn5">[5] \(K(A_4)\) is isomorphic to \(V\), the
<a href="https://en.wikipedia.org/wiki/Klein_four-group">Klein four-group</a>.
<a href="#r5">↩</a></p>
<p id="fn6">[6] In fact, the quartic formula has three nested
radicals. I wonder why?
<a href="#r6">↩</a></p>
</section>
https://www.akalin.com/computing-iroot
Computing Integer Roots
2016-01-10T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: use align* instead of aligned when it is supported by KaTeX. -->
<!-- TODO: Use \operatorname when it is supported by KaTeX. -->
<!-- TODO: Use \dotsc when it is supported by KaTeX. -->
<script src="https://cdn.rawgit.com/jasondavies/jsbn/v1.4/jsbn.js"></script>
<script src="https://cdn.rawgit.com/jasondavies/jsbn/v1.4/jsbn2.js"></script>
<section>
<header>
<h2>1. The algorithm</h2>
</header>
<p>Today I’m going to talk about the generalization of
the <a href="/computing-isqrt">integer square root algorithm</a> to
higher roots. That is, given \(n\) and \(p\), computing
\(\mathrm{iroot}(n, p) = \lfloor \sqrt[p]{n} \rfloor\), or the
greatest integer whose \(p\)th power is less than or equal to
\(n\). The generalized algorithm is straightforward, and it’s
easy to generalize the proof of correctness, but the run-time bound is
a bit trickier, since it has a dependence on \(p\).</p>
<p>First, the algorithm, which we’ll call \(\mathrm{N{\small
EWTON}\text{-}I{\small ROOT}}\):</p>
<ol>
<li>If \(n = 0\), return \(0\).</li>
<li>If \(p \ge \mathrm{Bits}(n)\) return \(1\).</li>
<li>Otherwise, set \(i\) to \(0\) and set \(x_0\) to \(2^{\lceil
\mathrm{Bits}(n) / p\rceil}\).</li>
<li>Repeat:
<ol>
<li>Set \(x_{i+1}\) to \(\lfloor ((p - 1) x_i + \lfloor
n/x_i^{p-1} \rfloor) / p \rfloor\).</li>
<li>If \(x_{i+1} \ge x_i\), return \(x_i\). Otherwise, increment
\(i\).</li>
</ol>
</li>
</ol>
<p>and its implementation in Javascript:<sup><a href="#fn1" id="r1">[1]</a></sup></p>
<script>
// iroot returns the greatest number x such that x^p <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/jasondavies/jsbn ), n must be non-negative, and
// p must be a positive integer.
//
// Example (open up the JS console on this page and type):
//
// iroot(new BigInteger("64"), 3).toString()
function iroot(n, p) {
var s = n.signum();
if (s < 0) {
throw new Error('negative radicand');
}
if (p <= 0) {
throw new Error('non-positive degree');
}
if (p !== (p|0)) {
throw new Error('non-integral degree');
}
if (s == 0) {
return n;
}
var b = n.bitLength();
if (p >= b) {
return n.constructor.ONE;
}
// x = 2^ceil(Bits(n)/p)
var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p));
var pMinusOne = new n.constructor((p - 1).toString());
var pBig = new n.constructor(p.toString());
while (true) {
// y = floor(((p-1)x + floor(n/x^(p-1)))/p)
var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig);
if (y.compareTo(x) >= 0) {
return x;
}
x = y;
}
}
</script>
<pre class="code-container"><code class="language-javascript">// iroot returns the greatest number x such that x^p <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/jasondavies/jsbn ), n must be non-negative, and
// p must be a positive integer.
//
// Example (open up the JS console on this page and type):
//
// iroot(new BigInteger("64"), 3).toString()
function iroot(n, p) {
var s = n.signum();
if (s < 0) {
throw new Error('negative radicand');
}
if (p <= 0) {
throw new Error('non-positive degree');
}
if (p !== (p|0)) {
throw new Error('non-integral degree');
}
if (s == 0) {
return n;
}
var b = n.bitLength();
if (p >= b) {
return n.constructor.ONE;
}
// x = 2^ceil(Bits(n)/p)
var x = n.constructor.ONE.shiftLeft(Math.ceil(b/p));
var pMinusOne = new n.constructor((p - 1).toString());
var pBig = new n.constructor(p.toString());
while (true) {
// y = floor(((p-1)x + floor(n/x^(p-1)))/p)
var y = pMinusOne.multiply(x).add(n.divide(x.pow(pMinusOne))).divide(pBig);
if (y.compareTo(x) >= 0) {
return x;
}
x = y;
}
}</code></pre>
<p>This algorithm turns out to require \(\Theta(p) + O(\lg \lg n)\)
loop iterations, with the run-time for a loop iteration depending on
what kind of arithmetic operations are used.</p>
</section>
<section>
<header>
<h2>2. Correctness</h2>
</header>
<p>Again we look at the iteration rule:
\[
x_{i+1} = \left\lfloor \frac{(p - 1) x_i + \left\lfloor \frac{n}{x_i^{p-1}}
\right\rfloor}{p} \right\rfloor
\]
Letting \(f(x)\) be the right-hand side, we can again use basic
properties of the floor function to remove the inner floor:
\[
f(x) = \left\lfloor \frac{1}{p} ((p-1) x + n/x^{p-1}) \right\rfloor
\]
Letting \(g(x)\) be its real-valued equivalent:
\[
g(x) = \frac{1}{p} ((p-1) x + n/x^{p-1})
\]
we can, again using basic properties of the floor function, show that
\(f(x) \le g(x)\), and for any integer \(m\), \(m \le f(x)\) if and
only if \(m \le g(x)\).</p>
<p>Finally, let’s give a name to our desired output: let \(s =
\mathrm{iroot}(n, p) = \lfloor \sqrt[p]{n} \rfloor\).<sup><a href="#fn2" id="r2">[2]</a></sup></p>
<p>Unsurprisingly, \(f(x)\) never underestimates:</p>
<p class="theorem">(<span class="theorem-name">Lemma 1</span>.) For
\(x \gt 0\), \(f(x) \ge s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> By the basic properties of
\(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge
s\). \(g'(x) = (1 - 1/p) (1 - n/x^p)\) and \(g''(x) = (p - 1)
(n/x^{p+1})\). Therefore, \(g(x)\) is concave-up for \(x \gt 0\); in
particular, its single positive extremum at \(x = \sqrt[p]{n}\) is a
minimum. But \(g(\sqrt[p]{n}) = \sqrt[p]{n} \ge s\). ∎</p>
</div>
<p>Also, our initial guess is always an overestimate:</p>
<p class="theorem">(<span class="theorem-name">Lemma 2</span>.) \(x_0
\gt s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> \(\mathrm{Bits}(n) =
\lfloor \lg n \rfloor + 1 \gt \lg n\). Therefore,
\[
\begin{aligned}
x_0 &= 2^{\lceil \mathrm{Bits}(n) / p \rceil} \\
&\ge 2^{\mathrm{Bits}(n) / p} \\
&\gt 2^{\lg n / p} \\
&= \sqrt[p]{n} \\
&\ge s\text{.} \; \blacksquare
\end{aligned}
\]
</p>
</div>
<p>Therefore, we again have the invariant that \(x_i \ge s\), which
lets us prove partial correctness:</p>
<p class="theorem">(<span class="theorem-name">Theorem 1</span>.) If
\(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) terminates, it
returns the value \(s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume it terminates. If it
terminates in step \(1\) or \(2\), then we are done. Otherwise, it can
only terminate in step \(4.2\) where it returns \(x_i\) such that
\(x_{i+1} = f(x_i) \ge x_i\). This implies \(g(x_i) = ((p-1)x_i +
n/x_i^{p-1}) / p \ge x_i\). Rearranging yields \(n \ge x_i^p\) and
combining with our invariant we get \(\sqrt[p]{n} \ge x_i \ge s\). But
\(s + 1 \gt \sqrt[p]{n}\), so that forces \(x_i\) to be \(s\), and
thus \(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) returns \(s\)
if it terminates. ∎</p>
</div>
<p>Total correctness is also easy:</p>
<p class="theorem">(<span class="theorem-name">Theorem 2</span>.)
\(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) terminates.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume it doesn’t
terminate. Then we have a strictly decreasing infinite sequence of
integers \(\{ x_0, x_1, \ldots \}\). But this sequence is bounded below
by \(s\), so it cannot decrease indefinitely. This is a contradiction,
so \(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) must
terminate. ∎</p>
</div>
<p>Note that, like \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\),
the check in step \(4.2\) cannot be weakened to \(x_{i+1} = x_i\), as
doing so would cause the algorithm to oscillate. In fact, as \(p\)
grows, so do the number of values of \(n\) that exhibit this behavior,
and so do the number of possible oscillations. For example, \(n =
972\) with \(p = 3\) would yield the sequence \(\{ 16, 11, 10, 9, 10,
9, \ldots \}\), and \(n = 80\) with \(p = 4\) would yield the sequence
\(\{ 4, 3, 2, 4, 3, 2, \ldots \}\).</p>
</section>
<section>
<header>
<h2>3. Run-time</h2>
</header>
<p>We will show that \(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\)
takes \(\Theta(p) + O(\lg \lg n)\) loop iterations. Then we will
analyze a single loop iteration and the arithmetic operations used to
get a total run-time bound.</p>
<p>Analagous to the square root case, define \(\mathrm{Err}(x) =
x^p/n - 1\) and let \(\epsilon_i = \mathrm{Err}(x_i)\). First,
let’s prove our lower bound for \(\epsilon_i\), which translates
directly from the square root case:</p>
<p class="theorem">(<span class="theorem-name">Lemma 3</span>.) \(x_i
\ge s + 1\) if and only if \(\epsilon_i \ge 1/n\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> \(n \lt (s + 1)^p\), so \(n + 1
\le (s + 1)^p\), and therefore \((s + 1)^p/n - 1 \ge 1/n\). But the
expression on the left side is just \(\mathrm{Err}(s +
1)\). \(x_i \ge s + 1\) if and only if \(\epsilon_i \ge
\mathrm{Err}(s + 1)\), so the result immediately
follows. ∎</p>
</div>
<p>Now for the next few lemmas we need to do some algebra and
calculus. Inverting \(\mathrm{Err}(x)\), we get that \(x_i =
\sqrt[p]{(\epsilon_i + 1) \cdot n}\). Expressing \(g(x_i)\) in terms
of \(\epsilon_i\) and \(q = 1 - 1/p\) we get
\[ g(x_i) = \sqrt[p]{n} \left( \frac{\epsilon_i q +
1}{(\epsilon_i + 1)^q} \right) \]
and
\[
\mathrm{Err}(g(x_i))
= \frac{(q \epsilon_i + 1)^p}{(\epsilon_i + 1)^{p-1}} - 1\text{.}
\]
Let
\[
f(\epsilon) = \frac{(q \epsilon + 1)^p}{(\epsilon + 1)^{p-1}} - 1\text{.}
\]
Then computing derivatives,
\[
\begin{aligned}
f'(\epsilon) &= q \epsilon \frac{(q \epsilon + 1)^{p-1}}{(\epsilon + 1)^p}\text{,} \\
f''(\epsilon) &= q \frac{(q \epsilon + 1)^{p-2}}{(\epsilon + 1)^{p + 1}}\text{, and} \\
f'''(\epsilon) &= -q (2 + q (2 + 3 \epsilon)) \frac{(q \epsilon + 1)^{p-3}}{(\epsilon + 1)^{p + 2}}\text{.}
\end{aligned}
\]
Note that \(f(0) = f'(0) = 0\), and \(f''(0) = q\). Also, for
\(\epsilon > 0\), \(f'(\epsilon) \gt 0\), \(f''(\epsilon) \gt 0\), and
\(f'''(\epsilon) < 0\).</p>
<p>Now we’re ready to show that the \(\epsilon_i\) shrink
quadratically:</p>
<p class="theorem">(<span class="theorem-name">Lemma 4</span>.)
\(f(\epsilon) \lt (\epsilon/\sqrt{2})^2\) for \(\epsilon \gt 0\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Taylor-expand \(f(\epsilon)\)
around \(0\) with
the <a href="https://en.wikipedia.org/wiki/Taylor%27s_theorem#Explicit_formulae_for_the_remainder">Lagrange
remainder form</a> to get \[ f(\epsilon) = f(0) + f'(0) \epsilon +
\frac{f''(0)}{2} \epsilon^2 + \frac{f'''(\xi)}{6} \epsilon^3 \] for
some some \(\xi\) such that \(0 \lt \xi \lt \epsilon\). Plugging in
values, we see that \(f(\epsilon) = \frac{1}{2} q \epsilon^2 +
\frac{1}{6} f'''(\xi) \epsilon^3\) with the last term being negative,
so \(f(\epsilon) \lt \frac{1}{2} q \epsilon^2 \lt \frac{1}{2}
\epsilon^2\). ∎</p>
</div>
<p>But this is only a useful upper bound when \(\epsilon_i \le 1\). In
the square root case this was okay, since \(\epsilon_1 \le 1\), but
that is not true for larger values of \(p\). In fact, in general, the
\(\epsilon_i\) start off shrinking <em>linearly</em>:</p>
<p class="theorem">(<span class="theorem-name">Lemma 5</span>.) For
\(\epsilon \gt 1\), \(f(\epsilon) \gt \epsilon/8\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Since \(f(0) = f'(0) = 0\), and
\(f''(\epsilon) \gt 0\) for \(\epsilon \ge 0\), \(f'(\epsilon)\) and
\(f(\epsilon)\) are increasing, and thus \(f(1) \gt 0\) and
\(f(\epsilon)\) is a concave-up curve.</p>
<p>Then \((0, 0)\) and \((1, f(1))\) are two points on a concave-up
curve, and thus geometrically the line \(y = f(1) \epsilon\) must lie
below \(y = f(\epsilon)\) for \(\epsilon \gt 1\), and thus
\(f(\epsilon) \gt f(1) \epsilon\) for \(\epsilon \gt
1\). Algebraically, this also follows from the definition
of <a href="https://en.wikipedia.org/wiki/Convex_function">(strict)
convexity</a> (with \(x_1 = 0\), \(x_2 = \epsilon\), and \(t = 1 -
1/\epsilon\)).</p>
<p>But \(f(1) = (2 - 1/p)^p/2^{p-1} - 1 = 2 \left(1 -
\frac{1}{2p}\right)^p - 1\), which is always increasing as a function
of \(p\), as you can see by calculating its derivative. Therefore, its
minimum is at \(p = 2\), which is \(1/8\), and so \(f(\epsilon) \gt
f(1) \epsilon \ge \epsilon/8\). ∎</p>
</div>
<p>Finally, let’s bound our initial values:</p>
<p class="theorem">(<span class="theorem-name">Lemma 6</span>.) \(x_0
\le 2s\) and \(\epsilon_0 \le 2^p - 1\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span>
This is a straightforward generalization of the equivalent lemma
from the square root case. Let’s start with \(x_0\):
\[
\begin{aligned}
x_0 &= 2^{\lceil \mathrm{Bits}(n) / p \rceil} \\
&= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + p - 1)/p \rfloor} \\
&= 2^{\lfloor \lg n / p \rfloor + 1} \\
&= 2 \cdot 2^{\lfloor \lg n / p \rfloor}\text{.}
\end{aligned}
\]
Then \(x_0/2 = 2^{\lfloor \lg n / p \rfloor} \le 2^{\lg n / p} =
\sqrt[p]{n}\). Since \(x_0/2\) is an integer, \(x_0/2 \le
\sqrt[p]{n}\) if and only if \(x_0/2 \le \lfloor \sqrt[p]{n} \rfloor =
s\). Therefore, \(x_0 \le 2s\).</p>
<p>As for \(\epsilon_0\):
\[
\begin{aligned}
\epsilon_0 &= \mathrm{Err}(x_0) \\
&\le \mathrm{Err}(2s) \\
&= (2s)^p/n - 1 \\
&= 2^p s^p/n - 1\text{.}
\end{aligned}
\]
Since \(s^p \le n\), \(2^p s^p/n \le 2^p\) and thus \(\epsilon_0 \le
2^p - 1\). ∎</p>
</div>
<p>Now we’re ready to show our main result, which involves
calculating how long the \(\epsilon_i\) shrink linearly:</p>
<p class="theorem">(<span class="theorem-name">Theorem 3</span>.)
\(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) performs \(\Theta(p)
+ O(\lg \lg n)\) loop iterations.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume that \(\epsilon_i \gt 1\)
for \(i \le j\), \(\epsilon_{j+1} \le 1\), and \(j+k\) is the number
of loop iterations performed when running the algorithm for \(n\) and
\(p\) (i.e., \(x_{j+k} \ge x_{j+k-1}\)). Using Lemma 5,
\[
\left( \frac{1}{8} \right)^{j+1} \epsilon_0 \lt \epsilon_{j+1} \le 1\text{,}
\]
which implies
\[
j \gt \frac{\lg \epsilon_0}{3} - 1\text{.}
\]
</p>
<p>Similarly,</p>
\[
\left( \frac{1}{8} \right)^j \epsilon_0 \ge \epsilon_j \gt 1\text{,}
\]
which implies
\[
j \lt \frac{\lg \epsilon_0}{3} \text{.}
\]
<p>Therefore, \(j = \Theta(\lg \epsilon_0)\), which is \(\Theta(p)\)
by Lemma 6.</p>
<p>Now assume \(k \ge 5\). Then \(x_i \ge s + 1\) for \(i \lt j + k -
1\). Since \(\epsilon_{j+1} \le 1\) by assumption, \(\epsilon_{j+3}
\le 1/2\) and \(\epsilon_i \le (\epsilon_{j+3})^{2^{i-j-3}}\) for \(j
+ 3 \le i \lt j + k - 1\) by Lemma 4, then \(\epsilon_{j+k-2} \le
2^{-2^{k-5}}\). But \(1/n \le \epsilon_{j+k-2}\) by Lemma 3, so \(1/n
\le 2^{-2^{k-5}}\). Taking logs to bring down the \(k\) yields \(k - 5
\le \lg \lg n\). Then \(k \le \lg \lg n + 5\), and thus \(k = O(\lg
\lg n)\).</p>
<p>Therefore, the total number of loop iterations is \(\Theta(p) +
O(\lg \lg n)\). ∎</p>
</div>
<p>Note that \(p \le \lg n\), so we can just say that
\(\mathrm{N{\small EWTON}\text{-}I{\small ROOT}}\) performs
\(\Theta(\lg n)\) operations. But that obscures rather than
simplifies. Note that the proof above is very similar to the proof of
the worse run-time of \(\mathrm{N{\small EWTON}\text{-}I{\small
SQRT}'}\) where the initial guess varies. In this case, the error in
our initial guess is magnified, since we raise it to the \((p-1)\)th
power, and so that manifests as the \(\Theta(p)\) term.</p>
<p>Furthermore, unlike the square root case, the number of arithmetic
operations in a loop iteration isn’t constant. In particular,
the sub-step to compute \(x_i^{p-1}\) takes a number of arithmetic
operations dependent on \(p - 1\). Using repeated squarings, this
computation would take \(\Theta(\lg p)\) squarings and at most
\(\Theta(\lg p)\) multiplications.</p>
<p>If the cost of an arithmetic operation is constant, e.g.,
we’re working with fixed-size integers, then the run-time bounds
is the above multiplied by \(\Theta(\lg p)\).</p>
<p>Otherwise, if the cost of an arithmetic operation depends on the
length of its arguments, then we only have to multiply by a constant
factor to get the run-time bounds in terms of arithmetic
operations. If the cost of multiplying two numbers \(\le x\) is \(M(x)
= O(\lg^k x)\), then the cost of computing \(x^p\) is \(O((p \lg
x)^k)\). But \(x\) is \(\Theta(n^{1/p})\), so the cost of computing
\(x^p\) is \(O(\lg^k n)\), which is on the order of the cost of
multiplying two numbers \(\le n\). Furthermore, note that we divide
the result into \(n\), so we can stop once the computation of
\(x_i^{p-1}\) exceeds \(n\). So in that case, we can treat a loop
iteration as if it were performing a constant number of arithmetic
operations on numbers of order \(n\), and so, like in the square root
case, we pick up a factor of \(D(n)\), where \(D(n)\) is the run-time
of dividing \(n\) by some number \(\le n\).</p>
</section>
<section class="footnotes">
<p id="fn1">[1] Go and JS implementations are available
on <a href="https://github.com/akalin/iroot">my GitHub</a>.
<a href="#r1">↩</a></p>
<p id="fn2">[2] Here, and in most of the article, we’ll
implicitly assume that \(n \gt 0\) and \(p \gt 1\).
<a href="#r2">↩</a></p>
</section>
https://www.akalin.com/sampling-visible-sphere
Sampling the Visible Sphere
2015-08-26T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<p><em>(Note: this article is a summary of
<a href="http://ompf2.com/viewtopic.php?f=3&t=1914">this thread on
ompf2</a>.)
</em></p>
<p>The usual method for sampling a sphere from a point outside the
sphere is to calculate the angle of the cone of the visible portion
and uniformly sample within that cone, as described in
<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.6561">Shirley/Wang</a>.</p>
<p>However, one detail that is glossed over is that you still need to map
from the sampled direction to the point on the sphere. The usual
method is to simply generate a ray from the point and the sampled
direction and intersect it with the sphere. However, this intersection
test may fail due to floating point inaccuracies (e.g., if the sphere
is small and the distance from the point is large).</p>
<p>I've found a couple of existing ways to deal with this. As
described in the pbrt book, pbrt simply assumes that the ray just
grazes the sphere if the intersection fails, and then projects the
center of the sphere onto the ray
(<a href="https://github.com/mmp/pbrt-v2/blob/master/src/shapes/sphere.cpp#L249">code
here</a>). mitsuba moves the origin of the ray closer to the sphere
(in fact, from within it) before doing the test (falling back to
projecting the center onto the ray if that still fails)
(<a href="https://www.mitsuba-renderer.org/repos/mitsuba/files/aeb7f95b37111187cc2ddf21cfffeff118bc52d2/src/shapes/sphere.cpp#L287">code
here</a>).</p>
<p>However, this seems inelegant. I've come up with a better way,
which involves converting the sampled cone angle \(\theta\) (as
measured from the segment connecting the point to the sphere center)
into an angle \(\alpha\) from the inside of the sphere, and then
simply using \(\alpha\) and the sampled polar angle \(\varphi\) onto
the sphere. This turns out to be simple, and in my unscientific tests
a bit faster.</p>
<p>Here's a crude diagram showing the geometry:<p>
<img src="/sampling-visible-sphere-files/diagram.png" alt="Diagram for derivation of cos α" />
<p>You can see that
\[
L = d \cos \theta - \sqrt{r^2 - d^2 \sin^2 \theta}
\]
and also by the law of cosines,
\[
L^2 = d^2 + r^2 - 2 d r \cos \alpha\text{.}
\]
We're actually more interested in \(\cos \alpha\), so solving for that
we get
\[
\cos \alpha = \frac{d}{r} \sin^2 \theta + \cos \theta \sqrt{1 - \frac{d^2}{r^2} \sin^2 \theta}\text{.}
\]
An alternate form, which may be easier to analyze, recalling that
\(\sin \theta_{\max} = r/d\), is
\[
\cos \alpha = \frac{\sin^2 \theta}{\sin \theta_{\max}} + \cos \theta \sqrt{1 - \frac{\sin^2 \theta}{\sin^2 \theta_{\max}}}\text{.}
\]
</p>
<p>So sampling pseudocode would look like:</p>
<pre class="code-container"><code class="language-c++">(cos θ, φ) = uniformSampleCone(rng, cos θmax)
D = 1 - d² sin² θ / r²
if D ≤ 0 {
cos α = sin θmax
} else {
cos α = (d/r) sin² θ + cos θ √D
}
ω = sphericalDirection(cos α, φ)
pSurface = C + r ω</code></pre>
<p>I haven't done any analysis yet on the most robust way [in the
floating-point sense] to do the calculations above.)</p>
<p>There's no backfacing since we clamp \(\cos \alpha\) to \(\sin
\theta_{\max}\), which is analogous to the case when the ray from
\(P\) misses the sphere.</p>
<p>Note that one cannot just compute \(\alpha_{\max}\) and uniformly
sample the cone from inside the sphere, as that doesn't produce the
same distribution over the visible region as sampling the cone from
outside the sphere. To preserve correctness, you would have to use the
(uniform) PDF over the surface area of the visible portion of the
sphere, but you would have to then convert that to a PDF with respect
to projected solid angle from \(P\), which is suboptimal to just doing
the sampling with respect to projected solid angle from \(P\) as
described above.</p>
https://www.akalin.com/computing-isqrt
Computing the Integer Square Root
2014-12-09T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: use align* instead of aligned when it is supported by KaTeX. -->
<!-- TODO: Use \operatorname when it is supported by KaTeX. -->
<!-- TODO: Use \dotsc when it is supported by KaTeX. -->
<script src="https://cdn.rawgit.com/jasondavies/jsbn/v1.4/jsbn.js"></script>
<script src="https://cdn.rawgit.com/jasondavies/jsbn/v1.4/jsbn2.js"></script>
<section>
<header>
<h2>1. The algorithm</h2>
</header>
<p>Today I’m going to talk about a fast algorithm to compute
the <em><a href="https://en.wikipedia.org/wiki/Integer_square_root">integer
square root</a></em> of a non-negative integer \(n\),
\(\mathrm{isqrt}(n) = \lfloor \sqrt{n} \rfloor\), or in words,
the greatest integer whose square is less than or equal to \(n\).<sup><a href="#fn1" id="r1">[1]</a></sup> Most
sources that describe the algorithm take it for granted that it is
correct and fast. This is far from obvious! So I will prove both
correctness and speed below.</p>
<p>One simple fact is that \(\mathrm{isqrt}(n) \le n/2\), so a
straightforward algorithm is just to test every non-negative integer
up to \(n/2\). This takes \(O(n)\) arithmetic operations, which is bad
since it’s exponential in the <em>size</em> of the input. That
is, letting \(\mathrm{Bits}(n)\) be the number of bits required
to store \(n\) and letting \(\lg n\) be the base-\(2\) logarithm of
\(n\), \(\mathrm{Bits}(n) = O(\lg n)\), and thus this algorithm
takes \(O(2^{\mathrm{Bits}(n)})\) arithmetic operations.</p>
<p>We can do better by doing binary search; start with the range \([0,
n/2]\) and adjust it based on comparing the square of an integer in
the middle of the range to \(n\). This takes \(O(\lg n) =
O(\mathrm{Bits}(n))\) arithmetic operations.</p>
<p>However, the algorithm below is even faster:<sup><a href="#fn2" id="r2">[2]</a></sup></p>
<ol>
<li>If \(n = 0\), return \(0\).</li>
<li>Otherwise, set \(i\) to \(0\) and set \(x_0\) to \(2^{\lceil
\mathrm{Bits}(n) / 2\rceil}\).</li>
<li>Repeat:
<ol>
<li>Set \(x_{i+1}\) to \(\lfloor (x_i + \lfloor n/x_i \rfloor) /
2 \rfloor\).</li>
<li>If \(x_{i+1} \ge x_i\), return \(x_i\). Otherwise, increment
\(i\).</li>
</ol>
</li>
</ol>
<p>Call this algorithm \(\mathrm{N{\small EWTON}\text{-}I{\small
SQRT}}\), since it’s based
on <a href="https://en.wikipedia.org/wiki/Newton%27s_method">Newton’s
method</a>. It’s not obvious, but this algorithm returns
\(\mathrm{isqrt}(n)\) using only \(O(\lg \lg n) =
O(\lg(\mathrm{Bits}(n)))\) arithmetic operations, as we will
prove below. But first, here’s an implementation of the
algorithm in Javascript:<sup><a href="#fn3" id="r3">[3]</a></sup></p>
<script>
// isqrt returns the greatest number x such that x^2 <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/jasondavies/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
// isqrt(new BigInteger("64")).toString()
function isqrt(n) {
var s = n.signum();
if (s < 0) {
throw new Error('negative radicand');
}
if (s == 0) {
return n;
}
// x = 2^ceil(Bits(n)/2)
var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
while (true) {
// y = floor((x + floor(n/x))/2)
var y = x.add(n.divide(x)).shiftRight(1);
if (y.compareTo(x) >= 0) {
return x;
}
x = y;
}
}
</script>
<pre class="code-container"><code class="language-javascript">// isqrt returns the greatest number x such that x^2 <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/jasondavies/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
// isqrt(new BigInteger("64")).toString()
function isqrt(n) {
var s = n.signum();
if (s < 0) {
throw new Error('negative radicand');
}
if (s == 0) {
return n;
}
// x = 2^ceil(Bits(n)/2)
var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
while (true) {
// y = floor((x + floor(n/x))/2)
var y = x.add(n.divide(x)).shiftRight(1);
if (y.compareTo(x) >= 0) {
return x;
}
x = y;
}
}</code></pre>
</section>
<section>
<header>
<h2>2. Correctness</h2>
</header>
<p>The core of the algorithm is the iteration rule:
\[
x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i}
\rfloor}{2} \right\rfloor
\]
where
the <a href="https://en.wikipedia.org/wiki/Floor_and_ceiling_functions">floor
functions</a> are there only because we’re using integer
division. Define an integer-valued function \(f(x)\) for the right
side. Using basic properties of the floor function, you can show that
you can remove the inner floor:
\[
f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor
\]
which makes it a bit easier to analyze. Also, the properties of
\(f(x)\) are closely related to its equivalent real-valued function:
\[
g(x) = \frac{1}{2} (x + n/x)\text{.}
\]</p>
<p>For starters, again using basic properties of the floor function,
you can show that \(f(x) \le g(x)\), and for any integer \(m\), \(m
\le f(x)\) if and only if \(m \le g(x)\).</p>
<p>Finally, let’s give a name to our desired output: let \(s =
\mathrm{isqrt}(n) = \lfloor \sqrt{n} \rfloor\).<sup><a href="#fn4" id="r4">[4]</a></sup></p>
<p>Intuitively, \(f(x)\) and \(g(x)\) “average out”
however far away their input \(x\) is from \(\sqrt{n}\). Conveniently,
this “average” is never an undereestimate:</p>
<p class="theorem">(<span class="theorem-name">Lemma 1</span>.) For
\(x \gt 0\), \(f(x) \ge s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> By the basic properties of
\(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge
s\). \(g'(x) = (1 - n/x^2)/2\) and \(g''(x) = n/x^3\). Therefore,
\(g(x)\) is concave-up for \(x \gt 0\); in particular, its single
positive extremum at \(x = \sqrt{n}\) is a minimum. But \(g(\sqrt{n})
= \sqrt{n} \ge s\). ∎</p>
</div>
<p>(You can also prove Lemma 1 without calculus; show that \(g(x) \ge
s\) if and only if \(x^2 - 2sx + n \ge 0\), which is true when \(s^2
\le n\), which is true by definition.)</p>
<p>Furthermore, our initial estimate is always an overestimate:</p>
<p class="theorem">(<span class="theorem-name">Lemma 2</span>.) \(x_0
\gt s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> \(\mathrm{Bits}(n) =
\lfloor \lg n \rfloor + 1 \gt \lg n\). Therefore,
\[
\begin{aligned}
x_0 &= 2^{\lceil \mathrm{Bits}(n) / 2 \rceil} \\
&\ge 2^{\mathrm{Bits}(n) / 2} \\
&\gt 2^{\lg n / 2} \\
&= \sqrt{n} \\
&\ge s\text{.} \; \blacksquare
\end{aligned}
\]
</p>
</div>
<p>(Note that any number greater than \(s\), say \(n\) or \(\lceil n/2
\rceil\), can be chosen for our initial guess without affecting
correctness. However, the expression above is necessary to guarantee
performance. Another possibility is \(2^{\lceil \lceil \lg n \rceil /
2 \rceil}\), which has the advantage that if \(n\) is an even power of
\(2\), then \(x_0\) is immediately set to \(\sqrt{n}\). However, this
is usually not worth the cost of checking that \(n\) is a power of
\(2\), as is required to compute \(\lceil \lg n \rceil\).)</p>
<p>An easy consequence of Lemmas 1 and 2 is that the invariant \(x_i
\ge s\) holds. That lets us prove partial correctness of
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\):</p>
<p class="theorem">(<span class="theorem-name">Theorem 1</span>.) If
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) terminates, it
returns the value \(s\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume it terminates. If it
terminates in step \(1\), then we are done. Otherwise, it can only
terminate in step \(3.2\) where it returns \(x_i\) such that \(x_{i+1}
= f(x_i) \ge x_i\). This implies that \(g(x_i) = (x_i + n/x_i) / 2 \ge
x_i\). Rearranging yields \(n \ge x_i^2\) and combining with our
invariant we get \(\sqrt{n} \ge x_i \ge s\). But \(s + 1 \gt
\sqrt{n}\), so that forces \(x_i\) to be \(s\), and thus
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) returns \(s\) if it
terminates. ∎</p>
</div>
<p>For total correctness we also need to show that
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) terminates. But this
is easy:</p>
<p class="theorem">(<span class="theorem-name">Theorem 2</span>.)
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) terminates.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume it doesn’t
terminate. Then we have a strictly decreasing infinite sequence of
integers \(\{ x_0, x_1, \ldots \}\). But this sequence is bounded below
by \(s\), so it cannot decrease indefinitely. This is a contradiction,
so \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) must
terminate. ∎</p>
</div>
<p>We are done proving correctness, but you might wonder if the check
\(x_{i+1} \ge x_i\) in step \(3.2\) is necessary. That is, can it be
weakened to the check \(x_{i+1} = x_i\)? The answer is
“no”; to see that, let \(k = n - s^2\). Since \(n \lt
(s+1)^2\), \(k \lt 2s + 1\). On the other hand, consider the
inequality \(f(x_i) \gt x_i\). Since that would cause the algorithm to
terminate and return \(x_i\), that implies that \(x_i =
s\). Therefore, that inequality is equivalent to \(f(s) \gt s\), which
is equivalent to \(f(s) \ge s + 1\), which is equivalent to \(g(s) =
(s + n/s) / 2 \ge s + 1\). Rearranging yields \(n \ge s^2 +
2s\). Substituting in \(n = s^2 + k\), we get \(s^2 + k \ge s^2 +
2s\), which is equivalent to \(k \ge 2s\). But since \(k \lt 2s + 1\),
that forces \(k\) to equal \(2s\). That is the maximum value \(k\) can
be, so therefore \(n\) must be one less than a perfect square. Indeed,
for such numbers, weakening the check would cause the algorithm to
oscillate between \(s\) and \(s + 1\). For example, \(n = 99\) would
yield the sequence \(\{ 16, 11, 10, 9, 10, 9, \ldots \}\).</p>
</section>
<section>
<header>
<h2>3. Run-time</h2>
</header>
<p>We will show that \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)
takes \(O(\lg \lg n)\) arithmetic operations. Since each loop
iteration does only a fixed number of arithmetic operations (with the
division of \(n\) by \(x\) being the most expensive), it suffices to
show that our algorithm performs \(O(\lg \lg n)\) loop iterations.</p>
<p>It is well known that Newton’s
method <a href="https://en.wikipedia.org/wiki/Newton%27s_method#Proof_of_quadratic_convergence_for_Newton.27s_iterative_method">converges
quadratically</a> sufficiently close to a simple root. We can’t
actually use this result directly, since it’s not clear that the
convergence properties of Newton’s method are preserved when
using integer operations, but we can do something similar.</p>
<p>Define \(\mathrm{Err}(x) = x^2/n - 1\) and let \(\epsilon_i =
\mathrm{Err}(x_i)\). Intuitively, \(\mathrm{Err}(x)\) is a
conveniently-scaled measure of the error of \(x\): it is less than
\(1\) for most of the values we care about and it bounded below for
integers greater than our target \(s\). Also, we will show that the
\(\epsilon_i\) shrink quadratically. These facts will then let us show
our bound for the iteration count.</p>
<p>First, let’s prove our lower bound for \(\epsilon_i\):</p>
<p class="theorem">(<span class="theorem-name">Lemma 3</span>.) \(x_i
\ge s + 1\) if and only if \(\epsilon_i \ge 1/n\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> \(n \lt (s + 1)^2\), so \(n + 1
\le (s + 1)^2\), and therefore \((s + 1)^2/n - 1 \ge 1/n\). But the
expression on the left side is just \(\mathrm{Err}(s +
1)\). \(x_i \ge s + 1\) if and only if \(\epsilon_i \ge
\mathrm{Err}(s + 1)\), so the result immediately
follows. ∎</p>
</div>
<p>Then we can use that to show that the \(\epsilon_i\) shrink
quadratically:</p>
<p class="theorem">(<span class="theorem-name">Lemma 4</span>.) If
\(x_i \ge s + 1\), then \(\epsilon_{i+1} \lt (\epsilon_i/2)^2\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> \(\epsilon_{i+1}\) is just
\(\mathrm{Err}(f(x_i)) \le \mathrm{Err}(g(x_i))\), so it
suffices to show that \(\mathrm{Err}(g(x_i)) \lt
(\epsilon_i/2)^2\). Inverting \(\mathrm{Err}(x)\), we get that
\(x_i = \sqrt{(\epsilon_i + 1) \cdot n}\). Expressing \(g(x_i)\) in
terms of \(\epsilon_i\) we get
\[ g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{\epsilon_i +
2}{\sqrt{\epsilon_i + 1}} \right) \]
and
\[
\mathrm{Err}(g(x_i)) = \frac{(\epsilon_i/2)^2}{\epsilon_i+1}\text{.}
\]
Therefore, it suffices to show that the denominator is greater than
\(1\). But \(x_i \ge s + 1\) implies \(\epsilon_i \gt 0\) by Lemma 3,
so that follows immediately and the result is proved. ∎</p>
</div>
<p>Then let’s bound our initial values:</p>
<p class="theorem">(<span class="theorem-name">Lemma 5</span>.) \(x_0
\le 2s\), \(\epsilon_0 \le 3\), and \(\epsilon_1 \le 1\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Let’s start with \(x_0\):
\[
\begin{aligned}
x_0 &= 2^{\lceil \mathrm{Bits}(n) / 2 \rceil} \\
&= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + 1)/2 \rfloor} \\
&= 2^{\lfloor \lg n / 2 \rfloor + 1} \\
&= 2 \cdot 2^{\lfloor \lg n / 2 \rfloor}\text{.}
\end{aligned}
\]
Then \(x_0/2 = 2^{\lfloor \lg n / 2 \rfloor} \le 2^{\lg n / 2} =
\sqrt{n}\). Since \(x_0/2\) is an integer, \(x_0/2 \le \sqrt{n}\) if
and only if \(x_0/2 \le \lfloor \sqrt{n} \rfloor = s\). Therefore,
\(x_0 \le 2s\).</p>
<p>As for \(\epsilon_0\):
\[
\begin{aligned}
\epsilon_0 &= \mathrm{Err}(x_0) \\
&\le \mathrm{Err}(2s) \\
&= (2s)^2/n - 1 \\
&= 4s^2/n - 1\text{.}
\end{aligned}
\]
Since \(s^2 \le n\), \(4s^2/n \le 4\) and thus \(\epsilon_0 \le 3\).</p>
<p>Finally, \(\epsilon_1\) is just
\(\mathrm{Err}(f(x_0))\). Using calculations from Lemma 4,
\[
\begin{aligned}
\epsilon_1 &\le \mathrm{Err}(g(x_0)) \\
&= (\epsilon_0/2)^2/(\epsilon_0 + 1) \\
&\le (3/2)^2/(3 + 1) \\
&= 9/16\text{.}
\end{aligned}
\]
Therefore, \(\epsilon_1 \le 1\). ∎</p>
</div>
<p>Finally, we can show our main result:</p>
<p class="theorem">(<span class="theorem-name">Theorem 3</span>.)
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) performs \(O(\lg \lg
n)\) loop iterations.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Let \(k\) be the number of loop
iterations performed when running the algorithm for \(n\) (i.e., \(x_k
\ge x_{k-1}\)) and assume \(k \ge 4\). Then \(x_i \ge s + 1\) for \(i
\lt k - 1\). Since \(\epsilon_1 \le 1\) by Lemma 5, \(\epsilon_2 \le
1/2\) and \(\epsilon_i \le (\epsilon_2)^{2^{i-2}}\) for \(2 \le i \lt
k - 1\) by Lemma 4, then \(\epsilon_{k-2} \le 2^{-2^{k-4}}\). But
\(1/n \le \epsilon_{k-2}\) by Lemma 3, so \(1/n \le
2^{-2^{k-4}}\). Taking logs to bring down the \(k\) yields \(k - 4 \le
\lg \lg n\). Then \(k \le \lg \lg n + 4\), and thus \(k = O(\lg \lg
n)\). ∎</p>
</div>
<p>Note that in general, an arithmetic operation is not constant-time,
and in fact has run-time \(\Omega(\lg n)\). Since the most expensive
arithmetic operation we do is division, we can say that
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) has run-time that is
both \(\Omega(\lg n)\) and \(O(D(n) \cdot \lg \lg n)\), where \(D(n)\)
is the run-time of dividing \(n\) by some number \(\le n\).<sup><a href="#fn5" id="r5">[5]</a></sup></p>
</section>
<section>
<header>
<h2>4. The Initial Guess</h2>
</header>
<p>It’s also useful to show that if the initial guess \(x_0\) is
bad, then the run-time degrades to \(\Theta(\lg n)\). We’ll do
this by defining the function \(\mathrm{N{\small EWTON}\text{-}I{\small
SQRT}'}\) to be like \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)
except that it takes a function \(\mathrm{I{\small
NITIAL}-G{\small UESS}}\) that is called with \(n\) and assigned to
\(x_0\) in step 1. Then, we can treat \(\epsilon_0\) as a function of
\(n\) and analyze how long \(\epsilon_i\) stays above \(1\) to show
that \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}'}\) takes
\(\Theta(\lg \epsilon_0(n)) + O(\lg \lg n)\) arithmetic
operations.</p>
<p>First, we need a lemma:</p>
<p class="theorem">(<span class="theorem-name">Lemma 6</span>.)
If \(\epsilon_i \gt 1\), then \(\epsilon_{i+1}\) exists and
\(\frac{1}{8}\epsilon_i \le \epsilon_{i+1}\le
\frac{1}{4}\epsilon_i\).</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> First, if \(\epsilon_i \gt 1\),
then \(\epsilon_i \ge 1/n\) and so \(x_i \ge s + 1\), which implies
that the main loop doesn’t terminate with \(x_i\), and thus
\(x_{i+1}\) and \(\epsilon_{i+1}\) exist.</p>
<p>Then,
\[
\begin{aligned}
\mathrm{Err}(g(x_i))
&= \frac{(\epsilon_i/2)^2}{\epsilon_i+1} \\
&= \frac{1}{4} \epsilon_i - \frac{\epsilon_i}{4 (\epsilon_i + 1)} \\
&= \frac{1}{8} \epsilon_i + \frac{\epsilon_i(\epsilon_i - 1)}{8 (\epsilon_i + 1)}\text{,}
\end{aligned}
\]
at which point the result follows. ∎</p>
</div>
<p>Then we can show:</p>
<p class="theorem">(<span class="theorem-name">Theorem 4</span>.)
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}'}\) performs
\(\Theta(\lg \epsilon_0(n)) + O(\lg \lg n)\) loop iterations.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Assume that \(\epsilon_0(n) \gt
1\), \(\epsilon_i \gt 1\) for \(i \lt j\), and \(\epsilon_j \le
1\). Using Lemma 6,
\[
\left( \frac{1}{8} \right)^j \epsilon_0(n) \le \epsilon_j \le 1\text{,}
\]
which implies
\[
j \ge \frac{1}{3} \lg \epsilon_0(n)\text{.}
\]
</p>
<p>Similarly,</p>
\[
\left( \frac{1}{4} \right)^{j-1} \epsilon_0(n) \ge \epsilon_{j-1} \gt 1\text{,}
\]
which implies
\[
j \lt \frac{1}{2} \lg \epsilon_0(n) + 1\text{.}
\]
<p>Therefore, \(j = \Theta(\lg \epsilon_0(n))\). Then Theorem 3 can be
adapted to show that only \(O(\lg \lg n)\) more iterations of the loop
are taken. ∎</p>
</div>
<p>Since \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) uses an
initial guess such that \(\epsilon_0(n) = \Theta(1)\), then Theorem 4
reduces to Theorem 3 in that case. However, if \(x_0\) is chosen to be
\(\Theta(n)\), e.g. the initial guess is just \(n\) or \(n/k\) for
some \(k\), then \(\epsilon_0(n)\) will also be \(\Theta(n)\), and so
the run time will degrade to \(\Theta(\lg n)\). So having a good
initial guess is important for the performance of
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)!</p>
</section>
<section class="footnotes">
<p id="fn1">[1] Aside from
the <a href="https://en.wikipedia.org/wiki/Integer_square_root">Wikipedia
article</a>, the algorithm is described as Algorithm 9.2.11 in
<a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime
Numbers: A Computational Perspective</a>.
<a href="#r1">↩</a></p>
<p id="fn2">[2] Note that only integer operations are used, which makes this
algorithm suitable for arbitrary-precision integers.
<a href="#r2">↩</a></p>
<p id="fn3">[3] Go and JS implementations are available
on <a href="https://github.com/akalin/iroot">my GitHub</a>.
<a href="#r3">↩</a></p>
<p id="fn4">[4] Here, and in most of the article, we’ll
implicitly assume that \(n \gt 0\).
<a href="#r4">↩</a></p>
<p id="fn5">[5] \(D(n)\) is \(\Theta(\lg^2 n)\) using long division, but
fancier division algorithms have better run-times.
<a href="#r5">↩</a></p>
</section>
https://www.akalin.com/constant-time-mssb
Finding the Most Significant Set Bit of a Word in Constant Time
2014-07-03T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<script>
// Converts the given binary string (possibly with whitespace) to an integer.
function b(s) {
return parseInt(s.replace(/\s+/g, ''), 2);
}
// Converts the given integer to a binary string.
function bs(x) {
return x.toString(2);
}
</script>
<section>
<header>
<h2>1. Overall method</h2>
</header>
<p>Finding the most significant set bit of a word (equivalently, finding
the integer log base 2 of a word, or counting the leading zeros of a
word) is
a <a href="https://stackoverflow.com/questions/2589096/find-most-significant-bit-left-most-that-is-set-in-a-bit-array">well-studied
problem</a>. <a href="http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious">Bit
Twiddling Hacks</a> lists various methods,
and <a href="https://en.wikipedia.org/wiki/Count_leading_zeros">Wikipedia</a>
gives the CPU instructions that perform the operation directly.</p>
<p>However, all of these methods are either specific to a certain word
size or take more than constant time (in terms of number of word
operations). That raises the question of whether there <em>is</em> a
method that takes constant time—surprisingly, the answer is
“yes”!<sup><a href="#fn1" id="r1">[1]</a></sup></p>
<p>The key idea is to split a word into \(\lceil \sqrt{w} \rceil\)
blocks of \(\lceil \sqrt{w} \rceil\) bits (where \(w\) is the number
of bits in a word). One can then do certain operations on blocks
“in parallel” by stuffing multiple blocks into a word and
then performing a single word operation.</p>
<p>Furthermore, since the block size and block count are the same, one
can transform the bits of a block into the blocks of a word and vice
versa in various ways using only a constant number of word
operations.</p>
<p>In particular, this lets us split up the problem into two parts:
finding the most significant set (i.e., non-zero) block, and finding
the most significant set bit within that block. It then turns out that
both parts can be done in constant time.</p>
<p>For concreteness, we'll use 32-bit words when explaining the
method below.<sup><a href="#fn2" id="r2">[2]</a></sup></p>
</section>
<section>
<header>
<h2>2. Finding the most significant set bit of a block</h2>
</header>
<p>First, let's consider the sub-problem of finding the most
significant set bit of a block. In fact, let's give ourselves a bit of
room and consider only blocks with the high bit cleared for now; we'll
see why we need this extra bit of room soon.</p>
<p>For 32 bits, the block size is 6 bits, so with the high bit of a
block cleared we're left with 5 bits. Let's look at a naive
implementation:</p>
<script>
function mssb5_naive(x) {
var c = 0;
for (var i = 0; i < 5 && x >= (1 << i); ++i) {
++c;
}
return c - 1;
}
</script>
<pre class="code-container"><code class="language-javascript">function mssb5_naive(x) {
var c = 0;
for (var i = 0; i < 5 && x >= (1 << i); ++i) {
++c;
}
return c - 1;
}</code></pre>
<p>In the above, we consider successive powers of 2 until we find one
greater than our given number. Then the answer is simply one less than
that power.</p>
<p>Notice that the loop has at most 5 iterations; this lines up nicely
with the 5 full blocks in an entire 32-bit word. (This is why we saved
our extra bit of room.) If we can copy our block to the higher 4
blocks and then use word operations to operate on those blocks in
parallel, then we're good.</p>
<p>For our example, let \(x = 5 = 00101\). Duplicating \(x\) among all
the blocks can easily be done by multiplying by the appropriate
constant:</p>
<style>
pre.binary-example {
background: #fdf6e3;
color: #586e75;
padding: 1em;
}
pre.binary-example span.dont-care {
color: #a3b1b1;
}
pre.binary-example span.last-operand {
text-decoration: underline;
}
</style>
<pre class="binary-example">
<span class="first-five"
>00 000000 000000 000000 000000 000101</span>
* <span class="last-operand low-bit-full"
>00 000001 000001 000001 000001 000001</span>
<span class="first-five"
>00 000000 000000 000000 000000 000101</span>
<span class="first-five"
>00 000000 000000 000000 000101</span>
<span class="first-five"
>00 000000 000000 000101</span>
<span class="first-five"
>00 000000 000101</span>
<span class="first-five last-operand"
>00 000101 </span>
<span class="lower-bits-full"
>00 000101 000101 000101 000101 000101</span>
</pre>
<p>In fact, this is a simple use of a more general tool. If \(x\) and
\(y\) are expressed in binary, then multiplying \(x\) by \(y\) can be
seen as taking the index of each set bit in \(y\), creating a copy of
\(x\) shifted by each such index, and then adding up all the shifted
copies. This case is just taking \(y\) to be the constant where the
\(\{ 0, 6, 12, 18, 24 \}\)th bits are set.</p>
<p>The first operation we need to parallelize is the comparisons to
the powers of 2. This can be converted to a word operation by noting
the comparison \(x \geq y\) can be performed by checking the sign of \(x
- y\), and that checking the sign can be done by setting the unused
high bit of \(x\) before doing the comparison, and then checking to
see if that high bit was left intact (i.e., not borrowed from). So we
pre-compute a constant with the \(n\)th block containing the \(n\)th
power of 2, then subtract that from our block containing the
duplicated blocks with the high bit set. Finally, we can then mask off
the unneeded lower bits:</p>
<pre class="binary-example">
<span class="lower-bits-full"
>00 000101 000101 000101 000101 000101</span>
| <span class="last-operand high-bit-full"
>00 100000 100000 100000 100000 100000</span>
<span class="full"
>00 100101 100101 100101 100101 100101</span>
- <span class="last-operand lower-bits-full"
>00 010000 001000 000100 000010 000001</span>
<span class="high-bit-full"
>00 010101 011101 100001 100011 100100</span>
& <span class="last-operand high-bit-full"
>00 100000 100000 100000 100000 100000</span>
<span class="high-bit-full"
>00 000000 000000 100000 100000 100000</span>
</pre>
<p>We're left with a word where all bits except for the high bits of a
block are zero. We still need to sum up those bits, but since they're
a block apart, that can be done by multiplication with a constant to
line up the bits in a single column. The constant turns out to have
the \(\{ 0, 6, 12, 18, 24 \}\)th bits set, with the answer being in
the top three bits:<sup><a href="#fn3" id="r3">[3]</a></sup></p>
<pre class="binary-example">
<span class="high-bit-full"
>00 000000 000000 100000 100000 100000</span>
* <span class="last-operand low-bit-full"
>00 000001 000001 000001 000001 000001</span>
<span class="high-bit-full"
>00 000000 000000 100000 100000 100000</span>
<span class="high-bit-full"
>00 000000 100000 100000 100000</span>
<span class="high-bit-full"
>00 100000 100000 100000</span>
<span class="high-bit-full"
>00 100000 100000</span>
<span class="high-bit-full last-operand"
>00 100000 </span>
<span class="top-three"
>01 100001 100001 100001 000000 100000</span>
MSSB5(x) = 011 - 1 = 2
</pre>
<p>We can now write <code>mssb5()</code> using a constant number of
word operations:<sup><a href="#fn4" id="r4">[4]</a></sup></p>
<script>
function mssb5(x) {
// Duplicate x among all the blocks.
x *= b('00 000001 000001 000001 000001 000001');
// Compare to successive powers of 2 in parallel.
x |= b('00 100000 100000 100000 100000 100000');
x -= b('00 010000 001000 000100 000010 000001');
x &= b('00 100000 100000 100000 100000 100000');
// Sum up the bits into the high 3 bits.
x *= b('00 000001 000001 000001 000001 000001');
// Shift down and subtract 1 to get the answer.
return (x >>> 29) - 1;
}
</script>
<pre class="code-container"><code class="language-javascript">function mssb5(x) {
// Duplicate x among all the blocks.
x *= b('00 000001 000001 000001 000001 000001');
// Compare to successive powers of 2 in parallel.
x |= b('00 100000 100000 100000 100000 100000');
x -= b('00 010000 001000 000100 000010 000001');
x &= b('00 100000 100000 100000 100000 100000');
// Sum up the bits into the high 3 bits.
x *= b('00 000001 000001 000001 000001 000001');
// Shift down and subtract 1 to get the answer.
return (x >>> 29) - 1;
}</code></pre>
<p>Then we can then find the most significant set bit of a full block
by simply testing the high bit first:</p>
<script>
function mssb6(x) {
return (x & b('100000')) ? 5 : mssb5(x);
}
</script>
<pre class="code-container"><code class="language-javascript">function mssb6(x) {
return (x & b('100000')) ? 5 : mssb5(x);
}</code></pre>
</section>
<section>
<header>
<h2>3. Finding the most significant set block</h2>
</header>
<p>Let's now consider the sub-problem of finding the most significant
set block of a word (ignoring the partial one). Similar to the above,
we'd like to be able to use subtraction to compare all the blocks to
zero at the same time. However, that requires the high bit of each
block to be unused. That's easy enough to handle: just separate the
high bit and the lower bits of each block, test the lower bits, and
then bitwise-or the results together:</p>
<pre class="binary-example">
x = <span class="full"
>00 100000 000000 010000 000000 000001</span>
& C = <span class="last-operand high-bit-full"
>00 100000 100000 100000 100000 100000</span>
y1 = <span class="high-bit-full"
>00 100000 000000 000000 000000 100000</span>
x = <span class="full"
>00 100000 000000 010000 000000 000001</span>
& ~C = <span class="last-operand lower-bits-full"
>00 011111 011111 011111 011111 011111</span>
t1 = <span class="lower-bits-full"
>00 000000 000000 010000 000000 000001</span>
C = <span class="full"
>00 100000 100000 100000 100000 100000</span>
- t1 = <span class="last-operand lower-bits-full"
>00 000000 000000 010000 000000 000001</span>
t2 = <span class="high-bit-full"
>00 100000 100000 010000 100000 011111</span>
~t2 = <span class="high-bit-full"
>11 011111 011111 101111 011111 100000</span>
& C = <span class="last-operand high-bit-full"
>00 100000 100000 100000 100000 100000</span>
y2 = <span class="high-bit-full"
>00 000000 000000 100000 000000 100000</span>
y1 = <span class="high-bit-full"
>00 100000 000000 000000 000000 100000</span>
| y2 = <span class="last-operand high-bit-full"
>00 000000 000000 100000 000000 100000</span>
y = <span class="high-bit-full"
>00 100000 000000 100000 000000 100000</span>
</pre>
<p>The result is stored in the high bits of each block. If we could
pack all the bits together, we'd then be able to
use <code>mssb5()</code>. This is similar to where we had to add all
the bits together in part 2, but we need a constant to stagger the
bits instead of lining them up. The constant to put the answer in the
high bits turns out to have the \(\{ 7, 12, 17, 22, 27 \}\)th bits
set:</p>
<pre class="binary-example">
y >>> 5 = <span class="low-bit-full"
>00 000001 000000 000001 000000 000001</span>
* <span class="last-operand every-fifth-from-seventh"
>00 001000 010000 100001 000010 000000</span>
<span class="low-bit-full"
>10 000000 000010 000000 00001</span>
<span class="low-bit-full"
>00 000001 000000 000001</span>
<span class="low-bit-full"
>00 100000 000000 1</span>
<span class="low-bit-full"
>00 000000 01</span>
<span class="last-operand low-bit-full"
>00 001 </span>
= <span class="top-five"
>10 101001 010010 100001 000010 000000</span>
</pre>
<p>This yields the answer <code>10101</code>, where the \(i\)th bit is
set exactly when the \(i\)th block of \(x\) is non-zero. Therefore,
the most significant block is then
simply <code>mssb5(10101)</code>.</p>
</section>
<section>
<header>
<h2>4. Putting it all together</h2>
</header>
<p>With the building blocks above, we can now implement the algorithm
for finding the most significant set bit in the full blocks of a
word:<sup><a href="#fn5" id="r5">[5]</a></sup></p>
<script>
function mssb30(x) {
var C = b('00 100000 100000 100000 100000 100000');
// Check whether the high bit of each block is set.
var y1 = x & C;
// Check whether the lower bits of each block is set.
var y2 = ~(C - (x & ~C)) & C;
var y = y1 | y2;
// Shift the result bits down to the lowest 5 bits.
var z = ((y >>> 5) * b('0000 10000 10000 10000 10000 10000000')) >>> 27;
// Compute the bit index of the most significant set block.
var b1 = 6 * mssb5(z);
// Compute the most significant set bit inside the most significant
// set block.
var b2 = mssb6((x >>> b1) & b('111111'));
return b1 + b2;
}
</script>
<pre class="code-container"><code class="language-javascript">function mssb30(x) {
var C = b('00 100000 100000 100000 100000 100000');
// Check whether the high bit of each block is set.
var y1 = x & C;
// Check whether the lower bits of each block is set.
var y2 = ~(C - (x & ~C)) & C;
var y = y1 | y2;
// Shift the result bits down to the lowest 5 bits.
var z = ((y >>> 5) * b('0000 10000 10000 10000 10000 10000000')) >>> 27;
// Compute the bit index of the most significant set block.
var b1 = 6 * mssb5(z);
// Compute the most significant set bit inside the most significant
// set block.
var b2 = mssb6((x >>> b1) & b('111111'));
return b1 + b2;
}</code></pre>
<p>And then it's simple enough to extend it to find the most
significant set bit of a full word:</p>
<script>
function mssb32(x) {
// Check the high duplet and fall back to mssb30 if it's not set.
var h = x >>> 30;
return h ? (30 + mssb5(h)) : mssb30(x);
}
</script>
<pre class="code-container"><code class="language-javascript">function mssb32(x) {
// Check the high duplet and fall back to mssb30 if it's not set.
var h = x >>> 30;
return h ? (30 + mssb5(h)) : mssb30(x);
}</code></pre>
<p>So the code above shows that we can find the most significant set
bit of a 32-bit word in a constant number of 32-bit word
operations. It is easy enough to see how it can be adapted to yield a
similar algorithm for a given arbitrary (but sufficiently large) word
size, simply by pre-computing the various word-size-dependent
constants.</p>
<p>It is also easy to see why no one actually uses this method on real
computers even in the absence of built-in instructions: it is much
more complicated and almost certainly slower than existing methods for
real word sizes! Also, the word-RAM model—where we assume all
word operations take constant time—is useful only when the word
size is fixed or narrowly bounded. When we allow word size to vary
arbitrarily, the word-RAM model breaks down—for one,
multiplication grows super-linearly with respect to word size! Alas,
this method is doomed to remain a theoretical curiosity, albeit one
that uses a few clever tricks.</p>
<script>
function highlightIndices(str, indices) {
var highlightedStr = '';
var i = 0, j = 0;
for (var k = 0; k < str.length; ++k) {
var chStr = str[str.length - k - 1];
if (chStr == '0' || chStr == '1') {
if (j < indices.length && i == indices[j]) {
++j;
} else {
chStr = '<span class="dont-care">' + chStr + '</span>';
}
++i;
}
highlightedStr = chStr + highlightedStr;
}
return highlightedStr;
}
function highlightElements(selector, indices) {
var es = document.querySelectorAll(selector);
for (var i = 0; i < es.length; ++i) {
var e = es[i];
e.innerHTML = highlightIndices(e.textContent, indices);
}
}
highlightElements('pre.binary-example span.first-five', [0, 1, 2, 3, 4]);
highlightElements('pre.binary-example span.low-bit-full', [0, 6, 12, 18, 24]);
highlightElements('pre.binary-example span.every-fifth-from-seventh',
[7, 12, 17, 22, 27]);
highlightElements('pre.binary-example span.lower-bits-full',
[0, 1, 2, 3, 4,
6, 7, 8, 9, 10,
12, 13, 14, 15, 16,
18, 19, 20, 21, 22,
24, 25, 26, 27, 28]);
highlightElements('pre.binary-example span.high-bit-full', [5, 11, 17, 23, 29]);
highlightElements('pre.binary-example span.full',
[0, 1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29]);
highlightElements('pre.binary-example span.top-three', [29, 30, 31]);
highlightElements('pre.binary-example span.top-five', [27, 28, 29, 30, 31]);
</script>
</section>
<section class="footnotes">
<p id="fn1">[1] The constant-time method is detailed in the original
papers for the <a href="https://en.wikipedia.org/wiki/Fusion_tree">fusion
tree</a> data
structure. <a href="http://dl.acm.org/citation.cfm?id=100217">The
first paper</a> is unfortunately behind a paywall, but
<a href="https://www.sciencedirect.com/science/article/pii/0022000093900404?np=y">the
second paper</a>, essentially a rehash of the first one, is
freely downloadable.</p>
<p>The method is also explained in
<a href="http://courses.csail.mit.edu/6.851/spring12/lectures/L12.html">lecture
12</a> of Erik
Demaine's <a href="http://courses.csail.mit.edu/6.851/spring12/">Advanced
Data Structures</a> class, which is how I originally found out
about it.
<a href="#r1">↩</a></p>
<p id="fn2">[2] Demaine uses 16-bit words, which factors nicely into
4 blocks of 4 bits, but it is instructive to see how the method
deals with the word size not a perfect square.
<a href="#r2">↩</a></p>
<p id="fn3">[3] In this case, the partial 6th block has enough room
to hold the answer but this may not be true in general. This can
be remedied easily enough by shifting down the block high bits to
the low bits before multiplying; the answer will then be in the
last full block.
<a href="#r3">↩</a></p>
<p id="fn4">[4] <code>b(str)</code> just parses a number from its
binary string representation.
<a href="#r4">↩</a></p>
<p id="fn5">[5] Try out this function (and the others on this page)
by opening up the JS console on this page!
<a href="#r5">↩</a></p>
</section>
https://www.akalin.com/primality-testing-polynomial-time-part-2
Primality Testing in Polynomial Time (Ⅱ)
2012-12-29T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: use align* instead of aligned when it is supported by KaTeX. -->
<script type="text/javascript"
src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/trial-division.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/euler-phi.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/multiplicative-order.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/primality-testing.js"></script>
<p><em>(Note: this article isn't fully polished yet, but I thought it
would be a shame to let it languish during my sabbatical. Happy new
year!)</em></p>
<section>
<header>
<h2>5. Strengthening the AKS theorem</h2>
</header>
<p>It turns out the conditions of the AKS theorem are stronger than
they appear; they themselves imply that \(n\) is prime. To show this,
we need the following theorem, which we'll state without proof:</p>
<p class="theorem">
(<span class="theorem-name">Lenstra's squarefree test</span>.) If
\(a^n \equiv a \pmod{n}\) for \(1 \le a \lt \ln^2 n\), then \(n\) is
<a href="http://en.wikipedia.org/wiki/Squarefree">squarefree</a>.<sup><a href="#fn1" id="r1">[1]</a></sup></p>
<p>We also need a couple of lemmas:</p>
<p class="theorem">
(<span class="theorem-name">Lemma 1</span>.)
For \(0 \le a \lt n\) and \(r \gt 1\), let
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]
Then
\[
(a + 1)^n = a + 1 \pmod{n}\text{.}
\]
</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> By definition,
\((X + a)^n - (X^n + a) = k(X) \cdot (X^r - 1) \pmod{n}\). Treating
both sides as a function of \(x\) and substituting in \(1\), we
immediately get \((1 + a)^n - (1 + a) = 0 \pmod{n}\). ∎</p>
</div>
<p class="theorem">
(<span class="theorem-name">Lemma 2</span>.)
For \(n \gt 1\), \(\lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n\).
</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> Since \(\ln n = \frac{\lg n}{\lg
e}\) and \(e \gt 2\), \(\lg n \gt \ln n\) for \(n \gt 1\).</p>
<p>Letting \(k = \lfloor \lg n \rfloor\), \(\ln n \lt \frac{k + 1}{\lg
e}\), so if \(\frac{k + 1}{\lg e} \lt k\), that implies that \(\ln n
\lt k\). Solving for \(k\), we get that \(k \gt \frac{1}{\lg e -
1}\), which is true when \(n \ge 8\).</p>
<p>So if \(n \ge 8\), then \(\ln n \lt \lfloor \lg n \rfloor\).
Checking manually, we find that \(\ln n \lt \lfloor \lg n \rfloor\)
holds also for \(n \in \{ 2, 4, 5, 6, 7 \}\), immediately implying the
lemma for all \(n \gt 1\) except \(3\). But checking manually again,
we find that the lemma holds for \(3\) also. ∎</p>
</div>
<p>Then, we can prove the strong version of the AKS theorem:</p>
<p class="theorem">
(<span class="theorem-name">AKS theorem, strong version</span>.) Let
\(n \ge 2\), \(r\) be relatively prime to \(n\) with \(o_r(n) \gt
\lg^2 n\), and \(M \gt \sqrt{\varphi(r)} \lg n\). Furthermore, let
\(n\) have no prime factor less than \(M\) and let
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]
for \(0 \le a \lt M\). Then \(n\) is prime.</p>
<div class="proof">
<p><span class="proof-name">Proof.</span> From Lemma 1, we know that \(a^n
= a \pmod{n}\) for \(1 \le a \lt M\). Since \(M \gt \lfloor \sqrt{t}
\rfloor \lg n \gt \lfloor \lg n \rfloor \cdot \lg n \gt \ln^2 n\) by
Lemma 2, we can apply Lenstra's squarefree test to show that \(n\) is
squarefree. From the weak version of the AKS theorem, we also know
that \(n\) is a prime power. But since \(n\) is squarefree, it must
have distinct prime factors, which immediately implies that \(n\) is
prime. ∎</p>
</div>
</section>
<section>
<header>
<h2>6. Finding a suitable \(r\)</h2>
</header>
<p>The only remaining loose end is to show that there exists an \(r\)
with \(o_r(n) \gt \lg^2 n\) and that it's small enough (i.e., polylog
in \(n\)). The existence of \(r\) is easy to see; we can simply pick
the smallest \(r\) that is co-prime to \(n\) and greater than
\(n^{\lg^2 n}\). But that's obviously too big. We can do better:</p>
<p class="theorem">
<span class="theorem-name">(Upper bound for \(r\).)</span> Let \(n \ge 2\).
Then there exists some \(r \le \max(3, \lceil \lg n \rceil^5)\) such
that \(o_r(n) \gt \lceil \lg n \rceil^2\).<sup><a href="#fn2" id="r2">[2]</a></sup></p>
<div class="proof">
<p><span class="proof-name">(Proof.)</span> Let's first prove the following
lemma:</p>
<p class="theorem">
<span class="theorem-name">(Lemma 3.)</span> Let \(n \ge 9\) and \(b =
\lceil \lg n \rceil\). Then for \(m \ge 1\), there exists some \(r
\le b^{2m + 1}\) such that \(o_r(n) \gt b^m\).</p>
<div class="proof">
<p><span class="proof-name">(Proof.)</span> Let
\[
N = n \cdot (n - 1) \cdot (n^2 - 1) \cdot \ldots \cdot (n^{b^m} - 1)\text{.}
\]
Note that \(r\) divides \(N\) if and only if \(o_r(n) \le b^m\). So
it suffices to find some \(r\) that does not divide \(N\).</p>
<p>We can see that:
\[
\begin{aligned}
N &= n \cdot (n - 1) \cdot (n^2 - 1) \cdot \ldots \cdot (n^{b^m} - 1) \\
&\lt n \cdot n \cdot n^2 \cdot \ldots \cdot n^{b^m} \\
&= n^{1 + 1 + 2 + 3 + \ldots + b^m} \\
&= n^{1 + b^m (b^m + 1) / 2} \\
&= n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1}\text{.}
\end{aligned}
\]
Furthermore, we can upper-bound the exponent of \(n\):
\[
\begin{aligned}
b^{2m} &\gt \frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1 \\
\frac{1}{2} b^{2m} - \frac{1}{2} b^m - 1 &\gt 0 \\
b^{2m} - b^m - 2 &\gt 0 \\
(b^m - 2) \cdot (b^m + 1) &\gt 0\text{.}
\end{aligned}
\]
The last statement holds when \(b^m \gt 2\), which is always since \(b
\ge 4\) and \(m \ge 1\).</p>
<p>Applying the upper bound,
\[
\begin{aligned}
N &\lt n^{\frac{1}{2} b^{2m} + \frac{1}{2} b^m + 1} \\
&\lt n^{b^{2m}} \\
&\le 2^{b^{2m + 1}}\text{.}
\end{aligned}
\]
</p>
<p>We can then use the following theorem, which we'll state without
proof:</p>
<p class="theorem">
<span class="theorem-name">(<a href="http://en.wikipedia.org/wiki/Primorial">Primorial</a>
lower bound.)</span> For \(x \ge 31\), the product of primes \(\le x\)
exceeds \(2^x\).<sup><a href="#fn3" id="r3">[3]</a></sup> That is,
\[
x\# = \prod_{p \le x\text{, }p\text{ is prime}} p \gt 2^x\text{.}
\]</p>
<p>Since \(b \ge 4\) and \(m \ge 1\), \(b^{2m + 1} \ge 31\), and so
\(2^{b^{2m + 1}} \lt (b^{2m + 1})\#\). Therefore,
\[
N \lt 2^{b^{2m + 1}} \lt (b^{2m + 1})\#\text{.}
\]
But that implies that there is some prime number \(p_0 \le b^{2m +
1}\) that does not divide \(N\); if they all did, then \(N\) would be
at least their product \((b^{2m + 1})\#\), contradicting the
inequality above. Therefore, \(o_{p_0}(n) \gt b^m\). ∎</p>
</div>
<p>We can then prove our theorem: for \(n \ge 9\), apply Lemma 3 with
\(m = 2\). Here are explicit values for the rest: for \(n = 2\), \(r
= 3\); \(n = 3\), \(r = 7\); \(n \in \{ 4, 6, 7, 8\}\), \(r = 11\);
and for \(n = 5\), \(r = 17\). ∎</p>
</div>
<p>Also, it turns out that about half the time, we can do better.
We'll state this theorem without proof:</p>
<p class="theorem"><span class="theorem-name">(Tight upper bound for
some \(r\).)</span> Let \(n \equiv \pm 3 \pmod{8}\). Then there
exists some \(r \lt 8 \lceil \lg n \rceil^2\) such that \(o_r(n) \gt
\lceil \lg n \rceil^2\).<sup><a href="#fn4" id="r4">[4]</a></sup></p>
</section>
<section>
<header>
<h2>7. The AKS algorithm (simple version)</h2>
</header>
<p>Without further ado, here is a simple version of the AKS
algorithm, given \(n \ge 2\):</p>
<ol>
<li>Starting from \(\lceil \lg n \rceil^2 + 2\), find an \(r\) such
that \(\gcd(r, n) = 1\) and \(o_r(n) \gt \lceil \lg n
\rceil^2\).</li>
<li>Compute \(M = \lfloor \sqrt{r - 1} \rfloor \lceil \lg n
\rceil + 1\).</li>
<li>Search for a prime factor of \(n\) less than \(M\). If one is
found, return “composite”. If none are found and \(M \ge
\lfloor \sqrt{n} \rfloor\), return “prime”.</li>
<li>For each \(1 \le a \lt M\), compute \((X + a)^n\), reducing
coefficients mod \(n\) and powers mod \(r\). If the result is not
equal to \(X^{n\text{ mod }r} + a\), return
“composite”.</li>
<li>Otherwise, return “prime”.</li>
</ol>
<p>As we've showed in the previous section, there always exists an
\(r\) such that \(o_r(n) \gt \lceil \lg n \rceil^2\), so step 1 will
terminate. All other steps are bounded, so the entire algorithm will
always terminate.</p>
<p>In step 2, since \(\varphi(r) \le r - 1\), the value of \(M\) that
we compute is always greater than \(\sqrt{\varphi(r)} \lceil \lg n
\rceil\). Step 4 checks if \((X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\) holds. Therefore, By the strong AKS theorem, if the algorithm
returns “prime”, then \(n\) is prime. Furthermore, by the
weak version of Fermat's little theorem for polynomials, if the
algorithm returns “composite”, then \(n\) is
composite.</p>
<p>Since the algorithm always terminates and it returns the correct
answer when it terminates, it
is <a href="http://en.wikipedia.org/wiki/Total_correctness">totally
correct</a>.</p>
<p>As shown in the previous section, we have to test \(O(\lg^5 n)\)
values to find a suitable \(r\). Assuming a straightforward algorithm
to compute the multiplicative order that bails out once \(\lfloor \lg
n \rfloor^2\) is reached, and assuming we use the
division-based <a href="http://en.wikipedia.org/wiki/Euclidean_algorithm">Euclidean
algorithm</a> for computing the greatest common divisor, testing each
value takes \(O(\lg^2 n)\) multiplies and \(O(\lg r) = O(\lg \lg n)\)
divisions of \(O(\lg r)\)-bit numbers. Let \(M(b)\) be the cost to
multiply two \(b\)-bit numbers. The complexity of division is
asymptotically the same as multiplication, so the total cost of step 1
is \(O(\lg^5 n \cdot (\lg^2 n + \lg \lg n) \cdot M(\lg \lg n)) =
O(\lg^7 n \cdot M(\lg \lg n))\), assuming \(M(O(b)) = O(M(b))\).</p>
<p>Step 2 involves one square root, one multiplication, and one
increment, all involving \(O(\lg \lg n)\)-bit numbers. The complexity
of taking the square root is asymptotically the same as
multiplication, so the total cost of step 2 is \(O(M(\lg \lg n))\).</p>
<p>Step 3 takes a square root and tests \(M = O(\lg^{7/2} n)\)
numbers, and each test involves dividing two \(O(\lg M)\)-bit numbers,
so the total cost of step 3 is \(O(\lg^{7/2} n \cdot M(\lg \lg
n))\).</p>
<p>Steps 4 and 5 test \(O(\lg^{7/2} n)\) polynomials. Testing each
polynomial involves exponentiating it by \(n\), reducing power mod
\(r\) and coefficients mod \(n\) at each step, which requires \(O(\lg
n)\) multiplications of polynomials with \(O(r)\) coefficients each of
size \(O(\lg n)\). The cost of multiplying two polynomials with \(s\)
coefficients of size \(b\) is \(M(s) \cdot M(b)\), so the total cost
of steps 4 and 5 is \(O(\lg^{9/2} n \cdot M(\lg^5 n \cdot \lg \lg
n))\), assuming \(M(a) \cdot M(b) = M(a \cdot b)\).</p>
<p>If <a href="http://en.wikipedia.org/wiki/Multiplication_algorithm#Long_multiplication">long
multiplication</a> is used, then it costs \(M(b) = b^2\), which gives
a total cost of \(O(\lg^{29/2} n \cdot \lg^2 \lg n) = O(\lg^{15} n)\)
for the whole
algorithm. <a href="http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm">More
complicated multiplication methods</a> cost only \(M(b) = b \lg b\),
which gives a total cost of \(O(\lg^{10} n)\) for the whole algorithm.
Either way, the AKS primality test is shown to be implementable in
polynomial time.</p>
<p>Below is step 1 implemented in Javascript; however, here we bound
\(r\) explicitly to be able to detect bugs quickly.<sup><a href="#fn5" id="r5">[5]</a></sup></p>
<pre class="code-container"><code class="language-javascript">// Returns an upper bound for r such that o_r(n) > ceil(lg(n))^2 that
// is polylog in n.
function calculateAKSModulusUpperBound(n) {
n = SNat.cast(n);
var ceilLgN = new SNat(n.ceilLg());
var rUpperBound = ceilLgN.pow(5).max(3);
var nMod8 = n.mod(8);
if (nMod8.eq(3) || nMod8.eq(5)) {
rUpperBound = rUpperBound.min(ceilLgN.pow(2).times(8));
}
return rUpperBound;
}
// Returns the least r such that o_r(n) > ceil(lg(n))^2 >= ceil(lg(n)^2).
function calculateAKSModulus(n, multiplicativeOrderCalculator) {
n = SNat.cast(n);
multiplicativeOrderCalculator =
multiplicativeOrderCalculator || calculateMultiplicativeOrderCRT;
var ceilLgN = new SNat(n.ceilLg());
var ceilLgNSq = ceilLgN.pow(2);
var rLowerBound = ceilLgNSq.plus(2);
var rUpperBound = calculateAKSModulusUpperBound(n);
for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) {
if (n.gcd(r).ne(1)) {
continue;
}
var o = multiplicativeOrderCalculator(n, r);
if (o.gt(ceilLgNSq)) {
return r;
}
}
throw new Error('Could not find AKS modulus');
}</code></pre>
<p>Here is step 2 implemented in Javascript:</p>
<pre class="code-container"><code class="language-javascript">// Returns floor(sqrt(r-1)) * ceil(lg(n)) + 1 > floor(sqrt(Phi(r))) * lg(n).
function calculateAKSUpperBoundSimple(n, r) {
n = SNat.cast(n);
r = SNat.cast(r);
// Use r - 1 instead of calculating Phi(r).
return r.minus(1).floorRoot(2).times(n.ceilLg()).plus(1);
}</code></pre>
<p>Here is part of step 3 implemented in Javascript, along with the
comments for the functions used in trial division:</p>
<pre class="code-container"><code class="language-javascript">// Given a number n, a generator function getNextDivisor, and a
// processing function processPrimeFactor, factors n using the
// divisors returned by genNextDivisor and passes each prime factor
// with its multiplicity to processPrimeFactor.
//
// getNextDivisor is passed the current unfactorized part of n and it
// should return the next divisor to try, or null if there are no more
// divisors to generate (although processPrimeFactor may still be
// called). processPrimeFactor is called with each non-trivial prime
// factor and its multiplicity. If it returns a false value, it won't
// be called anymore.
function trialDivide(n, getNextDivisor, processPrimeFactor) {
...
}
// Returns a generator that generates primes up to 7, then odd numbers
// up to floor(sqrt(n)), using a mod-30 wheel to eliminate odd numbers
// that are known composite (roughly half).
function makeMod30WheelDivisorGenerator() {
...
}
// Returns the first factor of n < m from generator, or null if there
// is no such factor.
function getFirstFactorBelow(n, M, generator) {
n = SNat.cast(n);
M = SNat.cast(M);
generator = generator || makeMod30WheelDivisorGenerator();
var boundedGenerator = function(n) {
var d = generator(n);
return (d && d.lt(M)) ? d : null;
};
var factor = null;
trialDivide(n, boundedGenerator, function(p, k) {
if (p.lt(M.min(n))) {
factor = p;
}
return false;
});
return factor;
}</code></pre>
<p>Below is a function that ties steps 1 to 3 together; it is useful
for testing purposes to separate it from the other steps. (Actually,
we use a different function to compute \(M\) which computes
\(\varphi(r)\) instead of using \(r - 1\) so that we always have the
tightest bound possible for \(M\).)</p>
<pre class="code-container"><code class="language-javascript">// The getAKSParameters* functions below return a parameters object
// with the following fields:
//
// n: the number the parameters are for.
//
// factor: A prime factor of n. If present, the fields below may
// not be present.
//
// isPrime: if set, n is prime. If present, the fields below may
// not be present.
//
// r: the AKS modulus for n.
//
// M: the AKS upper bound for n.
function getAKSParametersSimple(n) {
n = SNat.cast(n);
var r = calculateAKSModulus(n);
var M = calculateAKSUpperBound(n, r);
var parameters = {
n: n,
r: r,
M: M
};
var factor = getFirstFactorBelow(n, M);
if (factor) {
parameters.factor = factor;
} else if (M.gt(n.floorRoot(2))) {
parameters.isPrime = true;
}
return parameters;
}</code></pre>
<p>Finally, here is step 4 implemented in Javascript:</p>
<pre class="code-container"><code class="language-javascript">// Returns whether (X + a)^n = X^n + a mod (X^r - 1, n).
function isAKSWitness(n, r, a) {
n = SNat.cast(n);
r = SNat.cast(r);
a = SNat.cast(a);
function reduceAKS(p) {
return p.modPow(r).mod(n);
}
function prodAKS(x, y) {
return reduceAKS(x.times(y));
};
var one = new SPoly(new SNat(1));
var xn = one.shiftLeft(n.mod(r));
var ap = new SPoly(a);
var lhs = one.shiftLeft(1).plus(ap).pow(n, prodAKS);
var rhs = reduceAKS(one.shiftLeft(n).plus(ap));
return lhs.ne(rhs);
}
// Returns the first a < M that is an AKS witness for n, or null if
// there isn't one.
function getFirstAKSWitness(n, r, M) {
for (var a = new SNat(1); a.lt(M); a = a.plus(1)) {
if (isAKSWitness(n, r, a)) {
return a;
}
}
return null;
}</code></pre>
<p>Here's the code that ties it all together:</p>
<pre class="code-container"><code class="language-javascript">// Returns whether n is prime or not using the AKS primality test.
function isPrimeByAKS(n) {
n = SNat.cast(n);
var parameters = getAKSParameters(n);
if (parameters.factor) {
return false;
}
if (parameters.isPrime) {
return true;
}
return (getFirstAKSWitness(n, parameters.r, parameters.M) == null);
}</code></pre>
<p class="interactive-example" id="aksExample">
Let
<span class="fake-katex"><var>n</var> =
<input class="parameter" size="6" pattern="[0-9]*" required
type="text" value="175507"
data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span>.
<!-- ko template: outputTemplate --><!-- /ko -->
<script type="text/html" id="aks.error.invalidN">
<span class="fake-katex"><var>n</var></span> is not a valid number.
</script>
<script type="text/html" id="aks.error.outOfBoundsN">
<span class="fake-katex"><var>n</var></span>
must be greater than or equal to 2.
</script>
<script type="text/html" id="aks.success">
<span class="fake-katex">⌈lg <var>n</var>⌉</span></span> is
<span class="fake-katex intermediate" data-bind="text: ceilLgN"></span>,
<span class="fake-katex"><var>r</var> =
<span class="intermediate" data-bind="text: r"></span></span>
is the least value such that
<span class="fake-katex">o<sub><var>r</var></sub>(<var>n</var>) =
<span class="intermediate" data-bind="text: nOrder"></span>
> ⌈lg <var>n</var>⌉<sup>2</sup>
= <span class="intermediate" data-bind="text: ceilLgNSq"></span></span>,
<span class="fake-katex"><var>φ</var>(<var>r</var>) =
<span class="intermediate" data-bind="text: eulerPhiR"></span></span>,
and <span class="fake-katex"><var>M</var> =
⌊√<var>φ</var>(<var>r</var>)⌋ ⋅
⌈lg <var>n</var>⌉ + 1 =
<span class="intermediate" data-bind="text: M"></span> >
⌊√<var>φ</var>(<var>r</var>)⌋ ⋅
lg <var>n</var></span>.
<span data-bind="if: factor()">
<span class="fake-katex"><var>n</var></span>
has a factor
<span class="fake-katex"><span class="intermediate"
data-bind="text: factor"></span>
< <var>M</var></span>, so therefore
<span class="fake-katex"><var>n</var></span> is
<span class="result">composite</span>.
</span>
<span data-bind="if: isPrime()">
<span class="fake-katex"><var>n</var></span>
has no factor <span class="fake-katex">< <var>M</var></span>
and <span class="fake-katex"><var>M</var> ≤
⌊√<var>n</var>⌋ =
<span class="intermediate" data-bind="text: floorSqrtN"></span></span>,
so therefore
<span class="fake-katex"><var>n</var></span> is
<span class="result">prime</span>.
</span>
<span data-bind="if: !factor() && !isPrime()">
<span class="fake-katex"><var>n</var></span>
has no factor <span class="fake-katex">< <var>M</var></span>
and <span class="fake-katex"><var>M</var> >
⌊√<var>n</var>⌋ =
<span class="intermediate" data-bind="text: floorSqrtN"></span></span>,
so <span class="fake-katex"><var>n</var></span> is prime iff
<span class="fake-katex">(<var>X</var> +
<var>a</var>)<sup><var>n</var></sup>
≡ <var>X</var><sup><var>n</var></sup> + <var>a</var>
(mod <var>X</var><sup><var>r</var></sup> − 1,
<var>n</var>)</span> for
<span class="fake-katex">0 ≤ <var>a</var>
≤ <var>M</var></span>.
</span>
</script>
</p>
<script type="text/javascript" src="/primality-testing-polynomial-time-part-2-files/aks-example.js"></script>
<p><em>(To-do: Have an interactive box to demonstrate how the
per-\(a\) AKS test works.)</em></p>
</section>
<section>
<header>
<h2>8. The AKS algorithm (improved version)</h2>
</header>
<p>Here is a slightly more complicated version of the AKS algorithm.
Again given \(n \ge 2\):</p>
<ol>
<li>Search for a prime factor of \(n\) less than \(\lceil \lg n
\rceil^2 + 2\). If one is found, return “composite”.</li>
<li>For each \(r\) from \(\lceil \lg n \rceil^2 + 2\):
<ol>
<li>If \(r \gt \lfloor \sqrt{n} \rfloor\), return
“prime”.</li>
<li>If \(r\) divides \(n\), return “composite”.</li>
<li>Otherwise, factorize \(r\).</li>
<li>Compute \(o_r(n)\) using \(r\)'s prime factors. If it is less
than or equal to \(\lceil \lg n \rceil^2\), jump back to the top of
the loop with the next \(r\).</li>
<li>Otherwise, compute \(\varphi(r)\) using \(r\)'s prime factors.</li>
<li>Compute \(M = \lfloor \sqrt{\varphi(r)} \rfloor \lceil \lg n
\rceil + 1\), and break out of the loop.</li>
</ol>
</li>
<li>For each \(1 \le a \lt M\), compute \((X + a)^n\), reducing
coefficients mod \(n\) and powers mod \(r\). If the result is not
equal to \(X^{n\text{ mod }r} + a\), return
“composite”.</li>
<li>Otherwise, return “prime”.</li>
</ol>
<p>The logic of steps 1 to 3 of the simple version is essentially
merged together to form steps 1 and 2 of this version; since each
\(r\) has to be checked for co-primality with \(n\), that effectively
also checks if \(r\) is a prime factor of \(n\), so we only have to
check for prime factors of \(n\) up to the lower bound of \(r\).
Furthermore, both the multiplicative order as well as the totient
function can be computed more quickly given a complete prime
factorization, so we can compute that for each \(r\). Third, we use
\(\varphi(r)\) instead of \(r - 1\) to give a tighter bound for \(M\).
Finally, the last two steps are the same as in the simple version.</p>
<p>Here are steps 1 and 2 of the above algorithm, implemented in
Javascript:</p>
<pre class="code-container"><code class="langauge-javascript">function getAKSParameters(n, factorizer) {
n = SNat.cast(n);
factorizer = factorizer || defaultFactorizer;
var ceilLgN = new SNat(n.ceilLg());
var ceilLgNSq = ceilLgN.pow(2);
var floorSqrtN = n.floorRoot(2);
var rLowerBound = ceilLgNSq.plus(2);
var rUpperBound = calculateAKSModulusUpperBound(n).min(floorSqrtN);
var parameters = {
n: n
};
var factor = getFirstFactorBelow(n, rLowerBound);
if (factor) {
parameters.factor = factor;
return parameters;
}
for (var r = rLowerBound; r.le(rUpperBound); r = r.plus(1)) {
if (n.mod(r).isZero()) {
parameters.factor = d;
return parameters;
}
var rFactors = getFactors(r, factorizer);
var o = calculateMultiplicativeOrderCRTFactors(n, rFactors, factorizer);
if (o.gt(ceilLgNSq)) {
parameters.r = r;
parameters.M = calculateAKSUpperBoundFactors(n, rFactors);
return parameters;
}
}
if (rUpperBound.eq(floorSqrtN)) {
parameters.isPrime = true;
return parameters;
}
throw new Error('Could not find AKS modulus');
}</code></pre>
</section>
<p><em>(To-do: Wrap up and lead into what will be shown in part
3.)</em></p>
<section class="footnotes">
<p id="fn1">[1] This is a version of Theorem 2 from Lenstra's
paper <a href="http://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1979e/art.pdf">Miller's
Primality Test</a>.
<a href="#r1">↩</a></p>
<p id="fn2">[2] We work with \(\lceil \lg n \rceil^2\) instead of
\(\lceil \lg^2 n \rceil\) or \(\lg^2 n\) as it's easier to work
with in an actual implementation.
<a href="#r2">↩</a></p>
<p id="fn3">[3] This is exercise 1.27
from <a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime
Numbers: A Computational Perspective</a>.
<a href="#r3">↩</a></p>
<p id="fn4">[4] This is an adapted from section 8.4 of Granville's <a href="http://www.dms.umontreal.ca/~andrew/PDF/Bulletin04.pdf">It
is Easy to Determine Whether a Given Number is Prime</a>.
<a href="#r4">↩</a></p>
<p id="fn5">[5] The <a href="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"><code>SNat</code></a>
class used is the same as in my previous
article, <a href="intro-primality-testing">An Introduction to
Primality Testing</a>.
<a href="#r5">↩</a></p>
</section>
https://www.akalin.com/primality-testing-polynomial-time-part-1
Primality Testing in Polynomial Time (Ⅰ)
2012-08-06T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: Use \not\equiv instead of \neq when it is supported by KaTeX. -->
<section>
<header>
<h2>1. Introduction</h2>
</header>
<p>Exactly ten years
ago, <a href="http://www.cse.iitk.ac.in/users/manindra/">Agrawal</a>,
<a href="http://research.microsoft.com/en-us/people/neeraka/">Kayal</a>,
and <a href="http://www.math.uni-bonn.de/people/saxena/">Saxena</a>
published <a href="http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf">“PRIMES
is in P”</a>, which described an algorithm that could provably
determine whether a given number was prime or composite in polynomial
time.</p>
<p>The AKS algorithm is quite short, but understanding how it works
via the proofs in the paper requires some mathematical sophistication.
Also, some results in the last decade have simplified both the
algorithm and its accompanying proofs. In this article I will explain
in detail the main result of the AKS paper, and in a follow-up article
I will strengthen the main result, use it to get a polynomial-time
primality testing algorithm, and implement that algorithm in
Javascript. If you've
understood <a href="/intro-primality-testing">my introduction to
primality testing</a>, you should be able to follow along.</p>
<p>Let's get started! The basis for the AKS primality test is the
following generalization
of <a href="http://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat's
little theorem</a> to polynomials:</p>
<p class="theorem">
(<span class="theorem-name">Fermat's little theorem for polynomials,
strong version</span>.) If \(n \ge 2\) and \(a\) is relatively prime
to \(n\), then \(n\) is prime if and only if
\[
(X + a)^n \equiv X^n + a \pmod{n}\text{.}
\]</p>
<p>The form of the equation above may be unfamiliar. The polynomials
in question
are <a href="http://en.wikipedia.org/wiki/Polynomial_ring#The_polynomial_ring_K.5BX.5D"><em>formal
polynomials</em></a>. That is, we care only about the coefficients of
the polynomial and not how it behaves as a function. In this case, we
restrict ourselves to polynomials with integer coefficients. Then we
can meaningfully compare two polynomials modulo \(n\): we consider two
polynomials congruent modulo \(n\) if their respective coefficients
are all congruent modulo \(n\). (Equivalently, two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\) if \(f(X) - g(X) = n
\cdot h(X)\) for some polynomial \(h(X)\).) This definition is
consistent with how they behave as functions; if two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\), then treating them
as functions, \(f(x)\ \equiv g(x) \pmod{n}\) for any integer
\(x\).<sup><a href="#fn1" id="r1">[1]</a></sup></p>
<p>Unfortunately, this test by itself cannot give a polynomial-time
algorithm as testing even one value of \(a\) may require looking at
\(n\) coefficients of the left-hand side. (Remember that we're
interested in algorithms with time polynomial not in the input \(n\),
but in its bit length \(\lg n\). Such an algorithm is described as
having time <em>polylog in \(n\)</em>.) However, we can reduce the
number of coefficients we have to look at by taking the powers of
\(X\) modulo some number \(r\). This is equivalent to taking the
modulo of the polynomials themselves by \(X^r - 1\); you can see this
for yourself by picking some polynomial and some value for \(r\) and
doing long division by \(X^r - 1\) to find the remainder. (It may
seem weird to talk about taking the modulo of one polynomial with
another, but it's entirely analogous to integers.) This gives us a
weaker version of the theorem above:
<p class="theorem">
(<span class="theorem-name">Fermat's little theorem for polynomials,
weak version</span>.) If \(n\) is prime and \(a\) is not a multiple
of \(n\), then for any \(r \ge 2\)
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]</p>
<p>The “double mod” notation above may be unfamiliar, but
in this case its meaning is simple. We consider two polynomials
congruent modulo \(X^r - 1, n\) when they are congruent modulo \(n\)
after you reduce the powers of \(X\) modulo \(r\) and combine like
terms. More generally, two polynomials \(f(X)\) and \(g(X)\) are
congruent modulo \(n(X), n\) if \(f(X) - g(X) \equiv n(X) \cdot h(X)
\pmod{n}\) for some polynomial \(h(X)\).</p>
<!-- TODO(akalin): Put interactive applet for the condition here. -->
<p>With this theorem, we only have to compare \(r\) coefficients, but
we introduce the possibility of the condition above being met even
when \(n\) is composite. But can we impose conditions on \(r\) and
\(a\) such that if the condition holds for a polynomial number of
pairs of \(r\) and \(a\), we can be sure that \(n\) is prime? The
answer is “yes”; in particular, we can find a single \(r\)
and an upper bound \(M\) polylog in \(n\) such that if the condition
holds for \(r\) and \(0 \le a \lt M\), then \(n\) is prime.</p>
<p>In the remainder of this article, we'll work backwards. That is,
we'll first assume we have some \(n \ge 2\), \(r \ge 2\), and \(M \ge
1\) such that for all \(0 \le a \lt M\)
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]
Then we'll assume that \(n\) is not a power of one of its prime
divisors \(p\) and try to deduce the conditions that imposes on \(n\),
\(r\), \(M\), and \(p\). Then we can take the contrapositive to find
the inverse conditions on \(n\), \(r\), \(M\), and \(p\) that would
then force \(n\) to be a power of \(p\). Since we can easily test
whether \(n\) is
a <a href="http://en.wikipedia.org/wiki/Perfect_power">perfect
power</a>, if it's not one, we can immediately conclude that \(n =
p^1\) and thus prime. (Of course, if it does turn out to be a perfect
power, then it is trivially composite.)</p>
<p>To understand the conditions that we will derive, we must first
talk about <em>introspective numbers</em>.
</section>
<section>
<header>
<h2>2. Introspective numbers</h2>
</header>
<p>Given a base \(b\), a polynomial \(g(X)\) and a number \(q\), we
call \(q\) <em>introspective</em><sup><a href="#fn2" id="r2">[2]</a></sup> for \(g(X)\) modulo \(b\) if
\[
g(X)^q = g(X^q) \pmod{b}\text{.}
\]</p>
<p>We also say that \(g(X)\) is <em>introspective</em> under \(q\)
modulo \(b\).</p>
<p>A basic property of introspective numbers and polynomials is that
they are closed under multiplication. That is, if \(q_1\) and \(q_2\)
are introspective for \(g(X)\) modulo \(b\), then \(q_1 \cdot q_2\) is
also introspective for \(g(X)\) modulo \(b\), and if \(g_1(X)\) and
\(g_2(X)\) are introspective under \(q\) modulo \(b\), then \(g_1(X)
\cdot g_2(X)\) is also introspective under \(q\) modulo \(b\).</p>
<p>In particular, given our assumptions above, we can easily see that
\(1\), \(p\), and \(n\) are introspective for \(X + a\) modulo \(p\)
for any \(0 \le a \lt M\). We can also show that \(n/p\) is also
introspective for \(X + a\) modulo \(p\). Using closure under
multiplication, we can talk about the set of numbers generated by
\(p\) and \(n/p\), which are all introspective for \(X + a\) modulo
\(p\). Call this set \(I\):</p>
\[
I = \left\{ p^i \left( n/p \right)^j \mid i, j \ge 0 \right\}\text{.}
\]
<p>We can also take the closure of all \(X + a\) to get a set of
polynomials which are all introspective under \(p\), \(n/p\), or any
number in \(I\). Call this set \(P\):
\[
P = \left\{ 0 \right\} \cup
\left\{ X^{e_0} \cdot (X + 1)^{e_1} \cdot \ldots \cdot (X + M -
1)^{e_{M - 1}} \mid e_0, e_1, \ldots, e_{M - 1} \ge 0 \right\}\text{.}
\]
To summarize, \(I\) is a set of numbers and \(P\) is a set of
polynomials such that for any \(i \in I\) and \(g(X) \in P\), \(i\) is
introspective for \(g(X)\) modulo \(p\). Of course, it's still not
clear what these two sets have to do with whether \(n\) is prime or
not. But we will examine certain finite sets related to \(I\) and
\(P\) and their sizes, and we will see that we can deduce their
properties depending on the relation of \(n\) to \(p\).</p>
</section>
<section>
<header>
<h2>3. Bounds on finite sets related to \(I\) and \(P\)</h2>
</header>
<p>Now we're ready to work towards finding our restrictions on \(n\),
\(r\), \(M\), and \(p\). We'll slowly build them up such that when
the last one falls into place, we know that \(n\) is a perfect power
of \(p\). Here's what we're starting with:</p>
<div class="insert">
\(n \ge 2\), <br/>
\(r \ge 2\), <br/>
\(M \ge 1\), <br/>
\(p\) is a prime divisor of \(n\).
</div>
<p>Let us restrict \(I\) to a finite set by bounding the exponents of
\(p\) and \(n/p\):
\[
I_k = \left\{ p^i (n/p)^j \mid 0 \le i, j \lt k \right\} \subset I\text{.}
\]</p>
<p>Notice that if \(n\) is not a power of \(p\), then all members of
\(I_k\) are distinct, and therefore we can easily calculate its
size:<sup><a href="#fn3" id="r3">[3]</a></sup>
\[
|I_k| = k^2\text{.}
\]</p>
<p>Let's also restrict \(P\) to a finite set by bounding the degrees
of its polynomials:
\[
P_d = \left\{ g \in P \mid \deg(g) \lt d \right\} \subset P\text{.}
\]</p>
<p>We can calculate \(|P_d|\) exactly,<sup><a href="#fn4" id="r4">[4]</a></sup> but
we only need a lower bound for when \(d \le M\). Consider \(P_d^{\{0,
1\}}\), the subset of \(P_d\) where each \(X + a\) is present at most
once. Since each \(X + a\) is either present or not present, but not
all of them can be present at the same time, there are \(2^d - 1\)
distinct polynomials in \(P_d^{\{0, 1\}}\). Adding back the zero
polynomial yields \(|P_d^{\{0, 1\}}| = 2^d\). Since \(P_d^{\{0,
1\}}\) is a subset of \(P_d\), \(|P_d| \ge |P_d^{\{0, 1\}}| = 2^d\).
Therefore, if \(d \le M\), then<sup><a href="#fn5" id="r5">[5]</a></sup>
\[ |P_d| \ge 2^d\text{.} \]
This will be useful later (for a particular value of \(d\)), so let's
add the restriction to \(M\):
</p>
<div class="insert">
\(n \ge 2\), <br/>
\(r \ge 2\), <br/>
<em>\(M \ge d\)</em>, <br/>
\(p\) is a prime divisor of \(n\).
</div>
<p>Let us restrict \(I\) in a different way, by reducing modulo \(r\):
\[
J = \left\{ x \bmod r \mid x \in I \right\}
\]
and let \(t = |J|\). (This size will play an important role
later.)</p>
<p>Our final set that we're interested in needs some background to
define. We want to find a subset of \(P\) that lies in some field
\(F\) because fields have some convenient properties that we will use
later.<sup><a href="#fn6" id="r6">[6]</a></sup></p>
<p>Consider \(\mathbb{Z}/p\mathbb{Z}\), the ring
of <a href="http://en.wikipedia.org/wiki/Integers_modulo_n#Integers_modulo_n">integers
modulo \(p\)</a>. Since \(p\) is prime, it is also a field. In
particular, it is
the <a href="http://en.wikipedia.org/wiki/Finite_field">finite
field</a> \(\mathbb{F}_p\) of order \(p\). Then consider
\(\mathbb{F}_p[X]\),
its <a href="http://en.wikipedia.org/wiki/Polynomial_ring">polynomial
ring</a>, which is the set of polynomials with coefficients in
\(\mathbb{F}_p\). Given some polynomial \(q(X) \in \mathbb{F}_p[X]\),
we can further reduce modulo \(q(X)\) to get \(\mathbb{F}_p[X] /
q(X)\). Finally, if \(q(X)\) is
<a href="http://en.wikipedia.org/wiki/Irreducible_polynomial">irreducible</a>
over \(\mathbb{F}_p\), then \(\mathbb{F}_p[X] / q(X)\) is also a
field.</p>
<p>(We can show that both \(\mathbb{F}_p = \mathbb{Z}/p\mathbb{Z}\)
and \(\mathbb{F}_p[X] / q(X)\) are fields from the same general
theorem of rings: if \(R\) is
a <a href="http://en.wikipedia.org/wiki/Principal_ideal_domain">principal
ideal domain</a> and \((c)\) is
the <a href="http://en.wikipedia.org/wiki/Two-sided_ideal#Ideal_generated_by_a_set">two-sided
ideal generated by \(c\)</a>, then
the <a href="http://en.wikipedia.org/wiki/Quotient_ring">quotient
ring</a> \(R / (c)\) is a field if and only if \(c\) is
a <a href="http://en.wikipedia.org/wiki/Prime_element">prime
element</a> of \(R\).)<sup><a href="#fn7" id="r7">[7]</a></sup></p>
<p>So we just need to find a polynomial that's irreducible over
\(\mathbb{F}_p\). We know that \(X^r - 1\) has \(\Phi_r(X)\), the
\(r\)th <a href="http://en.wikipedia.org/wiki/Cyclotomic_polynomial">cyclotomic
polynomial</a>, as a factor. \(\Phi_r(X)\) is irreducible over
\(\mathbb{Z}\), but not necessarily over \(\mathbb{F}_p\). But if
\(r\) is relatively prime to \(p\), then \(\Phi_r(X)\) factors into
irreducible polynomials all of degree \(o_r(p)\)
(the <a href="http://en.wikipedia.org/wiki/Multiplicative_order">multiplicative
order</a> of \(p\) modulo \(r\)) over \(\mathbb{F}_p\).<sup><a href="#fn8" id="r8">[8]</a></sup> Then we can
just require that \(r\) be relatively prime to \(p\). If we do so,
then we can let \(h(X)\) be one of the factors of \(\Phi_r(X)\) over
\(\mathbb{F}_p\) and we have our field \(F = \mathbb{F}_p[X] /
h(X)\).</p>
<div class="insert">
\(n \ge 2\), <br/>
\(r \ge 2\), <em>\(r\) relatively prime to \(p\)</em>,<br/>
\(M \ge d\), <br/>
\(p\) is a prime divisor of \(n\).
</div>
<p>Finally, we can define our last set. Let
\[
Q = \left\{ f(X) \bmod (h(X), p) \mid f(X) \in P \right\} \subseteq F\text{.}
\]</p>
<p>We can map elements of \(P\) into \(Q\) via reduction modulo
\((h(X), p)\). But we're interested in only the elements of \(P\)
that map to distinct elements of \(Q\), since that will let us find a
lower bound for \(|Q|\). A simple example would be the set of \(X +
a\) for \(0 \le a \lt M\); if the degree of \(h(X)\) is greater than
\(1\) and \(p \ge M\), then each \(X + a\) is distinct in \(Q\).</p>
<p>Another interesting set is \(X^k\) for \(1 \le k \le r\). Since
\(h(X) \equiv 0 \pmod{h(X}, p)\), we can say that \(X\) is a root of
the polynomial function \(h(y)\) over the field \(F\). But since
\(h(y)\) is a factor of \(\Phi_r(y)\), \(X\) is then a primitive
\(r\)th root of unity in \(Q\).<sup><a href="#fn9" id="r9">[9]</a></sup> But the powers of a primitive \(r\)th
root of unity (from \(1\) to \(r\)) are all distinct. Therefore all
\(X^k\) for \(1 \le k \le r\) are distinct in \(Q\).</p>
<p>Most importantly, we can show that distinct elements in \(P_d\) map
to distinct elements in \(Q\) if \(d \le t\). Let \(f(X)\) and
\(g(X)\) be two different elements of \(P_d\). Assume that \(f(X)
\equiv g(X) \pmod{h(x}, p)\). Then, for \(m \in I\):
\[
f(X^m) \equiv f(X)^m \pmod{X^r - 1, p}
\]
and
\[
g(X^m) \equiv g(X)^m \pmod{X^r - 1, p}
\]
by introspection modulo \(p\), and therefore
\[
f(X^m) \equiv g(X^m) \pmod{X^r - 1, p}
\]
which immediately leads to
\[
f(X^m) \equiv g(X^m) \pmod{h(X}, p)\text{.}
\]
Therefore, all \(X^m\) for \(m \in I\) are roots of the polynomial
function \(u(y) = f(y) - g(y)\) over the field \(F\), and in
particular all \(X^m\) for \(m \in J\). But all \(t\) such \(X^m\)
are distinct in \(Q\) by the argument above. Therefore, \(u(y)\) must
have degree at least \(t\) since a polynomial over a field cannot have
more roots than its degree. But the degree of \(u(y)\) is less than
\(d\) since both \(f(y)\) and \(g(y)\) have degree less than \(d\).
Since \(d \le t\), this is a contradiction, so therefore \(f(X)
\neq g(X) \pmod{h(x}, p)\). But since \(f(X)\) and \(g(X)\)
were arbitrary, that implies that distinct elements of \(P_d\) map to
distinct elements of \(Q\) for \(d \le t\).</p>
<p>Given the above, we can conclude that as long as we require that
\(d \le t\), \(p \ge M\), and \(o_r(p) = \deg(h(X)) \gt 1\), then
\[
|Q| \ge |P_d| \ge 2^d\text{.}
\]</p>
<div class="insert">
\(n \ge 2\), <br/>
<em>\(o_r(p) \gt 1\)</em>,<br/>
\(M \ge d\), <br/>
<em>\(t \ge d\)</em>,<br/>
<em>\(p \ge M\)</em>, \(p\) is a prime divisor of \(n\).
</div>
</section>
<section>
<header>
<h2>4. The AKS theorem (weak version)</h2>
</header>
<p>We're finally ready to put it all together. Again assume \(n\) is
not a power of \(p\), and recall that \(|J| = t\). Let \(s \gt
\sqrt{t}\). Then \(|I_s| = s^2 \gt t\). By
the <a href="http://en.wikipedia.org/wiki/Pigeonhole_principle">pigeonhole
principle</a>, there must be two elements \(m_1, m_2 \in I_s\) that
map to the same element in \(J\); that is, there must be \(m_1, m_2
\in I_s\) such that \(m_1 \equiv m_2 \pmod{r}\). Now pick some
\(g(X)\) from \(P\). Then
\[
g(X)^{m_1} \equiv g(X^{m_1}) \pmod{X^r - 1, p}
\]
and
\[
g(X)^{m_2} \equiv g(X^{m_2}) \pmod{X^r - 1, p}
\]
by introspection modulo \(p\). But \(X^{m_1} \equiv X^{m_2} \pmod{X^r - 1}\) since \(m_1 \equiv m_2 \pmod{r}\), so
\[
g(X^{m_1}) \equiv g(X^{m_2}) \pmod{X^r - 1, p}\text{.}
\]
Chaining all these congruences together lets us deduce that
\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{X^r - 1, p}\text{,}
\]
which immediately leads to
\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{h(X}, p)\text{.}
\]
</p>
<p>That means that \(g(X) \bmod (h(X), p) \in Q\) is a root of the
polynomial function \(u(y) = y^{m_1} - y^{m_2}\) over the field \(F\).
But \(g(X)\) was picked arbitrarily from \(P\), so \(u(y)\) has at
least \(|Q|\) roots. \(\deg(u(y)) = \max(m_1, m_2) \le p^{s-1} \cdot
(n/p)^{s-1} = n^{s-1}\), and \(u(y)\), being a polynomial over a
field, cannot have more roots than its degree, so if \(n\) is not a
power of \(p\), then \(|Q| \le n^{s-1}\). Equivalently, if \(|Q| \gt
n^{s-1}\), then \(n\) must be a power of \(p\).<sup><a href="#fn10" id="r10">[10]</a></sup> But
we've shown above that \(|Q| \ge 2^d\) for \(d \le t\), so if we can
pick \(d\) and \(s\) such that \(2^d \gt n^{s-1}\), then we can force
\(n\) to be a power of \(p\). Taking logs, we see that this is
equivalent to picking \(d\) and \(s\) such that \(d \gt (s - 1) \lg
n\). Since \(d \le t\), this imposes \(t \gt (s - 1) \lg n\) in order
for there to be room to pick \(d\). Rearranging, we get \(s \lt
\frac{t}{\lg n} + 1\). But \(s \gt \sqrt{t}\), so this imposes
\(\sqrt{t} \lt \frac{t}{\lg n} + 1\) in order for there to be room to
pick \(s\). Rearranging again, we get \(\frac{t}{\sqrt{t} - 1} \gt
\lg n\). Since \(\frac{t}{\sqrt{t} - 1} \gt \sqrt{t}\), it suffices
to require that \(t \gt \lg^2 n\) in order for there to be room to
pick \(d\) and \(s\). Furthermore, since \(s\) has to be an integer,
then \(s \ge \lfloor \sqrt{t} \rfloor + 1\), and therefore \(d \gt
\lfloor \sqrt{t} \rfloor \lg n\). Let's update our assumptions:</p>
<div class="insert">
\(n \ge 2\), <br/>
\(o_r(p) \gt 1\)<br/>
<em>\(M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n\)</em>,<br/>
<em>\(t \gt \lg^2 n\)</em>,<br/>
\(p \ge M\), \(p\) is a prime divisor of \(n\).
</div>
<p>So to summarize, if we make the above assumptions, we can pick
\(d\) and \(s\) such that \(|Q| \ge 2^d \gt n^{s - 1}\), which implies
that \(n\) must be a power of \(p\), which was our goal. Now we just
have to express all assumptions in terms of \(n\), \(r\), and \(M\),
strengthening them if necessary. \(J\) is generated by \(p\) and
\(n/p\), so its order (i.e., \(t\)) is at least \(o_r(p)\), which is
in turn at least \(o_r(n)\), since \(p\) is a prime factor of \(n\)
(this brings along the assumption that \(r\) and \(n\) are relatively
prime). Therefore, we can replace the assumptions \(t \gt \lg^2 n\)
and \(o_r(p) \gt 1\) with \(o_r(n) \gt \lg^2 n\). We can remove the
reference to \(d\) by finding the maximum value of \(t\). Since \(r\)
is relatively prime to \(n\), \(J\) is a subgroup of \(Z_r\), and
therefore its order divides (and therefore is at most) \(\varphi(r)\).
So we can replace \(M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n\) with
\(M \gt \lfloor \sqrt{\varphi(r)} \rfloor \lg n\). Finally, we can
remove the reference to \(p\) by mandating that \(n\) has no prime
factor less than \(M\). Here are our final assumptions:</p>
<div class="insert">
\(n \ge 2\), <em>\(n\) has no prime factors less than \(M\)</em>,<br/>
<em>\(o_r(n) \gt \lg^2 n\)</em>,<br/>
<em>\(M \gt \lfloor \sqrt{\varphi(r)} \rfloor \lg n\)</em>.<br/>
</div>
<p>We can summarize the above discussion in the following theorem:</p>
<p class="theorem">
(<span class="theorem-name">AKS theorem, weak version</span>.) Let
\(n \ge 2\), \(r\) be relatively prime to \(n\) with \(o_r(n) \gt
\lg^2 n\), and \(M \gt \lfloor \sqrt{\varphi(r)} \rfloor \lg n\).
Furthermore, let \(n\) have no prime factor less than \(M\) and let
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}
\]
for \(0 \le a \lt M\). Then \(n\) is the power of some prime \(p \ge
M\).</p>
<p>And that's it for now! In the follow-up article we will strengthen
this theorem to further show that \(n\) is equal to \(p\), and
therefore prime. Then we will use this result to get a
primality-testing algorithm that we will prove to be polynomial
time.</p>
</section>
<section class="footnotes">
<p id="fn1">[1] We use uppercase letters for variables when we treat
polynomials as formal polynomials and lowercase letters when we
treat them as functions. <a href="#r1">↩</a></p>
<p id="fn2">[2] The term “introspection”, which comes
from the original AKS paper, was probably chosen to invoke the idea
that the exponent \(q\) can be pushed into and pulled out of \(g(X)\).
Here we generalize it a bit. <a href="#r2">↩</a></p>
<p id="fn3">[3] This condition is too weak to be useful by itself,
but we will parlay it into something we can use later.
<a href="#r3">↩</a></p>
<p id="fn4">[4] Using the ideas
on <a href="http://www.johndcook.com/TwelvefoldWay.pdf">this page</a>,
we can show that \(|P_d| = {M + d \choose d - 1} + 1\) by
considering each \(X + a\) a labeled urn (plus a
“dummy” urn) and each unit of power an unlabeled
ball. (This was used in the AKS paper.)
<a href="#r4">↩</a></p>
<p id="fn5">[5] This lower bound, as well as other ideas that simplify the
proof, was taken
from <a href="http://www.amazon.com/Prime-Numbers-A-Computational-Perspective/dp/0387252827">Prime
Numbers: A Computational Perspective</a>.
<a href="#r5">↩</a></p>
<p id="fn6">[6] You may first want to brush up on the definitions
of <a href="http://en.wikipedia.org/wiki/Group_(mathematics)">group</a>,
<a href="http://en.wikipedia.org/wiki/Ring_(mathematics)">ring</a>,
and <a href="http://en.wikipedia.org/wiki/Field_(mathematics)">field</a>,
and the differences between them.
<a href="#r6">↩</a></p>
<p id="fn7">[7] This is Theorem 1.47(iv) from
“<a href="http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948">Introduction
to finite fields and their applications</a>”.
<a href="#r7">↩</a></p>
<p id="fn8">[8] The reducibility of \(\Phi_r(X)\) over
\(\mathbb{F}_p\) given \(r\) relatively prime to \(p\) is Theorem
2.47(ii) from
“<a href="http://www.amazon.com/Introduction-Finite-Fields-their-Applications/dp/0521460948">Introduction
to finite fields and their applications</a>”.
<a href="#r8">↩</a></p>
<p id="fn9">[9] It's a bit weird to talk about a polynomial being
the root of other polynomials, but recall that we can form a
polynomial ring over any field, even a field of polynomials. We
keep track of which polynomials belong to which domains by using
different variables.
<a href="#r9">↩</a></p>
<p id="fn10">[10] Here's where we force \(n\) to be a prime power.
<a href="#r10">↩</a></p>
</section>
https://www.akalin.com/intro-primality-testing
An Introduction to Primality Testing
2012-07-08T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: Use \not\equiv instead of \neq when it is supported by KaTeX. -->
<script type="text/javascript"
src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"></script>
<script type="text/javascript" src="https://cdn.rawgit.com/akalin/num.js/eab08d4/primality-testing.js"></script>
<p>I will explain two commonly-used primality tests: Fermat and
Miller-Rabin. Along the way, I will cover the basic concepts of
primality testing. I won't be assuming any background in number
theory, but familiarity
with <a href="http://en.wikipedia.org/wiki/Modular_arithmetic">modular
arithmetic</a> will be helpful. I will also be providing
implementations in Javascript,
so <a href="https://developer.mozilla.org/en/JavaScript">familiarity
with it</a> will also be helpful. Finally, since Javascript doesn't
natively support arbitrary-precision arithmetic, I wrote a simple
natural number class
(<a href="https://cdn.rawgit.com/akalin/num.js/eab08d4/simple-arith.js"><code>SNat</code></a>) that
represents a number as an array of decimal digits. All algorithms
used are the simplest possible, except when a more efficient one is
needed by the algorithms we discuss.</p>
<p>Primality testing is the problem of determining whether a given
natural number is prime or composite. Compared to the problem of
<a href="http://en.wikipedia.org/wiki/Integer_factorization">integer
factorization</a>, which is to determine the prime factors of a given
natural number, primality testing turns out to be easier; integer
factorization is
in <a href="http://en.wikipedia.org/wiki/NP_(complexity)">NP</a> and
thought to be
outside <a href="http://en.wikipedia.org/wiki/P_(complexity)">P</a>
and <a href="http://en.wikipedia.org/wiki/NP-complete">NP-complete</a>,
whereas primality testing
is <a href="http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf">now
known to be in P</a>.</p>
<p>Most primality tests are actually compositeness tests; they involve
finding <em>composite witnesses</em>, which are numbers that, along
with a given number to be tested, can be fed to some easily-computable
function to prove that the given number is composite. (The composite
witness, along with the function, is a <em>certificate of
compositeness</em> of the given number.) A primality test can either
check each possible witness or, like the Fermat and Miller-Rabin
tests, it can randomly sample some number of possible witnesses and
call the number prime if none turn out to be witnesses. In the latter
case, there is a chance that a composite number can erroneously be
called prime; ideally, this chance goes to zero quickly as the sample
size increases.</p>
<p>The simplest possible witness type is, of course, a factor of the
given number, which we'll call a <em>factor witness</em>. If the
number to be tested is \(n\) and the possible factor witness is \(a\),
then one can simply test whether \(a\) divides \(n\) (written as \(a
\mid n\)) by evaluating \(n \bmod a = 0\); that is, whether the
remainder of \(n\) divided by \(a\) is zero. This doesn't yield a
feasible deterministic primality test, though, since checking all
possible witnesses is equivalent to factoring the given number. Nor
does it yield a feasible probabilistic primality test, since in the
worst case the given number has very few factors, which random
sampling would miss.</p>
<p>The simplest useful witness type is a <em>Fermat witness</em>,
which relies on the following theorem of Fermat:<p>
<p class="theorem">
(<span class="theorem-name">Fermat's little theorem</span>.) If \(n\)
is prime and \(a\) is not a multiple of \(n\), then
\[
a^{n-1} \equiv 1 \pmod{n}\text{.}
\]
</p>
<p>Thus, a Fermat witness is a number \(1 \lt a \lt n\) such that
\(a^{n-1} \neq 1 \pmod{n}\). Conversely, if \(n\) is composite
and \(a^{n-1} \equiv 1 \pmod{n}\), then \(a\) is a <em>Fermat
liar</em>.</p>
<p class="interactive-example" id="fermatExample">
Let
<span class="fake-katex"><var>n</var> =
<input class="parameter" size="6" pattern="[0-9]*" required
type="text" value="355207"
data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span>
and
<span class="fake-katex"><var>a</var> =
<input class="parameter" size="6" pattern="[0-9]*" required
type="text" value="2"
data-bind="value: aStr, valueUpdate: 'afterkeydown'" /></span>.
<!-- ko template: outputTemplate --><!-- /ko -->
<script type="text/html" id="fermat.error.invalidN">
<span class="fake-katex"><var>n</var></span> is not a valid number.
</script>
<script type="text/html" id="fermat.error.invalidA">
<span class="fake-katex"><var>a</var></span> is not a valid number.
</script>
<script type="text/html" id="fermat.error.outOfBoundsN">
<span class="fake-katex"><var>n</var></span> must be greater than
<span class="fake-katex">2</span>.
</script>
<script type="text/html" id="fermat.error.outOfBoundsA">
<span class="fake-katex"><var>a</var></span> must be greater than
<span class="fake-katex">1</span> and less than
<span class="fake-katex"><var>n</var></span>.
</script>
<script type="text/html" id="fermat.success">
Then
<span class="fake-katex"><var>a</var><sup><var>n</var>−1</sup>
≡
<span class="intermediate" data-bind="text: r"></span>
<span data-bind="if: r() && r().ne(1)">≢ 1</span>
(mod <var>n</var>)</span> so therefore
<span class="fake-katex"><var>n</var></span> is
<span data-bind="if: isCompositeByFermat()">
<span class="result">composite</span>.
<span data-bind="if: r() && r().isZero()">
Furthermore,
<span class="fake-katex">gcd(<var>a</var>, <var>n</var>) =
<span class="intermediate" data-bind="text: k"></span></span>
is a non-trivial factor of
<span class="fake-katex"><var>n</var></span>.
</span>
</span>
<span data-bind="ifnot: isCompositeByFermat()">
either <span class="result">prime</span> or a
<span class="result">Fermat pseudoprime base
<span class="fake-katex"><var>a</var></span></span>.
</span>
</script>
</p>
<script type="text/javascript" src="/intro-primality-testing-files/fermat-example.js"></script>
<p>If \(n\) has at least one Fermat witness that is relatively prime,
then we can show that at least half of all possible witnesses are
Fermat witnesses. (Roughly, if \(a\) is the Fermat witness and \(a_1,
a_2, \ldots, a_s\) are Fermat liars, then all \(a \cdot a_i\) are also
Fermat witnesses.) Therefore, for a sample of \(k\) possible
witnesses of \(n\), the probability of all of them being Fermat liars
is \(\le 2^{-k}\), which goes to zero quickly enough to be
practical.</p>
<p>However, there is the possibility that \(n\) is a composite number
with no relatively prime Fermat witnesses. These are
called <a href="http://en.wikipedia.org/wiki/Carmichael_numbers"><em>Carmichael
numbers</em></a>. Even though Carmichael numbers are rare, their
existence still makes the Fermat primality test unsuitable for some
situations, as when the numbers to be tested are provided by some
adversary.</p>
<p>Here is the Fermat compositeness test implemented in
Javascript:</p>
<pre class="code-container"><code class="language-javascript">// Runs the Fermat compositeness test given n > 2 and 1 < a < n.
// Calculates r = a^{n-1} mod n and whether a is a Fermat witness to n
// (i.e., r != 1, which means n is composite). Returns a dictionary
// with a, n, r, and isCompositeByFermat, which is true iff a is a
// Fermat witness to n.
function testCompositenessByFermat(n, a) {
n = SNat.cast(n);
a = SNat.cast(a);
if (n.le(2)) {
throw new RangeError('n must be > 2');
}
if (a.le(1) || a.ge(n)) {
throw new RangeError('a must satisfy 1 < a < n');
}
var r = a.powMod(n.minus(1), n);
var isCompositeByFermat = r.ne(1);
return {
a: a,
n: n,
r: r,
isCompositeByFermat: isCompositeByFermat
};
}</code></pre>
<p>Note that the algorithm depends on the efficiency
of <a href="http://en.wikipedia.org/wiki/Modular_exponentiation"><em>modular
exponentiation</em></a> when calculating \(a^{n-1} \pmod{n}\). The
naive method is unsuitable since it requires \(\Theta(n)\) \(b\)-bit
multiplications, where \(b = \lceil \lg n \rceil\). <code>SNat</code>
uses <a href="http://en.wikipedia.org/wiki/Repeated_squaring">repeated
squaring</a>, which requires only \(\Theta(\lg n)\) \(b\)-bit
multiplications.</p>
<p>Another useful witness type is a <em>non-trivial square root of
unity \(\mathop{\mathrm{mod}} n\)</em>; that is, a number \(a \neq \pm
1 \pmod{n}\) such that \(a^2 \equiv 1 \pmod{n}\). It is a theorem of
number theory that if \(n\) is prime, there are no non-trivial square
roots of unity \(\mathop{\mathrm{n}}\). Therefore, if we do find one,
that means \(n\) is composite. In fact, finding one leads directly to
factors of \(n\). By definition, a non-trivial square root of unity
\(a\) satisfies \(a \pm 1 \neq 0 \pmod{n}\) and \(a^2 - 1 \equiv 0
\pmod{n}\). Factoring the latter leads to \((a+1)(a-1) \equiv 0
\pmod{n}\), which means that \(n\) divides \((a+1)(a-1)\). But the
first condition says that \(n\) divides neither \(a+1\) nor \(a-1\),
so it must be a product of two numbers \(p \mid a+1\) and \(q \mid
a-1\). Then \(\gcd(a+1, n)\)<sup><a href="#fn1" id="r1">[1]</a></sup>
and \(\gcd(a-1, n)\) are factors of \(n\).</p>
<p>Finding non-trivial square roots of unity by itself doesn't give a
useful primality testing algorithm, but combining it with the Fermat
primality test does. \(a^{n-1} \bmod n\) either equals \(1\) or not.
If it doesn't, you're done since you have a Fermat witness. If it
does equal \(1\), and \(n-1\) is even, then consider the square root
of \(a^{n-1}\), i.e. \(a^{(n-1)/2}\). If it is not \(\pm 1\), then it
is a non-trivial square root of unity. If it is \(-1\), then you
can't do anything else. But if it is \(1\), and \((n-1)/2\) is even,
you can then take another square root and repeat the test, stopping
when the exponent of \(a\) becomes odd or when you get a result not
equal to \(1\).</p>
<p>To turn this into an algorithm, you simply start from the bottom
up: find the greatest odd factor of \(n-1\), call it \(t\), and keep
squaring \(a^t\) mod \(n\) until you find a non-trivial square root of
\(n\) or until you can deduce the value of \(a^{n-1}\). In fact, this
is almost as fast as the original Fermat primality test, since the
exponentiation by \(n-1\) has to do the same sort of squaring, and
we're just adding comparisons to \(\pm 1\) in between squarings.</p>
<p>The original idea for the test above is from Artjuhov, although it
is usually credited to Miller. Therefore, we call \(a\) an <em>Artjuhov witness<sup><a href="#fn2" id="r2">[2]</a></sup> of \(n\)</em> if it shows \(n\) composite by
the above test.</p>
<p class="interactive-example" id="artjuhovExample">
Let
<span class="fake-katex"><var>n</var> =
<input class="parameter" size="6" pattern="[0-9]*" required
type="text" value="561"
data-bind="value: nStr, valueUpdate: 'afterkeydown'" /></span>
and
<span class="fake-katex"><var>a</var> =
<input class="parameter" size="6" pattern="[0-9]*" required
type="text" value="2"
data-bind="value: aStr, valueUpdate: 'afterkeydown'" /></span>.
<!-- ko template: outputTemplate --><!-- /ko -->
<script type="text/html" id="artjuhov.error.invalidN">
<span class="fake-katex"><var>n</var></span> is not a valid number.
</script>
<script type="text/html" id="artjuhov.error.invalidA">
<span class="fake-katex"><var>a</var></span> is not a valid number.
</script>
<script type="text/html" id="artjuhov.error.outOfBoundsN">
<span class="fake-katex"><var>n</var></span> must be greater than
<span class="fake-katex">2</span>.
</script>
<script type="text/html" id="artjuhov.error.outOfBoundsA">
<span class="fake-katex"><var>a</var></span> must be greater than
<span class="fake-katex">1</span> and less than
<span class="fake-katex"><var>n</var></span>.
</script>
<script type="text/html" id="artjuhov.success.fermatEquivResult">
Then
<span class="fake-katex"><var>n</var></span>
is even, so this reduces to the Fermat primality test.
<span class="fake-katex"><var>a</var><sup><var>n</var>−1</sup>
≡
<span class="intermediate" data-bind="text: r"></span>
<span data-bind="if: r() && r().ne(1)">≢ 1</span>
(mod <var>n</var>)</span> so therefore
<span class="fake-katex"><var>n</var></span> is
<span data-bind="if: isCompositeByArtjuhov()">
<span class="result">composite</span>.
<span data-bind="html: factorsHtml"></span>
</span>
<span data-bind="ifnot: isCompositeByArtjuhov()">
an <span class="result">Artjuhov pseudoprime base
<span class="fake-katex"><var>a</var></span></span>.
</span>
</script>
<script type="text/html" id="artjuhov.success.impliesFinalEquivResult">
Then
<span class="fake-katex"><var>n</var> − 1 =
<span data-bind="html: nMinusOneHtml"></span></span>,
and
<span class="fake-katex"><var>r</var> ≡
<span data-bind="html: rHtml"></span> ≡
<span data-bind="html: rResultHtml"></span> (mod <var>n</var>)</span>,
so
<span class="fake-katex"><var>a</var><sup><var>n</var>−1</sup>
≡
<span data-bind="html: aNMinusOneHtml"></span> (mod <var>n</var>)</span>,
and therefore
<span class="fake-katex"><var>n</var></span> is
<span data-bind="if: isCompositeByArtjuhov()">
<span class="result">composite</span>.
<span data-bind="html: factorsHtml"></span>
</span>
<span data-bind="ifnot: isCompositeByArtjuhov()">
either <span class="result">prime</span> or an
<span class="result">Artjuhov pseudoprime base
<span class="fake-katex"><var>a</var></span></span>.
</span>
</script>
<script type="text/html" id="artjuhov.success.nonTrivialSqrtResult">
Then
<span class="fake-katex"><var>n</var> − 1 =
<span data-bind="html: nMinusOneHtml"></span></span>,
<span class="fake-katex"><var>r</var> ≡
<span data-bind="html: rHtml"></span>
≡ <span class="intermediate">1</span>
(mod <var>n</var>)</span>, and
<span class="fake-katex">√<var>r</var> ≡
<span data-bind="html: rSqrtHtml"></span>
≡ <span class="intermediate" data-bind="text: rSqrt"></span>
(mod <var>n</var>)</span>, which is a non-trivial square root
of unity <span class="fake-katex">mod <var>n</var></span>
and therefore <span class="fake-katex"><var>n</var></span>
is <span class="result">composite</span>.
<span data-bind="html: factorsHtml"></span>
</script>
</p>
<script type="text/javascript" src="/intro-primality-testing-files/artjuhov-example.js"></script>
<p>If \(n\) is an odd composite, then it can be shown (originally by
Rabin) that at least three quarters of all possible witnesses are
Artjuhov witnesses. Therefore, for a sample of \(k\) possible
witnesses of \(n\), the probability of all of them being Artjuhov
liars is \(\le 4^{-k}\), which is stronger than the bound for the
Fermat primality test. Furthermore, this bound is unconditional;
there is nothing like Carmichael numbers for the Artjuhov test.</p>
<p>Here is the Artjuhov compositeness test, implemented in
Javascript:</p>
<pre class="code-container"><code class="language-javascript">// Runs the Artjuhov compositeness test given n > 2 and 1 < a < n-1.
// Finds the largest s such that n-1 = t*2^s, calculates r = a^t mod
// n, then repeatedly squares r (mod n) up to s times until r is
// congruent to -1, 0, or 1 (mod n). Then, based on the value of s
// and the final value of r and i (the number of squarings),
// determines whether a is an Artjuhov witness to n (i.e., n is
// composite).
//
// Returns a dictionary with, a, n, s, t, i, r, rSqrt = sqrt(r) if i >
// 0 and null otherwise, and isCompositeByArtjuhov, which is true iff
// a is an Artjuhov witness to n.
function testCompositenessByArtjuhov(n, a) {
n = SNat.cast(n);
a = SNat.cast(a);
if (n.le(2)) {
throw new RangeError('n must be > 2');
}
if (a.le(1) || a.ge(n)) {
throw new RangeError('a must satisfy 1 < a < n');
}
var nMinusOne = n.minus(1);
// Find the largest s and t such that n-1 = t*2^s.
var t = nMinusOne;
var s = new SNat(0);
while (t.isEven()) {
t = t.div(2);
s = s.plus(1);
}
// Find the smallest 0 <= i < s such that a^{t*2^i} = 0/-1/+1 (mod
// n).
var i = new SNat(0);
var rSqrt = null;
var r = a.powMod(t, n);
while (i.lt(s) && r.gt(1) && r.lt(nMinusOne)) {
i = i.plus(1);
rSqrt = r;
r = r.times(r).mod(n);
}
var isCompositeByArtjuhov = false;
if (s.isZero()) {
// If 0 = i = s, then this reduces to the Fermat primality test.
isCompositeByArtjuhov = r.ne(1);
} else if (i.isZero()) {
// If 0 = i < s, then:
//
// * r = 0 (mod n) -> a^{n-1} = 0 (mod n), and
// * r = +/-1 (mod n) -> a^{n-1} = 1 (mod n).
isCompositeByArtjuhov = r.isZero();
} else if (i.lt(s)) {
// If 0 < i < s, then:
//
// * r = 0 (mod n) -> a^{n-1} = 0 (mod n),
// * r = +1 (mod n) -> a^{t*2^{i-1}} is a non-trivial square root of
// unity mod n, and
// * r = -1 (mod n) -> a^{n-1} = 1 (mod n).
//
// Note that the last case means r = n - 1 > 1.
isCompositeByArtjuhov = r.le(1);
} else {
// If 0 < i = s, then:
//
// * r = 0 (mod n) can't happen,
// * r = +1 (mod n) -> a^{t*2^{i-1}} is a non-trivial square root of
// unity mod n, and
// * r > +1 (mod n) -> failure of the Fermat primality test.
isCompositeByArtjuhov = true;
}
return {
a: a,
n: n,
t: t,
s: s,
i: i,
r: r,
rSqrt: rSqrt,
isCompositeByArtjuhov: isCompositeByArtjuhov
};
}</code></pre>
<p>With the two compositeness tests above, we can now write a
probabilistic primality test:</p>
<pre class="code-container"><code class="language-javascript">// Returns true iff a is a Fermat witness to n, and thus n is
// composite. a and n must satisfy the same conditions as in
// testCompositenessByFermat.
function hasFermatWitness(n, a) {
return testCompositenessByFermat(n, a).isCompositeByFermat;
}
// Returns true iff a is an Arjuhov witness to n, and thus n is
// composite. a and n must satisfy the same conditions as in
// testCompositenessByArtjuhov.
function hasArtjuhovWitness(n, a) {
return testCompositenessByArtjuhov(n, a).isCompositeByArtjuhov;
}
// Returns true if n is probably prime, based on sampling the given
// number of possible witnesses and testing them against n. If false
// is returned, then n is definitely composite.
//
// By default, uses the Artjuhov test for witnesses with 20 samples
// and Math.random for the random number generator. This gives an
// error bound of 4^-20 if true is returned.
function isProbablePrime(n, hasWitness, numSamples, rng) {
n = SNat.cast(n);
hasWitness = hasWitness || hasArtjuhovWitness;
rng = rng || Math.random;
numSamples = numSamples || 20;
if (n.le(1)) {
return false;
}
if (n.le(3)) {
return true;
}
if (n.isEven()) {
return false;
}
for (var i = 0; i < numSamples; ++i) {
var a = SNat.random(2, n.minus(2), rng);
if (hasWitness(n, a)) {
return false;
}
}
return true;
}</code></pre>
<p><code>isProbablePrime</code> called
with <code>hasFermatWitness</code> is the <em>Fermat primality
test</em>, and <code>isProbablePrime</code> called
with <code>hasArtjuhovWitness</code> is the <em>Miller-Rabin primality
test</em>. The latter is the current general primality test of
choice, replacing
the <a href="http://en.wikipedia.org/wiki/Solovay-Strassen">Solovay-Strassen
primality test</a>.</p>
<p>We can also use <code>isProbablePrime</code> to randomly generate
probable primes, which is useful
for <a href="http://en.wikipedia.org/wiki/RSA_(algorithm)#Key_generation">cryptographic
applications</a>:</p>
<pre class="code-container"><code class="language-javascript">// Returns a probable b-bit prime that is at least 2^b. All
// parameters but b are passed to isProbablePrime.
function findProbablePrime(b, hasWitness, rng, numSamples) {
b = SNat.cast(b);
var lb = (new SNat(2)).pow(b.minus(1));
var ub = lb.times(2);
while (true) {
var n = SNat.random(lb, ub);
if (isProbablePrime(n, hasWitness, rng, numSamples)) {
return n;
}
}
}</code></pre>
<p>In this case, for sufficiently large \(b\), the Fermat primality
test is acceptable, since Carmichael numbers are so rare and we're the
ones generating the possible primes to be tested.<sup><a href="#fn3" id="r3">[3]</a></sup></p>
<p>There are other primality tests, but they're less often used in
practice because they're
either <a href="http://en.wikipedia.org/wiki/Solovay%E2%80%93Strassen_primality_test">less
efficient</a> or <a href="http://www.pseudoprime.com/pseudo2.pdf">more
sophisticated</a> than the algorithms above, or they require \(n\) to
have <a href="http://en.wikipedia.org/wiki/Lucas_primality_test">special</a> <a href="http://en.wikipedia.org/wiki/Proth%27s_theorem">properties</a>.
Perhaps the most interesting of these tests is
the <a href="http://en.wikipedia.org/wiki/Aks_primality_test"><em>AKS
primality test</em></a>, which proved once and for all that primality
testing is in P.</p>
<section class="footnotes">
<p id="fn1">[1] \(\gcd\) is
the <a href="http://en.wikipedia.org/wiki/Greatest_common_divisor">greatest
common divisor</a> function.
<a href="#r1">↩</a></p>
<p id="fn2">[2] “Artjuhov witness” is an idiosyncratic
name on my part; a more common name is <em>strong witness</em>, which
I don't like.
<a href="#r2">↩</a></p>
<p id="fn3">[3]
<a href="http://en.wikipedia.org/wiki/Fermat_primality_test#Applications">According to Wikipedia</a>, PGP uses the Fermat primality test.
<a href="#r3">↩</a></p>
</section>
https://www.akalin.com/pair-counterexamples-vector-calculus
A Pair of Counterexamples in Vector Calculus
2011-11-27T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<!-- TODO: use \operatorname for sgn instead of \mathrm when it is supported by KaTeX. -->
<!-- TODO: use align* instead of aligned when it is supported by KaTeX. -->
<p>While recently reviewing some topics in vector calculus, I became
curious as to why violating seemingly innocuous conditions for some
theorems leads to surprisingly wild results. In fact, I was struck by
how these theorems resemble computer programs, not in some
<a href="http://en.wikipedia.org/wiki/Curry-Howard_Correspondence">abstract
way</a>, but in how the lack of “input validation” leads
to
<a href="http://en.wikipedia.org/wiki/Undefined_behavior">non-obvious
behavior</a> in the face of erroneous input.</p>
<p>I found that understanding why these counterexamples lead to wild
results deepened my understanding of the theorems involved and their
proofs.<sup><a href="#fn1" id="r1">[1]</a></sup> Besides,
pathological examples are more interesting than well-behaved ones!</p>
<p>First, let's look at a “counterexample”
to <a href="http://en.wikipedia.org/wiki/Green%27s_theorem">Green's
theorem</a>:</p>
<p class="example">1. Two functions \(L, M \colon \mathbb{R}^2 \to \mathbb{R}\) and
a positively-oriented, piecewise smooth, simple closed curve \(C\)
in \(\mathbb{R}^2\) enclosing the region \(D\) such that
\[
\oint_C L \,dx + M \,dy \ne
\iint_D \left( \frac{\partial{M}}{\partial{x}} - \frac{\partial{L}}{\partial{y}} \right) \,dx \,dy \text{.}
\]</p>
<p>Let
\[
\begin{aligned}
L &= -\frac{y}{x^2+y^2} \text{, } M = \frac{x}{x^2+y^2} \text{,}
\end{aligned}
\]
and \(C\) be a curve going clockwise around the rectangle \(D = [-1,
1]^2\).<sup><a href="#fn2" id="r2">[2]</a></sup> Then the integral of \(L \,dx + M \, dy\) around \(C\) is \(2
\pi\) since it encloses the origin. But
\[
\frac{\partial{M}}{\partial{x}} = \frac{\partial{L}}{\partial{y}} = \frac{y^2-x^2}{x^2+y^2}
\]
so the difference of the two vanishes everywhere but the origin, where
neither function is defined. Therefore, the (improper) integral over
\(D\) also vanishes, proving the inequality. ∎</p>
<p>Of course, the easy explanation is that the discontinuity of \(L\)
and \(M\) at the origin violates a condition of Green's theorem. But
that doesn't really tell us anything, so let's break down the theorem
and see where exactly it fails.</p>
<p>Green's theorem is usually proved first for rectangles \([a, b]
\times [c, d]\), which suffices for our purpose. If \(C\) is a curve
that goes counter-clockwise around such a rectangle \(D\), then we can
easily show that
\[
\oint_C L \,dx = - \iint_D \frac{\partial{L}}{\partial{y}} \,dx \,dy
\]
and
\[
\oint_C M \,dy = \iint_D \frac{\partial{M}}{\partial{x}} \,dx \,dy \text{,}
\]
with the sum of these two formulas proving the theorem.</p>
<p>So the first sign of trouble is that the theorem freely
interchanges addition and integration. Since the partial derivatives
of our functions diverge at the origin, if \(D\) contains the origin
then the integrals of those partial derivatives over \(D\) may not
even be defined, even if the integral of their difference is.</p>
<p>But the problem arises even before that. The statements above are
proved by showing
\[
\oint_C L \,dx = - \int_a^b \left( \int_c^d \frac{\partial{L}}{\partial{y}} \,dy \right) \,dx
\]
and
\[
\oint_C M \,dy = \int_c^d \left( \int_a^b \frac{\partial{M}}{\partial{x}} \,dx \right) \,dy
\text{.}
\]
both of which hold for our example. But notice that in one case we
integrate with respect to \(y\) first, and in the other case we
integrate with respect to \(x\) first. Therefore, we have to
interchange the order of integration or convert to a double integral
in order to get them to a form where we can add them. And there's the
rub: if \(D\) contains the origin, switching the order of integration
for either integral above switches the sign of the result!</p>
<p>This fully explains the discrepancy; since the result of both
integrals above (with the iteration order preserved) is \(\pi\),
adding them together as-is gives the expected result of \(2 \pi\).
But if we switch the iteration order of one of the iterated integrals
as done in the proof of Green's theorem, then we switch the result of
that integral to \(-\pi\), which cancels with the result of the other
unchanged integral to produce \(0\).</p>
<p>So now let's examine this strange behavior of the sign of an
integration's result depending on the iteration order. This leads us
to our next “counterexample,” this time
for <a href="http://en.wikipedia.org/wiki/Fubini%27s_theorem">Fubini's
theorem</a>:</p>
<p class="example">2. A function \(f \colon \mathbb{R}^2 \to \mathbb{R}\) whose
iterated integrals over a rectangle \(D = [a, b] \times [c, d]
\subset \mathbb{R}^2\) differ.</p>
<p>Let
\[
f(x, y) = \frac{x^2-y^2}{(x^2+y^2)^2}
\quad \text{ and } \quad
D = [-1, 1]^2\text{.}
\]
The two iterated integrals of \(f\) over \(D\) are usually written as
\[
\int_{-1}^1 \left( \int_{-1}^1 f(x, y) \,dy \right) \,dx
\qquad \text{ and } \qquad
\int_{-1}^1 \left( \int_{-1}^1 f(x, y) \,dx \right) \,dy
\]
but let's define them more carefully to make it easier to justify our
calculations.</p>
<p>Let
\[
\begin{aligned}
u_k &= y \mapsto f(k, y) \\
v_l &= x \mapsto f(x, l) \text{.}
\end{aligned}
\]
In other words, given the real constants \(k\) and \(l\), construct
the (possibly partial) real functions \(u_k(y)\) and \(v_l(x)\) by
partially-evaluating \(f\) at \(x = k\) and \(y = l\),
respectively.</p>
<p>Then, if we also let<sup><a href="#fn3" id="r3">[3]</a></sup>
\[
U(x) = \int_{-1}^1 u_x(y) \,dy
\qquad \text{ and } \qquad
V(y) = \int_{-1}^1 v_y(x) \,dx \text{,}
\]
we can write the iterated integrals as
\[
\int_{-1}^1 U(x) \,dx
\qquad \text{ and } \qquad
\int_{-1}^1 V(y) \,dy \text{.}
\]
</p>
<p>Computing \(U(x)\) for \(x \neq 0\), we get<sup><a href="#fn4" id="r4">[4]</a></sup>
\[
\begin{aligned}
U(x) &= \int_{-1}^1 \frac{\partial{}}{\partial{y}} \left( -\frac{y}{x^2+y^2} \right) \,dy \\
&= \left. -\frac{y}{x^2+y^2} \right|_{y=-1}^{y=1} \\
&= -\frac{2}{x^2+1} \text{.}
\end{aligned}
\]
</p>
<p>Attempting to evaluate \(U(0)\), we see that
\[
\begin{aligned}
U(0) &= \int_{-1}^1 \frac{0^2-y^2}{(0^2+y^2)^2} \,dy \\
&= - \int_{-1}^1 \frac{dy}{y^2}
\end{aligned}
\]
which diverges. So
\[
U(x) = -\frac{2}{x^2+1} \text{ for } x \ne 0 \text{.}
\]
</p>
<p>
By a similar computation, we find that<sup><a href="#fn5" id="r5">[5]</a></sup>
\[
V(y) = \frac{2}{y^2+1} \text{ for } y \ne 0 \text{.}
\]
</p>
<p>Since \(U(x)\) isn't defined at \(0\), we have to treat it as an
improper integral, although doing so poses no real difficulty:
\[
\begin{aligned}
\int_{-1}^1 U(x)\,dx
&= \lim_{a \nearrow 0} \left( \int_{-1}^a -\frac{2}{x^2+1} \,dx \right) +
\lim_{a \searrow 0} \left( \int_{a}^1 -\frac{2}{x^2+1} \,dx \right) \\
&= \lim_{a \nearrow 0}
\Bigl( \left. -2 \arctan(x) \right|_{-1}^{a} \Bigr) +
\lim_{a \searrow 0}
\Bigl( \left. -2 \arctan(x) \right|_{a}^{1} \Bigr) \\
&= \left. -2 \arctan(x) \right|_{-1}^{0} +
\left. -2 \arctan(x) \right|_{0}^{1} \\
&= \left. -2 \arctan(x) \right|_{-1}^{1} \\
&= -\pi \text{.}
\end{aligned}
\]
</p>
<p>Similarly,
\[
\int_{-1}^1 V(y)\,dy = \pi \text{,}
\]
so the iterated integrals of \(f(x, y)\) over \([-1, 1]^2\) differ; in
fact, as we claimed above, switching the iteration order switches the
sign of the result. ∎</p>
<p>We can repeat the above calculations for an arbitrary rectangle to
see that the iterated integrals of \(f(x, y)\) differ if \(D\)
contains the origin either as an interior point or a corner. But
there's an easier way to prove that statement and also gain some
insight as to why \(f(x, y)\) has this strange property.</p>
<p>Note that the key facts in the above calculations were that \(U(x)
\lt 0\) for any \(x \ne 0\) and \(V(y) \gt 0\) for any \(y \ne 0\).
Therefore, integrating \(U(x)\) over any interval on the \(x\)-axis
would produce a negative result and integrating \(V(x)\) over any
interval on the \(y\)-axis would produce a positive result, leading to
the difference in iterated integrals. This holds more generally; for
any \(m, n \gt 0\):
\[
\int_{-n}^n f(x, y) \,dy \lt 0
\qquad \text{ and } \qquad
\int_{-m}^m f(x, y) \,dx \gt 0 \text{.}
\]
Therefore,
\[
\int_{-m}^m \left( \int_{-n}^n f(x, y) \,dy \right) \,dx \lt 0
\qquad \text{ and } \qquad
\int_{-n}^n \left( \int_{-m}^m f(x, y) \,dx \right) \,dy \gt 0
\]
so the iterated integrals of \(f(x, y)\) differ over the rectangles
\([-m, m] \times [-n, n]\). Since any rectangle \(D\) containing the
origin as an interior point must contain some smaller rectangle \(E =
[-m, m] \times [-n, n]\), the iterated integrals of \(f(x, y)\) over
\(E\) differ and therefore must also differ over \(D\).</p>
<p>Furthermore, since \(f(x, y)\) is even in both \(x\) and \(y\), you
can carry out a similar argument to the above with intervals of the
form \([0, m]\) or \([-m, 0]\) to show that the iterated integrals of
\(f(x, y)\) must also differ over any rectangle with the origin as a
corner.
</p>
<p>So the essential property of \(f(x, y)\) is that slicing it along
the \(x\)-axis gives a function which has positive area under the
curve on any interval symmetric around \(0\) or with \(0\) as an
endpoint, and that slicing it similarly along the \(y\)-axis gives a
function with has negative area. Therefore, on a rectangle symmetric
around the origin or with the origin as a corner, one can choose the
sign of the iterated integral by choosing which axis to slice
first.</p>
<p>The next thing to investigate is how exactly the iterated integrals
of \(f(x, y)\) over the rectangle \(D\) are expressed such that they
differ only when \(D\) contains the origin, especially considering
that the \(f(x, y)\) is expressed in quite a simple form. To do that,
let's consider the simple case of a rectangle \(D = [\delta, 1] \times
[\epsilon, 1]\) where we can vary \(\delta\) and \(\epsilon\) at
will.</p>
<p>Let
\[
\begin{aligned}
I_{yx}(\delta, \epsilon) &=
\int_{\delta}^1 \left( \int_{\epsilon}^1 f(x, y) \,dy \right) \,dx \\
I_{xy}(\delta, \epsilon) &=
\int_{\epsilon}^1 \left( \int_{\delta}^1 f(x, y) \,dx \right) \,dy
\text{.}
\end{aligned}
\]
Then, for \(\epsilon \neq 0\):
\[
\begin{aligned}
I_{yx}(\delta, \epsilon) &=
\int_{\delta}^1 \left( \int_{\epsilon}^1
\frac{y^2-x^2}{(x^2+y^2)^2} \,dy \right) \,dx \\
&= \int_{\delta}^1 \left(
\left. -\frac{y}{x^2+y^2} \right|_{y=\epsilon}^{y=1} \right) \,dx \\
&= \int_{\delta}^1 \Biggl(
-\frac{1}{1+x^2} -
\left( -\frac{\epsilon}{\epsilon^2+x^2} \right) \Biggr) \,dx \\
&= \int_{\delta}^1 \frac{dx/\epsilon}{1+(x/\epsilon)^2} -
\int_{\delta}^1 \frac{dx}{1+x^2} \\
&= \arctan\left(\frac{1}{\epsilon}\right) -
\arctan\left(\frac{\delta}{\epsilon}\right) -
\frac{\pi}{4} + \arctan(\delta) \text{,}
\end{aligned}
\]
and for \(\epsilon = 0\):
\[
I_{yx}(\delta, 0) = -\frac{\pi}{4} + \arctan(\delta) \text{.}
\]
Similarly, for \(\delta \neq 0\):
\[
\begin{aligned}
I_{xy}(\delta, \epsilon) &=
\int_{\epsilon}^1 \left( \int_{\delta}^1
\frac{y^2-x^2}{(x^2+y^2)^2} \,dx \right) \,dy \\
&= \int_{\epsilon}^1 \left(
\left. \frac{x}{x^2+y^2} \right|_{x=\delta}^{x=1} \right) \,dy \\
&= \int_{\epsilon}^1 \left(
\frac{1}{1+y^2} - \frac{\delta}{\delta^2+x^2} \right) \,dy \\
&= \int_{\epsilon}^1 \frac{dy}{1+y^2} -
\int_{\epsilon}^1 \frac{dy/\delta}{1+(y/\delta)^2} \\
&= \frac{\pi}{4} - \arctan(\epsilon) -
\arctan\left(\frac{1}{\delta}\right) +
\arctan\left(\frac{\epsilon}{\delta}\right) \text{,}
\end{aligned}
\]
and for \(\delta = 0\):
\[
I_{xy}(0, \epsilon) = \frac{\pi}{4} - \arctan(\epsilon) \text{.}
\]
Then let \(\Delta = I_{xy} - I_{yx}\) be the difference between the
two iterated integrals. We can use the identity
\[
\arctan(x) + \arctan\left(\frac{1}{x}\right) = \frac{\pi}{2} \mathrm{sgn}(x)
\]
to simplify \(\Delta(\delta, \epsilon)\) if neither \(\delta\) nor
\(\epsilon\) is zero:
\[
\begin{aligned}
\Delta(\delta, \epsilon)
&= \bigl( \pi/4 - \arctan(\epsilon) - \arctan(1/\delta)
+ \arctan(\epsilon/\delta) \bigr) \\
& \quad \mathbin{-}
\bigl( \arctan(1/\epsilon) - \arctan(\delta/\epsilon)
- \pi/4 + \arctan(\delta) \bigr) \\
&= \pi/2 - \bigl( \arctan(\epsilon) + \arctan(1/\epsilon) \bigr) \\
& \quad \mathbin{-} \bigl( \arctan(\delta) + \arctan(1/\delta) \bigr) \\
& \quad \mathbin{+}
\bigl( \arctan(\delta/\epsilon) + \arctan(\epsilon/\delta) \bigr) \\
&= \frac{\pi}{2} \bigl( 1 - \mathrm{sgn}(\epsilon) - \mathrm{sgn}(\delta)
+ \mathrm{sgn}(\delta/\epsilon) \bigr) \text{.}
\end{aligned}
\]
</p>
<p>
Using the properties of \(\mathrm{sgn}(x)\), we can simplify this to the final
expression:
\[
\Delta(\delta, \epsilon) =
\frac{\pi}{2}
\bigl( 1 - \mathrm{sgn}(\delta) \bigr) \bigl( 1 - \mathrm{sgn}(\epsilon) \bigr)
\]
which we can prove still holds if either \(\delta\) or \(\epsilon\) is
zero (or both).</p>
<p>So with the simplified expression for \(\Delta(\delta, \epsilon)\),
it becomes apparent how \(\mathrm{sgn}(x)\) is used to control the value of
\(\Delta(\delta, \epsilon)\); as long as either \(\delta\) or
\(\epsilon\) is positive, \(1 - \mathrm{sgn}(x)\) zeroes out the entire
expression.</p>
<section class="footnotes">
<p id="fn1">[1] There are
actually <a href="http://amzn.com/048668735X">whole</a>
<a href="http://amzn.com/0486428753">books</a> dedicated to
counterexamples. They make good bathroom reading material.
<a href="#r1">↩</a></p>
<p id="fn2">[2] The vector field \((L, M)\) also serves as the
canonical “counterexample” to
the <a href="http://en.wikipedia.org/wiki/Gradient_theorem">gradient
theorem</a>. <a href="#r2">↩</a></p>
<p id="fn3">[3] \(U(x)\) and \(V(y)\) are also (partial) real
functions. <a href="#r3">↩</a></p>
<p id="fn4">[4] We're justified in applying standard integration
techniques here since \(u_k(y)\) for \(k \gt 0\) is defined and
bounded for all \(y\). <a href="#r4">↩</a></p>
<p id="fn5">[5] Note that \(U(x)\) and \(V(y)\) differ only in
variable name and sign. <a href="#r5">↩</a></p>
</section>
https://www.akalin.com/evlis-tail-recursion
Understanding Evlis Tail Recursion
2011-10-28T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<p>While reading
about <a href="http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%_sec_3.5">proper
tail recursion</a> in Scheme, I encountered a similar but obscure
optimization called <em>evlis tail recursion</em>.
In <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8567&rep=rep1&type=pdf">the
paper where it was first described</a>, the author claims it
"dramatically improve the space performance of many programs," which
sounded promising.</p>
<p>However, the few places where its mentioned don't do much more than
state its definition and claim its usefulness. Hopefully I can
provide a more detailed analysis here.</p>
<p>Consider the straightforward factorial implementation in
Scheme:<sup><a href="#fn1" id="r1">[1]</a></sup></p>
<pre class="code-container"><code class="language-lisp">(define (fact n) (if (<= n 1) 1 (* n (fact (- n 1)))))</code></pre>
<p>It is not tail-recursive, since the recursive call is nested in
another procedure call. However, it's <em>almost</em> tail-recursive;
the call to <code>*</code> is a tail call, and the recursive call is
its last subexpression, so it will be the last subexpression to be
evaluated.</p>
<p>Recall what happens when a procedure call (represented as a list of
subexpressions) is evaluated: each subexpression is evaluated, and the
first result (the procedure) is passed the other results as
arguments.<sup><a href="#fn2" id="r2">[2]</a></sup></p>
<p>Evlis tail recursion can be described as follows: when performing a
procedure call and during the evaluation of the last subexpression,
the calling environment is discarded as soon as it is not
required.<sup><a href="#fn3" id="r3">[3]</a></sup> The distinction
between evlis tail recursion and proper tail recursion is subtle.
Proper tail recursion requires only that the calling environment be
discarded before the actual procedure call; evlis tail recursion
discards the calling environment even sooner, if possible.</p>
<p>An example will help to clarify things. Given <code>fact</code> as
defined above, say you evaluate <code>(fact 10)</code> and you're in
the procedure call with <code>n = 5</code>. The call stack of a
properly tail-recursive interpreter would look like this:</p>
<pre>
evalExpr
--------
env = { n: 10 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [10, <pending evalExpr('(fact (- n 1))', env)>]
evalExpr
--------
env = { n: 9 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [9, <pending evalExpr('(fact (- n 1))', env)>]
...
evalExpr
--------
env = { n: 6 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [6, <pending evalExpr('(fact (- n 1))', env)>]
evalExpr
--------
env = { n: 5 } -> <top-level environment>
expr = '(if ...)'
</pre>
<p>whereas the call stack of an evlis tail-recursive interpreter would
look like this:</p>
<pre>
evalExpr
--------
env = { n: 5 } -> <top-level environment>
pendingProcedureCalls = [
[<native function: *>, 10],
[<native function: *>, 9],
...
[<native function: *>, 6]
]
expr = (if ...)
</pre>
<p>In this implementation, the last subexpression of a procedure call
is evaluated exactly like a tail expression, but the procedure call
and non-last subexpressions are pushed onto a stack. Whenever an
expression is reduced to a simple one and the stack is non-empty, a
pending procedure call with its other args are popped off, and it is
called with the reduced expression as the final argument.</p>
<p>Note that this didn't change the asymptotic behavior of the
procedure; it still takes \(\Theta(n)\) memory to evaluate. However,
only the bare minimum of information is saved: the list of pending
functions and their arguments. Other auxiliary variables, and
crucially the nested calling environments, aren't preserved, leading
to a significant constant-factor reduction in memory.</p>
<p>This raises the question: Are there cases where evlis tail
recursion leads to better asymptotic behavior? In fact, yes; consider
the following (contrived) implementation of
factorial<sup><a href="#fn4" id="r4">[4]</a></sup>:</p>
<pre class="code-container"><code class="language-lisp">(define (fact2 n)
(define v (make-vector n))
(* (n (fact2 (- n 1)))))</code></pre>
<p>Before the main body of the function, a vector of size \(n\) is
defined. This means that the environments in the call stack of a
properly tail-recursive interpreter would look like this:<sup><a href="#fn5" id="r5">[5]</a></sup></p>
<pre>
env = { n: 10, v: <vector of size 10> } -> <top-level environment>
env = { n: 9, v: <vector of size 9> } -> <top-level environment>
env = { n: 8, v: <vector of size 8> } -> <top-level environment>
env = { n: 7, v: <vector of size 7> } -> <top-level environment>
...
</pre>
<p>whereas the an evlis tail-recursive interpreter would keep around
only the current environment. Therefore, the properly tail-recursive
interpreter would require \(\Theta(n^2)\) memory to
evaluate <code>(fact2 n)</code> while the evlis tail-recursive
interpreter would require only \(\Theta(n)\)</p>
<p>Studying examples like the one above enabled me to finally
understand how evlin tail recursion worked and what sort of savings it
gives. However, I have yet to find a practical example where evlis
tail recursion gives the same sort of asymptotic gains as described
above, and I'd be interested to receive some. But perhaps the "large
gains" mentioned in the various papers describing it are only
constant-factor reductions in memory.</p>
<p>In any case, another important difference in Scheme between proper
tail recursion and evlis tail recursion is that the former is
a <em>language feature</em> and the latter is
an <em>optimization</em>. That means that it is acceptable and even
encouraged to write Scheme programs that take advantage of proper tail
recursion, but it would be unwise to rely on evlis tail recursion for
the asymptotic performance of your function. Instead, one should
treat it just as a nice constant-factor speed gain.</p>
<p>Note that it is easy to make evlis tail recursion "smarter." Since
Scheme doesn't specify the order of argument evaluation, an
interpreter could evaluate arguments to maximize the gains from evlis
tail recursion. As an easy example, if we had switched the arguments
to <code>+</code> in <code>fact</code> above, making it
non-evlis-tail-recursive, a smart compiler could still treat it as
such. A possible rule of thumb would be to pick a non-trivial
function call to evaluate last.</p>
<p>To complete the picture, I will outline below the evaluation
function for a simple evlis tail-recursive Scheme interpreter in
Javascript. All of the sources I've found describe it in terms of
compilers, so I think it'll be useful to have a reference
implementation for an interpreter.</p>
<p>Let's say we already have a properly tail-recursive
interpreter:<sup><a href="#fn6" id="r6">[6]</a></sup></p>
<pre class="code-container"><code class="lang-javascript">function evalExpr(expr, env) {
// Fake tail calls with a while loop and continue.
while (true) {
// Symbols, constants, quoted expressions, and lambdas.
if (isSimpleExpr(expr)) {
// The only exit point.
return evalSimpleExpr(expr, env);
}
// (if test conseq alt)
if (isSpecialForm(expr, 'if')) {
expr = evalExpr(expr[1], env) ? expr[2] : expr[3];
continue;
}
// (set! var expr)
if (isSpecialForm(expr, 'set!')) {
env.set(expr[1], evalExpr(expr[2], env));
expr = null;
continue;
}
// (define var expr?)
if (isSpecialForm(expr, 'define')) {
env.define(expr[1], evalExpr(expr[2], env));
expr = null;
continue;
}
// (begin expr*)
if (isSpecialForm(expr, 'begin')) {
if (expr.length == 1) {
expr = null;
continue;
}
// Evaluate all but the last subexpression.
for (var i = 1; i < expr.length - 1; ++i) {
evalExpr(expr[i], env);
}
expr = expr[expr.length - 1];
continue;
}
// (proc expr*)
var proc = evalExpr(expr.shift(), env);
var args = expr.map(function(subExpr) { return evalExpr(subExpr, env); });
// proc.run() returns its body in result.expr and the environment
// in which to evaluate it (with all its arguments bound) in
// result.env.
var result = proc.run(args);
expr = result.expr;
// The only time when env is changed.
env = result.env;
continue;
}
}</code></pre>
<p>Then implementing evlis tail recursion requires only a few
changes:</p>
<pre class="code-container"><code class="lang-javascript">function evalExpr(expr, env) {
// This is a stack of procedures and their non-final arguments that
// are waiting for their final argument to be evaluated.
var pendingProcedureCalls = [];
while (true) {
if (isSimpleExpr(expr)) {
expr = evalSimpleExpr(expr, env);
// Discard calling environment.
env = null;
if (pendingProcedureCalls.length == 0) {
// No pending procedure calls, so we're done (the only exit
// point).
return expr;
}
var args = pendingProcedureCalls.pop();
var proc = args.shift();
args.push(expr);
var result = proc.run(args);
expr = result.expr;
// Change to new environment (the only time when env is
// changed).
env = result.env;
continue;
}
...
// Everything else remains the same.
...
// (proc expr*)
var nonFinalSubExprs =
exprs.slice(0, -1).map(
function(subExpr) { return evalExpr(subExpr, env); });
pendingProcecureCalls.push(nonFinalSubExprs);
// Evaluate the last subexpression as a tail call.
expr = expr[expr.length - 1];
continue;
}
}</code></pre>
<section class="footnotes">
<p id="fn1">[1] Assume a left-to-right evaluation order for now.
<a href="#r1">↩</a></p>
<p id="fn2">[2] The function that takes a list of expressions, evaluates them,
and returns the results as a list is traditionally
called <code>evlis</code>, hence the name of the optimization.
<a href="#r2">↩</a></p>
<p id="fn3">[3] This assumes that the calling environment isn't
stored somewhere else.
<a href="#r3">↩</a></p>
<p id="fn4">[4] This was adapted from an example
in <a href="ftp://ftp.ccs.neu.edu/pub/people/will/tail.pdf">Proper
Tail Recursion and Space Efficiency</a>.
<a href="#r4">↩</a></p>
<p id="fn5">[5] Assume that the interpreter isn't smart enough to deduce that \(v\)
can be optimized out since it's never used.
<a href="#r5">↩</a></p>
<p id="fn6">[6] Adapted from Peter Norvig's
excellent <a href="http://norvig.com/lispy.html"><code>lis.py</code></a>.
<a href="#r6">↩</a></p>
</section>
https://www.akalin.com/elementary-gaussian-proof
An Elementary Way to Calculate the Gaussian Integral
2011-01-06T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<p>
While reading <a href="http://gowers.wordpress.com">Timothy Gowers's blog</a> I stumbled on
<a href="http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-239">Scott Carnahan's comment</a>
describing an elegant calculation of the Gaussian integral
\[
\int_{-\infty}^{\infty} e^{-x^2} \, dx = \sqrt{\pi}\text{.}
\]
I was so struck by its elementary character that I imagined what it
would be like written up, say, as an extra credit exercise in a
single-variable calculus class:
</p>
<p class="exercise">
<span class="exercise">Exercise 1.</span>
(<span class="exercise-name">The Gaussian integral</span>.) Let
\[
F(t) = \int_0^t e^{-x^2} \, dx
\text{, }\qquad
G(t) = \int_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx
\text{,}
\]
and \(H(t) = F(t)^2 + G(t)\).
<ol class="exercise-list">
<li>Calculate \(H(0)\).</li>
<li>Calculate and simplify \(H'(t)\). What does this
imply about \(H(t)\)?</li>
<li>Use part b to calculate \(F(\infty) =
\displaystyle\lim_{t \to \infty} F(t)\).</li>
<li>Use part c to calculate
\[
\int_{-\infty}^{\infty} e^{-x^2} \, dx\text{.}
\]</li>
</ol>
</p>
<p>
Although this is simpler than
<a href="http://en.wikipedia.org/wiki/Gaussian_integral#Careful_proof">the
usual calculation of the Gaussian integral</a>, for which careful
reasoning is needed to justify the use of polar coordinates, it seems
more like a
<a href="http://en.wikipedia.org/wiki/Certificate_(complexity)">certificate</a>
than an actual
proof; you can convince yourself that the calculation is valid, but
you gain no insight into the reasoning that led up to it.<sup><a href="#fn1" id="r1">[1]</a></sup>
</p>
<p>
Fortunately, <a href="http://gowers.wordpress.com/2007/10/04/when-are-two-proofs-essentially-the-same/#comment-243">David Speyer's
comment</a> solves the mystery; \(G(t)\) falls out of doing the
integration in Cartesian coordinates over a triangular region. Just
for kicks, here's how I imagine an exercise based on this method would
look like (this time for a multi-variable calculus class):
</p>
<p class="exercise">
<span class="exercise">Exercise 2.</span>
(<span class="exercise-name">The Gaussian integral in Cartesian coordinates.</span>) Let
\[
A(t) = \iint\limits_{\triangle_t} e^{-(x^2+y^2)} \, dx \, dy
\]
where \(\triangle_t\) is the triangle with vertices \((0, 0)\), \((t,
0)\), and \((t, t)\).
<!-- TODO(akalin): Draw a diagram for \triangle_t. -->
<ol class="exercise-list">
<li>Use the substitution \(y = sx\) to reduce \(A(t)\) to a
one-dimensional integral.</li>
<li>Use part a to calculate \(A(\infty) =
\lim_{t \to \infty} A(t)\).</li>
<li>Use part b to calculate
\[
\int_{-\infty}^{\infty} e^{-x^2} \, dx\text{.}
\]</li>
<li>Let
\[
F(t) = \int_0^t e^{-x^2} \, dx
\qquad\text{ and }\qquad
G(t) = \int_0^1 \frac{e^{-t^2 (1+x^2)}}{1+x^2} \, dx
\text{.}
\]
Use part a to relate \(F(t)\) to \(G(t)\).</li>
<li>Use part d to derive a proof of part c
using only single-variable calculus.</li>
</ol>
</p>
<section class="footnotes">
<p id="fn1">[1] Similar to proving \(\sum\limits_{i=0}^n m^3 =
\frac{n^2(n+1)^2}{4}\) by induction. <a href="#r1">↩</a></p>
</section>
https://www.akalin.com/parallelizing-flac-encoding
Parallelizing FLAC Encoding
2008-05-05T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<style type="text/css" media="all">
/*<![CDATA[*/
table.benchmark-results,
table.benchmark-results tr,
table.benchmark-results th {
border: 1px solid black;
}
table.benchmark-results {
font-family: "Arial", "Helvetica", sans-serif;
}
table.benchmark-results th,
table.benchmark-results td {
padding: .2em .4em;
}
/*]]>*/
</style>
<p>One thing I noticed ever since getting a multi-core system
was that the reference FLAC encoder is not multi-threaded. This isn't
a huge problem for most people as you can simply encode multiple files
at the same time but I usually rip my audio CDs into a single audio
file with a cue sheet instead of separate track files and so I am
usually encoding a single large audio file instead of multiple smaller
ones. Even so, encoding a CD-length audio file takes under a minute
but I thought it would be a fun and useful weekend project to see if I
could parallelize the simpler <a href="http://flac.cvs.sourceforge.net/flac/flac/examples/c/encode/file/main.c?revision=1.2&view=markup">example encoder</a>. The
<a href="http://flac.sourceforge.net/format.html">format specification</a> indicates that input blocks are
encoded independently which makes the problem <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel">embarassingly
parallel</a> and trawling through the <a href="http://www.mail-archive.com/flac-dev@xiph.org/msg00724.html">FLAC
mailing lists</a> reveals that no one has had the time
nor the inclination to look into it.</p>
<p>However, I was able to write a multithreaded FLAC encoder that
achieves near-linear speedup with only minor hacks to the libFLAC API.
Here are some encode times on an 8-core 2.8 GHz Xeon 5400 for a 636 MB
wave file (some caveats are discussed below):</p>
<table class="benchmark-results">
<tr>
<th>baseline</th><td>34.906s</td>
</tr>
<tr>
<th>1 threads</th><td>31.424s</td>
</tr>
<tr>
<th>2 threads</th><td>16.936s</td>
</tr>
<tr>
<th>4 threads</th><td>10.173s</td>
</tr>
<tr>
<th>8 threads</th><td>6.808s</td>
</tr>
</table>
<p>I took the simple approach of sharding the input file into
<var>n</var> roughly equal pieces and passing them to <var>n</var>
encoder threads, assembling the output file from the <var>n</var>
output buffers. In general this is not a good way of partitioning the
workload as time is wasted if one shard takes significantly more time
to process but for my use case this isn't a significant problem.</p>
<p>The best way to share the input file among the encoding threads is to
map it into memory. In fact, memory-mapped file I/O has so many
advantages in general that I'm surprised at how little I see it used,
although it does have the disadvantage of requiring a bit more
bookkeeping. Here is how I use it in my multithreaded encoder
(slightly paraphrased):</p>
<pre>
<code class="language-cpp">#include <fcntl.h> /* open() */
#include <sys/mman.h> /* mmap()/munmap() */
#include <sys/stat.h> /* stat() */
#include <unistd.h> /* close() */
int main(int argc, char *argv[]) {
int fdin;
struct stat buf;
char *bufin;
fdin = open(argv[1], O_RDONLY);
fstat(fdin, &buf);
bufin = mmap(NULL, buf.st_size, PROT_READ, MAP_SHARED, fdin, 0);
/* The input file (passed in via argv[1]) is now mapped read-only to
the memory region in bufin up to bufin + buf.st_size. */
/* Note that you can work directly with the mapped input file
instead of fread()ing the header into a buffer. */
if((buf.st_size < WAV_HEADER_SIZE) ||
memcmp(bufin, "RIFF", 4) ||
memcmp(bufin+8, "WAVEfmt \020\000\000\000\001\000\002\000", 16) ||
memcmp(bufin+32, "\004\000\020\000data", 8)) {
/* Invalid input file: print error and exit. */
}
for (i = 0; i < num_threads; ++i) {
shard_infos[i].bufin = bufin + WAV_HEADER_SIZE + i * bytes_per_thread;
/* bufsize for the last thread may be slightly larger. */
shard_infos[i].bufsize = bytes_per_thread;
}
/* Spawn encode threads (which calls encode_shard() below) passing
an element of shard_infos to each. */
...
munmap(bufin, buf.st_size);
close(fdin);
}
FLAC__bool encode_shard(struct shard_info *shard_info) {
FLAC__StreamEncoder *encoder = FLAC__stream_encoder_new();
...
/* The input file is paged in lazily as this function accesses
bufin from shard_info->bufin. */
FLAC__stream_encoder_process_interleaved(encoder,
shard_info->bufin,
shard_info->bufsize);
...
FLAC__stream_encoder_delete(encoder);
}</code>
</pre>
<p>However, handling the output file is a bit trickier. Since the
encoded FLAC data output by the threads vary in size we have to wait
until all encoding threads are done before we know the right offsets
to write the output data. A convenient and fast way to handle this is
to use asynchronous I/O; we know where to write the output data for a
shard as soon as the encoding thread for all previous shards finish so
we simply wait for the encoding threads in shard order and queue up a
write request after each thread finishes. Here I use the POSIX
asynchronous I/O API in my multithreaded encoder (again, slightly
paraphrased):</p>
<pre>
<code class="language-cpp">#include <aio.h> /* aio_*() */
#include <pthread.h> /* pthread_*() */
#include <string.h> /* memset() */
int main(int argc, char *argv[]) {
int fdout;
pthread_t threads[MAX_THREADS];
struct aiocb aiocbs[MAX_THREADS];
unsigned long byte_offset = 0;
/* mmap input file in. */
...
fdout = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC);
/* Spawn encode threads passing an element of shard_infos to
each. */
...
/* Wait for each thread in sequence and queue up output writes. */
/* We need to zero out any aiocb struct that we use before we fill
in any members. */
memset(aiocbs, 0, num_threads * sizeof(*aiocbs));
for (i = 0; i < num_threads; ++i) {
pthread_join(threads[i], NULL);
aiocbs[i].aio_buf = shard_infos[i].bufout;
aiocbs[i].aio_nbytes = shards_infos[i].bytes_written;
aiocbs[i].aio_offset = byte_offset;
aiocbs[i].aio_fildes = fdout;
aio_write(&aiocbs[i]);
byte_offset += shard_infos[i].bytes_written;
}
/* Wait for all output writes to finish. */
for (i = 0; i < num_threads; ++i) {
const struct aiocb *aiocbp = &aiocbs[i];
aio_suspend(&aiocbp, 1, NULL);
aio_return(&aiocbs[i]);
}
close(fdout);
}</code>
</pre>
<p>The POSIX API is a bit unwieldy for this use case; ideally, there
would be a version of <code>aio_suspend()</code> that would suspend the
calling process until <em>all</em> of the specified requests have completed.
As it is now the simplest way is to loop through the requests as
above, especially since the maximum number of simultaneous
asynchronous I/O requests is usually quite small (16 on my system).</p>
<p>Also, I found that the OS X implementation of <code>aio_write()</code>
did not obey this part of the specified behavior:</p>
<blockquote>
<pre> If O_APPEND is set for aiocbp->aio_fildes, aio_write() operations append
to the file in the same order as the calls were made. If O_APPEND is not
set for the file descriptor, the write operation will occur at the abso-
lute position from the beginning of the file plus aiocbp->aio_offset.</pre>
</blockquote>
<p>but it was just as easy (and clearer) to explicitly set the correct
offset.</p>
<p>I had to hack up libFLAC a bit to implement my multithreaded encoder.
I exposed the <code>update_metadata_()</code> to make it easy to write the
correct number of total samples in the metadata field and also to zero
out the min/max framesize fields. I also exposed the
<code>FLAC__stream_encoder_set_do_md5()</code> function (which it should
have been in the first place) so that I can turn off the writing of
md5 field in the metadata. Finally, I added the function
<code>FLAC__stream_encoder_set_current_frame_number()</code> so that the
correct frame numbers are written at encode time.</p>
<p>For comparison purposes I turn off md5 calculation in my multithreaded
encoder as well as the baseline one. Since calling
<code>FLAC__stream_encoder_set_current_frame_number()</code> causes
crashes with vericiation turned on I also turn that off. The numbers
above reflect that so they're underestimates of how a production
multithreaded encoder would perform. However, the essential behavior
of the program shouldn't change much.</p>
<p><a href="/parallelizing-flac-encoding-files/patch-libFLAC.in">Here</a> is a patch file for the <a href="http://downloads.sourceforge.net/flac/flac-1.2.1.tar.gz?modtime=1189961849&big_mirror=0">flac 1.2.1
source</a> that implements the hacks I described
above. <a href="/parallelizing-flac-encoding-files/mt_encode.c">Here</a> is the source for my multithreaded FLAC
encoder. I've tested it with <code>i686-apple-darwin9-gcc-4.0.1</code>
and <code>i686-apple-darwin9-gcc-4.2.1</code> on Mac OS X. I got the
above numbers compiling
<code>mt_encode.c</code> with gcc 4.2.1 and the switches <code>-Wall
-Werror -g -O2 -ansi</code>.</p>
https://www.akalin.com/bfpp
bfpp
2008-04-23T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<p>Okay, I lied; you can't <em>really</em> embed <a href="http://www.muppetlabs.com/~breadbox/bf/">brainfuck</a> in C++
but you can get pretty close. Here is an example:</p>
<pre>
<code class="language-cpp">#include "bfpp.h"
int main() {
// Prints out factorial numbers in sequence. Adapted from
// http://www.hevanet.com/cristofd/brainfuck/factorial.b .
bfpp
* + + + + + + + + + + * * * + * + -- * * * + -- - -- & & & & & -- +
& & & & & ++ * * -- -- - ++ * -- & & + * + * - ++ & -- * + & - ++ &
-- * + & - -- * + & - -- * + & - -- * + & - -- * + & - -- * + & - --
* + & - -- * + & - -- * + & - -- * -- - ++ * * * * + * + & & & & & &
- -- * + & - ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ * -- & + * - ++ + * *
* * * ++ & & & & & -- & & & & & ++ * * * * * * * -- * * * * * ++ + +
-- - & & & & & ++ * * * * * * - ++ + * * * * * ++ & -- * + + & - ++
& & & & -- & -- * + & - ++ & & & & ++ * * -- - * -- - ++ + + + + + +
-- & + + + + + + + + * - ++ * * * * ++ & & & & & -- & -- * + * + & &
- ++ * ! & & & & & ++ * ! * * * * ++
end_bfpp
}</code>
</pre>
<p>I call this variant “bfpp” as it has some pretty significant
differences from brainfuck. First of all, some commands had to be
adapted; although <code>+</code> and <code>-</code> remain the same,</p>
<ul>
<li><code><</code> and <code>></code> were changed to <code>&</code> and
<code>*</code>,</li>
<li><code>.</code> and <code>,</code> were changed to <code>!</code> and <code>~</code>
(mnemonic: <code>!</code> contains <code>.</code> within it and <code>~</code>
is kind of like a sideways <code>,</code>),</li>
<li>and <code>[</code> and <code>]</code> were changed to <code>--</code> and
<code>++</code> (mnemonic: <code>[</code> and <code>]</code> are the most
complex brainfuck commands [to implement, at least] and so deserve to be mapped to the wider
and more prominent operators).</li>
</ul>
<p>This magic is made possible by the fact that brainfuck has exactly
eight commands and C++ has exactly eight overloadable symbolic unary
operators. Add some macros to hide the C++ scaffolding behind some
delimiters and you have a convincing illusion of an embedded language.</p>
<p><a href="/bfpp-files/bfpp.h">bfpp.h</a> implements a simple (<100 lines) bfpp interpreter and
the magic described above, and <a href="/bfpp-files/bf2bfpp.c">bf2bfpp.c</a> is a
straightforward translator from brainfuck to bfpp. Gotta love C++!</p>
https://www.akalin.com/longest-palindrome-linear-time
Finding the Longest Palindromic Substring in Linear Time
2007-11-28T00:00:00-08:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<style type="text/css" media="all">
/*<![CDATA[*/
span.palind {
color: red;
}
/*]]>*/
</style>
<script>
function trackOutboundLink(url) {
ga('send', 'event', 'outbound', 'click', url, {
'hitCallback': function() { document.location = url; }
});
}
</script>
<p>Another <a href="http://www.reddit.com/r/programming/comments/2dykz/finding_palindromes_repairing_endos_dna_and_the/"
onclick="trackOutboundLink('http://programming.reddit.com/info/2dykz/comments/c2e7r0');
return false;">interesting problem</a> I stumbled across on reddit is
finding the longest substring of a given string that is a palindrome.
I
found <a href="http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html"
onclick="trackOutboundLink('http://johanjeuring.blogspot.com/2007/08/finding-palindromes.html');
return false;">the explanation on Johan Jeuring's blog</a> somewhat
confusing and I had to spend some time poring over the Haskell code
(eventually rewriting it in Python) and walking through examples
before it "clicked." I haven't found any other explanations of the
same approach so hopefully my explanation below will help the next
person who is curious about this problem.</p>
<p>Of course, the most naive solution would be to exhaustively examine
all \(n \choose 2\) substrings of the given \(n\)-length string, test each
one if it's a palindrome, and keep track of the longest one seen so
far. This has complexity \(O(n^3)\), but we can easily do better by
realizing that a palindrome is centered on either a letter (for
odd-length palindromes) or a space between letters (for even-length
palindromes). Therefore we can examine all \(2n + 1\) possible centers
and find the longest palindrome for that center, keeping track of the
overall longest palindrome. This has complexity \(O(n^2)\).</p>
<p>It is not immediately clear that we can do better but if we're told
that an \(\Theta(n)\) algorithm exists we can infer that the algorithm
is most likely structured as an iteration through all possible
centers. As an off-the-cuff first attempt, we can adapt the above
algorithm by keeping track of the current center and expanding until
we find the longest palindrome around that center, in which case we
then consider the last letter (or space) of that palindrome as the new
center. The algorithm (which isn't correct) looks like this
informally:</p>
<ol type="1">
<li>Set the current center to the first letter.</li>
<li>Loop while the current center is valid:
<ol type="a">
<li>Expand to the left and right simultaneously until we find
the largest palindrome around this center.</li>
<li>If the current palindrome is bigger than the stored maximum
one, store the current one as the maximum one.</li>
<li>Set the space following the current palindrome as the
current center unless the two letters immediately surrounding
it are different, in which case set the last letter of the
current palindrome as the current center.</li>
</ol>
</li>
<li>Return the stored maximum palindrome.</li>
</ol>
<p>This seems to work but it doesn't handle all cases: consider the
string "abababa". The first non-trivial palindrome we see is "<span
class="palind">a</span>|bababa", followed by "<span
class="palind">aba</span>|baba". Considering the current space as the
center doesn't get us anywhere but considering the preceding letter
(the second 'a') as the center, we can expand to get "<span
class="palind">ababa</span>|ba". From this state, considering the
current space again doesn't get us anywhere but considering the preceding
letter as the center, we can expand to get "ab<span
class="palind">ababa</span>|". However, this is incorrect as the
longest palindrome is actually the entire string! We can remedy this
case by changing the algorithm to try and set the new center to be one
before the end of the last palindrome, but it is clear that having a
fixed "lookbehind" doesn't solve the general case and anything more
than that will probably bump us back up to quadratic time.</p>
<p>The key question is this: given the state from the example above,
"<span class="palind">ababa</span>|ba", what makes the second 'b' so
special that it should be the new center? To use another example, in
"<span class="palind">abcbabcba</span>|bcba", what makes the second
'c' so special that it should be the new center? Hopefully, the
answer to this question will lead to the answer to the more important
question: once we stop expanding the palindrome around the current
center, how do we pick the next center? To answer the first question,
first notice that the current palindromes in the above examples
themselves contain smaller non-trivial palindromes: "ababa" contains
"aba" and "abcbabcba" contains "abcba" which also contains "bcb".
Then, notice that if we expand around the "special" letters, we get a
palindrome which shares a right edge with the current palindrome; that
is, <em>the longest palindrome around the special letters are proper
suffixes of the current palindrome</em>. With a little thought, we
can then answer the second question: <em>to pick the next center, take
the center of the longest palindromic proper suffix of the current
palindrome</em>. Our algorithm then looks like this:</p>
<ol type="1">
<li>Set the current center to the first letter.</li>
<li>Loop while the current center is valid:
<ol type="a">
<li>Expand to the left and right simultaneously until we find
the largest palindrome around this center.</li>
<li>If the current palindrome is bigger than the stored maximum
one, store the current one as the maximum one.</li>
<li>Find the maximal palindromic proper suffix of the current
palindrome.</li>
<li>Set the center of the suffix from c as the current center
and start expanding from the suffix as it is palindromic.</li>
</ol>
</li>
<li>Return the stored maximum palindrome.</li>
</ol>
<p>However, unless step 2c can be done efficiently, it will cause the
algorithm to be superlinear. Doing step 2c efficiently seems
impossible since we have to examine the entire current palindrome to
find the longest palindromic suffix unless we somehow keep track of
extra state as we progress through the input string. Notice that the
longest palindromic suffix would by definition also be a palindrome of
the input string so it might suffice to keep track of every palindrome
that we see as we move through the string and hopefully, by the time
we finish expanding around a given center, we would know where all the
palindromes with centers lying to the left of the current one are.
However, if the longest palindromic suffix has a center to the right
of the current center, we would not know about it. But we also have
at our disposal the very useful fact that <em>a palindromic proper
suffix of a palindrome has a corresponding dual palindromic proper
prefix</em>. For example, in one of our examples above, "abcbabcba",
notice that "abcba" appears twice: once as a prefix and once as a
suffix. Therefore, while we wouldn't know about all the palindromic
suffixes of our current palindrome, we would know about either it or
its dual.</p>
<p>Another crucial realization is the fact that we don't have to keep
track of all the palindromes we've seen. To use the example
"abcbabcba" again, we don't really care about "bcb" that much, since
it's already contained in the palindrome "abcba". In fact, we only
really care about keeping track of the longest palindromes for a given
center or equivalently, the length of the longest palindrome for a
given center. But this is simply a more general version of our
original problem, which is to find the longest palindrome around
<em>any</em> center! Thus, if we can keep track of this state
efficiently, maybe by taking advantage of the properties of
palindromes, we don't have to keep track of the maximal palindrome and
can instead figure it out at the very end.</p>
<p>Unfortunately, we seem to be back where we started; the second
naive algorithm that we have is simply to loop through all possible
centers and for each one find the longest palindrome around that
center. But our discussion has led us to a different incremental
formulation: given a current center, the longest palindrome around
that center, and a list of the lengths of the longest palindromes
around the centers to the left of the current center, can we figure
out the new center to consider and extend the list of longest
palindrome lengths up to that center efficiently? For example, if we
have the state:</p>
<p><"ab<span class="palind">a</span>ba|??", [0, 1, 0, 3, 0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]></p>
<p>where the highlighted letter is the current center, the vertical line
is our current position, the question marks represent unread
characters or unknown quantities, and the array represents the list
of longest palindrome lengths by center, can we get to the state:</p>
<p><"aba<span class="palind">b</span>a|??", [0, 1, 0, 3, 0, 5, 0, ?, ?, ?, ?, ?, ?, ?, ?]></p>
<p>and then to:</p>
<p><"aba<span class="palind">b</span>aba|", [0, 1, 0, 3, 0, 5, 0, 7, 0, 5, 0, 3, 0, 1, 0]></p>
<p>efficiently? The crucial thing to notice is that the longest
palindrome lengths array (we'll call it simply the lengths array) in
the final state is palindromic since the original string is
palindromic. In fact, the lengths array obeys a more general
property: <em>the longest palindrome <var>d</var> places to the right
of the current center (the <var>d</var>-right palindrome) is at least
as long as the longest palindrome d places to the left of the current
center (the <var>d</var>-left palindrome) if the <var>d</var>-left
palindrome is completely contained in the longest palindrome around
the current center (the center palindrome), and it is of equal length
if the <var>d</var>-left palindrome is not a prefix of the center
palindrome or if the center palindrome is a suffix of the entire
string</em>. This then implies that we can more or less fill in the
values to the right of the current center from the values to the left
of the current center. For example, from [0, 1, 0, 3, 0, 5, ?, ?, ?,
?, ?, ?, ?, ?, ?] we can get to [0, 1, 0, 3, 0, 5, 0, ≥3?, 0,
≥1?, 0, ?, ?, ?, ?]. This also implies that the first unknown
entry (in this case, ≥3?) should be the new center because it
means that the center palindrome is not a suffix of the input string
(i.e., we're not done) and that the <var>d</var>-left palindrome is a
prefix of the center palindrome.</p>
<p>From these observations we can construct our final algorithm which
returns the lengths array, and from which it is easy to find the
longest palindromic substring:</p>
<ol type="1">
<li>Initialize the lengths array to the number of possible
centers.</li>
<li>Set the current center to the first center.</li>
<li>Loop while the current center is valid:
<ol type="a">
<li>Expand to the left and right simultaneously until we find
the largest palindrome around this center.</li>
<li>Fill in the appropriate entry in the longest palindrome
lengths array.</li>
<li>Iterate through the longest palindrome lengths array
backwards and fill in the corresponding values to the right of
the entry for the current center until an unknown value (as
described above) is encountered.</li>
<li>set the new center to the index of this unknown value.</li>
</ol>
</li>
<li>Return the lengths array.</li>
</ol>
<p>Note that at each step of the algorithm we're either incrementing
our current position in the input string or filling in an entry in the
lengths array. Since the lengths array has size linear in the size of
the input array, the algorithm has worst-case linear running time.
Since given the lengths array we can find and return the longest
palindromic substring in linear time, a linear-time algorithm to find
the longest palindromic substring is the composition of these two
operations.</p>
<p>Here is Python code that implements the above algorithm (although
it is closer to Johan Jeuring's Haskell implementation than to the
above description):</p>
<pre class="code-container"><code class="language-python">def fastLongestPalindromes(seq):
"""
Behaves identically to naiveLongestPalindrome (see below), but
runs in linear time.
"""
seqLen = len(seq)
l = []
i = 0
palLen = 0
# Loop invariant: seq[(i - palLen):i] is a palindrome.
# Loop invariant: len(l) >= 2 * i - palLen. The code path that
# increments palLen skips the l-filling inner-loop.
# Loop invariant: len(l) < 2 * i + 1. Any code path that
# increments i past seqLen - 1 exits the loop early and so skips
# the l-filling inner loop.
while i < seqLen:
# First, see if we can extend the current palindrome. Note
# that the center of the palindrome remains fixed.
if i > palLen and seq[i - palLen - 1] == seq[i]:
palLen += 2
i += 1
continue
# The current palindrome is as large as it gets, so we append
# it.
l.append(palLen)
# Now to make further progress, we look for a smaller
# palindrome sharing the right edge with the current
# palindrome. If we find one, we can try to expand it and see
# where that takes us. At the same time, we can fill the
# values for l that we neglected during the loop above. We
# make use of our knowledge of the length of the previous
# palindrome (palLen) and the fact that the values of l for
# positions on the right half of the palindrome are closely
# related to the values of the corresponding positions on the
# left half of the palindrome.
# Traverse backwards starting from the second-to-last index up
# to the edge of the last palindrome.
s = len(l) - 2
e = s - palLen
for j in xrange(s, e, -1):
# d is the value l[j] must have in order for the
# palindrome centered there to share the left edge with
# the last palindrome. (Drawing it out is helpful to
# understanding why the - 1 is there.)
d = j - e - 1
# We check to see if the palindrome at l[j] shares a left
# edge with the last palindrome. If so, the corresponding
# palindrome on the right half must share the right edge
# with the last palindrome, and so we have a new value for
# palLen.
#
# An exercise for the reader: in this place in the code you
# might think that you can replace the == with >= to improve
# performance. This does not change the correctness of the
# algorithm but it does hurt performance, contrary to
# expectations. Why?
if l[j] == d:
palLen = d
# We actually want to go to the beginning of the outer
# loop, but Python doesn't have loop labels. Instead,
# we use an else block corresponding to the inner
# loop, which gets executed only when the for loop
# exits normally (i.e., not via break).
break
# Otherwise, we just copy the value over to the right
# side. We have to bound l[i] because palindromes on the
# left side could extend past the left edge of the last
# palindrome, whereas their counterparts won't extend past
# the right edge.
l.append(min(d, l[j]))
else:
# This code is executed in two cases: when the for loop
# isn't taken at all (palLen == 0) or the inner loop was
# unable to find a palindrome sharing the left edge with
# the last palindrome. In either case, we're free to
# consider the palindrome centered at seq[i].
palLen = 1
i += 1
# We know from the loop invariant that len(l) < 2 * seqLen + 1, so
# we must fill in the remaining values of l.
# Obviously, the last palindrome we're looking at can't grow any
# more.
l.append(palLen)
# Traverse backwards starting from the second-to-last index up
# until we get l to size 2 * seqLen + 1. We can deduce from the
# loop invariants we have enough elements.
lLen = len(l)
s = lLen - 2
e = s - (2 * seqLen + 1 - lLen)
for i in xrange(s, e, -1):
# The d here uses the same formula as the d in the inner loop
# above. (Computes distance to left edge of the last
# palindrome.)
d = i - e - 1
# We bound l[i] with min for the same reason as in the inner
# loop above.
l.append(min(d, l[i]))
return l</code></pre>
<p>And here is a naive quadratic version for comparison:</p>
<pre class="code-container"><code class="language-python">def naiveLongestPalindromes(seq):
"""
Given a sequence seq, returns a list l such that l[2 * i + 1]
holds the length of the longest palindrome centered at seq[i]
(which must be odd), l[2 * i] holds the length of the longest
palindrome centered between seq[i - 1] and seq[i] (which must be
even), and l[2 * len(seq)] holds the length of the longest
palindrome centered past the last element of seq (which must be 0,
as is l[0]).
The actual palindrome for l[i] is seq[s:(s + l[i])] where s is i
// 2 - l[i] // 2. (// is integer division.)
Example:
naiveLongestPalindrome('ababa') -> [0, 1, 0, 3, 0, 5, 0, 3, 0, 1]
Runs in quadratic time.
"""
seqLen = len(seq)
lLen = 2 * seqLen + 1
l = []
for i in xrange(lLen):
# If i is even (i.e., we're on a space), this will produce e
# == s. Otherwise, we're on an element and e == s + 1, as a
# single letter is trivially a palindrome.
s = i / 2
e = s + i % 2
# Loop invariant: seq[s:e] is a palindrome.
while s > 0 and e < seqLen and seq[s - 1] == seq[e]:
s -= 1
e += 1
l.append(e - s)
return l</code></pre>
<p>Note that this is not the only efficient solution to this problem;
building a suffix tree is linear in the length of the input string and
you can use one to solve this problem but as Johan also mentions,
that is a much less direct and efficient solution compared to this
one.</p>
https://www.akalin.com/number-theory-haskell-foray
A Foray into Number Theory with Haskell
2007-07-06T00:00:00-07:00
Fred Akalin
https://www.akalin.com/
© Fred Akalin
2005–2017.
All rights reserved.
<style type="text/css" media="all">
/*<![CDATA[*/
pre.console {
background-color: #eee;
overflow-x: auto;
}
/*]]>*/
</style>
<!-- TODO: use \cfrac instead when it is supported by KaTeX. -->
<p>I encountered
<a href="http://programming.reddit.com/info/216p9/comments">an
interesting problem</a> on reddit a few days ago which can be
paraphrased as follows:</p>
<blockquote><p>Find a perfect square \(s\) such that \(1597s + 1\) is also
perfect square.</p></blockquote>
<p>After reading the discussion about implementing a brute-force
algorithm to solve the problem and spending a futile half-hour or so
trying my hand at find a better way, someone noticed that the problem
was an instance
of <a href="http://en.wikipedia.org/wiki/Pell%27s_equation">Pell's
equation</a> which is known to have an elegant and fast solution;
indeed, he posted
a <a href="http://programming.reddit.com/info/216p9/comments/c21dpn">one-liner
in Mathematica</a> solving the given problem. However, I wanted to try
coding up the solution myself as the Mathematica solution, while
succinct, isn't very enlightening since the heavy lifting is already
done by a built-in function and an arbitrary constant was used for this
particular instance of Pell's equation.</p>
<p>Pell's equation is simply the
<a href="http://en.wikipedia.org/wiki/Diophantine_equation">Diophantine
equation</a> \(x^2 - dy^2 = 1\) for a given
\(d\)<sup><a href="#fn1" id="r1">[1]</a></sup>; being Diophantine means
that all variables involved take on only integer values. (In our
original problem, \(d\) is 1597 and we are asked for \(y^2\).) The
solution involves finding the <em>continued fraction expansion</em> of
\(\sqrt{d}\), finding the first <em>convergent</em> of the expansion
that satisfies Pell's equation, and then generating all other
solutions from that
<em>fundamental solution</em>. We rule out the trivial solution \(x =
1\), \(y = 0\) which also implies that if \(d\) is a perfect square then
there is no solution.</p>
<p>A continued fraction is an expression of the form: \[ x = a_0 +
\frac{1}{a_1 + \frac{1}{a_2 + \frac{1}{a_3 +
\frac{1}{\ddots\,}}}}\] where all \(a_i\) are integers and all but the
first one are positive. The standard math notation for continued
fractions is quite unwieldy so from now on we'll use \(\left \langle
a_0; a_1, a_2, \ldots \right \rangle\) instead of the above.</p>
<p>The theory of continued fractions is a rich and beautiful one but
for now we'll just state a few facts:</p>
<ul>
<li>The continued fraction expansion of a number is (mostly) unique.</li>
<li>The continued fraction expansion of a rational number is
finite.</li>
<li>The continued fraction expansion of a irrational number is
infinite.</li>
<li>A <a href="http://en.wikipedia.org/wiki/Quadratic_surd">quadratic
surd</a> is a number of the form \(\frac{a + \sqrt{b}}{c}\)
where
\(a\), \(b\), and \(c\) are integers. Except
maybe for the first term, the continued fraction expansion of a
quadratic surd is periodic; that is, it repeats forever after a
certain number of terms. This applies in particular to the square root
of an integer.</li>
<li>Truncating an infinite continued fraction to get a finite
continued fraction gives (in some sense) an optimal rational
approximation to the irrational number represented by the infinite
continued fraction.</li>
</ul>
<p>Given a quadratic surd it is fairly easy to manipulate it into the
form \(a + \frac{1}{q}\) where \(q\) is another quadratic surd. This fact
can be used to come up with an algorithm to find the continued
fraction expansion of a square
root. Wikipedia <a href="http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Continued_fraction_expansion">explains
it pretty well</a> so I won't go over it, but here is my Haskell
implementation:</p>
<pre>
<code class="language-haskell">sqrt_continued_fraction n = [ a_i | (_, _, a_i) <- mdas ]
where
mdas = iterate get_next_triplet (m_0, d_0, a_0)
m_0 = 0
d_0 = 1
a_0 = truncate $ sqrt $ fromIntegral n
get_next_triplet (m_i, d_i, a_i) = (m_j, d_j, a_j)
where
m_j = d_i * a_i - m_i
d_j = (n - m_j * m_j) `div` d_i
a_j = (a_0 + m_j) `div` d_j</code>
</pre>
<p>and here are some examples:</p>
<pre class="console">
Prelude Main> take 20 $ sqrt_continued_fraction 2
[1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]
Prelude Main> take 20 $ sqrt_continued_fraction 103
[10,6,1,2,1,1,9,1,1,2,1,6,20,6,1,2,1,1,9,1]
Prelude Main> take 20 $ sqrt_continued_fraction 36
[6,*** Exception: divide by zero
</pre>
<p>(Note that we're assuming that we won't be called with a perfect
square. Also, do you notice anything interesting about the periodic
portion of the continued fractions, particularly of \(\sqrt{103}\)?)</p>
<p>For those who are unfamiliar with Haskell, here's a quick list of key facts:
<ul>
<li>The first line takes a list of triplets and forms a list of all
third elements, which is what we're interested in. (The other two
elements of the triplet are auxiliary variables used by the
algorithm.)</li>
<li><code>iterate</code> is a function which takes in another
function <code>f</code>, an initial variable <code>x</code>, and
returns the infinite list <code>[ x, f(x), f(f(x)), f(f(f(x))),
... ]</code>.</li>
<li>Note that Haskell
uses <a href="http://en.wikipedia.org/wiki/Lazy_evaluation">lazy
evaluation</a> and so this function does not take an infinite amount
of time to run; all its elements are evaluated (and memoized) only
when needed.</li>
<li>The rest of the function is a straightforward representation of
the meat of the algorithm described in the above Wikipedia entry.</li>
</ul>
<p>It may not be clear what \(\sqrt{d}\) and its continued fraction
expansion has to do with solving Pell's equation. However, notice that
if \(x\) and \(y\) solve Pell's equation then manipulating Pell's equation
to get \(\sqrt{d}\) on one side reveals that \(\frac{x}{y}\) is a good
approximation of \(\sqrt{n}\). In fact, it is so good that you can prove
that \(\frac{x}{y}\) <em>must</em> come from truncating the continued
fraction expansion of \(\sqrt{d}\).</p>
<p>This leads us to the following: if you have an infinite continued
fraction \(\left \langle a_0; a_1, a_2, \ldots \right \rangle\) you can
truncate it into a finite continued fraction \(\left \langle a_0; a_1,
a_2, \ldots, a_i \right \rangle\) and simplify it into the rational
number \(\frac{p_i}{q_i}\). The sequence \(\frac{p_0}{q_0},
\frac{p_1}{q_1}, \frac{p_2}{q_2}, \ldots\) forms the
<a href="http://en.wikipedia.org/wiki/Convergent_%28continued_fraction%29"><em>convergents</em></a>
of \(\left \langle a_0; a_1, a_2, \ldots \right \rangle\) and converges to
its represented irrational number.</p>
<p>It turns out you can calculate \(p_{i+1}\) and \(q_{i+1}\)
efficiently from \(p_i\), \(q_i\), \(p_{i-1}\), \(q_{i-1}\), and \(a_{i+1}\)
using
the <a href="http://en.wikipedia.org/wiki/Fundamental_recurrence_formulas"><em>fundamental
recurrence formulas</em></a> (which can be proved by induction). Here
is my Haskell implementation:</p>
<pre>
<code class="language-haskell">get_convergents (a_0 : a_1 : as) = pqs
where
pqs = (p_0, q_0) : (p_1, q_1) :
zipWith3 get_next_convergent pqs (tail pqs) as
p_0 = a_0
q_0 = 1
p_1 = a_1 * a_0 + 1
q_1 = a_1
get_next_convergent (p_i, q_i) (p_j, q_j) a_k = (p_k, q_k)
where
p_k = a_k * p_j + p_i
q_k = a_k * q_j + q_i</code>
</pre>
<p>and some more examples:</p>
<pre class="console">
Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 2
[(1,1),(3,2),(7,5),(17,12),(41,29),(99,70),(239,169),(577,408)]
Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 103
[(10,1),(61,6),(71,7),(203,20),(274,27),(477,47),(4567,450),(5044,497)]
Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 1597
[(39,1),(40,1),(1039,26),(1079,27),(2118,53),(3197,80),(27694,693),(113973,2852)]
Prelude Main> let divFrac (x, y) = (fromInteger x) / (fromInteger y)
Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 2
[1.0,1.5,1.4,1.4166666666666667,1.4137931034482758,1.4142857142857144,1.4142011834319526,1.4142156862745099]
Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 103
[10.0,10.166666666666666,10.142857142857142,10.15,10.148148148148149,10.148936170212766,10.148888888888889,10.148893360160965]
Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 1597
[39.0,40.0,39.96153846153846,39.96296296296296,39.9622641509434,39.9625,39.96248196248196,39.9624824684432]
</pre>
<p>Here are a few more quick facts to help those unfamiliar with
Haskell:</p>
<ul>
<li>The expression <code>a : as</code> forms a new list from the
element <code>a</code> and the existing list <code>as</code>
(equivalent to <code>cons</code> in Lisp).</li>
<li><code>zipWith3</code> is a function that takes in a
function <code>f</code>, three lists <code>a</code>, <code>b</code>,
and <code>c</code> of the same (possibly infinite)
length <code>n</code>, and forms the new list
<code>[ f(a[0], b[0], c[0]), f(a[1], b[1], c[1]), ..., f(a[n], b[n],
c[n]) ]</code>.</li>
<li>Note that the result of <code>zipWith3</code> is part of the
variable <code>pqs</code> which itself appears (twice!) in the
arguments to <code>zipWith3</code>. This is a Haskell idiom and
reflects the fact that the recurrence formulas define a convergent
in terms of its two previous convergents. A simpler example (using
the Fibonacci sequence) can be found in the
<a href="http://en.wikipedia.org/wiki/Lazy_evaluation">Wikipedia
entry for lazy evaluation</a>.</li>
<li>Haskell has built-in data types for integers of arbitrary size
which is necessary as the numerators and denominators of the
convergents get large quickly. In fact, Haskell has built-in
data types for rational numbers (represented as fractions) but it
doesn't help us much here.</li>
</ul>
<p>Since we are guaranteed that some convergent eventually satisfies
Pell's equation, we can write a simple function to generate all
convergents, test each one to see if it satisfies Pell's equation,
and return the first one we see. Here is the Haskell implementation:</p>
<pre>
<code class="language-haskell">get_pell_fundamental_solution n = head $ solutions
where
solutions = [ (p, q) | (p, q) <- convergents, p * p - n * q * q == 1 ]
convergents = get_convergents $ sqrt_continued_fraction n</code>
</pre>
<p>Note the use of the
Haskell's <a href="http://en.wikipedia.org/wiki/List_comprehension">list
comprehension</a> syntax, similar to Python, which expresses what I
just described in a matter reminiscent of set notation.
Here is the full Haskell program designed so its output may be
conveniently piped
to <a href="http://en.wikipedia.org/wiki/Bc_programming_language">bc</a>
for verification:
<pre>
<code class="language-haskell">module Main where
import System (getArgs)
sqrt_continued_fraction :: (Integral a) => a -> [a]
{- ... the sqrt_continued_fraction function explained above ... -}
get_convergents :: (Integral a) => [a] -> [(a, a)]
{- ... the get_convergents function explained above ... -}
get_pell_fundamental_solution :: (Integral a) => a -> (a, a)
{- ... the get_pell_fundamental_solution function explained above ... -}
main :: IO ()
main = do
args <- System.getArgs
let d = (read $ head $ args :: Integer)
(p, q) = get_pell_fundamental_solution d in
putStr $ "d = " ++ (show d) ++ "\n" ++
"p = " ++ (show p) ++ "\n" ++
"q = " ++ (show q) ++ "\n" ++
"p^2 - d * q^2 == 1\n"</code>
</pre>
and here is it in action:
<pre class="console">
$ ./solve_pell 1597
d = 1597
p = 519711527755463096224266385375638449943026746249
q = 13004986088790772250309504643908671520836229100
p^2 - d * q^2 == 1
</pre>
<p>The solution to the original problem is therefore:<br/>
<strong>5054112910466227478111803017176109047976100000000.</strong></p>
<p>Now that we've found a method to get <em>a</em> solution, the
question remains as to whether it's the only one. In fact it is not,
but it is the minimal one, and all other solutions (of which there are
an infinite number) can be generated from this fundamental one with a
simple recurrence relation as described on
the <a href="http://en.wikipedia.org/wiki/Pell%27s_equation#Solution_technique">Wikipedia
article</a>. My program above can be easily extended to generate all
solutions instead of just the fundamental one (I'll leave it to the
reader as an exercise).</p>
<p>One remaining question is the efficiency of this algorithm. For
simplicity, let's neglect the cost of the arbitrary-precision
arithmetic involved and assume that the incremental cost of generating
each term of the continued fraction expansion and the convergents is
constant. Then the main cost is just how many convergents we have to
generate before we find one that satisfies Pell's equation. In fact,
it turns out that this depends on the length of the period of the
continued fraction expansion of \(\sqrt{d}\), which has a rough upper
bound of \(O(\ln(d \sqrt{d}))\). Therefore, the cost of solving Pell's
equation (in terms of how many convergents to generate) for a given
\(n\)-digit number is \(O(n 2^{n/2})\). This is pretty expensive already,
although it's still much better than brute-force search (which is on
the order of exponentiating the above expression). Can we do better?
Well, sort of; it turns out the length of the answer is of the same
order as the expression above, so any algorithm that explicitly
outputs a solution necessarily takes that long. However, if you can
somehow factor \(d\) into \(s d'\), where \(s\) is a perfect square and \(d'\)
is <a href="http://en.wikipedia.org/wiki/Squarefree">squarefree</a>
(i.e., not divisible by any perfect square), then you can solve Pell's
equation for the smaller number \(d'\) and output the solution for \(d'\)
as the smaller fundamental solution and an expression raised to a
certain power involving it. Note that in general this involves
factoring \(d\), another hard problem, but for which there exists tons
of prior work. An interested reader can peruse the papers
by <a href="http://www.ams.org/notices/200202/fea-lenstra.pdf">Lenstra</a>
and <a href="http://www.math.nyu.edu/~crorres/Archimedes/Cattle/cattle_vardi.pdf">Vardi</a>
for more details.</p>
<p>As a final note, one of the things I really like about number
theory is that investigating such a simple program can lead you down
surprising avenues of mathematics and computational theory. In fact,
I've had to omit a lot of things I had planned to say to avoid growing
this entry to be longer than it already is. Hopefully, this entry
helps someone else learn more about this interesting corner of number
theory.</p>
<section class="footnotes">
<p id="fn1">[1] As a rule we'll avoid considering trivial cases and
re-stating obvious assumptions (like \(d\) having to be a positive
integer). <a href="#r1">↩</a></p>
</section>