Why is the Quintic Unsolvable?

Fred Akalin

September 26, 2016

(This was discussed on r/math and Hacker News.)

1. Overview

In this article, I hope to convince you that the quintic equation is unsolvable, in the sense that I can’t write down the solution to the equation \[ ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0 \] using only addition, subtraction, multiplication, division, raising to an integer power, and taking an integer root. In fact, I hope to go further and explain how this is true for the same reason that I can’t write down the solution to the equation \[ ax^2 + bx + c = 0 \] using only the first five operations above!

The usual approach to the above claim involves a semester’s worth of abstract algebra and Galois theory. However, there’s a much easier and shorter proof which involves only a bit of group theory and complex analysis—enough to fit in a blog post—and some interactive visualizations.^[1]

2. Quadratic Equations

Let’s start with quadratic equations, which hopefully you all remember from high school. Given two complex numbers \(r_1\) and \(r_2\), you can determine the quadratic equation whose solutions are \(r_1\) and \(r_2\), namely \[ (x - r_1)(x - r_2) = x^2 - (r_1 + r_2) x + r_1 r_2 = 0\text{.} \] If we take the standard form of a quadratic equation to be \[ a x^2 + bx + c = 0\text{,} \] then we can define a function from \(r_1\) and \(r_2\) to \(a\), \(b\), and \(c\), which is shown by the first two panels in the visualization below; drag either of the points \(r_1\) and \(r_2\) and notice how \(b\) and \(c\) move (\(a\) will always remain fixed at \(1\)).

Now pretend that we misremember the quadratic formula as \[ x_{1, 2} = \frac{-b ± b^2 - 4ac}{4a}\text{.} \] The results of this formula—our candidate solution—are shown in the third panel. Note that since \(x_1\) and \(x_2\) depend on \(a\), \(b\), and \(c\), which all depend on \(r_1\) and \(r_2\), they also move when you drag either \(r_1\) and \(r_2\)

Interactive Example 1: An incorrect quadratic formula

Roots

Coefficients

Candidate solution

Now this formula looks right, since \(x_1\) and \(x_2\) are at the same coordinates as \(r_1\) and \(r_2\). However, if you move \(r_1\) or \(r_2\) around, you can easily convince yourself that this formula can’t be right, since \(x_1\) and \(x_2\) don’t move in the same way.

Now if you remember from high school, the real quadratic formula involves taking a square root, and since our candidate solution doesn’t do that, that means it’s probably incorrect. I say “probably” because there’s no immediate reason why there can’t be multiple quadratic formulas, some simpler than others, of which one is simple enough to not need a square root. From manipulating \(r_1\) and \(r_2\), we know that our candidate formula is incorrect, but that doesn’t immediately follow from it not having a square root.

Fortunately, there is a general way to rule out candidate solutions that are similar to the one above, namely those that use only addition, subtraction, multiplication, division, and raising to an integer power; we’ll call these rational expressions. Here’s how it goes: if you press the button to swap \(r_1\) and \(r_2\), which moves \(r_1\) to \(r_2\)’s position and vice versa, \(a\), \(b\), and \(c\) move from their starting positions but return once \(r_1\) and \(r_2\) reach their destinations. This makes sense, because the coefficients of a polynomial don’t depend on how you order the roots. But since \(x_1\) and \(x_2\) depend only on \(a\), \(b\), and \(c\), they too must loop back to their starting positions.

But that means that our candidate solution cannot be the quadratic formula! If it were, then \(x_1\) and \(x_2\) would have ended up swapped, too. Instead, they went back to their starting positions, which is a contradiction. This reasoning holds for any expression which is a single-valued function of \(a\), \(b\), and \(c\), so in particular this holds for rational expressions.

Let’s summarize our reasoning in a theorem:

(Theorem 1.) A rational expression^[2] in the coefficients of the general quadratic equation \[ ax^2 + bx + c = 0 \] cannot be a solution to this equation.

Sketch of proof. Assume to the contrary that the rational expression \(x = f(a, b, c)\) is a solution. Assume that we start with \(r_1 = z_1\) and \(r_2 = z_2 \ne z_1\), and without loss of generality assume that we start with \(x = z_1\).

Run \(r_1\) and \(r_2\) along continuous paths that swap their two positions, i.e. make \(r_1\) head from \(z_1\) to \(z_2\) continuously, and at the same time make \(r_2\) head from \(z_2\) to \(z_1\) continuously, and make sure to pick paths such that \(r_1\) and \(r_2\) never coincide.

Since \(a\), \(b\), and \(c\) are continuous functions of \(r_1\) and \(r_2\), and \(x\) is a rational function of \(a\), \(b\) and \(c\), and thus continuous, \(x\) then depends continuously on \(r_1\) and \(r_2\). Thus, since we start with \(x = r_1 = z_1\), and \(r_1\) never coincides with \(r_2\), then as \(r_1\) moves, \(x = r_1\) must continue to hold, since \(x\) is a solution, and therefore \(x\)’s final position must be the same as \(r_1\)’s, which is \(z_2\).

However, since the coefficients \(a\), \(b\), and \(c\) don’t depend on the ordering of \(r_1\) and \(r_2\), then their final positions are the same as their initial positions. Since \(x\) is a function of only \(a\), \(b\), and \(c\), its final position also must be the same as its initial position, \(z_1\). This contradicts the above, and therefore \(x\) cannot be a solution. ∎

Now consider the candidate solution \[ x_{1,2} = \sqrt{b^2 - 4ac}\text{.} \] This isn’t a rational expression since it has a square root. In particular, in the visualization below, it behaves quite differently from our first candidate solution. First, even though we have just a single expression, it yields two points \(x_1\) and \(x_2\). Second, and more surprisingly, if you swap \(r_1\) and \(r_2\), \(x_1\) and \(x_2\) also exchange places, seemingly contradicting Theorem 1! What is going on?

Interactive Example 2: The quadratic equation

Roots

Coefficients

Candidate solution

\(x_{1, 2} = \frac{-b \pm b^2 - 4ac}{4a}\)
\(x_1 = b^2 - 4ac\)
\(x_{1, 2} = \sqrt{b^2 - 4ac}\)
\(x_{1, 2} = \frac{-b + \sqrt{b^2 - 4ac}}{2a}\)
(the quadratic formula)

To answer this, we first need to review some facts about complex numbers. Recall that a complex number \(z\) can be expressed in polar coordinates, where it has a length \(r\) and an angle \(θ\), and that it can be converted to the usual Cartesian coordinates using Euler’s formula: \[ z = r e^{iθ} = r \cos θ + i \, r \sin θ\text{.} \] Then, if you have two complex numbers \(z_1 = r_1 e^{iθ_1}\) and \(z_2 = r_2 e^{iθ_2}\) in polar form, you can multiply them by multiplying their lengths, and adding their angles: \[ z_1 z_2 = r_1 r_2 e^{i (θ_1 + θ_2)}\text{.} \] So a square root of a complex number \(z = r e^{iθ}\) is just \(\sqrt{r} e^{iθ/2}\), as you can easily verify. However, if \(z\) is non-zero, there is one more square root of \(z\), namely \(\sqrt{r} e^{i (θ/2 + π)}\), as you can also verify. (Recall that angles that differ by \(2π = 360^\circ\) are considered the same.)

So in general, the square root of a rational expression, like our candidate solution, yields two distinct points as long as the rational expression is non-zero. In our case, \(b^2 - 4ac\) remains non-zero as \(r_1\) and \(r_2\) don’t coincide. (We’ll have more to say about this expression, called the discriminant, once we talk about cubic equations below.) Therefore, if we want to examine how \(x_1\) and \(x_2\) move as \(r_1\) and \(r_2\) move, we have to number the square roots of \(b^2 - 4ac\), and we have to keep this numbering consistent.

To do so, we have to do two things: we have to vary \(r_1\) and \(r_2\) only continuously, and we have to vary \(r_1\) and \(r_2\) such that they never coincide. If we do this, then we can intuitively “lift” the expression \(b^2 - 4ac\) from the complex plane to a new surface \(S\) where we consider only angles that differ by \(4π = 720^\circ\), rather than \(2π\), to be the same. In this space, we can take the “first” square root of a non-zero complex number to be the one with half the angle, and the “second” square root to be the one with half the angle plus \(π\), and have these two square root functions behave continuously as their argument goes around the origin.

Figure 1 \(S\), which is the Riemann surface of \(\sqrt{z}\). (Image by Leonid 2 licensed under CC BY-SA 3.0.)

Now this answers the question of why the proof of Theorem 1 fails for \(\sqrt{b^2 - 4ac}\). \(a\), \(b\), and \(c\), go around a single loop as \(r_1\) is swapped with \(r_2\), and therefore \(b^2 - 4ac\) goes around a single loop in the complex plane, but when \(b^2 - 4ac\) is lifted to \(S\), the final position of \(b^2 - 4ac\) differs from the initial position only by an angle of \(2π\), so it is distinct from the initial position, and thus we can’t conclude that the final position of \(\sqrt{b^2 - 4ac}\) is the same as the initial position.

Similar reasoning holds for any algebraic expression that isn’t a rational expression, i.e. ones that involve taking any integer root, so Theorem 1 cannot apply to algebraic expressions in general. Of course, this is consistent with what we know about the quadratic formula, since we know that it has a square root!

3. Cubic Equations

Now we can move on to cubic equations. Similarly, given three complex numbers \(r_1\), \(r_2\), and \(r_3\), you can determine the cubic equation with those solutions, namely \[ (x - r_1) (x - r_2) (x - r_3) = x^3 - (r_1 + r_2 + r_3) x^2 + (r_1 r_2 + r_1 r_3 + r_2 r_3) x - r_1 r_2 r_3\text{,} \] and so we can define a function from \(r_1\), \(r_2\), and \(r_3\) to \(a\), \(b\), \(c\), and \(d\), where \[ a x^3 + b x^2 + c x + d \] is the standard form of a cubic polynomial, and this is shown in the visualization below.

In the previous section, we talked about the discriminant \(b^2 - 4ac\) of the general quadratic polynomial. However, the discriminant is an expression that is defined for any polynomial. If \(r_1, \dotsc, r_n\) are the roots of a polynomial (counting multiplicity) with leading coefficient \(a_n\), then the discriminant is \[ Δ = a_n^{2n - 2} ∏_{i \lt j} (r_i - r_j)^2\text{.} \] In other words, the discriminant is, up to sign and a power of the leading coefficient, the product of the differences of all pairs of different roots. In particular, if the polynomial has repeated roots, the discriminant is zero.

Using the formula above, you can express the discriminant in terms of the coefficients of the polynomial, as you can verify for yourself with the quadratic equation. Indeed this is true in general; for cubic polynomials, the discriminant can be expressed in terms of the coefficients as \[ Δ = b^2 c^2 - 4 a c^3 - 4 b^3 d - 27 a^2 d^2 + 18 a b c d\text{.} \] But why do we care? Because, as you can see in the visualization below, if you swap any pair of roots, this causes the discriminant to make a single loop around the origin, so it serves as a useful test functions for taking roots.

So now that we have three roots, we can swap them in multiple ways. If \(R\) is a list that starts off as \(\langle r_1, r_2, r_3 \rangle\), let \(↺_{i, j}\) denote counter-clockwise paths that takes the root at the \(i\)th index of \(R\) to the one at the \(j\)th index of \(R\) and vice versa, and similarly for \(↻_{i, j}\). (Note that this is not the same as the paths that swap \(r_i\) and \(r_j\)! Play around with the buttons in the visualization below to understand the difference.)

Interactive Example 3: The cubic discriminant

Roots

\(R = \langle r_1, r_2, r_3 \rangle\)

Coefficients

Candidate solution

\(x_1 = Δ\)
\(x_{1, 2, 3, 4, 5} = \sqrt[5]{Δ}\)

Now, with the formula \(Δ\), the same reasoning as in the previous section shows that it cannot possibly be the cubic formula, nor can any other rational expression. However, unlike the quadratic case, we can also rule out \(\sqrt[5]{Δ}\), or any other algebraic formula with no nested radicals (i.e., that doesn’t have a radical within a radical like \(\sqrt{a - \sqrt{bc - 5}}\)). If you apply the operations \(↺_{2, 3}\), \(↺_{1, 2}\), \(↻_{2, 3}\), and \(↻_{1, 2}\) in sequence, \(r_1\), \(r_2\), and \(r_3\) rotate among themselves, but all the \(x_i\) go back to their original positions. Therefore, by similar reasoning as the previous section, \(\sqrt[5]{Δ}\) also cannot possibly be the cubic formula!

To make this statement precise, we need to review some group theory. Recall that a group is a set with an associative binary operation, an identity element, and inverse elements. Most basic examples of groups are related to numbers, like the integers under addition, or the non-zero rationals under multiplication. However, more interesting examples of groups are related to functions, none the least because the group operation for functions is composition, which is in general not commutative; in other words, if \(f\) and \(g\) are functions, \(f \circ g \ne g \circ f\), and it is this non-commutativity that will come in handy for our purposes.

So let’s say we have a list of \(n\) objects, and we’re interested in the functions that rearrange this list’s elements. These are permutations, and they naturally form a group under composition, as you can check for yourself, called \(S_n\), the symmetric group on \(n\) objects.

There’s a convenient way to write permutations, called cycle notation. If you write \((i_1 \; i_2 \; \dotsc \; i_k)\), this denotes the permutation that maps the \(i_1\)th position of the list to the \(i_2\)th position the \(i_2\)th position to the \(i_3\)th, and so on, called a cycle. Then you can write any permutation as a composition of disjoint cycles, so this provides a convenient way to write down and compute with permutations.

In the visualization above, we have four operations \(↺_{1, 2}\), \(↺_{2, 3}\), \(↻_{1, 2}\), and \(↻_{2, 3}\), which act on \(R\), meaning that they define permutations on \(R\). In particular, \(↺_{1, 2}\) and \(↻_{1, 2}\) both swap the first and second elements of \(R\), so we say that \(↺_{1, 2}\) and \(↻_{1, 2}\) act on \(R\) as \((1 \; 2)\), and similarly, \(↺_{2, 3}\) and \(↻_{2, 3}\) act on \(R\) as \((2 \; 3)\).

Now concatenating two operations—doing one after the other—corresponds to composing their mapped-to permutations on \(R\). Denoting \(o_2 * o_1\) as doing \(o_1\), then doing \(o_2\), the sequence of operations above is \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) (note the order!), which acts on \(R\) like \((1 \; 2) (2 \; 3) (1 \; 2) (2 \; 3)\), which is equal to \((1 \; 3 \; 2)\).^[3] (The \(\circ\) is usually dropped when composing permutations.)

Now for the formula \(Δ\), all the operations make \(x_1\) loop around the origin either clockwise or counter-clockwise; in other words, they all induce a rotation of \(2π\) or \(-2π\) on \(x_1\), and the final distance of \(x_1\) from the origin is the same as the initial distance. Therefore, if we apply an equal number of clockwise and counter-clockwise rotations, the total angle of rotation will be \(0\) and the final distance will be the same as the initial distance, i.e. the final position of \(x_1\) is the same as it’s initial distance. But the same reasoning holds for the formula \(\sqrt[5]{Δ}\); all the operations induce a rotation of \(2π/5\) or \(-2π/5\) and leave the distance from the origin unchanged, so an equal number of clockwise and counter-clockwise rotations will still induce a total angle of \(0\) and leave the distance from the origin unchanged. Therefore, the operation \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) acts like \((1 \; 3\; 2)\) on \(R\), but leaves all \(x_i\) unchanged.

But how did we come up with \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) in the first place? This involves a bit more group theory. \(S_3\) is not a commutative group; in particular, \((1 \; 2) (2 \; 3) \ne (2 \; 3) (1 \; 2)\). For two group elements \(g\) and \(h\), we can define their commutator^[4] \([ g, h ]\), which is the group element that corrects for \(g\) and \(h\) not commutating. That is, we want the equation \[ g h = h g [g, h] \] to hold, which means that \[ [g, h] = g^{-1} h^{-1} g h\text{.} \] So the commutator provides a convenient way to generate a non-trivial permutation from two other non-commuting permutations. Furthermore, it involves two appearances of both elements, so we can pick a sequence of operations that induce the commutator and also have an equal number of clockwise and counter-clockwise operations. Then we’re guaranteed that this sequence of operations permutes \(R\) and leaves all \(x_i\) unchanged, even if each individual operation moves some \(x_i\). But of course, this is just \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\)!

Let’s define some terminology to make proofs and discussion easier. If \(o\) is an operation that acts on \(R\) non-trivially but has the final position of the expression \(x = f(a, b, c, \dotsc)\) the same as its initial position, we say that \(o\) rules out the expression \(x = f(a, b, c, \dotsc)\). For example, Theorem 1 says that swapping both roots of a quadratic rules out all rational expressions.

Now we’re ready to state and prove the theorem:

(Theorem 2.) An algebraic expression with no nested radicals in the coefficients of the general cubic equation \[ ax^3 + bx^2 + cx + d = 0 \] cannot be a solution to this equation.

Sketch of proof. First assume to the contrary that the expression \(x = \sqrt[k]{r(a, b, c, d)}\) is a solution, where \(r(a, b, c, d)\) is a rational expression. Assume we start with \(r_1 = z_1\), \(r_2 = z_2\), and \(r_3 = z_3\), where all \(z_i\) are distinct, and without loss of generality assume that we start with \(x = z_1\).

Any of the operations \(↺_{1, 2}\), \(↺_{2, 3}\), \(↻_{1, 2}\), and \(↻_{2, 3}\) applied to \(x = r(a, b, c, d)\) cause \(x\)’s final position to be the same as its initial position, by Theorem 1. Pick a point \(z_0\) that is never equal to any point \(x\) traverses under any operation. Then, by the same reasoning as above, the total angle induced by \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) on \(x = \sqrt[k]{r(a, b, c, d)}\) around \(z_0\) is \(0\), and the distance from \(z_0\) remains unchanged. Thus \(x\) remains fixed, and this operation rules out \(x = \sqrt[k]{r(a, b, c, d)}\).

For the general case, it suffices to show that if \(o\) rules out the expressions \(f\) and \(g\), then \(o\) also rules out \(f\) raised to an integer power, \(f + g\text{,}\) \(f - g\text{,}\) \(f \cdot g\text{,}\) and \(f / g\) where \(g \ne 0\text{.}\) But this is straightforward, and such formulas are just the algebraic expressions with no nested radicals, so the statement holds in general. ∎

Theorem 2 can be summarized thus: any \(↺_{i, j}\) or \(↻_{i, j}\) rules out any rational expression as the cubic formula, and if given an algebraic expression with no nested radicals, either some \(↺_{i, j}\) or \(↻_{i, j}\) rules it out, or \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) rules it out.

Now we can consider algebraic expressions with one level of nesting. Can such formulas be ruled out as being the cubic formula? We can’t do so via Theorem 2, at least; we would need a non-trivial element of \(S_3\) that is the commutator of commutators. But you can calculate that all non-trivial commutators of \(S_3\) are either \((3 \; 2 \; 1)\) or \((1 \; 2\; 3)\), and these two elements commute, so \(S_3\) cannot have a non-trivial commutator of commutators.

In fact, as we would expect, the actual cubic formula has such an algebraic expression, which is \(C\) in the visualization below, so that serves as a convenient example of an algebraic expression with a single nested radical that can’t be ruled out by Theorem 2.

Interactive Example 4: The cubic equation

Roots

\(R = \langle r_1, r_2, r_3 \rangle\)

Coefficients

Candidate solution

\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)
\(x_1 = -27a^2 Δ = {Δ_1}^2 - 4 {Δ_0}^3\)
\(x_{1, 2} = C^3 = \frac{Δ_1 + \sqrt{-27a^2 Δ}}{2}\)
\(x_{1,2,3,4,5,6} = C\)
\(x_{1, 2, 3} = -\frac{1}{3a} \left( b + C + \frac{Δ_0}{C} \right)\)
(the cubic formula)

Note that there is a new list \(X\), which lists the \(x_i\) in the order which they occupy their initial positions, like how \(R\) does the same for the \(r_i\). In general, we can’t do this, since a general multi-valued function won’t necessarily permute that \(x_i\) among themselves, but in the interactive visualizations we’ll only consider expressions that do.

We can then talk how an operation acts on \(X\). For example, if we pick \(\sqrt[5]{Δ}\) in Interactive Example 3, we can say that \(↺_{i, j}\) acts like \((5 \; 1 \; 2 \; 3 \; 4)\) on \(X\) and \(↻_{i, j}\) acts like \((1 \; 2 \; 3 \; 4 \; 5)\) on \(X\). Therefore, \(↻_{1, 2} * ↻_{2, 3} * ↺_{1, 2} * ↺_{2, 3}\) acts non-trivially on \(R\) but acts trivially on \(X\), which is another more algebraic way of saying that if this operation rules out \(\sqrt[5]{Δ}\), since the action on \(X\) depends on the candidate formula. On the other hand, if you choose \(C\) in the visualization above, you can convince yourself that no operation acts non-trivially on \(R\) without also acting non-trivially on \(X\), and so \(C\) can’t be ruled out as the cubic formula.

4. Quartic Equations

Now we can move on to quartic equations. As usual, given four complex numbers \(r_1\), \(r_2\), \(r_3\), and \(r_4\), you can map this to the coefficients \(a\), \(b\), \(c\), \(d\), and \(e\) of the standard form of a quartic polynomial, as shown in the visualization below, such that the \(r_i\) are the solutions to the quartic equation \[ a x^4 + b x^3 + c x^2 + d x + e = 0\text{.} \]

Now that we have four roots, we have even more ways to permute them using the \(↺_{i, j}\) and \(↻_{i, j}\). Before we move on, we need more terminology and group theory to handle this more complicated case.

First, we want a convenient way to denote the combination of operations that act like a commutator, so let’s define \(↺_{i, j}^\prime\) to mean \(↻_{i, j}\) and vice versa, \((o_1 \circ o_2 \circ \dotsb \circ o_n)^\prime\) to mean \(o_n^\prime \circ o_{n-1}^\prime \circ \dotsb \circ o_1^\prime\), and \([\![ o_1, o_2 ]\!]\) to mean \(o_1^\prime \circ o_2^\prime \circ o_1 \circ o_2\), so that if \(o_i\) acts on \(R\) like \(g_i\), then \(o_i^\prime\) acts on \(R\) like \(g_i^{-1}\) and \([\![o_i, o_j]\!]\) acts on \(R\) like \([g_i, g_j]\). For example, in the previous section, we were using \([\![ ↺_{1, 2}, ↺_{2, 3} ]\!]\) to rule out algebraic expressions with no nested radicals.

Then not only do we want to talk about commutators of particular permutations, we want to talk about the set of commutators of a particular group. In fact, for a group \(G\), this set of commutators forms a subgroup \(K(G)\) called the commutator subgroup. For the quadratic case, we just have \(S_2\), which has only a single non-trivial element, so its commutator subgroup \(K(S_2)\) is the trivial group. For the cubic case, we started with \(S_3\), and we computed the commutator subgroup \(K(S_3)\), which is just \(\{ e, (1 \; 2 \; 3), (3 \; 2 \; 1) \}\). We can also compute the commutator of this group, which is just the trivial group again, since \(K(S_3)\) is commutative. So we can see that \(K(K(S_3))\) being the trivial group means that we can’t rule out algebraic expressions with nested radicals as solutions to the general cubic equation.

Given all the elements of a group \(G\), it’s not particularly complicated to compute the commutator subgroup—just take all possible pairs of elements \(g, h \in G\), compute \([g, h]\), and remove duplicates. However, we can make things easier for ourselves by finding generators for \(K(G)\) as commutators of generators of \(G\), since then we can easily map those back to \([\![ o_1, o_2 ]\!]\) applied on the appropriate operations. Fortunately, when \(G = S_n\), we can use a few facts from group theory to easily compute \(K(S_n)\). First, \(K(S_n)\) is called the alternating group \(S_n\), and is generated by the \(3\)-cycles of the form \((i \enspace i+1 \enspace i+2)\), similar to how \(S_n\) is generated by the \(2\)-cycles of the form \((i \enspace i + 1)\). But a \(3\)-cycle \((i \enspace i+1 \enspace i+2)\) can be expressed as the commutator of two \(2\)-cycles \([(i+2 \enspace i+1), (i \enspace i+1)]\).

Therefore, for \(S_4\), the generators for \(K(S_4)\) are just \((1 \; 2 \; 3) = [(2 \; 3), (1 \; 2)]\) and \((2 \; 3 \; 4) = [(3 \; 4), (2 \; 3)]\), with respective operations \([\![ ↺_{2, 3}, ↺_{1, 2} ]\!]\) and \([\![ ↺_{3, 4}, ↺_{2, 3} ]\!]\). However, these two generators are not quite enough to generate \(K^{(2)}(S_4)\) via commutators. Fortunately, it suffices to just add \(↺_{4, 1}\) to the list of operations, which lets us add \((1 \; 4)\) to the list of generators for \(S_4\), and then add \((3 \; 4 \; 1)\) to the list of generators for \(K(S_4)\). Then \((1 \; 4) (2 \; 3) = [(2 \; 3 \; 4), (1 \; 2 \; 3)]\) and \((2 \; 1) (3 \; 4) = [(3 \; 4 \; 1), (2 \; 3 \; 4)]\) suffice to generate \(K^{(2)}(S_4)\).^[5] Finally, we can easily compute \(K^{(3)}(S_4)\) to be the trivial group.

What does that tell us about what expressions we can rule out as solutions to the general quartic equation? Similarly to the cubic case, we expect to be able to rule out rational expressions and algebraic expressions with no nested radicals, and since \(K^{(2)}(S_4)\) is not the trivial group, we also expect to be able to rule out algebraic expressions with singly-nested radicals, like \(\sqrt{a - \sqrt{bc - 4}}\). But since \(K^{(3)}(S_4)\) is the trivial group, we don’t expect to be able to rule out algebraic expressions with doubly-nested radicals, like \(\sqrt{a - \sqrt{bc - \sqrt{d + 3}}}\).

As an antidote to all the abstractness above, here is a visualization for quartics, where you can examine how the various operations interact with the quartic formula and its subexpressions.

Interactive Example 5: The quartic equation

Roots

\(R = \langle r_1, r_2, r_3, r_4 \rangle\)

Coefficients

Candidate solution

\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)

\(x_1 = -27 Δ\)
\(x_{1, 2} = Q^3 = \frac{Δ_1 + \sqrt{-27 Δ}}{2}\)
\(x_{1, 2, 3, 4, 5, 6} = Q\)
\(x_{1, 2, 3, 4, 5, 6} = S =\)
\(\qquad \frac{1}{2} \sqrt{-\frac{2}{3} p + \frac{1}{3a} \left( Q + \frac{Δ_0}{Q} \right)}\)
\(x_{1, 2, 3, 4} = \)
\(\qquad -\frac{b}{4a} \mp S + \frac{1}{2} \sqrt{-4S^2 - 2p \pm \frac{q}{S}}\)
(the quartic formula)

There are a few additions to the interactive display above. It now prints a message when it detects that the selected expression is ruled out as the quartic formula, which just looks at whether \(R\) is not in order and \(X\) is, and vice versa. There’s also a button to reset the ordering of \(R\) and \(X\).

The second addition is that the operations have been organized to make clear what commutator subgroup they’re in. The \(A_i\) map to generators of \(S_4\). Then taking the commutators of adjacent \(A_i\) give \(B_i\), which map to the generators of \(K(S_4)\), and similarly for \(C_i\).

The third addition is a button that finds the first operation that rules out the selected formula, if any. It simply tries all the \(A_i\)s, then all the \(B_i\)s, then all the \(C_i\)s, checking \(R\) and \(X\) in between. The general algorithm, which assumes a fixed set of roots \(r_1, \dotsc, r_n\text{,}\) takes an expression \(f(a_n, a_{n-1}, \dotsc)\) where \(a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0\) is the general \(n\)th-degree polynomial equation, takes a depth limit \(k\), and looks like this (defining \(K^{(0)}(G)\) to be just \(G\)):

For \(i\) from 0 to \(k\):
1. If \(K^{(i)}(S_n)\) is trivial, then terminate indicating that \(f(a_n, a_{n-1}, \dotsc)\) was unable to be ruled out because \(K^{(i)}(S_n)\) is trivial.
2. Otherwise, find operations \(o_1\) to \(o_m\) that act as the generators \(g_1\) to \(g_m\) of \(K^{(i)}(S_n)\). For \(i > 0\), this can be done by applying \([\![ o_1, o_2 ]\!]\) to the operations corresponding to the generators of \(K^{(i-1)}(S_n)\).
3. For each \(o_j\):
  1. Apply \(o_j\).
  2. If \(R\) is not in order but \(X\) is, terminate indicating that \(o_j\) rules out \(f(a_n, a_{n-1}, \dotsc)\).
  3. Undo \(o_j\), i.e. apply \(o_j^\prime\) or reset to the initial state of \(r_1, \dotsc, r_n\).
Terminate indicating that \(f(a_n, a_{n-1}, \dotsc)\) was unable to be ruled out because the depth limit has been reached.

This algorithm basically just implements the proof of the following lemma, which generalizes the previous theorems, except that it tries to find the simplest operation that is a generator that rules out the given expression.

Before we state the lemma, we need another definition: let the radical level of an algebraic expression \(f(a_n, a_{n-1}, \dotsc)\) be \(0\) if \(f(a_n, a_{n-1}, \dotsc)\) is a rational expression, \(1\) if \(f(a_n, a_{n-1}, \dotsc)\) has only non-nested radicals, and \(n + 1\) if the maximum number of nested radicals is \(n\).

(Lemma 3.) If the algebraic expression \(f(a_n, a_{n-1}, \dotsc)\) has radical level \(d\) and \(K^{(d)}(S_n)\) is non-trivial, then any operator that maps to a non-trivial element \(g\) in \(K^{(d)}(S_n)\) rules out \(f(a_n, a_{n-1}, \dotsc)\) as the solution to the general \(n\)th-degree polynomial equation \[ a_n x^n + a_{n+1} x^{n+1} + \dotsb + a_0 = 0\text{.} \]

Rough sketch of proof. We just do induction on \(d\). For the base case \(d = 0\), if \(K^{(0)}(S_n)\) is non-trivial, then \(n \ge 2\). Let \(g = (i \; j)\) for any \(i \ne j\), of which there must at least be one. Then by the same reasoning as Theorem 1, \(g\) rules out \(f(a_n, a_{n-1}, \dotsc)\). Since the \((i \; j)\) generate \(S_n\), then any \(g \in S_n\) is the composition of some sequence of \((i \; j)\)s, each of which rules out \(f(a_n, a_{n-1}, \dotsc)\), so \(g\) must also rule it out.

Assume the lemma holds for \(d\), and let \(x = f_{d+1}(a_n, a_{n-1}, \dotsc) = \sqrt[k]{f_d(a_n, a_{n-1}, \dotsc)}\) for some \(k\), where \(f_d\) has radical level \(d\). Let \(o\) act on \(R\) like any non-trivial element \(g\) of \(K^{(d+1)}(S_n)\). By the induction hypothesis, all elements \(h_i \in K^{(d)}(S_n)\) cause \(x = f_d(a_n, a_{n-1}, \dotsc)\) to go around a loop, so pick a point \(z_0\) that is never equal to any point \(x\) traverses under any operation corresponding to \(h_i\). Then, since \(g = [h, k]\) for \(h, k \in K^{(d)}(S_n)\), by the same reasoning as in Theorem 2, the total angle induced by \(o\) on \(x = f_{d+1}(a_n, a_{n-1}, \dotsc)\) around \(z_0\) is \(0\), and the distance from \(z_0\) remains unchanged. Thus, \(x = f_{d+1}(a_n, a_{n-1}, \dotsc)\) remains fixed, and \(o\) rules it out.

By the same reasoning as in Theorem 2, this can be extended to the general case of \(f(a_n, a_{n-1}, \dotsc)\) being any algebraic formula with nesting level \(d + 1\). ∎

We can immediately deduce the following corollaries, using the fact that \(K^{(2)}(S_4)\) is non-trivial:

(Corollary 4.) An algebraic expression with at most singly-nested radicals in the coefficients of the general quartic equation \[ ax^4 + bx^3 + cx^2 + dx + e = 0 \] cannot be a solution to this equation.^[6]

5. Quintic Equations

Now, finally, the quintic. Let’s jump right to the interactive example.

Interactive Example 6: The quintic equation

Roots

\(R = \langle r_1, r_2, r_3, r_4, r_5 \rangle\)

Coefficients

Candidate solution

\(X = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\)

\(x_1 = f_A = Δ\)
\(x_{1, 2} = f_B = \sqrt{f_A}\)
\(x_{1, 2, 3, 4, 5, 6} = f_C =\)
\(\qquad \sqrt[3]{(f_B - 0.8)(f_B - 0.75)}\)

Similarly to the interactive example for the quartic, the operations are organized to make clear what commutator subgroup they’re in. There’s something interesting though—the \(C_i\) seem very similar to the \(B_i\). In fact, the \(C_i\) also act on \(R\) like \(A_5\)! Also, if you compute \(D_i = [\![ C_{(i+1) \bmod 5}, C_{i \bmod 5} ]\!]\), you will find that \(D_i\) acts exactly like \(B_i\) on \(R\)!

Why can we do this for the quintic, but not for anything of lower degree? This is because \(A_5\) is perfect, which means that it equals its own commutator subgroup. (You can verify this yourself by brute force, e.g. writing a program, or you can play around with \(3\)-cycles and see that any \(3\)-cycle is the commutator of two other \(3\)-cycles.) Then this immediately implies that \(K^{(n)}(S_5)\) is non-trivial for any \(n\), which then implies our main result:

(Abel-Ruffini theorem.) An algebraic expression in the coefficients of the general \(n\)th-degree polynomial equation \[ a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0 = 0 \] for \(n \ge 5\) cannot be a solution to this equation.

Proof. By the above, \(A_5\) is perfect, so \(K^{(d)}(S_5)\) is non-trivial for all \(d\).

Since \(S_5\) is a subgroup of \(S_n\) for \(n \ge 5\), \(A_5 = K(S_5)\) must also be a subgroup of \(A_n = K(S_n)\) for \(n \ge 5\). But since \(A_5\) is perfect, then \(A_5\) must also be a subgroup of \(K^{(d)}(S_n)\) for any \(d\), which means that \(K^{(d)}(S_n)\) is non-trivial for any \(d\) and \(n \ge 5\).

An algebraic expression has some finite radical level \(d\), but \(K^{(d)}(S_5)\) is non-trivial for any \(d\) and \(n \ge 5\), so by Lemma 3 no algebraic expression can be solution to the general \(n\)th-degree polynomial equation for \(n \ge 5\). ∎

With the theorem above, we now have a succinct answer to the question at the beginning of this article. You can’t write down a solution to the general quadratic equation that is a rational expression because you can find an operation on the roots that will permute them non-trivially and yet leave the result of the expression constant. For the same reason, you can’t write down a solution to the general \(n\)th-degree polynomial equation that is an algebraic equation!

Finally, as a bonus, I’ll explain how to generate algebraic expressions that require a “\(d\)th-level” operator, meaning an operator that maps to an element of \(K^{(d)}(S_n)\), assuming it’s non-trivial. This shows that there’s no single “super-operation” that rules out all algebraic expressions.

As an example, the formulas in the interactive example above are chosen so that \(f_A\) is ruled out by the \(A_i\), \(f_B\) is ruled out by the \(B_i\), etc. They depend on the particular roots chosen, of course, which is why this interactive example doesn’t let you move the roots around, but in principle you could build formulas for any polynomial that is first ruled out by \(C_i\), or \(D_i\), or whatever you wish. Given a polynomial \(P = a_n x^n + a_{n-1} x^{n-1} + \dotsb + a_0\) of degree \(n \ge 5\) and \(d\), a recursive algorithm to generate an expression that is ruled out only by a “\(d\)th-level” operator is:

If \(d = 0\), return \(Δ(a_n, a_{n-1}, \dotsc)\).
Otherwise, run this algorithm with \(P\) and \(d-1\) to get \(f_{d-1}(a_n, a_{n-1}, \dotsc)\).
Find operations \(o_1\) to \(o_m\) that correspond to generators \(g_1\) to \(g_m\) of \(K^{(d-1)}(S_n)\).
For each \(o_i\):
1. Apply \(o_i\), which makes \(x = f_{d-1}(a_n, a_{n-1}, \dotsc)\) go around a loop. Record the looped-around regions and their associated rotation numbers (i.e., the total angle divided by \(2π\)).
Pick points \(z_1, \dotsc, z_t\) such that each \(z_i\) has a non-zero rotation number for at least one \(o_j\). \(t\) can be at most \(m\).
Let \(k\) be the least number such that, for every \(o_i\), \(k\) doesn’t divide any of the rotation numbers of any \(z_j\) with respect to \(o_i\). Return \(f_d(a_n, a_{n-1}, \dotsc) = \sqrt[k]{\prod_i (f_{k-1}(a_n, a_{n-1}, \dotsc) - z_i)}\).

Like this post? Subscribe to my feed or follow me on Twitter .

Footnotes

[1] This proof is originally due to Arnold. There are a couple of videos that talk about this proof, as well as this book based on Arnold’s lectures, and this paper. I mostly follow Boaz’s video, and the interactive visualizations are based on the visualizations he has in his video.

The interactive visualizations were generated using the excellent JSXGraph library. ↩

[2] Theorem 1 can be generalized even more! We can append other functions and operations to rational expressions, as long as those functions and operations are continuous and single-valued. For example, we can allow the use of exponentials and trigonometric functions, which is something that the standard Galois theory cannot handle.↩

[3] More precisely, a \(↺_{i, j}\) contains a pair of simple paths, i.e. continuous injective functions \([0, 1] \to \mathbb{C}\), between two distinct points of \(\mathbb{C}\), such that their concatenation defines a simple closed curve around a region in \(\mathbb{C}\) with a counter-clockwise orientation. Also, depending on the exact method of formalizing \(↺_{i, j}\), it either explicitly or implicitly encodes a permutation on \(R\). Then we can define an operation \(*\) on the \(↺_{i, j}\) and \(↻_{i, j}\) (defined analogously) which concatenates the paths (and composes the permutations, if explicitly encoded). Since the space of paths has no inverses or an identity, the \(↺_{i, j}\) and \(↻_{i, j}\) generate a free semigroup with the operation \(*\). Then this semigroup defines an action on \(R\) via its associated permutation on \(R\), which then just generates \(S_n\), since \(S_n\) is generated by adjacent swaps.

We make a distinction between the operation \(↺_{i, j}\) and the permutation it induces on \(R\), since the latter “loses” the orientation information, which is important to preserve when talking about the action of \(↺_{i, j}\) on some \(x_i\). ↩

[4] Note that, depending on the text, the commutator may be defined slightly differently as \(g h g^{-1} h^{-1}\). ↩

[5] \(K(A_4)\) is isomorphic to \(V\), the Klein four-group. ↩

[6] In fact, the quartic formula has three nested radicals. I wonder why? ↩