A Foray into Number Theory with Haskell

Fred Akalin

July 6, 2007

I encountered an interesting problem on reddit a few days ago which can be paraphrased as follows:

Find a perfect square \(s\) such that \(1597s + 1\) is also perfect square.

After reading the discussion about implementing a brute-force algorithm to solve the problem and spending a futile half-hour or so trying my hand at find a better way, someone noticed that the problem was an instance of Pell's equation which is known to have an elegant and fast solution; indeed, he posted a one-liner in Mathematica solving the given problem. However, I wanted to try coding up the solution myself as the Mathematica solution, while succinct, isn't very enlightening since the heavy lifting is already done by a built-in function and an arbitrary constant was used for this particular instance of Pell's equation.

Pell's equation is simply the Diophantine equation \(x^2 - dy^2 = 1\) for a given \(d\)^[1]; being Diophantine means that all variables involved take on only integer values. (In our original problem, \(d\) is 1597 and we are asked for \(y^2\).) The solution involves finding the continued fraction expansion of \(\sqrt{d}\), finding the first convergent of the expansion that satisfies Pell's equation, and then generating all other solutions from that fundamental solution. We rule out the trivial solution \(x = 1\), \(y = 0\) which also implies that if \(d\) is a perfect square then there is no solution.

A continued fraction is an expression of the form: \[ x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cfrac{1}{\ddots\,}}}} \] where all \(a_i\) are integers and all but the first one are positive. The standard math notation for continued fractions is quite unwieldy so from now on we'll use \(\left \langle a_0; a_1, a_2, \dotsc \right \rangle\) instead of the above.

The theory of continued fractions is a rich and beautiful one but for now we'll just state a few facts:

The continued fraction expansion of a number is (mostly) unique.
The continued fraction expansion of a rational number is finite.
The continued fraction expansion of a irrational number is infinite.
A quadratic surd is a number of the form \(\frac{a + \sqrt{b}}{c}\) where \(a\), \(b\), and \(c\) are integers. Except maybe for the first term, the continued fraction expansion of a quadratic surd is periodic; that is, it repeats forever after a certain number of terms. This applies in particular to the square root of an integer.
Truncating an infinite continued fraction to get a finite continued fraction gives (in some sense) an optimal rational approximation to the irrational number represented by the infinite continued fraction.

Given a quadratic surd it is fairly easy to manipulate it into the form \(a + \frac{1}{q}\) where \(q\) is another quadratic surd. This fact can be used to come up with an algorithm to find the continued fraction expansion of a square root. Wikipedia explains it pretty well so I won't go over it, but here is my Haskell implementation:

sqrt_continued_fraction n = [ a_i | (_, _, a_i) <- mdas ]
    where
      mdas = iterate get_next_triplet (m_0, d_0, a_0)

      m_0 = 0
      d_0 = 1
      a_0 = truncate $ sqrt $ fromIntegral n

      get_next_triplet (m_i, d_i, a_i) = (m_j, d_j, a_j)
          where
            m_j = d_i * a_i - m_i
            d_j = (n - m_j * m_j) `div` d_i
            a_j = (a_0 + m_j) `div` d_j

and here are some examples:

Prelude Main> take 20 $ sqrt_continued_fraction 2
[1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]

Prelude Main> take 20 $ sqrt_continued_fraction 103
[10,6,1,2,1,1,9,1,1,2,1,6,20,6,1,2,1,1,9,1]

Prelude Main> take 20 $ sqrt_continued_fraction 36
[6,*** Exception: divide by zero

(Note that we're assuming that we won't be called with a perfect square. Also, do you notice anything interesting about the periodic portion of the continued fractions, particularly of \(\sqrt{103}\)?)

For those who are unfamiliar with Haskell, here's a quick list of key facts:

The first line takes a list of triplets and forms a list of all third elements, which is what we're interested in. (The other two elements of the triplet are auxiliary variables used by the algorithm.)
iterate is a function which takes in another function f, an initial variable x, and returns the infinite list [ x, f(x), f(f(x)), f(f(f(x))), ... ].
Note that Haskell uses lazy evaluation and so this function does not take an infinite amount of time to run; all its elements are evaluated (and memoized) only when needed.
The rest of the function is a straightforward representation of the meat of the algorithm described in the above Wikipedia entry.

It may not be clear what \(\sqrt{d}\) and its continued fraction expansion has to do with solving Pell's equation. However, notice that if \(x\) and \(y\) solve Pell's equation then manipulating Pell's equation to get \(\sqrt{d}\) on one side reveals that \(\frac{x}{y}\) is a good approximation of \(\sqrt{n}\). In fact, it is so good that you can prove that \(\frac{x}{y}\) must come from truncating the continued fraction expansion of \(\sqrt{d}\).

This leads us to the following: if you have an infinite continued fraction \(\left \langle a_0; a_1, a_2, \dotsc \right \rangle\) you can truncate it into a finite continued fraction \(\left \langle a_0; a_1, a_2, \dotsc, a_i \right \rangle\) and simplify it into the rational number \(\frac{p_i}{q_i}\). The sequence \(\frac{p_0}{q_0}, \frac{p_1}{q_1}, \frac{p_2}{q_2}, \dotsc\) forms the convergents of \(\left \langle a_0; a_1, a_2, \dotsc \right \rangle\) and converges to its represented irrational number.

It turns out you can calculate \(p_{i+1}\) and \(q_{i+1}\) efficiently from \(p_i\), \(q_i\), \(p_{i-1}\), \(q_{i-1}\), and \(a_{i+1}\) using the fundamental recurrence formulas (which can be proved by induction). Here is my Haskell implementation:

get_convergents (a_0 : a_1 : as) = pqs
    where
      pqs = (p_0, q_0) : (p_1, q_1) :
            zipWith3 get_next_convergent pqs (tail pqs) as

      p_0 = a_0
      q_0 = 1

      p_1 = a_1 * a_0 + 1
      q_1 = a_1

      get_next_convergent (p_i, q_i) (p_j, q_j) a_k = (p_k, q_k)
          where
            p_k = a_k * p_j + p_i
            q_k = a_k * q_j + q_i

and some more examples:

Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 2
[(1,1),(3,2),(7,5),(17,12),(41,29),(99,70),(239,169),(577,408)]

Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 103
[(10,1),(61,6),(71,7),(203,20),(274,27),(477,47),(4567,450),(5044,497)]

Prelude Main> take 8 $ get_convergents $ sqrt_continued_fraction 1597
[(39,1),(40,1),(1039,26),(1079,27),(2118,53),(3197,80),(27694,693),(113973,2852)]

Prelude Main> let divFrac (x, y) = (fromInteger x) / (fromInteger y)

Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 2
[1.0,1.5,1.4,1.4166666666666667,1.4137931034482758,1.4142857142857144,1.4142011834319526,1.4142156862745099]

Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 103
[10.0,10.166666666666666,10.142857142857142,10.15,10.148148148148149,10.148936170212766,10.148888888888889,10.148893360160965]

Prelude Main> take 8 $ map divFrac $ get_convergents $ sqrt_continued_fraction 1597
[39.0,40.0,39.96153846153846,39.96296296296296,39.9622641509434,39.9625,39.96248196248196,39.9624824684432]

Here are a few more quick facts to help those unfamiliar with Haskell:

The expression a : as forms a new list from the element a and the existing list as (equivalent to cons in Lisp).
zipWith3 is a function that takes in a function f, three lists a, b, and c of the same (possibly infinite) length n, and forms the new list [ f(a[0], b[0], c[0]), f(a[1], b[1], c[1]), ..., f(a[n], b[n], c[n]) ].
Note that the result of zipWith3 is part of the variable pqs which itself appears (twice!) in the arguments to zipWith3. This is a Haskell idiom and reflects the fact that the recurrence formulas define a convergent in terms of its two previous convergents. A simpler example (using the Fibonacci sequence) can be found in the Wikipedia entry for lazy evaluation.
Haskell has built-in data types for integers of arbitrary size which is necessary as the numerators and denominators of the convergents get large quickly. In fact, Haskell has built-in data types for rational numbers (represented as fractions) but it doesn't help us much here.

Since we are guaranteed that some convergent eventually satisfies Pell's equation, we can write a simple function to generate all convergents, test each one to see if it satisfies Pell's equation, and return the first one we see. Here is the Haskell implementation:

get_pell_fundamental_solution n = head $ solutions
    where
      solutions = [ (p, q) | (p, q) <- convergents, p * p - n * q * q == 1 ]

      convergents = get_convergents $ sqrt_continued_fraction n

Note the use of the Haskell's list comprehension syntax, similar to Python, which expresses what I just described in a matter reminiscent of set notation.

Here is the full Haskell program designed so its output may be conveniently piped to bc for verification:

module Main where

import System (getArgs)

sqrt_continued_fraction :: (Integral a) => a -> [a]
{- ... the sqrt_continued_fraction function explained above ... -}

get_convergents :: (Integral a) => [a] -> [(a, a)]
{- ... the get_convergents function explained above ... -}

get_pell_fundamental_solution :: (Integral a) => a -> (a, a)
{- ... the get_pell_fundamental_solution function explained above ... -}

main :: IO ()
main = do
  args <- System.getArgs
  let d      = (read $ head $ args :: Integer)
      (p, q) = get_pell_fundamental_solution d in
    putStr $ "d = " ++ (show d) ++ "\n" ++
             "p = " ++ (show p) ++ "\n" ++
             "q = " ++ (show q) ++ "\n" ++
             "p^2 - d * q^2 == 1\n"

and here is it in action:

$ ./solve_pell 1597
d = 1597
p = 519711527755463096224266385375638449943026746249
q = 13004986088790772250309504643908671520836229100
p^2 - d * q^2 == 1

The solution to the original problem is therefore:
5054112910466227478111803017176109047976100000000.

Now that we've found a method to get a solution, the question remains as to whether it's the only one. In fact it is not, but it is the minimal one, and all other solutions (of which there are an infinite number) can be generated from this fundamental one with a simple recurrence relation as described on the Wikipedia article. My program above can be easily extended to generate all solutions instead of just the fundamental one (I'll leave it to the reader as an exercise).

One remaining question is the efficiency of this algorithm. For simplicity, let's neglect the cost of the arbitrary-precision arithmetic involved and assume that the incremental cost of generating each term of the continued fraction expansion and the convergents is constant. Then the main cost is just how many convergents we have to generate before we find one that satisfies Pell's equation. In fact, it turns out that this depends on the length of the period of the continued fraction expansion of \(\sqrt{d}\), which has a rough upper bound of \(O(\ln(d \sqrt{d}))\). Therefore, the cost of solving Pell's equation (in terms of how many convergents to generate) for a given \(n\)-digit number is \(O(n 2^{n/2})\). This is pretty expensive already, although it's still much better than brute-force search (which is on the order of exponentiating the above expression). Can we do better? Well, sort of; it turns out the length of the answer is of the same order as the expression above, so any algorithm that explicitly outputs a solution necessarily takes that long. However, if you can somehow factor \(d\) into \(s d'\), where \(s\) is a perfect square and \(d'\) is squarefree (i.e., not divisible by any perfect square), then you can solve Pell's equation for the smaller number \(d'\) and output the solution for \(d'\) as the smaller fundamental solution and an expression raised to a certain power involving it. Note that in general this involves factoring \(d\), another hard problem, but for which there exists tons of prior work. An interested reader can peruse the papers by Lenstra and Vardi for more details.

As a final note, one of the things I really like about number theory is that investigating such a simple program can lead you down surprising avenues of mathematics and computational theory. In fact, I've had to omit a lot of things I had planned to say to avoid growing this entry to be longer than it already is. Hopefully, this entry helps someone else learn more about this interesting corner of number theory.

Like this post? Subscribe to my feed or follow me on Twitter .

Footnotes

[1] As a rule we'll avoid considering trivial cases and re-stating obvious assumptions (like \(d\) having to be a positive integer). ↩