Computing the Integer Square Root

Fred Akalin

December 9, 2014

1. The algorithm

Today I’m going to talk about a fast algorithm to compute the integer square root of a non-negative integer \(n\), \(\isqrt(n) = \lfloor \sqrt{n} \rfloor\), or in words, the greatest integer whose square is less than or equal to \(n\).^[1] Most sources that describe the algorithm take it for granted that it is correct and fast. This is far from obvious! So I will prove both correctness and speed below.

One simple fact is that \(\isqrt(n) \le n/2\), so a straightforward algorithm is just to test every non-negative integer up to \(n/2\). This takes \(O(n)\) arithmetic operations, which is bad since it’s exponential in the size of the input. That is, letting \(\Bits(n)\) be the number of bits required to store \(n\) and letting \(\lg n\) be the base-\(2\) logarithm of \(n\), \(\Bits(n) = O(\lg n)\), and thus this algorithm takes \(O(2^{\Bits(n)})\) arithmetic operations.

We can do better by doing binary search; start with the range \([0, n/2]\) and adjust it based on comparing the square of an integer in the middle of the range to \(n\). This takes \(O(\lg n) = O(\Bits(n))\) arithmetic operations.

However, the algorithm below is even faster:^[2]

If \(n = 0\), return \(0\).
Otherwise, set \(i\) to \(0\) and set \(x_0\) to \(2^{\lceil \Bits(n) / 2\rceil}\).
Repeat:
1. Set \(x_{i+1}\) to \(\lfloor (x_i + \lfloor n/x_i \rfloor) / 2 \rfloor\).
2. If \(x_{i+1} \ge x_i\), return \(x_i\). Otherwise, increment \(i\).

Call this algorithm \(\NewtonSqrt\), since it’s based on Newton’s method. It’s not obvious, but this algorithm returns \(\isqrt(n)\) using only \(O(\lg \lg n) = O(\lg(\Bits(n)))\) arithmetic operations, as we will prove below. But first, here’s an implementation of the algorithm in Javascript:^[3]

// isqrt returns the greatest number x such that x^2 <= n. The type of
// n must behave like BigInteger (e.g.,
// https://github.com/akalin/jsbn ), and n must be non-negative.
//
//
// Example (open up the JS console on this page and type):
//
//   isqrt(new BigInteger("64")).toString()
function isqrt(n) {
  var s = n.signum();
  if (s < 0) {
    throw new Error('negative radicand');
  }
  if (s == 0) {
    return n;
  }

  // x = 2^ceil(Bits(n)/2)
  var x = n.constructor.ONE.shiftLeft(Math.ceil(n.bitLength()/2));
  while (true) {
    // y = floor((x + floor(n/x))/2)
    var y = x.add(n.divide(x)).shiftRight(1);
    if (y.compareTo(x) >= 0) {
      return x;
    }
    x = y;
  }
}

2. Correctness

The core of the algorithm is the iteration rule: \[ x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i} \rfloor}{2} \right\rfloor \] where the floor functions are there only because we’re using integer division. Define an integer-valued function \(f(x)\) for the right side. Using basic properties of the floor function, you can show that you can remove the inner floor: \[ f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor \] which makes it a bit easier to analyze. Also, the properties of \(f(x)\) are closely related to its equivalent real-valued function: \[ g(x) = \frac{1}{2} (x + n/x)\text{.} \]

For starters, again using basic properties of the floor function, you can show that \(f(x) \le g(x)\), and for any integer \(m\), \(m \le f(x)\) if and only if \(m \le g(x)\).

Finally, let’s give a name to our desired output: let \(s = \isqrt(n) = \lfloor \sqrt{n} \rfloor\).^[4]

Intuitively, \(f(x)\) and \(g(x)\) “average out” however far away their input \(x\) is from \(\sqrt{n}\). Conveniently, this “average” is never an undereestimate:

(Lemma 1.) For \(x \gt 0\), \(f(x) \ge s\).

Proof. By the basic properties of \(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge s\). \(g'(x) = (1 - n/x^2)/2\) and \(g''(x) = n/x^3\). Therefore, \(g(x)\) is concave-up for \(x \gt 0\); in particular, its single positive extremum at \(x = \sqrt{n}\) is a minimum. But \(g(\sqrt{n}) = \sqrt{n} \ge s\). ∎

(You can also prove Lemma 1 without calculus; show that \(g(x) \ge s\) if and only if \(x^2 - 2sx + n \ge 0\), which is true when \(s^2 \le n\), which is true by definition.)

Furthermore, our initial estimate is always an overestimate:

(Lemma 2.) \(x_0 \gt s\).

Proof. \(\Bits(n) = \lfloor \lg n \rfloor + 1 \gt \lg n\). Therefore, \[ \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &\ge 2^{\Bits(n) / 2} \\ &\gt 2^{\lg n / 2} \\ &= \sqrt{n} \\ &\ge s\text{.} \; \blacksquare \end{aligned} \]

(Note that any number greater than \(s\), say \(n\) or \(\lceil n/2 \rceil\), can be chosen for our initial guess without affecting correctness. However, the expression above is necessary to guarantee performance. Another possibility is \(2^{\lceil \lceil \lg n \rceil / 2 \rceil}\), which has the advantage that if \(n\) is an even power of \(2\), then \(x_0\) is immediately set to \(\sqrt{n}\). However, this is usually not worth the cost of checking that \(n\) is a power of \(2\), as is required to compute \(\lceil \lg n \rceil\).)

An easy consequence of Lemmas 1 and 2 is that the invariant \(x_i \ge s\) holds. That lets us prove partial correctness of \(\NewtonSqrt\):

(Theorem 1.) If \(\NewtonSqrt\) terminates, it returns the value \(s\).

Proof. Assume it terminates. If it terminates in step \(1\), then we are done. Otherwise, it can only terminate in step \(3.2\) where it returns \(x_i\) such that \(x_{i+1} = f(x_i) \ge x_i\). This implies that \(g(x_i) = (x_i + n/x_i) / 2 \ge x_i\). Rearranging yields \(n \ge x_i^2\) and combining with our invariant we get \(\sqrt{n} \ge x_i \ge s\). But \(s + 1 \gt \sqrt{n}\), so that forces \(x_i\) to be \(s\), and thus \(\NewtonSqrt\) returns \(s\) if it terminates. ∎

For total correctness we also need to show that \(\NewtonSqrt\) terminates. But this is easy:

(Theorem 2.) \(\NewtonSqrt\) terminates.

Proof. Assume it doesn’t terminate. Then we have a strictly decreasing infinite sequence of integers \(\{ x_0, x_1, \dotsc \}\). But this sequence is bounded below by \(s\), so it cannot decrease indefinitely. This is a contradiction, so \(\NewtonSqrt\) must terminate. ∎

We are done proving correctness, but you might wonder if the check \(x_{i+1} \ge x_i\) in step \(3.2\) is necessary. That is, can it be weakened to the check \(x_{i+1} = x_i\)? The answer is “no”; to see that, let \(k = n - s^2\). Since \(n \lt (s+1)^2\), \(k \lt 2s + 1\). On the other hand, consider the inequality \(f(x_i) \gt x_i\). Since that would cause the algorithm to terminate and return \(x_i\), that implies that \(x_i = s\). Therefore, that inequality is equivalent to \(f(s) \gt s\), which is equivalent to \(f(s) \ge s + 1\), which is equivalent to \(g(s) = (s + n/s) / 2 \ge s + 1\). Rearranging yields \(n \ge s^2 + 2s\). Substituting in \(n = s^2 + k\), we get \(s^2 + k \ge s^2 + 2s\), which is equivalent to \(k \ge 2s\). But since \(k \lt 2s + 1\), that forces \(k\) to equal \(2s\). That is the maximum value \(k\) can be, so therefore \(n\) must be one less than a perfect square. Indeed, for such numbers, weakening the check would cause the algorithm to oscillate between \(s\) and \(s + 1\). For example, \(n = 99\) would yield the sequence \(\{ 16, 11, 10, 9, 10, 9, \dotsc \}\).

3. Run-time

We will show that \(\NewtonSqrt\) takes \(O(\lg \lg n)\) arithmetic operations. Since each loop iteration does only a fixed number of arithmetic operations (with the division of \(n\) by \(x\) being the most expensive), it suffices to show that our algorithm performs \(O(\lg \lg n)\) loop iterations.

It is well known that Newton’s method converges quadratically sufficiently close to a simple root. We can’t actually use this result directly, since it’s not clear that the convergence properties of Newton’s method are preserved when using integer operations, but we can do something similar.

Define \(\Err(x) = x^2/n - 1\) and let \(ϵ_i = \Err(x_i)\). Intuitively, \(\Err(x)\) is a conveniently-scaled measure of the error of \(x\): it is less than \(1\) for most of the values we care about and it bounded below for integers greater than our target \(s\). Also, we will show that the \(ϵ_i\) shrink quadratically. These facts will then let us show our bound for the iteration count.

First, let’s prove our lower bound for \(ϵ_i\):

(Lemma 3.) \(x_i \ge s + 1\) if and only if \(ϵ_i \ge 1/n\).

Proof. \(n \lt (s + 1)^2\), so \(n + 1 \le (s + 1)^2\), and therefore \((s + 1)^2/n - 1 \ge 1/n\). But the expression on the left side is just \(\Err(s + 1)\). \(x_i \ge s + 1\) if and only if \(ϵ_i \ge \Err(s + 1)\), so the result immediately follows. ∎

Then we can use that to show that the \(ϵ_i\) shrink quadratically:

(Lemma 4.) If \(x_i \ge s + 1\), then \(ϵ_{i+1} \lt (ϵ_i/2)^2\).

Proof. \(ϵ_{i+1}\) is just \(\Err(f(x_i)) \le \Err(g(x_i))\), so it suffices to show that \(\Err(g(x_i)) \lt (ϵ_i/2)^2\). Inverting \(\Err(x)\), we get that \(x_i = \sqrt{(ϵ_i + 1) \cdot n}\). Expressing \(g(x_i)\) in terms of \(ϵ_i\) we get \[ g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{ϵ_i + 2}{\sqrt{ϵ_i + 1}} \right) \] and \[ \Err(g(x_i)) = \frac{(ϵ_i/2)^2}{ϵ_i+1}\text{.} \] Therefore, it suffices to show that the denominator is greater than \(1\). But \(x_i \ge s + 1\) implies \(ϵ_i \gt 0\) by Lemma 3, so that follows immediately and the result is proved. ∎

Then let’s bound our initial values:

(Lemma 5.) \(x_0 \le 2s\), \(ϵ_0 \le 3\), and \(ϵ_1 \le 1\).

Proof. Let’s start with \(x_0\): \[ \begin{aligned} x_0 &= 2^{\lceil \Bits(n) / 2 \rceil} \\ &= 2^{\lfloor (\lfloor \lg n \rfloor + 1 + 1)/2 \rfloor} \\ &= 2^{\lfloor \lg n / 2 \rfloor + 1} \\ &= 2 \cdot 2^{\lfloor \lg n / 2 \rfloor}\text{.} \end{aligned} \] Then \(x_0/2 = 2^{\lfloor \lg n / 2 \rfloor} \le 2^{\lg n / 2} = \sqrt{n}\). Since \(x_0/2\) is an integer, \(x_0/2 \le \sqrt{n}\) if and only if \(x_0/2 \le \lfloor \sqrt{n} \rfloor = s\). Therefore, \(x_0 \le 2s\).

As for \(ϵ_0\): \[ \begin{aligned} ϵ_0 &= \Err(x_0) \\ &\le \Err(2s) \\ &= (2s)^2/n - 1 \\ &= 4s^2/n - 1\text{.} \end{aligned} \] Since \(s^2 \le n\), \(4s^2/n \le 4\) and thus \(ϵ_0 \le 3\).

Finally, \(ϵ_1\) is just \(\Err(f(x_0))\). Using calculations from Lemma 4, \[ \begin{aligned} ϵ_1 &\le \Err(g(x_0)) \\ &= (ϵ_0/2)^2/(ϵ_0 + 1) \\ &\le (3/2)^2/(3 + 1) \\ &= 9/16\text{.} \end{aligned} \] Therefore, \(ϵ_1 \le 1\). ∎

Finally, we can show our main result:

(Theorem 3.) \(\NewtonSqrt\) performs \(O(\lg \lg n)\) loop iterations.

Proof. Let \(k\) be the number of loop iterations performed when running the algorithm for \(n\) (i.e., \(x_k \ge x_{k-1}\)) and assume \(k \ge 4\). Then \(x_i \ge s + 1\) for \(i \lt k - 1\). Since \(ϵ_1 \le 1\) by Lemma 5, \(ϵ_2 \le 1/2\) and \(ϵ_i \le (ϵ_2)^{2^{i-2}}\) for \(2 \le i \lt k - 1\) by Lemma 4, then \(ϵ_{k-2} \le 2^{-2^{k-4}}\). But \(1/n \le ϵ_{k-2}\) by Lemma 3, so \(1/n \le 2^{-2^{k-4}}\). Taking logs to bring down the \(k\) yields \(k - 4 \le \lg \lg n\). Then \(k \le \lg \lg n + 4\), and thus \(k = O(\lg \lg n)\). ∎

Note that in general, an arithmetic operation is not constant-time, and in fact has run-time \(\Omega(\lg n)\). Since the most expensive arithmetic operation we do is division, we can say that \(\NewtonSqrt\) has run-time that is both \(\Omega(\lg n)\) and \(O(D(n) \cdot \lg \lg n)\), where \(D(n)\) is the run-time of dividing \(n\) by some number \(\le n\).^[5]

4. The Initial Guess

It’s also useful to show that if the initial guess \(x_0\) is bad, then the run-time degrades to \(Θ(\lg n)\). We’ll do this by defining the function \(\NewtonSqrt\) except that it takes a function \(\mathrm{I{\small NITIAL}\text{-}G{\small UESS}}\) that is called with \(n\) and assigned to \(x_0\) in step 1. Then, we can treat \(ϵ_0\) as a function of \(n\) and analyze how long \(ϵ_i\) stays above \(1\) to show that \(\NewtonSqrt\) uses an initial guess such that \(ϵ_0(n) = Θ(1)\), then Theorem 4 reduces to Theorem 3 in that case. However, if \(x_0\) is chosen to be \(Θ(n)\), e.g. the initial guess is just \(n\) or \(n/k\) for some \(k\), then \(ϵ_0(n)\) will also be \(Θ(n)\), and so the run time will degrade to \(Θ(\lg n)\). So having a good initial guess is important for the performance of \(\NewtonSqrt\)!

Like this post? Subscribe to my feed or follow me on Twitter .

Footnotes

[1] Aside from the Wikipedia article, the algorithm is described as Algorithm 9.2.11 in Prime Numbers: A Computational Perspective. ↩

[2] Note that only integer operations are used, which makes this algorithm suitable for arbitrary-precision integers. ↩

[3] Go and JS implementations are available on my GitHub. ↩

[4] Here, and in most of the article, we’ll implicitly assume that \(n \gt 0\). ↩

[5] \(D(n)\) is \(Θ(\lg^2 n)\) using long division, but fancier division algorithms have better run-times. ↩