Understanding Evlis Tail Recursion

Fred Akalin

While reading about proper tail recursion in Scheme, I encountered a similar but obscure optimization called evlis tail recursion. In the paper where it was first described, the author claims it "dramatically improve the space performance of many programs," which sounded promising.

However, the few places where its mentioned don't do much more than state its definition and claim its usefulness. Hopefully I can provide a more detailed analysis here.

Consider the straightforward factorial implementation in Scheme:[1]

(define (fact n) (if (<= n 1) 1 (* n (fact (- n 1)))))

It is not tail-recursive, since the recursive call is nested in another procedure call. However, it's almost tail-recursive; the call to * is a tail call, and the recursive call is its last subexpression, so it will be the last subexpression to be evaluated.

Recall what happens when a procedure call (represented as a list of subexpressions) is evaluated: each subexpression is evaluated, and the first result (the procedure) is passed the other results as arguments.[2]

Evlis tail recursion can be described as follows: when performing a procedure call and during the evaluation of the last subexpression, the calling environment is discarded as soon as it is not required.[3] The distinction between evlis tail recursion and proper tail recursion is subtle. Proper tail recursion requires only that the calling environment be discarded before the actual procedure call; evlis tail recursion discards the calling environment even sooner, if possible.

An example will help to clarify things. Given fact as defined above, say you evaluate (fact 10) and you're in the procedure call with n = 5. The call stack of a properly tail-recursive interpreter would look like this:

evalExpr
--------
env = { n: 10 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [10, <pending evalExpr('(fact (- n 1))', env)>]

evalExpr
--------
env = { n: 9 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [9, <pending evalExpr('(fact (- n 1))', env)>]

...

evalExpr
--------
env = { n: 6 } -> <top-level environment>
expr = '(* n (fact (- n 1)))'
proc = <native function: *>
args = [6, <pending evalExpr('(fact (- n 1))', env)>]

evalExpr
--------
env = { n: 5 } -> <top-level environment>
expr = '(if ...)'

whereas the call stack of an evlis tail-recursive interpreter would look like this:

evalExpr
--------
env = { n: 5 } -> <top-level environment>
pendingProcedureCalls = [
  [<native function: *>, 10],
  [<native function: *>, 9],
  ...
  [<native function: *>, 6]
]
expr = (if ...)

In this implementation, the last subexpression of a procedure call is evaluated exactly like a tail expression, but the procedure call and non-last subexpressions are pushed onto a stack. Whenever an expression is reduced to a simple one and the stack is non-empty, a pending procedure call with its other args are popped off, and it is called with the reduced expression as the final argument.

Note that this didn't change the asymptotic behavior of the procedure; it still takes \(\Theta(n)\) memory to evaluate. However, only the bare minimum of information is saved: the list of pending functions and their arguments. Other auxiliary variables, and crucially the nested calling environments, aren't preserved, leading to a significant constant-factor reduction in memory.

This raises the question: Are there cases where evlis tail recursion leads to better asymptotic behavior? In fact, yes; consider the following (contrived) implementation of factorial[4]:

(define (fact2 n)
  (define v (make-vector n))
  (* (n (fact2 (- n 1)))))

Before the main body of the function, a vector of size \(n\) is defined. This means that the environments in the call stack of a properly tail-recursive interpreter would look like this:[5]

env = { n: 10, v: <vector of size 10> } -> <top-level environment>
env = { n: 9, v: <vector of size 9> } -> <top-level environment>
env = { n: 8, v: <vector of size 8> } -> <top-level environment>
env = { n: 7, v: <vector of size 7> } -> <top-level environment>
...

whereas the an evlis tail-recursive interpreter would keep around only the current environment. Therefore, the properly tail-recursive interpreter would require \(\Theta(n^2)\) memory to evaluate (fact2 n) while the evlis tail-recursive interpreter would require only \(\Theta(n)\)

Studying examples like the one above enabled me to finally understand how evlin tail recursion worked and what sort of savings it gives. However, I have yet to find a practical example where evlis tail recursion gives the same sort of asymptotic gains as described above, and I'd be interested to receive some. But perhaps the "large gains" mentioned in the various papers describing it are only constant-factor reductions in memory.

In any case, another important difference in Scheme between proper tail recursion and evlis tail recursion is that the former is a language feature and the latter is an optimization. That means that it is acceptable and even encouraged to write Scheme programs that take advantage of proper tail recursion, but it would be unwise to rely on evlis tail recursion for the asymptotic performance of your function. Instead, one should treat it just as a nice constant-factor speed gain.

Note that it is easy to make evlis tail recursion "smarter." Since Scheme doesn't specify the order of argument evaluation, an interpreter could evaluate arguments to maximize the gains from evlis tail recursion. As an easy example, if we had switched the arguments to + in fact above, making it non-evlis-tail-recursive, a smart compiler could still treat it as such. A possible rule of thumb would be to pick a non-trivial function call to evaluate last.

To complete the picture, I will outline below the evaluation function for a simple evlis tail-recursive Scheme interpreter in Javascript. All of the sources I've found describe it in terms of compilers, so I think it'll be useful to have a reference implementation for an interpreter.

Let's say we already have a properly tail-recursive interpreter:[6]

function evalExpr(expr, env) {
  // Fake tail calls with a while loop and continue.
  while (true) {
    // Symbols, constants, quoted expressions, and lambdas.
    if (isSimpleExpr(expr)) {
      // The only exit point.
      return evalSimpleExpr(expr, env);
    }
    // (if test conseq alt)
    if (isSpecialForm(expr, 'if')) {
      expr = evalExpr(expr[1], env) ? expr[2] : expr[3];
      continue;
    }
    // (set! var expr)
    if (isSpecialForm(expr, 'set!')) {
      env.set(expr[1], evalExpr(expr[2], env));
      expr = null;
      continue;
    }
    // (define var expr?)
    if (isSpecialForm(expr, 'define')) {
      env.define(expr[1], evalExpr(expr[2], env));
      expr = null;
      continue;
    }
    // (begin expr*)
    if (isSpecialForm(expr, 'begin')) {
      if (expr.length == 1) {
        expr = null;
        continue;
      }
      // Evaluate all but the last subexpression.
      for (var i = 1; i < expr.length - 1; ++i) {
        evalExpr(expr[i], env);
      }
      expr = expr[expr.length - 1];
      continue;
    }
    // (proc expr*)
    var proc = evalExpr(expr.shift(), env);
    var args = expr.map(function(subExpr) { return evalExpr(subExpr, env); });
    // proc.run() returns its body in result.expr and the environment
    // in which to evaluate it (with all its arguments bound) in
    // result.env.
    var result = proc.run(args);
    expr = result.expr;
    // The only time when env is changed.
    env = result.env;
    continue;
  }
}

Then implementing evlis tail recursion requires only a few changes:

function evalExpr(expr, env) {
  // This is a stack of procedures and their non-final arguments that
  // are waiting for their final argument to be evaluated.
  var pendingProcedureCalls = [];
  while (true) {
    if (isSimpleExpr(expr)) {
      expr = evalSimpleExpr(expr, env);
      // Discard calling environment.
      env = null;
      if (pendingProcedureCalls.length == 0) {
        // No pending procedure calls, so we're done (the only exit
        // point).
        return expr;
      }
      var args = pendingProcedureCalls.pop();
      var proc = args.shift();
      args.push(expr);
      var result = proc.run(args);
      expr = result.expr;
      // Change to new environment (the only time when env is
      // changed).
      env = result.env;
      continue;
    }
    ...
    // Everything else remains the same.
    ...
    // (proc expr*)
    var nonFinalSubExprs =
      exprs.slice(0, -1).map(
        function(subExpr) { return evalExpr(subExpr, env); });
    pendingProcecureCalls.push(nonFinalSubExprs);
    // Evaluate the last subexpression as a tail call.
    expr = expr[expr.length - 1];
    continue;
  }
}

[1] Assume a left-to-right evaluation order for now.

[2] The function that takes a list of expressions, evaluates them, and returns the results as a list is traditionally called evlis, hence the name of the optimization.

[3] This assumes that the calling environment isn't stored somewhere else.

[4] This was adapted from an example in Proper Tail Recursion and Space Efficiency.

[5] Assume that the interpreter isn't smart enough to deduce that \(v\) can be optimized out since it's never used.

[6] Adapted from Peter Norvig's excellent lis.py.