Show HN: Tail Recursion Optimization for the JVM

salmonellaeater · on April 22, 2020

I've been hoping for tail call elimination[1] on the JVM for a long time, and historically it has been blocked by a security requirement[2]. Changing the number of stack frames between two calls would cause an error. Apparently that check is no longer there so TCE is possible!

[1] Using the word 'optimization' suggests that eliminating tail calls is optional, but for it to be really useful it has to be guaranteed. If you're writing in recursive style, your program becomes instantly incorrect if TCE goes away because many acceptable inputs will start causing stack overflows.

[2] See this answer: https://softwareengineering.stackexchange.com/a/272086

cryptonector · on April 22, 2020

> in jdk classes [...] there are a number of security sensitive methods that rely on counting stack frames between jdk library code and calling code to figure out who's calling them.

That's... lame. BTW, tail code elimination could keep a count of tail calls for this. It wouldn't work so well for mutual recursion, but maybe that would suffice. In any case, nothing should be so dependent on stack traces!

abecedarius · on April 23, 2020

How to do it for mutual recursion is sketched in my other comment. Java's stack inspection doesn't inherently require counting stack frames -- it's a choice. I agree it's a bad model anyway.

saagarjha · on April 22, 2020

Java's security manager model is fairly lame in general.

cryptonector · on April 22, 2020

Yes. And JAAS won't die even though applets did.

slaymaker1907 · on April 22, 2020

I know for JavaScript it was not implemented due to issues around debugging. I think for that reason if tail call optimization was done, you would need some sort of control over it such as an annotation to turn it on or off per method, maybe even a global flag for it.

abecedarius · on April 22, 2020

It's actually possible to implement calling in such a way that TCO is compatible with Java's security architecture (or at least as it was back in the late 90s when I cared about this). IIRC you'd have a finite state machine along with your calls, which would transit when you're making a tail call of the sort which would matter to the security checks. In other words, the amount of security state needed by a tail-call chain was bounded, if you implemented the spec in a tail-call conscious way.

I thought stack inspection was a bad idea anyway, but you could implement it together with TCO.

[edited because truncated by misclick]

malkia · on April 23, 2020

I remember the rationale behind clojure's recur - if you want to use it, your recur calls needs to be tail call optimizable, otherwise it fails to compile (sorry was long time since I've done any clojure, so could be wrong here).

At first it was - why such requirement? Then it hit me - this is not bad - it's actually enforcing that either you would succeed doing that, and the compiler would also do it, or not. So there is a strong guarantee there.

wtetzner · on April 23, 2020

Unfortunately Clojure’s recur only works for self tail calls.

dreamcompiler · on April 23, 2020

This does not appear to be general TCE (i.e. between two possibly different functions), but strictly recursive TCE. So with mutual recursion you could still overflow the stack. Nevertheless it seems like a big step forward.

waynesonfire · on April 22, 2020

i recently learned about project loom that's addressing tail call elimination among other things, exciting stuff

https://wiki.openjdk.java.net/display/loom/Main

vips7L · on April 22, 2020

Now only if pron would tell us when its ready!

pjmlp · on April 22, 2020

There is an updated EA build available.

dahfizz · on April 23, 2020

Doesn't Scala already do this?

smabie · on April 23, 2020

Yeah, but you need to explicitly annotate your functions with @rec.

freekh · on April 23, 2020

Not sure if this is accurate. You can enforce @tailrec if you want, but I believe it will try to in any case, even if it is not annotated.

yashap · on April 23, 2020

This is correct. @tailrec will simply cause a compile failure if the compiler DOESN’T optimize the function, but the compiler does still optimize tail recursion without the annotation.

It’s a weird but helpful annotation. For example, Scala won’t optimize methods that can be overridden (non-final and public/protected), which is easy to forget. So the annotation is a nice check/confirmation that the compiler is doing what you expect.

Gollapalli · on April 22, 2020

This is exciting. I'm guessing since it applies to bytecode it works for clojure as well? That would mean we'll be able to use (func (when cond function)) instead of (loop [] (recur)).

hencq · on April 22, 2020

The use of recur in Clojure is by design. It would have been quite trivial to optimize this type of self recursion, but since it doesn't do general tail call optimization, they introduced the recur keyword. See also [1]. By the way, you don't need to use loop in Clojure either, it also works with functions.

[1]. https://clojure.org/about/functional_programming#_recursive_...

Scarbutt · on April 22, 2020

This library just does self-calls, doesn't optimize(and can't cause of JVM limitations) calls to others functions (mutual recursion).

sipkab · on April 22, 2020

It works on bytecode, so yes, it should apply to all JVM based languages. (Though I haven't done tests for them)

exabrial · on April 22, 2020

Could/Should these be added to the Graal project as compiler optimizations?

sipkab · on April 22, 2020

Definitely not by default.

This optimization messes with the stack trace, so if you throw an exception in one of the recursively called methods, it will only show up only once in the stack trace. This may very well cause confusion in cases where the developer is not aware that such optimizations have been performed.

However, you can still perform the optimization on your Java bytecode classes, and then pass them to Graal. (If I understand Graal correctly.)

igravious · on April 22, 2020

Code gets optimized out of existence all the time and isn't a problem for exception handling code and debugging tools, I don't see why this would be any different.

http://cesquivias.github.io/blog/2015/01/15/writing-a-langua...

sipkab · on April 22, 2020

This optimization is simply a goto to the start of the function, and reassignment of the arguments. It doesn't prevent exception handling in any way. It doesn't interfere with debugging tools.

The point is that when your program throws an exception, and prints the stacktrace, you will only see a single stack frame for your method. Even if it recursively descended into itself multiple times. This is counter-intuitive as you don't see the evidence of the recursive calls in the stack trace. If you're unaware that tail recursion optimization was performed, you won't know why it seems like your method was only called once.

The program flow stays the same, only the diagnostic information of the program changes.

tgvaughan · on April 23, 2020

I spend a small amount of time programming in Scheme as a hobby, and honestly I would find the presence of a single stack frame resulting from a "recursive" call like this no more confusing than having a single stack frame containing a for loop. In both cases the state of the program at the point of the exception is specified by the variable bindings in a single stack frame rather than the depth of the call stack.

lalaithion · on April 22, 2020

Sure, but it wouldn't be that hard to do something like <677 tail calls elided> from the stack trace.

jmaa · on April 22, 2020

Well, yes, it would be that hard.

The whole idea with tail call elimination is that a stack trace shouldn't be able to see the frames for the tail calls, because the tail call reuses the frame. To add support for tracking the tail call stack frames, you would need to modify the compiler to output code that specifically keeps track of "elided tail call frames", and you would need to update the stack trace traversal code to be able to recognize the extra debug information.

Sure, it's something you could build into a new toolchain, but adding something like this to the JVM would be harder due to the constraints already placed on the JVM. Furthermore I don't know of such support in any toolchains, not even for Lua or Scheme, both of which guarantee tail call optimization. (If anybody has an example, please share!)

anonymoushn · on April 23, 2020

Lua's stack trace includes an indication that 1 or more tail calls happened, but not how many

  $ cat tailrec.lua
  function f(g,n)
    if n == 0 then
      error("oh no")
    end
    return g(g,n-1)
  end
  f(f,5)
  $ lua tailrec.lua
  lua: tailrec.lua:3: oh no
  stack traceback:
   [C]: in function 'error'
   tailrec.lua:3: in function 'f'
   (...tail calls...)
   tailrec.lua:7: in main chunk
   [C]: in ?

saagarjha · on April 22, 2020

JavaScriptCore keeps track of frames, even for JavaScript proper tail calls: https://webkit.org/blog/6240/ecmascript-6-proper-tail-calls-...

chrisseaton · on April 23, 2020

The essential words in that article are...

> There are some non-standard ECMAScript features in JavaScript that work differently in the presence of PTC.

Java has a strict spec, and the relevant methods which would break aren’t non-standard like they are in JavaScript.

If you’re willing to change the spec (I think Loom does) then yeah, but you can’t implement it as an ‘optimisation’ until then, because it’s not an optimisation if it changes behaviour.

saagarjha · on April 23, 2020

What would Java's be? Is there an interface besides Thread.getStackTrace()?

chrisseaton · on April 23, 2020

It’s also the format of the backtrack - I think that’s covered by the TCK.

chrisseaton · on April 23, 2020

The point of the tail call is to not have to maintain the activation that you are tail calling from.

So we’d need to reconstruct some information about the activation. How are you proposing we do that? From what information?

Are you suggesting we could store some metadata perhaps? Can you suggest a mechanism for the metadata we should use? There’s a tricky constraint before you suggest an idea! It has to use constant storage, because making storage not grow with the number of calls is the whole point in the first place.

ridiculous_fish · on April 23, 2020

JavaScriptCore has probably the best answer for this question, and it's the hilariously-named "Shadow Chicken" mechanism.

JSC's call stack is the C stack, and this stack has real TCO. There is a separate, heap-allocated "shadow" stack which records calls (but not returns). A single recycled C-stack frame may correspond to multiple shadow-stack calls, but between the two, a debugger can recover many tail calls.

It's best effort. The GC may collect frames if it chooses to. Highly optimized code won't touch the shadow stack. etc. But it's good enough for a debugger.

It's named "Shadow Chicken" because it's a shadow stack inspired by CHICKEN scheme, which uses the legendary Cheney-on-the-MTA strategy.

https://trac.webkit.org/changeset/199076/webkit

whateveracct · on April 23, 2020

Haskell has stack traces despite pervasive TCE.

noelwelsh · on April 22, 2020

Project Loom is added TCE to the JVM. It is apparently very close to release.

https://openjdk.java.net/projects/loom/

buremba · on April 22, 2020

It sounds to be a great idea for an IDE plugin. Looking at you, Intellij IDEA.