I've been hoping for tail call elimination[1] on the JVM for a long time, and historically it has been blocked by a security requirement[2]. Changing the number of stack frames between two calls would cause an error. Apparently that check is no longer there so TCE is possible!
[1] Using the word 'optimization' suggests that eliminating tail calls is optional, but for it to be really useful it has to be guaranteed. If you're writing in recursive style, your program becomes instantly incorrect if TCE goes away because many acceptable inputs will start causing stack overflows.
> in jdk classes [...] there are a number of security sensitive methods that rely on counting stack frames between jdk library code and calling code to figure out who's calling them.
That's... lame. BTW, tail code elimination could keep a count of tail calls for this. It wouldn't work so well for mutual recursion, but maybe that would suffice. In any case, nothing should be so dependent on stack traces!
How to do it for mutual recursion is sketched in my other comment. Java's stack inspection doesn't inherently require counting stack frames -- it's a choice. I agree it's a bad model anyway.
I know for JavaScript it was not implemented due to issues around debugging. I think for that reason if tail call optimization was done, you would need some sort of control over it such as an annotation to turn it on or off per method, maybe even a global flag for it.
It's actually possible to implement calling in such a way that TCO is compatible with Java's security architecture (or at least as it was back in the late 90s when I cared about this). IIRC you'd have a finite state machine along with your calls, which would transit when you're making a tail call of the sort which would matter to the security checks. In other words, the amount of security state needed by a tail-call chain was bounded, if you implemented the spec in a tail-call conscious way.
I thought stack inspection was a bad idea anyway, but you could implement it together with TCO.
I remember the rationale behind clojure's recur - if you want to use it, your recur calls needs to be tail call optimizable, otherwise it fails to compile (sorry was long time since I've done any clojure, so could be wrong here).
At first it was - why such requirement? Then it hit me - this is not bad - it's actually enforcing that either you would succeed doing that, and the compiler would also do it, or not. So there is a strong guarantee there.
This does not appear to be general TCE (i.e. between two possibly different functions), but strictly recursive TCE. So with mutual recursion you could still overflow the stack. Nevertheless it seems like a big step forward.
This is correct. @tailrec will simply cause a compile failure if the compiler DOESN’T optimize the function, but the compiler does still optimize tail recursion without the annotation.
It’s a weird but helpful annotation. For example, Scala won’t optimize methods that can be overridden (non-final and public/protected), which is easy to forget. So the annotation is a nice check/confirmation that the compiler is doing what you expect.
This is exciting. I'm guessing since it applies to bytecode it works for clojure as well? That would mean we'll be able to use (func (when cond function)) instead of (loop [] (recur)).
The use of recur in Clojure is by design. It would have been quite trivial to optimize this type of self recursion, but since it doesn't do general tail call optimization, they introduced the recur keyword. See also [1]. By the way, you don't need to use loop in Clojure either, it also works with functions.
This optimization messes with the stack trace, so if you throw an exception in one of the recursively called methods, it will only show up only once in the stack trace. This may very well cause confusion in cases where the developer is not aware that such optimizations have been performed.
However, you can still perform the optimization on your Java bytecode classes, and then pass them to Graal. (If I understand Graal correctly.)
Code gets optimized out of existence all the time and isn't a problem for exception handling code and debugging tools, I don't see why this would be any different.
This optimization is simply a goto to the start of the function, and reassignment of the arguments. It doesn't prevent exception handling in any way. It doesn't interfere with debugging tools.
The point is that when your program throws an exception, and prints the stacktrace, you will only see a single stack frame for your method. Even if it recursively descended into itself multiple times. This is counter-intuitive as you don't see the evidence of the recursive calls in the stack trace. If you're unaware that tail recursion optimization was performed, you won't know why it seems like your method was only called once.
The program flow stays the same, only the diagnostic information of the program changes.
I spend a small amount of time programming in Scheme as a hobby, and honestly I would find the presence of a single stack frame resulting from a "recursive" call like this no more confusing than having a single stack frame containing a for loop. In both cases the state of the program at the point of the exception is specified by the variable bindings in a single stack frame rather than the depth of the call stack.
The whole idea with tail call elimination is that a stack trace shouldn't be able to see the frames for the tail calls, because the tail call reuses the frame. To add support for tracking the tail call stack frames, you would need to modify the compiler to output code that specifically keeps track of "elided tail call frames", and you would need to update the stack trace traversal code to be able to recognize the extra debug information.
Sure, it's something you could build into a new toolchain, but adding something like this to the JVM would be harder due to the constraints already placed on the JVM. Furthermore I don't know of such support in any toolchains, not even for Lua or Scheme, both of which guarantee tail call optimization. (If anybody has an example, please share!)
Lua's stack trace includes an indication that 1 or more tail calls happened, but not how many
$ cat tailrec.lua
function f(g,n)
if n == 0 then
error("oh no")
end
return g(g,n-1)
end
f(f,5)
$ lua tailrec.lua
lua: tailrec.lua:3: oh no
stack traceback:
[C]: in function 'error'
tailrec.lua:3: in function 'f'
(...tail calls...)
tailrec.lua:7: in main chunk
[C]: in ?
> There are some non-standard ECMAScript features in JavaScript that work differently in the presence of PTC.
Java has a strict spec, and the relevant methods which would break aren’t non-standard like they are in JavaScript.
If you’re willing to change the spec (I think Loom does) then yeah, but you can’t implement it as an ‘optimisation’ until then, because it’s not an optimisation if it changes behaviour.
The point of the tail call is to not have to maintain the activation that you are tail calling from.
So we’d need to reconstruct some information about the activation. How are you proposing we do that? From what information?
Are you suggesting we could store some metadata perhaps? Can you suggest a mechanism for the metadata we should use? There’s a tricky constraint before you suggest an idea! It has to use constant storage, because making storage not grow with the number of calls is the whole point in the first place.
JavaScriptCore has probably the best answer for this question, and it's the hilariously-named "Shadow Chicken" mechanism.
JSC's call stack is the C stack, and this stack has real TCO. There is a separate, heap-allocated "shadow" stack which records calls (but not returns). A single recycled C-stack frame may correspond to multiple shadow-stack calls, but between the two, a debugger can recover many tail calls.
It's best effort. The GC may collect frames if it chooses to. Highly optimized code won't touch the shadow stack. etc. But it's good enough for a debugger.
It's named "Shadow Chicken" because it's a shadow stack inspired by CHICKEN scheme, which uses the legendary Cheney-on-the-MTA strategy.
[1] Using the word 'optimization' suggests that eliminating tail calls is optional, but for it to be really useful it has to be guaranteed. If you're writing in recursive style, your program becomes instantly incorrect if TCE goes away because many acceptable inputs will start causing stack overflows.
[2] See this answer: https://softwareengineering.stackexchange.com/a/272086