True, but if you have a great idea for a new language you really don't want to s...

prog · on June 21, 2010

I agree that the VM can be a lot of work. However, some language features may need support from the VM. E.g. Scala and Clojure don't have full TCO support as JVM doesn't support that (yet!). If a languages uses an existing VM, it may need to work around the VM limitation. I suspect thats the reason Jython is slower than CPython even though JVM is much faster than the CPython VM.

IMO the big advantage of using a mature existing VM is the library. It should be possible to have at least a usable GC (say simple mark-and-sweep), code generator etc. without too much effort.

Scriptor · on June 21, 2010

It'd be a good idea to decide on what features you want most and whether the disadvantages of a VM offset its advantages. Although Clojure lacks TCO, it's already gotten very popular very quickly, maybe more than other lisps. Hard data for the JVM's role in this is the poll posted not long ago that showed many Clojure programmers were former Java programmers.

I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?

prog · on June 21, 2010

> I'm also working on a cross-language compiler and have a question about TCO, specifically tail recursion. Currently my language compiles a tail-recursed function's body into a while loop. Don't Scala and Clojure do the same, except using the actual bytecode?

I know Scala does it at function level (i.e. the function calls itself at tail position). I think Clojure have a 'recur' keyword to similar effect. The issue with JVM is that if f() calls g() at tail position and g() calls f() at tail position, it can't be optimized away (at least not without an undue amount of work so the advantage is lost). Clojure uses a trampoline[1] based approach to handle such a situation. I think Scala 2.8 also adds support for that. This works well with constant space, the only issue is that its a performance hit as its not done by the VM.

[1] http://richhickey.github.com/clojure/clojure.core-api.html#c...

dfox · on June 21, 2010

compiling self-recursive function into loop catches many cases of tail recursion but certainly not all. Real TCO requires some support from VM to be efficient (if you don't care about efficiency it is possible to fake it with exceptions).

pwpwp · on June 21, 2010

I don't know, a new language will take you years anyway until its useful. And it's a very cool hobby, so I'm not even interested in getting it done as quickly as possible. ;) I want to make it as good as I can.

And I get a warm fuzzy feeling from doing stuff myself (although I'll use the Boehm GC and compile to C for my next language).

chc · on June 21, 2010

How confident are you that a novice language designer will be able to do better cross-platform codegen than LLVM?

pwpwp · on June 21, 2010

I would rather that language designers get basic stuff like lexical scope right, before they care about performance.

And LLVM is effectively a huge black box, which I would caution any new language implementor against using. Sure, it may get you off the ground easier, but that's because you'll no longer be standing on the ground, you'll be standing on LLVM, a massive codebase you don't understand nothing about.

jules · on June 21, 2010

That's going to be the case regardless. If you compile to x86 machine code you're standing on a huge black box, but this time it's also an ugly, platform specific one.

You think people should rather worry about getting lexical scope right, then why should they worry about the low level details? Sure, they have to have at least some idea about how it works on the lowest level, but they don't have to know all the stupid man-given details. Choosing LLVM over x86 assembly is good for performance and productivity. Performance with LLVM will be better unless you are going to spend an extraordinary amount of time to build better low level optimizations, register allocation and code generation than LLVM.

chc · on June 21, 2010

I agree entirely that language designers should focus on getting language issues like scopes right — and that's why I don't think they should waste their time reinventing codegen over and over again unless there's a compelling need for it. I mean, if you're just playing around and don't really want to make a language, fine. But wasting your time on details that aren't useful is the best way to make sure your project never amounts to anything.

dedward · on June 21, 2010

Speaking from the systems side of things - it's plainly obvious when you get a piece of software where the developers don't understand the system level at all - it's obvious they only understand things at an abstract, programming level, and don't really understand how their software is going to work in the real world. (The software will do what it's supposed to, and they may have implemented some fancy algorithms, but it will be a PITA to debug, PITA to install, PITA for every sysadmin who has to touch it, and PITA to try to design systems to support it.)

The point of doing the low-level projects is to learn for yourself, not to literally create the best new language (but you never know.)

chc · on June 21, 2010

Sure, like I said, if your goal is just idle curiosity and a desire to learn, that's fine. But it's kind of moving the goalposts to frame that as "creating a programming language" rather than "fruitlessly messing around with the science and techniques behind programming languages."