Faster than "the single-threaded C version." I found this a bit shocking - on a 32-core machine, this is an achievement? Obviously it's not Clojure's fault as Scala and Java are in the same range, but it was a surprise.
It's not really an 32-core machine. It's an 8 core UltraSPARC T1, each core supporting 4 hardware threads. Also this is the sort of problem that C absolutely kills at. C can just mmap the file and run straight across raw the bytes and take advantage of all kinds of cache tricks. For well-written C code the problem is basically IO bound and I was surprised I could catch up to it at all with the JVM.