Thanks. Yeah, fast anything code tends to be not very pretty to look at and I'm not a very experienced lisper. But fortunately in most real-world programs the part that is performance critical tends to be very small compared to the rest of the program.
Oh, I didn't mean it as a criticism -- I'm afraid it has to be that way. My Common Lisp code that has been optimized looks similar (sprinkled with type declarations in various places).
Well being a seasonned clojurist, i almost never dug into such low level optimizations, but nonetheless , i find his code ok and very readable.
The only function sacrificing some level of readability (and wich i still find quite easy to read) is the new split function.
I think it is very much a question of being accustomed to the particular style of code lisps enforce. On this topic, the only grip i've got with his functions on the style side is, lots of java interop wich lenghtens the whole thing quite a bit, and big bulky functions that would benefit being cutted in several pieces.
Faster than "the single-threaded C version." I found this a bit shocking - on a 32-core machine, this is an achievement? Obviously it's not Clojure's fault as Scala and Java are in the same range, but it was a surprise.
It's not really an 32-core machine. It's an 8 core UltraSPARC T1, each core supporting 4 hardware threads. Also this is the sort of problem that C absolutely kills at. C can just mmap the file and run straight across raw the bytes and take advantage of all kinds of cache tricks. For well-written C code the problem is basically IO bound and I was surprised I could catch up to it at all with the JVM.
It also shows, unfortunately, that fast Lisp code isn't always pretty to look at.