It's impressively compliant, considering it's just a one man project! Almost as fully featured as Boa, plus or minus a few things. And generally faster too, almost double the speed of Boa on some benchmarks.
First time seeing a few of the engines listed here - based on this table I'm surprised Samsung's Escargot hasn't gotten more attention. LGPL, 100% ES2016+ compliance, top 10 perf ranking, 25% the size of V8 & only 318 Github stars.
A quick HN search shows 0 comments for Escargot - is there some hidden problem with this engine not covered in this table?
Because it pretty much only makes sense for Samsung TVs and smart appliances since it scores 3% on the benchmarks vs V8.
It's too big for most embedded devices, too slow for general computing, and if you can run something 25% the size of V8, you can probably just run V8. If for some reason that size and speed profile does fit your niche and you aren't Samsung wanting to use their own software, then Facebook's Hermes looks better in terms of licensing, speed and binary size and writing compatible JS for it isn't that hard.
Personally, I'm more impressed with https://github.com/Hans-Halverson/brimstone - it is faster, nearly just as full featured (almost full ES2025) and last but not least, a single person project.
Yeah! I found out about Brimstone just the other day! Its definitely interesting! One optimization that they have that Boa needs to implement is ropes for our string type :)
A solver running at 50ms instead of 1ms I would say is practically imperceptible to most users, but I don't know what time span you are measuring with those numbers.
$ time ./v8 /bench/yt-dlp.js | md5sum -
a730e32029941bf1f60f9587a6d9554f -
real 0m0.252s
user 0m0.386s
sys 0m0.074s
$ time ./quickjs /bench/yt-dlp.js | md5sum -
a730e32029941bf1f60f9587a6d9554f -
real 0m2.280s
user 0m2.507s
sys 0m0.031s
So about 10x slower for the current flavor of YouTube challenges: 0.2s -> 2.2s.
A few more results on same input:
spidermonkey 0.334s
v8_jitless 1.096s => about the limit for JIT-less interpreters like quickjs
graaljs 2.396s
escargot 3.344s
libjs 4.501s
brimstone 6.328s
modernc-quickjs 12.767s (pure Go port of quickjs)
fastschema-qjs 1m22.801s (Wasm port of quickjs)
boa 1m28.070s
quickjs-ng 2m49.202s
node(v8) : 1.25s user 0.12s system 154% cpu 0.892 total
quickjs : 6.54s user 0.11s system 99% cpu 6.671 total
quickjs-ng: 545.55s user 202.67s system 99% cpu 12:32.28 total
A 5x slowdown for an interpreted C JS engine is pretty good I think, compared to all the time, code and effort put into v8 over the years!
It works incredibly well with Linux VMs, my daily driver. I plug in a USB keyboard, external monitor and Can't Believe It's Not Linux. Only occasionally when I need to use the laptop screen/keyboard does macOS bother me and remind of it real self.
There's around 10-15% performance penalty for VMs (assuming you use arm64 guests), but the whole system is just so much faster and well built than anything Intel-based to day, that it more than compensates.
For Windows, it's lacking accelerated video drivers, but VMWare Fusion is an ok free alternative - I can totally play AAA games from last decade. Enjoy it until broadcom kills it.
Funny you say that, as a long term Linux user who was in the exact same boat as you, I actually find Mac M4 my best Linux laptop purchase ever so far. I think what you're missing is its virtualization story. Put UTM on it, and you're back to a familiar environment, just on much nicer hardware. The first time I booted into my Linux desktop on it, I was blown away by how much snappier it felt compared to my ~5 year old top-of-the-line PC build.
I'm as much of a fan of Mac OS as the next Linux user here, but it's a very decent hypervisor and Stuff Just Works out of the box, for the most time. No more screwing around with half-baked qemu wrappers for me, vfio, virgl and what not. And running stuff without virtualization is a non-starter for me, I've been concerned about supply chain attacks before it became fashionable. Of course it would be even nicer if new Macs could run Linux natively, and I hope Asahi project will succeed with that, but until then I'm pretty happy running Linux desktop virtualized on it.
arm64 support is very decent across all the different OS now, I hardly miss Intel. I can even reasonably play most AAA games up to maybe mid-2010s on a Windows VM that's just a three finger swipe away from my main Linux desktop.
They allow, but Apple's policy is to lock down that ability pretty much just to Safari/WKWebView. If you could transpile/compile your program to JS or WASM and run it through one of these blessed options, it should get JIT'ted.
> The initial naive application didn’t even yield much gains. Only after a bunch of optimizations that it really shines: a 30-46% speedup compared to the computed goto interpreter.
Looks like quite a lot of complexity for such gain. 30-40% is roughly what context-threading would buy you [1]. It takes relatively little code to implement - only do honest assembly for jumps and conditional branches, for other opcodes just emit a call to interpreter's handler. Reportedly, it took Apple just 4k LOC to ship first JIT like that in JavaScriptCore [2].
Also, if you haven't seen it, musttail + preserve_none is a cool new dispatch technique to get more mileage out of plain C/C++ before turning to hand-coded assembly/JIT [3]. A step up from computed goto.
I wonder how tricks that rely on compiler extensions (e.g., computed goto, musttail, and preserve_none) compare against the weval transform? The weval transform involves a small language extension backed by a larger change to the compiler implementation.
I suppose the downside of the weval transform is that it is only helpful for interpreters, whereas the other extensions could have other use cases.
Well, runtime/warmup costs seems like one obvious downside to me - weval would add some non-trivial compilation overhead to your interpreter (unrolling of interpreter loop, dead code elimination, optimizing across opcodes boundaries - probably a major source of speedup). Great if you have the time to precompile your script - only have to pay those costs once. It also helps if your host language's runtime ships with an optimizing compiler/JIT you can piggyback on (WASM runtime in weval's paper, JVM in Graal's case) - these things take space. But sometimes you might just have a huge pile of code that's not hot enough to be worth optimizing and you would be better off with a basic interpreter (that can benefit from computed gotos or tail-call dispatch with zero runtime overhead). Octane's CodeLoad or TypeScript benchmarks are such examples - GraalJS does pretty poorly there.
Partial evaluation subsumes a lot of other compiler optimizations, like constant folding, inlining and dead code elimination, so it wouldn't just find application with interpreters.
My favorite trick: NaN boxing. NaN's aren't just for errors, but also for smuggling other data inside. For a double, you have whopping 53 bits of payload, enough to cram in a pointer and maybe a type tag, and many javascript engines do (since JS numbers are double's after all)
It is also how RISC-V floating point registers are required to store floats of smaller widths. Eg if your CPU supports 64-bit floats (D extension), its FPU registers will be 64-bit wide. If you use an instruction to load a 16-bit float (Zfh extension) into such a register, it will be boxed into a negative quiet NaN with all bits above the lower 16 bits set to 1.
You can put stuff into the sign bit too, that makes 53. Yeah, the lower 52 bits can't all be zero - that'd be ±INF, but the other 2^53-2 values are all yours to use.
It's possible for the sign bit of a NaN to be changed by a "non-arithmetic" operation that doesn't trap on the NaN, so don't put anything precious in there.
It's impressively compliant, considering it's just a one man project! Almost as fully featured as Boa, plus or minus a few things. And generally faster too, almost double the speed of Boa on some benchmarks.
reply