> I confess that somewhat surprises me, as I would expect getting machine code g...

> I confess that somewhat surprises me, as I would expect getting machine code generated would be a main source of speed up for a JIT?

My understanding is that the basic copy-and-patch approach without any other optimizations doesn’t actually give that much. The difference between an interpreter running opcodes A,B,C and a JIT emitting machine code for opcode sequence A,B,C is very little - the CPU running the code will execute roughly the same instructions for both, the only difference is that the jit avoids doing an op dispatch between each op - but that’s already not that expensive due to jump threading in the interpreter. Meanwhile the JIT adds an extra possible cost of more work if you ever need to jump from JIT back to fallback interpreter.

But what the JIT allows is to codegen machine code corresponding to more specialized ops that wouldn’t be that beneficial in the interpreter (as more and smaller ops make it much worse for icaches and branch predictors). For example standard CPython interpreter ops do very frequent refcount updates, while the JIT can relatively easily remove some sequences of refcount increments followed by immediate decrements in the next op.

Or maybe I misunderstood the question, then in other words: in principle copy-and-patch’s code generation is quite simple, and the true benefits come from the optimized opcode stream that you feed it that wouldn’t have been as good for the interpreter.