It wasn't too action action packed, see the number of title from that era and even before that used the SCUMM system. Running in a VM was SOP for point and click games.
There was a huge jump between the 8-bit and 16-bit hardware in particular - more memory to play with, faster CPU, and better graphics hardware all at once. (Of course, not all the platforms were equal, but the general trend was evident)
So while the 8-bit games generally needed micro-optimizations and completely unportable techniques just to do basic 2d rendering in real-time, the 16-bit generation and on often had some "room to waste" for those kinds of games.
When looking at the level of optimization needed, it's not really what the platform can do so much as combination of the platform + the desired kinds of processing. So Infocom was able to do a VM right from the start because their games were text-only(and later a few static pictures) while today's AAA games still have to do near-metal optimization because they set the explicit goal of pushing the hardware near to its limits, and even so they still manage to waste a lot since the overall project scope is larger.
Working in Flash in 2011, I am able to support a complete in-game editing toolset, via a popup console that contains a Lisp-like repl. The resources are there to do such things, and the amount of code needed to implement them is small, while the benefits are massive. It's a very different ball game.
I have not looked at this program, but it is possible that this is not a matter of 'could take the hit', but of 'had to take the hit'. CPU cycles weren't the only thing that was scarce in those days, and a application-specific VM could be what made this fit on a floppy or in 640k RAM.
There are a few huge base64 encoded data files, and the rest is just the bytecode, and converting the drawing functions to canvas.
I was surprised that "back in the day" you could take the hit of a vm - I always thought you had to get as close to the metal as possible!