Why would you need traditional bounds checking with Wasm?
Just use MMU hardware to insert 2GB [0] no man's land below and above Wasm memory and only allow signed 32-bit indexing (-2^31 — 2^31-1).
This way the attacker can only read sandbox memory (and the useless 2 GB no man's land, mapped to pages full of zeroes or whatever).
When there's no speculation involved or the data is simply out of "speculative range", Spectre is toothless.
[0]: 2GB is just a basic example. More may be required if for example something like x86 SIB (Scale Index Base) is used for multiplying the index by 2, 4 or 8.
"Just" is never a good word in a technical discussion, especially around security vulnerabilities like Spectre.
That's a good idea, but there's also several reasons that may not be appropriate:
- WASM explicitly says that it may be extended to 64-bit indexing (more than 4GB of addressable memory is definitely useful for some things)
- Spending 4GB of (hopefully, virtual) memory on every WASM instance may be undesirable or impossible (e.g. 32-bit processor)
That said, it's very reasonable to impose restrictions on things running in ring-0, and wasmjit could well require a 64-bit machine with 32-bit WASM indices (which I imagine would be okay assumptions for things one would do with it anyway).
> - WASM explicitly says that it may be extended to 64-bit indexing (more than 4GB of addressable memory is definitely useful for some things)
In that case, just fall back to bitwise AND index clamping. A small performance penalty, but nothing major.
> - Spending 4GB of (hopefully, virtual) memory on every WASM instance may be undesirable or impossible (e.g. 32-bit processor)
Just page table entries. Wasting physical memory for that would be pointless. If the entries need to be mapped, on x86-64 it'd incur 4 kB, 2 MB or 1 GB total "wasted" memory, depending on which page size granularity you want to use. Of course, you could also simultaneously use this "wasted" memory for any non-sensitive data.
Well, mapping 2x 2GB memory using 4kB pages does take up hmm... 8 MB of RAM for the PTEs. So perhaps 2 MB pages would be optimal.
> In that case, just fall back to bitwise AND index clamping. A small performance penalty, but nothing major.
Masking the index will break code that is actually using the larger address space: running true 64-bit WASM code (as in, using >4GB of space) won't work, which is what I was referring to.
> page table entries
Indeed, hence the reference to virtual memory. In any case, because both x86-64 and ARM64 only have 48 bits of actually addressable space, that 4GB of overhead (plus, up to 4GB of actual addressable memory) only allows for 65536 (or half that) WASM instances. That's definitely a large number, but not one that is out of reach.
> Masking the index will break code that is actually using the larger address space: running true 64-bit WASM code (as in, using >4GB of space) won't work, which is what I was referring to.
You can also clamp for example at 33-37 bits, giving 8-128 GB array range.
> Just two extra instructions. No need to memory map or hard code the size of bounds.
Pretty neat idea! [Although the (register) dependency chain looks a bit nasty. 'and' will need 'sbb' to commit and 'sbb' will need to wait for 'cmp' to commit (flags register). But I guess the few/rare cases where this latency is really an issue can be dealt one-by-one basis.]
> No need to memory map
Well, using MMU can have performance benefits. Less repetitive bounds checking code and better performance in most scenarios. Both solutions have their strengths and issues, there are no silver bullets.
Good point on the MMU performance advantage and trade offs involved. When everyone's heads were on fire, made sense to indiscriminately mask off user controlled pointers. Now that the dust has settled a bit I imagine we'll see more usage of memory mapping tricks in performance critical sections.
And now you're limited to 2048 WASM instances in a single address space, purely because of virtual memory overhead. To be clear, I think the idea is very neat, but, like most things, comes with a variety of trade-offs that should be reasoned about rather than papered over.
Just use MMU hardware to insert 2GB [0] no man's land below and above Wasm memory and only allow signed 32-bit indexing (-2^31 — 2^31-1).
This way the attacker can only read sandbox memory (and the useless 2 GB no man's land, mapped to pages full of zeroes or whatever).
When there's no speculation involved or the data is simply out of "speculative range", Spectre is toothless.
[0]: 2GB is just a basic example. More may be required if for example something like x86 SIB (Scale Index Base) is used for multiplying the index by 2, 4 or 8.