Why would you need traditional bounds checking with Wasm? Just use MMU hardware ...

dbaupp · on Sept 26, 2018

"Just" is never a good word in a technical discussion, especially around security vulnerabilities like Spectre.

That's a good idea, but there's also several reasons that may not be appropriate:

- WASM explicitly says that it may be extended to 64-bit indexing (more than 4GB of addressable memory is definitely useful for some things)

- Spending 4GB of (hopefully, virtual) memory on every WASM instance may be undesirable or impossible (e.g. 32-bit processor)

That said, it's very reasonable to impose restrictions on things running in ring-0, and wasmjit could well require a 64-bit machine with 32-bit WASM indices (which I imagine would be okay assumptions for things one would do with it anyway).

vardump · on Sept 26, 2018

> - WASM explicitly says that it may be extended to 64-bit indexing (more than 4GB of addressable memory is definitely useful for some things)

In that case, just fall back to bitwise AND index clamping. A small performance penalty, but nothing major.

> - Spending 4GB of (hopefully, virtual) memory on every WASM instance may be undesirable or impossible (e.g. 32-bit processor)

Just page table entries. Wasting physical memory for that would be pointless. If the entries need to be mapped, on x86-64 it'd incur 4 kB, 2 MB or 1 GB total "wasted" memory, depending on which page size granularity you want to use. Of course, you could also simultaneously use this "wasted" memory for any non-sensitive data.

Well, mapping 2x 2GB memory using 4kB pages does take up hmm... 8 MB of RAM for the PTEs. So perhaps 2 MB pages would be optimal.

dbaupp · on Sept 26, 2018

> In that case, just fall back to bitwise AND index clamping. A small performance penalty, but nothing major.

Masking the index will break code that is actually using the larger address space: running true 64-bit WASM code (as in, using >4GB of space) won't work, which is what I was referring to.

> page table entries

Indeed, hence the reference to virtual memory. In any case, because both x86-64 and ARM64 only have 48 bits of actually addressable space, that 4GB of overhead (plus, up to 4GB of actual addressable memory) only allows for 65536 (or half that) WASM instances. That's definitely a large number, but not one that is out of reach.

vardump · on Sept 26, 2018

> Masking the index will break code that is actually using the larger address space: running true 64-bit WASM code (as in, using >4GB of space) won't work, which is what I was referring to.

You can also clamp for example at 33-37 bits, giving 8-128 GB array range.

comesee · on Sept 26, 2018

You can do masking in the same way Linux does it. It prevents "branch code bypass" without using an explicit size:

    cmp %bound, %ptr
    jae bad_ptr
    sbb %mask, %mask
    and %mask, %ptr

Just two extra instructions. No need to memory map or hard code the size of bounds.

See `array_index_mask_nospec` in https://github.com/torvalds/linux/blob/master/arch/x86/inclu...

vardump · on Sept 27, 2018

> Just two extra instructions. No need to memory map or hard code the size of bounds.

Pretty neat idea! [Although the (register) dependency chain looks a bit nasty. 'and' will need 'sbb' to commit and 'sbb' will need to wait for 'cmp' to commit (flags register). But I guess the few/rare cases where this latency is really an issue can be dealt one-by-one basis.]

> No need to memory map

Well, using MMU can have performance benefits. Less repetitive bounds checking code and better performance in most scenarios. Both solutions have their strengths and issues, there are no silver bullets.

comesee · on Sept 27, 2018

Good point on the MMU performance advantage and trade offs involved. When everyone's heads were on fire, made sense to indiscriminately mask off user controlled pointers. Now that the dust has settled a bit I imagine we'll see more usage of memory mapping tricks in performance critical sections.

dbaupp · on Sept 26, 2018

And now you're limited to 2048 WASM instances in a single address space, purely because of virtual memory overhead. To be clear, I think the idea is very neat, but, like most things, comes with a variety of trade-offs that should be reasoned about rather than papered over.