You're right to quote some text that addresses the idea that "microkernels are s...

zaarn · on Oct 28, 2019

IPC is a tad slower than not IPC.

However, stuff like WebAssembly and other sandboxing methods can be used to leverage two processes into the same address space. Your filesystem driver then simply lives as a module in address space and it's a normal process. The IPC turns into a simple jump using a pointer value provided by the kernel (which depending on the trust level can provide parameter validation or can be a plain pointer to the correct function).

hvidgaard · on Oct 28, 2019

How would that work? The entire premise of Microkernels is that everything runs in a dedicated processes such that if one crashes, the system can continue all other processes and only need to recover that single process.

The cost of switching processes are not a problem of address space, the MMU makes sure that is not a problem as shared memory is a reasonable tradeoff for certain domains where performance is necessary. Normally you'd rather want use messages but that is a different topic.

In either case, the cost of a process switch is handling the registers and cache - that will not go away no matter how you do it, which is why multicore implementation with messages can actually turn out being faster. Less switching and more locality.

eadan · on Oct 28, 2019

The experimental Singularity OS [0] from Microsoft implemented processes in the same address space using so-called "Software Isolated Processes" or SIPs [1]. Singularity was a microkernel design with the filesystem, networking, drivers implemented outside the kernel in a variant of C# called Sing#. It seems Sing# had ownership semantics similar to Rust which allowed SIPs to share memory by transferring ownership through lightweight message passing.

[0] https://en.wikipedia.org/wiki/Singularity_%28operating_syste...

[1] https://courses.cs.washington.edu/courses/cse551/15sp/papers...

zaarn · on Oct 28, 2019

Well, with webassembly the process can run under ring0 but still be isolated as if it was running in ring3. The crash mechanism isn't different; if the driver crashes then the kernel can kill it and deallocate it's resources, a device manager process can then restart the driver. The advantage is that you can include small, audited and verified binaries that can run code as ring 0 to remove abstraction from accessing the raw hardware.

The cost of switching processes is significantly reduced when you don't need to switch privilege levels and not having to invalidate the TLB as well as not having to change address space at all will make a context switch not significantly more expensive than a function call.

You get compile-time isolation and can still take advantage of the MMU when needed.

And using such an implementation does not prevent you from implementing your driver such that it can run on each core and take advantage of it or even passes messages. An ethernet driver could still, for example, pass a message when the "send data to tcp socket" function is called while allowing another program to use the same function without any message passing, depending on what is better for your use case.

hvidgaard · on Oct 28, 2019

I'm not sure I believe it's a good idea to give up on the hardware protection, as that leaves it to the software implementation to ensure it's secure. If you compromise an application in user mode, it will not get you far without another exploit in something that runs in supervisor mode. The hardware makes that certain, and it's well verified. We've seen time and time again that simple buffer overflows exploit things, and the more the runs in supervisor mode, the larger the attack surface is.

If a driver runs in user mode, an exploit needs to exploit the hardware as well - and that is for all intents and purposes something that we see very rarely.

If the same driver runs in "software user mode", but executing as supervisor (basically inside an VM environment), we need constant security checks in software, and an exploit now have the VM code to further exploit, if successful that will automatically grant it supervisor access.

In both cases it's assumed that neither implementation has access to more interfaces than necessary for it to do it's work. For instance, a driver for a mouse does not need access to the disks.

zaarn · on Oct 28, 2019

The thing is, you don't have to give up hardware protection. If you don't trust code, you can still run it in ring 3 with all the associated overhead. The point is being able to choose how close an application runs to "no overhead" until you're at a level where the driver is a function call away.

From my experience, a lot of hardware is terribly insecure against exploits. Not necessarily the CPU but stuff like your GPU or HBAs, ethernet cards, etc.

With software containment, the advantage is that you can set it up that drivers need to declare their interfaces and privileges beforehand. In an ELF or WASM you have to declare imported and linked functions, it should not be difficult to leverage that to determine what a driver can effectively do. With WASM you get the added benefit that doing anything but using declared interfaces results in a compile-time error.

A driver can be written so that a minimal, audited interface exists to talk to the hardware almost directly with some security checks and then the WASM part that handles the larger logic parts and provides the actual functionality.

WASM isn't a supervisor, so exploits on VM code aren't that relevant. Exploiting the WASM compiler/interpreter/JIT is more interesting but those are exposed to the daily internet shitstorm exploits, so I think they are fairly safe.

hvidgaard · on Oct 28, 2019

I suppose it remains to be seen if someone can make a PoC. I'm skeptical but ultimately I do not know enough to decide either way.

> it should not be difficult to ...

Famous last words.

geofft · on Oct 28, 2019

It works by defining "process" in a way that doesn't imply "registers and cache." If the concern is the ability for one process to survive despite arbitrary misbehavior from another process (and not, say, Spectre-style attacks), software-based fault isolation like NaCl or PittSFIeld or using a restricted language runtime like wasm or Lua demonstrably works fine for that, no hardware context switch required.

hvidgaard · on Oct 28, 2019

That sounds like a lot of overhead for a kernel. There is a reason it's normally written in a mix of C and ASM.

zaarn · on Oct 28, 2019

It wouldn't be that much overhead in reality, you can still write the kernel in a mix of C and ASM (or more modern; Rust and ASM). The kernel doesn't even need to know about these isolation mechanisms, since you run stuff in ring 0, you can hook the appropriate APIs and interfaces in a process. The kernel itself would be insanely small and thusly more easy to defend.

snvzz · on Oct 28, 2019

>cost of context switch

Page 135.

http://sigops.org/s/conferences/sosp/2013/papers/p133-elphin...

eadan · on Oct 28, 2019

Nebulet [0] was an interesting take on this -- a Rust microkernel compiled to WebAssembly. Unfortunately, the project seems to have run out of steam.

[0] https://github.com/nebulet/nebulet

targonca · on Oct 28, 2019

So, you'd like to expose the whole kernel to Spectre-type attacks. Wonderful!

zaarn · on Oct 28, 2019

Well, for trusted code that doesn't expose any mechanism to run foreign code (ie browsers), spectre is largely a non-issue.

So the trusted core part of the OS can run without any spectre prevention, though you can still enable the various hardware protections available in the chicken bits.

And if it's necessary to protect against spectre attacks, you can use shim layers or even isolation into ring3 to take preventative measures. This allows leveraging performance were important and security where necessary.

If it's in webassembly, you can even run two versions of a driver; one with spectre-mitigations compiled in and one without, sharing one memory space and the kernel can choose to invoke either one depending on the call chain.

zzzcpan · on Oct 28, 2019

Trusted code has to be free from vulnerabilities to be immune, so it's still an issue even for trusted code. And I'm pretty sure neither webassembly nor other sandboxing methods can fully mitigate speculative attacks on out-of-order CPUs within the same address space, you'd need a programming language with a compiler designed from scratch for it.

zaarn · on Oct 28, 2019

Well, it doesn't have to be free from vulnerabilities, not any more than any other OS code. The sandboxed code that is running trusted (ie without trampolines and spectre-defenses) would still hold the guarantees given by the sandbox (WASM), which are pretty much on par with what a modern browser can do for JS and WASM. And keep in mind that both WASM and JS now have spectre-defenses, so there is no need for a PL from scratch for this.

zzzcpan · on Oct 28, 2019

> And keep in mind that both WASM and JS now have spectre-defenses, so there is no need for a PL from scratch for this.

As far as I remember they weren't able to defend from side channel attacks within the same process completely and decided to rely on process isolation instead, estimating it would be too much work to address all known spectre class vulnerabilities on their existing compilers and too hard to ensure for defenses not to be broken later by compiler developers.

AlphaSite · on Oct 28, 2019

Another point is that macOS and iOS are moving towards a user space driver stack and away from Kernel space.

snvzz · on Oct 28, 2019

Because, save few exceptions (e.g. timers, early debug serial...) running drivers in supervisor mode isn't very smart.

Drivers tend to have high bug density, and supervisor mode means high potential for damage.

Much can be mitigated by running drivers in userspace[0][1], especially with some IOMMU help.

[0] https://www.cs.vu.nl/~herbertb/papers/minix3_dsn09.pdf

[1] https://wiki.minix3.org/doku.php?id=www:documentation:reliab...

pjmlp · on Oct 28, 2019

Android and Windows as well.

panpanna · on Oct 28, 2019

Linux has been moving towards hybrid drivers for certain components for a while now.

I think you can run large parts of the filesystem and network stack in user-land now.

pjmlp · on Oct 28, 2019

There is a big difference is that the other OSes are doing it no matter how many pitchforks come into their way, while with Linux you need to find a security conscious distribution.

Matthias247 · on Oct 28, 2019

I think it’s hard to say that without knowing what’s going on in that companies and how many internal disagreements had been there.

And even if Apple now talks about user-Mode drivers - do we even know what percentage they aim for?

pjmlp · on Oct 28, 2019

Yes we do, as they presented at WWDC, all of them.

They are following a two release steps, in release N, the user space drivers for a specific class get introduced and the respective kernel APIs are automatically deprecated. In release N + 1, those deprecated APIs will be removed.

BurningCycles · on Oct 28, 2019

I disagree, these are VERY old papers (~30 years old) of comparisons containing very little data from what I can see at a quick glance.

I saw a talk a couple of years ago by Tanenbaum where he said he would be ok with Minix being 20% slower than a monolithic kernel like Linux, indicating that it was currently slower than that.

Granted, Minix has not seen the type of optimizations that popular monolithic kernel has due to lack of manpower.

So, I really look forward to seeing benchmarks made between monolithic kernels and new micro kernels like Google's Zircon, and Redox once they've had sufficient time to mature.

naasking · on Oct 28, 2019

> Tanenbaum where he said he would be ok with Minix being 20% slower than a monolithic kernel like Linux, indicating that it was currently slower than that.

It doesn't indicate anything of the sort. He was making a comment that the reliability and security benefits of microkernels are simply more important than performance in this mind.

gnufx · on Oct 28, 2019

They may be old, but what major advances in monolithic kernels have there been to invalidate them now? There seem to have been significant advances in microkernels. (I was glad to have been using fast microkernel-ish systems daily in the 1980s, and not VAX/VMS.)

BurningCycles · on Oct 28, 2019

>but what major advances in monolithic kernels have there been to invalidate them now?

What are the significant advances in micro kernels that does not apply to monolithic kernels ?

Also they are not only VERY old, they seem intentionally vague when it comes to actual data about the systems they are comparing against. In short, I welcome the new micro kernels so that we can see a comparison between modern monolithic and modern micro kernels and actually get a good representation of what the performance difference is.

Because if not for performance, there is no reason not to use a micro kernel.

gnufx · on Oct 30, 2019

The L4 work is usually quoted for relatively recent advances. Presumably monolithic kernels don't benefit from performance work that's specific to micro-kernel message passing, if performance was all that mattered.

I don't remember the OS4000 context switching time but it was fast in comparison with other systems of the 1980s, and it was very fast in actual use (running real-time and interactive processes). The performance of L4Linux is quoted within the typical margin of speculative execution mitigations relative to Linux. However, it's a strange idea to me that speed is all that matters for an OS, and not reliability, trustworthiness, etc.

jacquesm · on Oct 28, 2019

It always was a lot less important than proponents of monolithic kernels made it seem, there were two things that were left out of such discussions:

- paging is a very efficient way to copy a block of data from one process to another.

- the perceived speed of an interactive system has everything to do with responsiveness and very little with actual throughput. And responsiveness is associated with near real-time operation which happens to be something micro kernel based systems excel at.

zzzcpan · on Oct 28, 2019

With the advent of multi core CPUs monolithic kernels jumping in on doing slow shared memory concurrency kind of killed the idea of monolithic kernels to be fast. They can't be fast unless they do actor model, but it goes against the idea of monoliths, hence they can't do better than microkernels.