It's always amazed me with databases why they don't go the other way.
Create an operating system specifically for the database and make it so you boot the database.
Databases seem to spend most of their time working around the operating system abstractions. So why not look at the OS, and streamline it for database use - dropping all the stuff a database will never need.
That then is a completely separate project which is far easier to get started rather than shoehorning the database into an operating system thread model that is already a hack of the process model.
I'm not sure what you mean by OS.
If you mean a whole new kernel, it will take decades. They can support only small number of HW. If you mean a specialized linux distro, many companies does that already.
I don't know how that can make it easier the process based / thread based problem.
That was/is part of the promise of the whole unikernel thing, no?
https://mirage.io/ or similar could then let you boot your database. That said, it's not really taken off from what I can tell, so I'm guessing there's more to it than that.
Yeah indeed, that was my feeling on it as well. As much as Linux et al might get in ones way at times, what we get for free by relying on them is too useful to ignore for most tasks I think.
That said, perhaps at AWS or Google scale that would be different? I wonder if they've looked at this stuff internally.
You can get most of these speedups by using advanced APIs like IO_uring and friends, while still benefiting of using an OS, which is taking care of the messy and thankless task of hardware support.
> Create an operating system specifically for the database and make it so you boot the database.
(Others downthread have pointed out unikernels and I agree with the criticisms)
This proposal is an excellent Phd project for someone like me :-)
It ticks all of the things I like to work on the most[1]:
Will involve writing low-level OS code
Get to hyper-focus on performance
Writing a language parser and executor
Implement scheduler, threads, processes, etc.
Implement the listening protocol in the kernel.
I have to say, though, it might be easier to start off with a rump kernel (netBSD), then add in a specific RAW disk access that bypasses the OS (no, or fewer, syscalls to use it), create a kernel module for accepting a limited type of task and executing that task in-kernel (avoiding a context-swtich on every syscall)[2].
Programs in userspace must have the lowest priority (using starvation-prevention mechanisms to ensure that user input would eventually get processed).
I'd expect a non-insignificant speedup by doing all the work in the kernel.
The way it is now,
userspace requests read() on a socket (context-switch to kernel),
gets data (context-switch to userspace),
parses a query,
requests read on disk (multiple context-switches to kernel for open, stat, etc, multiple switches back to userspace after each call is complete). This latency is probably fairly well mitigated with mmap, though.
logs diagnostic (multiple context-switches to and from kernel)
requests write on client socket (context switch to kernel back and forth until all data is written).
The goal of the DBOS would be to remove almost all the context-switching between userspace and kernel.
[1] My side projects include a bootable (but unfinished) x86 OS, various programming languages, performant (or otherwise) C libraries.
[2] Similar to the way RealTime Linux calls work (caller shares a memory buffer with rt kernel module, populates the buffer and issues a call, kernel only returns when that task is complete). The BPF mechanism works the same. It's the only way to reduce latency to the absolute physical minimum.
> Create an operating system specifically for the database and make it so you boot the database.
I have the impression that this is similar to the adhoc filesystem idea; this seems in principle very advantageous (why employing two layers that do approximately the same thing on top of each other?), but in reality, when implemented (by Oracle), it lead to only a minor improvement (a few % points, AFAIR).
Create an operating system specifically for the database and make it so you boot the database.
Databases seem to spend most of their time working around the operating system abstractions. So why not look at the OS, and streamline it for database use - dropping all the stuff a database will never need.
That then is a completely separate project which is far easier to get started rather than shoehorning the database into an operating system thread model that is already a hack of the process model.