Why hasn’t someone made sqlitefs yet?

sedatk · on July 27, 2024

Because SQLite not being an FS is apparently the reason why it’s fast:

> The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files.

supriyo-biswas · on July 27, 2024

Additionally, it may also be that accesses to a single file allows the OS to efficiently retrieve (and IIRC in the case of Windows, predict) the working set allowing the reduction of access times; which is not the case if you open multiple files.

pjc50 · on July 27, 2024

Not so much working set, as you only have to check the access control once. Windows does a lot of stuff when you open a file, and creating a process is even worse.

squarefoot · on July 27, 2024

Here you go:)

https://github.com/narumatt/sqlitefs

And it seems quite interesting:

"sqlite-fs allows Linux and MacOS to mount a sqlite database file as a normal filesystem."

"If a database file name isn't specified, sqlite-fs use in-memory-db instead of a file. All data will be deleted when the filesystem is closed."

xav0989 · on July 27, 2024

Proxmox puts the VM configuration information in a SQLite database and exposes it through a FUSE file system. It even gets replicated across the cluster using their replication algorithm. It’s a bespoke implementation, but it’s a SQLite-backed filesystem.

written-beyond · on July 27, 2024

I remember reading someone's comments about how instead of databases using their own data serialisation formats for persistence and then optimizing writes and read over that they should just utilize the FS directly and let all of the optimisations built by FS authors be taken advantage of.

I wish I could find that comment, because my explanation doesn't do it justice. Very interesting idea, someone's probably going to explain how it's already been tried in some old IBM database a long time ago and failed due to whatever reason.

I still think it should be tried with newer technologies though, sounds like a very interesting idea.

pjc50 · on July 27, 2024

> instead of databases using their own data serialisation formats for persistence and then optimizing writes and read over that they should just utilize the FS directly and let all of the optimisations built by FS authors be taken advantage of.

The original article effectively argues the opposite: if your use case matches a database, then that will be way faster. Because the filesystem is both fully general, multi-process and multi-user, it's got to be pessimistic about its concurrency.

This is why e.g. games distribute their assets as massive blobs which are effectively filesystems - better, more predictable seek performance. Starting from the Doom WAD onwards.

For an example of databases that use the file system, both the mbox and maildir systems for email probably count?

vidarh · on July 27, 2024

ReiserFS was built on the premise that doing a filesystem right could get us to a point where the filesystem is the database for a lot of use cases.

It's now "somewhat possible" in that modern filesystem are overall mostly less broken about handling large number of small (or at least moderately small) files than they used to be.

But databases are still far more optimized for handling small pieces of data in the ways we want to handle data we put into databases, which typically also includes a need to index etc.

eknkc · on July 27, 2024

As far as I can remember MongoDB did not have any dedicated block caching mechanism in its earlier releases.

They basically mmap’ed the database file and argued that OS cache should do its job. Which makes sense but I guess it did not perform as well as any fune tuned caching mechanism.

sausagefeet · on July 27, 2024

Early MongoDB design choices are probably not great to call out for anything other than ignorance. mmap is a very naive view on how to easily work with data but it falls over pretty hard for any situation where ensuring your data doesn't get corrupted is important. MongoDB has come a long way, but its early technical decisions were not based on technical insight.

chipdart · on July 27, 2024

> Why hasn’t someone made sqlitefs yet?

What do you expect the value proposition of something loosely described as a sqlitefs to be?

One of the main selling points of SQLite is that you can statically link it into a binary and no one needs to maintain anything between the OS and the client application. I'm talking about things like versioning.

What benefit would there be to replace a library with a full blown file system?

shakna · on July 27, 2024

There's quite a number of sqlite FUSE implementations around, if you want to head in that direction.

dmurray · on July 27, 2024

No mention of how it performs when you need random access (seek) into files. Perhaps it underperforms the file system at that?

bhawks · on July 27, 2024

Probably because you wouldn't seek into a database row?

I guess querying by PK has some similarities but it is not as unstructured and random as a seek.

Also side effects such as sparse files do not mean much from a database interface standpoint.

simonw · on July 27, 2024

SQLite does have low-level C APIs for that:

https://www.sqlite.org/c3ref/blob_open.html

https://www.sqlite.org/c3ref/blob_read.html

I've not seen performance numbers for those. Could make for an interesting micro-benchmark.

bhawks · on July 27, 2024

POSIX interfaces (open, read, write, seek, close, etc) are very challenging to implement in an efficient/reliable way.

Using SQLite let's you tailor your data access patterns in a much more rigorous way and side step the POSIX tarpit.

euroderf · on July 27, 2024

There are at least two for macOS. But they run into trouble nowadays because FUSE wants kernel extensions.

sspiff · on July 27, 2024

Would you put sqlite in the kernel? Or using something like FUSE?

It seems to me that all the extra indirection from using FUSE would lead to more than a 35% performance hit.

Statically linking an sqlite into a kernel module and providing it with filesystem access seems like something non trivial to me.

k__ · on July 27, 2024

Could we expect performance gains from Sqlite being in the kernel?

skissane · on July 28, 2024

The idea of embedding SQLite in the kernel, reminds me of IBM OS/400 (the operating system of the IBM AS/400, nowadays known as IBM i). It contains a built-in relational database, although exactly how deeply integrated it is, is not entirely clear, due to lack of details of its inner workings.

Putting a relational database in the OS kernel is an interesting violation of standard layering. Obviously has the potential to unleash a lot of issues, but also could possibly enable novel features.

ruined · on July 27, 2024

just point it at your block device

01HNNWZ0MV43FF · on July 27, 2024

I don't think SQLite can run on a block device out of the box, it needs locking primitives and a second file for the journal or WAL plus a shared memory file in WAL mode

skissane · on July 28, 2024

> I don't think SQLite can run on a block device out of the box, it needs locking primitives and a second file for the journal or WAL

It ships with an example VFS which shows you how to do this: https://www.sqlite.org/src/doc/trunk/src/test_onefile.c