I’ve argued this before and I’ll argue it here now:
Modern computers are fast enough that in many cases “the only database you will ever need” can be files on the filesystem. For example “1 row = 1 file”.
It brings additional benefits as well: for low-write applications you can use git to get a history (+transactions if you store them in the log), backups are super easy, replication is trivial. For higher-write applications it gets more complex but you can still plan and implement most of the traditional DB scaling techniques (and even implement them one at a time as you go grow).
Computers are “stupid fast” now that we’ve gotten off platters.
> Name: Edward
I'm using grep and find and the Unix file system
> Name: Toni
Why do you need a database? I'm using CSV files!
> Name: Carlos
I came after a long journey to your website to seek enlightenment : am I fucked? And it didn't answer that question clearly. Let me re-phrase it: I first search an LDAP directory, then I remotely execute a quota status, after that I query a PostgreSQL database, and then I generate a .txt file with the timestamp as its name or a .csv file with an hash as its name, and then I look up the files from a web page, load it all to a multi-dimensional array, and generate a nice report, re-loading the entire file every time the user wants to, say, sort by another field. Something this complex can't ever possibly fuck up, can it?
Transactions, really? How do you lock rows, how do you have relations, how do you do joins? In fact, ext3 can only handle about 50,000 files in one directory. So you'll have to split up your "primary key" into letters like abc/def/foo like we do
The bad but valid answer to locking rows and doing relations is that you write the logic in to your application. The better answer is of course that if you’re doing specific types of complex things a DB is a better fit.
Honestly splitting in to nested subfolders is not big deal anymore, it’s a single function you can write even if you’re having a “.10X” day.
Store the index on the filesystem and populate it on write.
Not satire, though a bit sensationalistic to argue it’s a solid solution that’s usually overlooked because it’s “too slow”. I’m just pointing out it’s not actually slow any more.
Back in the days of scaled applications running on MySQL, DDR2 was 3200MB/s and people were so happy when their DB was small enough they could fit it in RAM.
I feel like wasting disk space is not a real issue any more: if you’re working with more than 1TB of data you (at least you better) have the resources to pay for bigger HDs which are at an almost trivial cost per TB.
Files in a single directory: It was discussed in another comment, but there’s a tried and true solution to that: simply nest your items in folders. For example with UUID as primary key you could have a folder structure of ‘(first 4 bytes)/(second four bytes)/(...so on)/(full uuid)’ where your nesting level is enough that you have no more than 50,000 files in each directory. For smaller pools you can reduce the layers (‘(first two bytes)/(full uuid)’ for example still gives you quite a few entries before any one folder gets to 50,000)
Modern computers are fast enough that in many cases “the only database you will ever need” can be files on the filesystem. For example “1 row = 1 file”.
It brings additional benefits as well: for low-write applications you can use git to get a history (+transactions if you store them in the log), backups are super easy, replication is trivial. For higher-write applications it gets more complex but you can still plan and implement most of the traditional DB scaling techniques (and even implement them one at a time as you go grow).
Computers are “stupid fast” now that we’ve gotten off platters.