>The only time you need to consider a client-server setup is: Where you have multiple physical machines accessing the same database server over a network. In this setup you have a shared database between multiple clients.
This caveat covers "most cases". If there's only a single machine, then any data stored is not durable.
Additionally, to my knowledge SQLite doesn't have a solution for durability other than asynchronous replication. Arguably, most applications can tolerate this, but I'd rather just use MySQL with semi-sync replication, and not have to think through all of the edge cases about data loss.
That's not what the parent means by durability, they mean having your data survive any one of your machines being instantly nuked from orbit at the most inconvenient possible time.
Just having sync replication is enough, doesn't have to be fancy like semi-sync.
'durability' already has a well-established, rigorously-defined meaning in this context, which is confusingly similar to pitr but definitely not the same thing
the downside of sync replication, as i understand it, is that although your data will survive any one of your machines being instantly nuked from orbit, your entire service will go down; semi-sync avoids this problem
But they’re using the other well-established meaning of durability a la how AWS and others describe their storage platforms. It’s pretty much the same thing but taken at whole system level. On that level an ACID database is as durable as the underlying storage medium which is sadly not very durable.
well, it's sort of arbitrary that the standard definition of durability requires your data to survive machine checks and kernel panics and power outages but not disk failures, isn't it
especially since nowadays in many data centers disk failures are far more common
The OPs point is that the single process ACID semantics of SQLite don't provide a durability guarantee that includes replication.
Other databases have a commit level that makes sure logs have been shipped.
For me this is an edge case in just about everything except financial transactions (the performance penalty of distributed transactions is pretty high!) but it is correct to note it.
Sounds like IMS runs on Z system mainframes with redundant hot-swappable CPUs and memory. They pay IBM a lot of money for the illusion of a single reliable machine, when a different OS would manage it as a small cluster.
We economize by using racks of cheap, flaky commodity hardware, but we have to be ready for one computer to die by failing each application over to another.
Does this practically improve the situation? The odds of two servers breaking at the same time for the same reasons seems very high. I actually can't think of a single example where the secondary sever would keep running.
Regression via a code or dependency update? Full disk? DNS is down? Too much load? All of these would bring down both servers in quick succession.
I guess something like a "once every 2 days" race condition could buy you some time if you had a 2nd server. But that's not a common error
Zero downtime upgrades, hardware fault, aws decides that specific instance needs to die. It also doesn't let you cheat statelessness very easily, so it's easier to scale horizontally.
Fair enough I guess. I don’t think you need two servers to do zero downtime upgrades. And the other issues are, imo, beyond the 0.99 uptime threshold that most services realistically have when you add in breakage due to upgrades.
I like your statelessness point. I suppose in your view it’s better to have the concentrated stateful core with stateless servers as opposed to just one stateful instance. Two instances mean you can’t easily store foo in memory and hope the server doesn’t die until it’s not needed there anymore. Counterpoint is that the extra layer of indirection is 10x slower and horizontal scaling won’t be needed as much if you don’t pay that price in the first place, but you are right, the temptation to store foo in memory would still be in its prime. The thing is, if one machine can scale, putting foo in memory isn’t actually bad. It’s only when things don’t scale that it’s bad.
> I don’t think you need two servers to do zero downtime upgrades
Absolutely not and I can't understand why I keep hearing this argument. Doing zero downtime upgrades on a single server have been simple since basically forever, run another process on another port, change config, restart front balancer gracefully and there you go.
We use 3 node MSSQL and it happens all the time where the primary gets in a bad state (100% cpu, high latency etc)and simply failing over to another instance fully recovers.
It could be bad hardware, it could be bad query (left dangling/canceled on old instance), could be bad statistics and unlocks disk fragmentation etc etc.
I’m with you, but you could also make the case that most small web service businesses still run a single Postgres instance with no redundancy— just backups. So, you have a single point of failure. You can get quite decent uptime out of a single VPS.
This project comes to mind https://github.com/rqlite/rqlite but I've never used it, and I'm not sure if it would count as "pure sqlite" like the op advocated anymore.
https://litestream.io/ does streaming replication to S3 (or similar service). With this, you probably have better data durability than a small database cluster.
My understanding is that it provides asynchronous replication, so you'd still lose some data if you lose a machine. This is documented here https://litestream.io/tips/#data-loss-window
I guess this is an interview level question.
1) Drain connections from your instance. Stop taking new connections and let all existing requests timeout. This could be by removing it from a load-balancer or dns. This ensures your litestream backup is "up-to-date".
2) Bring up the new deployment, it restores by litestream. When restore is complete, register it with the load balancer (if you are using one) or dns.
3) Delete the old instance.
Yes... and all I see here is downtime. How do we do this without services failure? With a postgres db, you can spin up and down ec2 instances to your hearts content. Every 1-4 years, you can upgrade your db by using a replica with no down time.
Depends exactly what you mean with "durable". One machine with RAID10 can be pretty durable and solves the most common problems with disk issues, other risks can be managed too.
Ah, that brings back memories. Had 2 RAID 10 MySQL servers run for a decade without rebooting. One had an app db, the other a stats db, and the two replicated to each other.
Spinning disks and all, I was terrified to reboot them and have the boot disk fail (which was not on RAID).
The main disks failed once or twice which slowed the servers down considerably until rebuild of the raid finished. Very nervous time.
Durable in the database context refers to durability of transactions, i.e. your database does not lose a record of committed transactions. A good example is an ATM withdrawal.
This caveat covers "most cases". If there's only a single machine, then any data stored is not durable.
Additionally, to my knowledge SQLite doesn't have a solution for durability other than asynchronous replication. Arguably, most applications can tolerate this, but I'd rather just use MySQL with semi-sync replication, and not have to think through all of the edge cases about data loss.