Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The only time you need to consider a client-server setup is: Where you have multiple physical machines accessing the same database server over a network. In this setup you have a shared database between multiple clients.

This caveat covers "most cases". If there's only a single machine, then any data stored is not durable.

Additionally, to my knowledge SQLite doesn't have a solution for durability other than asynchronous replication. Arguably, most applications can tolerate this, but I'd rather just use MySQL with semi-sync replication, and not have to think through all of the edge cases about data loss.



people have been providing acid transaction semantics on single machines for 50 years

do you think ims/db ran on a cluster

the d in acid stands for durability

you're talking about pitr, which is what mysql semi-sync provides (and afaik you are correct that sqlite doesn't offer pitr)


That's not what the parent means by durability, they mean having your data survive any one of your machines being instantly nuked from orbit at the most inconvenient possible time.

Just having sync replication is enough, doesn't have to be fancy like semi-sync.


i know that

i'm correcting their terminology

'durability' already has a well-established, rigorously-defined meaning in this context, which is confusingly similar to pitr but definitely not the same thing

the downside of sync replication, as i understand it, is that although your data will survive any one of your machines being instantly nuked from orbit, your entire service will go down; semi-sync avoids this problem


But they’re using the other well-established meaning of durability a la how AWS and others describe their storage platforms. It’s pretty much the same thing but taken at whole system level. On that level an ACID database is as durable as the underlying storage medium which is sadly not very durable.


well, it's sort of arbitrary that the standard definition of durability requires your data to survive machine checks and kernel panics and power outages but not disk failures, isn't it

especially since nowadays in many data centers disk failures are far more common

(though full raid failures are less common)

but that is the standard definition


Michael, is that you?


The OPs point is that the single process ACID semantics of SQLite don't provide a durability guarantee that includes replication.

Other databases have a commit level that makes sure logs have been shipped.

For me this is an edge case in just about everything except financial transactions (the performance penalty of distributed transactions is pretty high!) but it is correct to note it.


Sounds like IMS runs on Z system mainframes with redundant hot-swappable CPUs and memory. They pay IBM a lot of money for the illusion of a single reliable machine, when a different OS would manage it as a small cluster.

We economize by using racks of cheap, flaky commodity hardware, but we have to be ready for one computer to die by failing each application over to another.


in 01973 the s/360 did not have hot-swappable cpus or memory

even in the 01990s i don't think ibm had such an offering, though tandem did (but it couldn't run ims)


it didn't run ims but it ran nonstop SQL instead which was a rdbms designed for their redundant hw architecture


right, i didn't mean to imply it didn't support acid


Our primary product is backed by sqlite, using BedrockDB to make it client/server and multinode. 10mil users, 2+ TB sqlite database, 6 database nodes.


My standard for any serious service is at least minimal redundancy for improved availability during failures. At least two webservers.


Does this practically improve the situation? The odds of two servers breaking at the same time for the same reasons seems very high. I actually can't think of a single example where the secondary sever would keep running.

Regression via a code or dependency update? Full disk? DNS is down? Too much load? All of these would bring down both servers in quick succession.

I guess something like a "once every 2 days" race condition could buy you some time if you had a 2nd server. But that's not a common error


Zero downtime upgrades, hardware fault, aws decides that specific instance needs to die. It also doesn't let you cheat statelessness very easily, so it's easier to scale horizontally.


Fair enough I guess. I don’t think you need two servers to do zero downtime upgrades. And the other issues are, imo, beyond the 0.99 uptime threshold that most services realistically have when you add in breakage due to upgrades.

I like your statelessness point. I suppose in your view it’s better to have the concentrated stateful core with stateless servers as opposed to just one stateful instance. Two instances mean you can’t easily store foo in memory and hope the server doesn’t die until it’s not needed there anymore. Counterpoint is that the extra layer of indirection is 10x slower and horizontal scaling won’t be needed as much if you don’t pay that price in the first place, but you are right, the temptation to store foo in memory would still be in its prime. The thing is, if one machine can scale, putting foo in memory isn’t actually bad. It’s only when things don’t scale that it’s bad.


> I don’t think you need two servers to do zero downtime upgrades

Absolutely not and I can't understand why I keep hearing this argument. Doing zero downtime upgrades on a single server have been simple since basically forever, run another process on another port, change config, restart front balancer gracefully and there you go.


Sure, it can be done, but that alone isn't enough reason to give up redundancy.


We use 3 node MSSQL and it happens all the time where the primary gets in a bad state (100% cpu, high latency etc)and simply failing over to another instance fully recovers.

It could be bad hardware, it could be bad query (left dangling/canceled on old instance), could be bad statistics and unlocks disk fragmentation etc etc.


I’m with you, but you could also make the case that most small web service businesses still run a single Postgres instance with no redundancy— just backups. So, you have a single point of failure. You can get quite decent uptime out of a single VPS.


Yes, but is multiple single points of failure better than one?


This project comes to mind https://github.com/rqlite/rqlite but I've never used it, and I'm not sure if it would count as "pure sqlite" like the op advocated anymore.


More compelling options are https://dqlite.io/ and https://litestream.io/.


https://litestream.io/ does streaming replication to S3 (or similar service). With this, you probably have better data durability than a small database cluster.


My understanding is that it provides asynchronous replication, so you'd still lose some data if you lose a machine. This is documented here https://litestream.io/tips/#data-loss-window


even with litestream, how do you do deployments? do you just terminate the process and re-launch it on the same machine?


I guess this is an interview level question. 1) Drain connections from your instance. Stop taking new connections and let all existing requests timeout. This could be by removing it from a load-balancer or dns. This ensures your litestream backup is "up-to-date". 2) Bring up the new deployment, it restores by litestream. When restore is complete, register it with the load balancer (if you are using one) or dns. 3) Delete the old instance.

Instance can be process, container or machine.


Yes... and all I see here is downtime. How do we do this without services failure? With a postgres db, you can spin up and down ec2 instances to your hearts content. Every 1-4 years, you can upgrade your db by using a replica with no down time.


Depends on what you are doing. But a hobby project should definitely just do downtime, lol.

If your db does lots of work, I buy new ones, install them in the datacenter, bring them up from backup.

BTW: you skipped a step for a busy database: warming it up with shadow traffic.

Under certain scale just switching like you describe will cause just as much downtime.


Yeah if I were a user of this application I would consider this a very poor solution...


litestream is more for data recovery, for replication LiteFS is better.


Came here to say this


Depends exactly what you mean with "durable". One machine with RAID10 can be pretty durable and solves the most common problems with disk issues, other risks can be managed too.


Ah, that brings back memories. Had 2 RAID 10 MySQL servers run for a decade without rebooting. One had an app db, the other a stats db, and the two replicated to each other.

Spinning disks and all, I was terrified to reboot them and have the boot disk fail (which was not on RAID).

The main disks failed once or twice which slowed the servers down considerably until rebuild of the raid finished. Very nervous time.


How did this situation come to an end? End of life for the service?


New machines with SSDs! Then took those guys out of service for good.


Durable in the database context refers to durability of transactions, i.e. your database does not lose a record of committed transactions. A good example is an ATM withdrawal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: