>The only time you need to consider a client-server setup is: Where you have mul...

kragen · on Feb 16, 2023

people have been providing acid transaction semantics on single machines for 50 years

do you think ims/db ran on a cluster

the d in acid stands for durability

you're talking about pitr, which is what mysql semi-sync provides (and afaik you are correct that sqlite doesn't offer pitr)

Spivak · on Feb 16, 2023

That's not what the parent means by durability, they mean having your data survive any one of your machines being instantly nuked from orbit at the most inconvenient possible time.

Just having sync replication is enough, doesn't have to be fancy like semi-sync.

kragen · on Feb 16, 2023

i know that

i'm correcting their terminology

'durability' already has a well-established, rigorously-defined meaning in this context, which is confusingly similar to pitr but definitely not the same thing

the downside of sync replication, as i understand it, is that although your data will survive any one of your machines being instantly nuked from orbit, your entire service will go down; semi-sync avoids this problem

Spivak · on Feb 16, 2023

But they’re using the other well-established meaning of durability a la how AWS and others describe their storage platforms. It’s pretty much the same thing but taken at whole system level. On that level an ACID database is as durable as the underlying storage medium which is sadly not very durable.

kragen · on Feb 16, 2023

well, it's sort of arbitrary that the standard definition of durability requires your data to survive machine checks and kernel panics and power outages but not disk failures, isn't it

especially since nowadays in many data centers disk failures are far more common

(though full raid failures are less common)

but that is the standard definition

xwowsersx · on Feb 16, 2023

Michael, is that you?

nl · on Feb 16, 2023

The OPs point is that the single process ACID semantics of SQLite don't provide a durability guarantee that includes replication.

Other databases have a commit level that makes sure logs have been shipped.

For me this is an edge case in just about everything except financial transactions (the performance penalty of distributed transactions is pretty high!) but it is correct to note it.

erik_seaberg · on Feb 16, 2023

Sounds like IMS runs on Z system mainframes with redundant hot-swappable CPUs and memory. They pay IBM a lot of money for the illusion of a single reliable machine, when a different OS would manage it as a small cluster.

We economize by using racks of cheap, flaky commodity hardware, but we have to be ready for one computer to die by failing each application over to another.

kragen · on Feb 16, 2023

in 01973 the s/360 did not have hot-swappable cpus or memory

even in the 01990s i don't think ibm had such an offering, though tandem did (but it couldn't run ims)

jpgvm · on Feb 16, 2023

it didn't run ims but it ran nonstop SQL instead which was a rdbms designed for their redundant hw architecture

kragen · on Feb 16, 2023

right, i didn't mean to imply it didn't support acid

fukawi2 · on Feb 16, 2023

Our primary product is backed by sqlite, using BedrockDB to make it client/server and multinode. 10mil users, 2+ TB sqlite database, 6 database nodes.

bagels · on Feb 16, 2023

My standard for any serious service is at least minimal redundancy for improved availability during failures. At least two webservers.

srcreigh · on Feb 16, 2023

Does this practically improve the situation? The odds of two servers breaking at the same time for the same reasons seems very high. I actually can't think of a single example where the secondary sever would keep running.

Regression via a code or dependency update? Full disk? DNS is down? Too much load? All of these would bring down both servers in quick succession.

I guess something like a "once every 2 days" race condition could buy you some time if you had a 2nd server. But that's not a common error

bagels · on Feb 16, 2023

Zero downtime upgrades, hardware fault, aws decides that specific instance needs to die. It also doesn't let you cheat statelessness very easily, so it's easier to scale horizontally.

srcreigh · on Feb 16, 2023

Fair enough I guess. I don’t think you need two servers to do zero downtime upgrades. And the other issues are, imo, beyond the 0.99 uptime threshold that most services realistically have when you add in breakage due to upgrades.

I like your statelessness point. I suppose in your view it’s better to have the concentrated stateful core with stateless servers as opposed to just one stateful instance. Two instances mean you can’t easily store foo in memory and hope the server doesn’t die until it’s not needed there anymore. Counterpoint is that the extra layer of indirection is 10x slower and horizontal scaling won’t be needed as much if you don’t pay that price in the first place, but you are right, the temptation to store foo in memory would still be in its prime. The thing is, if one machine can scale, putting foo in memory isn’t actually bad. It’s only when things don’t scale that it’s bad.

capableweb · on Feb 16, 2023

> I don’t think you need two servers to do zero downtime upgrades

Absolutely not and I can't understand why I keep hearing this argument. Doing zero downtime upgrades on a single server have been simple since basically forever, run another process on another port, change config, restart front balancer gracefully and there you go.

bagels · on Feb 17, 2023

Sure, it can be done, but that alone isn't enough reason to give up redundancy.

therealdrag0 · on Feb 17, 2023

We use 3 node MSSQL and it happens all the time where the primary gets in a bad state (100% cpu, high latency etc)and simply failing over to another instance fully recovers.

It could be bad hardware, it could be bad query (left dangling/canceled on old instance), could be bad statistics and unlocks disk fragmentation etc etc.

christophilus · on Feb 16, 2023

I’m with you, but you could also make the case that most small web service businesses still run a single Postgres instance with no redundancy— just backups. So, you have a single point of failure. You can get quite decent uptime out of a single VPS.

bagels · on Feb 16, 2023

Yes, but is multiple single points of failure better than one?

deathclassic · on Feb 16, 2023

This project comes to mind https://github.com/rqlite/rqlite but I've never used it, and I'm not sure if it would count as "pure sqlite" like the op advocated anymore.

pstuart · on Feb 16, 2023

More compelling options are https://dqlite.io/ and https://litestream.io/.

RcouF1uZ4gsC · on Feb 16, 2023

https://litestream.io/ does streaming replication to S3 (or similar service). With this, you probably have better data durability than a small database cluster.

adamkf · on Feb 16, 2023

My understanding is that it provides asynchronous replication, so you'd still lose some data if you lose a machine. This is documented here https://litestream.io/tips/#data-loss-window

itake · on Feb 16, 2023

even with litestream, how do you do deployments? do you just terminate the process and re-launch it on the same machine?

killingtime74 · on Feb 16, 2023

I guess this is an interview level question. 1) Drain connections from your instance. Stop taking new connections and let all existing requests timeout. This could be by removing it from a load-balancer or dns. This ensures your litestream backup is "up-to-date". 2) Bring up the new deployment, it restores by litestream. When restore is complete, register it with the load balancer (if you are using one) or dns. 3) Delete the old instance.

Instance can be process, container or machine.

itake · on Feb 16, 2023

Yes... and all I see here is downtime. How do we do this without services failure? With a postgres db, you can spin up and down ec2 instances to your hearts content. Every 1-4 years, you can upgrade your db by using a replica with no down time.

sroussey · on Feb 16, 2023

Depends on what you are doing. But a hobby project should definitely just do downtime, lol.

If your db does lots of work, I buy new ones, install them in the datacenter, bring them up from backup.

BTW: you skipped a step for a busy database: warming it up with shadow traffic.

Under certain scale just switching like you describe will cause just as much downtime.

sanderjd · on Feb 16, 2023

Yeah if I were a user of this application I would consider this a very poor solution...

guluarte · on Feb 16, 2023

litestream is more for data recovery, for replication LiteFS is better.

colobas · on Feb 16, 2023

Came here to say this

capableweb · on Feb 16, 2023

Depends exactly what you mean with "durable". One machine with RAID10 can be pretty durable and solves the most common problems with disk issues, other risks can be managed too.

sroussey · on Feb 16, 2023

Ah, that brings back memories. Had 2 RAID 10 MySQL servers run for a decade without rebooting. One had an app db, the other a stats db, and the two replicated to each other.

Spinning disks and all, I was terrified to reboot them and have the boot disk fail (which was not on RAID).

The main disks failed once or twice which slowed the servers down considerably until rebuild of the raid finished. Very nervous time.

srcreigh · on Feb 16, 2023

How did this situation come to an end? End of life for the service?

sroussey · on Feb 16, 2023

New machines with SSDs! Then took those guys out of service for good.

butlerm · on Feb 16, 2023

Durable in the database context refers to durability of transactions, i.e. your database does not lose a record of committed transactions. A good example is an ATM withdrawal.