Maybe good to mention torn pages somewhere too? Both MySQL and Postgres jump thr...

eatonphil · on July 1, 2024

Thanks Adam! I think torn writes would still be caught via checksums, no? Although that may be later than you'd wish.

I'm not confident but from reading that page it seems that for Postgres at least, if it did do checksums it might not need to count on page-level atomic writes?

AdamProut · on July 1, 2024

Checksums can detect a torn page, but not always repair them. It's likely a good part of the database page is gone (i.e., an amount of data that matches the disk / file system atomic write unit size is probably missing). Torn page writes are a pretty common scenario too, so databases need to be able to fully recover from them - not just detect them and report a corruption (ie., just pull the power plug from the machine during a heavy write workload and you're likely to get one - it doesn't require a solar ray to flip a bit :) ).

eatonphil · on July 1, 2024

That's fair. In the post I did mention disk redundancy (and I guess I only implied recovery) as one additional level for safety. Which I think is what you're getting at too.

mattashii · on July 1, 2024

Disk redundancy won't help guarantee torn page protection if the writes across the redundant disks are not coordinated to have one start after the other finishes such that there is always have one copy of the page that is not currently being written. So writing to a RAID1 array won't help here without knowledge about how that raid1's writes work.