Postgres is usually completely happy enough with UUIDv4. Overall architecture (s...

sgarland · on July 6, 2024

If your app isn’t working with billions of rows, you really don’t need to be worrying about distributed anything. Even then, I’d be suspicious.

I don’t think people grasp how far a single RDBMS server can take you. Hundreds of thousands of queries per second are well in reach of a well-configured MySQL or Postgres instance on modern hardware. This also has the terrific benefit of making reasoning about state and transactions much, much simpler.

Re: last bit of performance, it’s more than that. If you’re using Aurora, where you pay for every disk op, using UUIDv4 as PK in Postgres will approximately 7x your IOPS for SELECTs using them, and massively (I can’t quantify it on a general basis; it depends on the rest of the table, and your workload split) increase them for writes. That’s not free. On RDS, where you pay for disk performance upfront, you’re cutting into your available performance.

About the only place it effectively doesn’t matter except at insane scale is on native NVMe drives. If you saturate IOPS for one of those without first saturating the NIC, I would love to see your schema and queries.

fovc · on July 6, 2024

Scale isn’t the only reason to have distributed systems. You could very well have a tiny but distributed system

wibblewobble125 · on July 6, 2024

Sometimes distribution is not for performance but tenant isolation for regulatory or general isolation purposes. I work in such an industry.

sgarland · on July 6, 2024

Fair point. You can still use monotonic IDs with these, via either interleaving chunks to each DB, or with a central server that allocates them – the latter approach is how Slack handles it, for example.