hi sdairs, we did store the data on the worker nodes for the challenge, but not ...

otterley · 2025-10-24T19:39:15 1761334755

“Linux may cache the filesystem data” means there’s a non-zero likelihood that the data in memory unless you dropped caches right before you began the benchmark. You don’t have to explicitly load it into memory for this to be true. What’s more, unless you are in charge of how memory is used, the kernel is going to make its own decisions as to what to cache and what to evict, which can make benchmarks unreproducible.

It’s important to know what you are benchmarking before you start and to control for extrinsic factors as explicitly as possible.

sdairs · 2025-10-24T19:46:54 1761335214

Thanks for clarifying; I'm not trying to take anything away from you, I work in the OLAP space too so it's always good to see people pushing it forwards. It would be interesting to see a comparison of totally cold Vs hot caches.

Are you looking at distributed queries directly over S3? We did this in ClickHouse and can do instant virtual sharding over large data sets S3. We call it parallel replicas https://clickhouse.com/blog/clickhouse-parallel-replicas

tanelpoder · 2025-10-24T22:23:01 1761344581

(I submitted this link). My interest in this approach in general is about observability infra at scale - thinking about buffering detailed events, metrics and thread samples at the edge and later only extract things of interest, after early filtering at the edge. I’m a SQL & database nerd, thus this approach looks interesting.

jamesblonde · 2025-10-24T20:40:21 1761338421

With 2 modern NVMe disks per host (15 GB/s) and pcie 5.0, it should only take 15s to read 30 TB into memory on 63 hosts.

You can find those disks on Hetzner. Not AWS, though.

jiggawatts · 2025-10-25T01:36:19 1761356179

I don’t understand why both Azure and AWS have local SSDs that are an order of magnitude slower than what I can get in a laptop. If Hetzner can do it, surely so can they!

Not to mention that Azure now exposes local drives as raw NVMe devices mapped straight through to the guest with no virtualisation overheads.

jamesblonde · 2025-10-25T13:09:11 1761397751

It would undercut all their higher level services - like DynamoDB, CosmosDB, etc.

Databases would suddenly go BRRR in the cloud and show up cloud-native (S3) based databases for the high latency services they are.