Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the benchmarks! :)

Indeed, 14GB seems really high for a 400MB Parquet file, that's a 35x multiple on the base file size.

Of course, the data is compressed on disk, but even the uncompressed data isn't that large so I believe indeed that quite a lot of optimisations are still possible.



It’s also the aggregation operation. If there are many unique groups it can take a lot of memory.

Newer DuckDbs are able to handle out of core operations better. But in general just because data fits in memory doesn’t mean the operation will — and as I said 8GB is very limited memory so it will entail spilling to disk.

https://duckdb.org/2024/03/29/external-aggregation.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: