Indeed, 14GB seems really high for a 400MB Parquet file, that's a 35x multiple on the base file size.
Of course, the data is compressed on disk, but even the uncompressed data isn't that large so I believe indeed that quite a lot of optimisations are still possible.
It’s also the aggregation operation. If there are many unique groups it can take a lot of memory.
Newer DuckDbs are able to handle out of core operations better. But in general just because data fits in memory doesn’t mean the operation will — and as I said 8GB is very limited memory so it will entail spilling to disk.
Indeed, 14GB seems really high for a 400MB Parquet file, that's a 35x multiple on the base file size.
Of course, the data is compressed on disk, but even the uncompressed data isn't that large so I believe indeed that quite a lot of optimisations are still possible.