Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SELECT COUNT(DISTINCT) has entered the challenge.


good point :) - we can re-aggregate HyperLogLog (HLL) sketches to get a pretty accurate NDV (Count Distinct) - see Query.farm's DataSketches DuckDB extension here: https://github.com/Query-farm/datasketches

We also have Bitmap aggregation capabilities for exact count distinct - something I worked with Oracle, Snowflake, Databricks, and DuckDB labs on implementing. It isn't as fast as HLL - but it is 100% accurate...


I remember BigQuery had Distinct with HLL accuracy 10 years ago but rather quickly replaced it with actual accuracy.

How would you compare this solution to BigQuery?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: