Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hybrid Hash Join – breaking the memory wall of streams join (timeplus.com)
2 points by tingfirst 24 days ago | hide | past | favorite | 1 comment


All streaming processors face the same fundamental problem:

Streaming joins require maintaining state for both sides of the join

High-cardinality data (millions of unique keys) means huge state sizes

Traditional approach: Keep everything in memory will make memory exhausted

The high-cardinality join memory problem isn't unique to Timeplus. Apache Flink also uses hybrid hash joins that spill to disk (RocksDB) when memory fills, Materialize shares indexed state across multiple queries (but still requires keeping full datasets in memory), and RisingWave stores state in cloud object storage (S3/GCS) with LRU caching for hot data. What makes Timeplus different is its purpose-built optimization for the Pareto Principle, where a tiny fraction of data generates the vast majority of activity - keeping hot data in memory and cold data on disk for dramatic memory savings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: