Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Data sources are usually in Kafka, or other operational databases like Postgres or MySQL

1. Table A : fact events, high-throughput (10k~1M eps), high-cardinality

2. Table B, C, D : couple of dimension tables (fast or slow changing).

The use case is straightforward : join/enrich/lookup everything into one big flattened, analytics-friendly table into ClickHouse.

What’s the best pipeline approach to achieve this in real-time and efficiently?



Consistently we heard about ClickHouse has very limited materialized views that can't handle real-time pipeline fast efficiently enough. would love to see more comments here.


there are some limitations as I know:

1. Insert Performance Degradation

Users frequently complain that materialized views significantly slow down insert performance, especially when having multiple MVs on a single table.

2. Streaming Data Patterns

This is critical for ClickHouse materialized views. Streaming data often arrives in frequent, small batches, but ClickHouse performs best when ingesting data in larger batches. The materialized views' transformation query runs synchronously within the INSERT transaction for every single batch, making the fixed overhead disproportionately large for small batches

3. Block-Level Processing Limitations

MVs in ClickHouse operate only on the data blocks being inserted at that moment. When performing aggregation, a single group from the original dataset may have multiple entries in the target table since grouping is applied only to the current insert block.

4. JOIN Limitations and Memory Issues

Materialized views with JOINs are problematic because MVs only trigger on the left-most table. It's also inefficient to update the view upon the right join table since it needs to recreate a hash table each time, or else keeping a large hash table and consuming a lot of memory.

5. Reprocessing historical data requires manual ALTER TABLE operations.

6. Each materialized view will create a new part from the block over which it runs - potentially causing the "Too Many Parts" issue




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: