500MB for such a complete product is tiny! The largest CDN in the world ships 10GB+ binaries. 1G is common for large code bases if you link things statically. The bloat tends to come from transitive dependencies, most direct code is small in size
Proton is a lightweight streaming processing "add-on" for ClickHouse, and we are making these delta parts as standalone as possible. Meanwhile contributing back to the ClickHouse community can also help a lot.
that's a littlebit of a stretch. when you say "no shortage" - outside of redpanda what product exists that actually compete in all deployment modes?
it's a misconception that redpanda is simply a better kafka. the way to think about it is that is a new storage engine, from scratch, that speaks the kafka protocol. similar to all of the pgsql companies in a different space, i.e.: big table pgsql support is not a better postgres, fundamentally different tech. you can read the src and design here: https://github.com/redpanda-data/redpanda. or an electric car is not the same as a combustion engine, but only similar in that they are cars that take you from point a to point b.
this seems truthy but isn't in practice. a lot of work, my perf optimization team @ redpanda does (yes we have a full team chasing tail latencies) is spent on CPU optimization, debouncing, amortizing costs, metadata lookups, hash tables, profilers, etc. so there is a lot of additional work after the IO layer which a decent async eventing thing can get you to get good perf.
What I mean to say is that in my experience the time spent round-tripping to Kafka is more than the time it takes Kafka to do whatever I'm asking it to do. So at least for my use-cases a faster Kafka would be of no benefit.
Now if it means I can run fewer smaller brokers that's awesome.
And I guess the idea is furthermore that just because you don't fsync after every write does not mean that you haven't fsync-ed before responding to the user request saying the data is stored durably. I assume that you do actually guarantee the flush before returning a success to the user.
Yes, this is exactly how it works. It effectively batches the sync part of multiple user requests, so none of those user requests are ack-ed until the sync completes. It is a low-latency analogue to checkpointing storage write-backs to minimize the number of round-trips to the I/O syscalls.
Wrong. Use something like aio or io_uring to submit 4 asynchronous writes in a single system call and you'll get way better performance. The kernel has all kinds of infrastructure that tries to coalesce and batch things that the write()+fsync() syscalls make horribly inefficient, and in the modern world of really fast nvme drives, you want to make your calls into the device driver as efficient as possible. You'll burn far fewer CPU cycles by giving the kernel the whole set of i/os in one single go, burn less on synchronization. It really is better to avoid write() + fsync().
what makes you think that we haven't tested this? seastar's io engine is defaulted to io-uring... there are about 10 things here to comment on. on optimized kernels we disable block coalescing at the kernel level, second we tell the kernel to use fifo, etc. these low hanging fruit was already something we've done for a very very long time.
Right. But that line of thinking gets you to a place where one is like “how does anything work” haha.
In general this was a response to Confluent attempting to dismiss fsync() as a neat trick rather than an actual safety problem and why when we benchmarked we showcase the numbers we did.
the opposite should be true tho. opt-in for unsafe. you are the minority if you read the docs, let's be real :) most ppl never read the full docs. of the ppl i chat w/ is more like 5%