Yeah, I've looked into PeerDB but in terms of self-hosting it's not really light...

saisrirampur · on Nov 4, 2024

Sai from PeerDB here. Temporal has been very impactful for us and a major factor in our ability to build a production-grade product that supports large-scale workloads.

At a high level, CDC is a complex state machine. Temporal helps building the state machine taking care of auto-retries/idempotency at different failure points and also aids in managing and observing it. This is very useful to identify root causes when issues arise.

Managing Temporal shouldn’t be complex. They offer a well-maintained, mature Docker container. From a user standpoint, the software is intuitive and easy to understand. We package the temporal docker container in our own Docker setup and have it integrated into our Helm charts. We’ve quite a few users smoothly using Enterprise (that we open sourced recently) and standard OSS!

https://github.com/PeerDB-io/peerdb/blob/main/docker-compose...

https://github.com/PeerDB-io/peerdb-enterprise

Let me know if there are any questions!

iyn · on Nov 5, 2024

Thanks for reaching out! Just to be clear, from what I can tell, both PeerDB and Temporal are great (and I’ve been hoping to learn Temporal for a while). At some point I considered self-hosting PeerDB but my impression was that it required multiple nodes to run properly and so it wasn’t budget friendly - this is also based on your pricing plans with $250 being the cheapest which suggests that it’s not cheap to host it (I’m trying to minimize costs until I have more customers). Please correct me if I’m wrong! Can you give me an example of a budget friendly deployment, e.g. how many EC2 instances for PeerDB would I need for one of the smaller RDS instances?

Given the acquisition by ClickHouse (congrats!), what can we expect for the CDC for sinks other than CH? Do you plan to continue supporting different targets or should we expect only CH focus?

Edit: also, any plans for supporting e.g. SNS/SQS/NATS or similar?

saisrirampur · on Nov 6, 2024

Great question! PeerDB can be just run on a single EC2 instance (using either Docker or Helm charts). A typical production-grade setup could use 4 vCores with 16GB RAM, You can scale up or down based on resource usage for your workload.

To elaborate more on the architecture (https://docs.peerdb.io/architecture), the flow-worker does most of the heavy lifting (actual data movement during initial load and CDC), while the other components are fairly lightweight. Allocating around 70-80% of provisioned resources to the flow-worker is a good estimate. For Temporal, you could allocate 20-25% of resources and distribute the rest to other components.

Our open-source offering (https://github.com/PeerDB-io/peerdb) supports multiple connectors (CH and non-CH). Currently, there aren’t any plans to make changes on that front!

iyn · on Nov 6, 2024

Thanks for guidance on sizing the instance(s), I'll experiment with that in the coming weeks.

merb · on Nov 4, 2024

1. i am not sure if the helm chart can be used for the oss version? 2. if a helm chart needs sh files, it’s already an absolut no-go since it won’t work with gitops that well.

assaxor · on Nov 5, 2024

Hi, the helm chart uses the OSS PeerDB images. The sh files were created to bootstrap the values files for easier (and faster) POCs.

You can append a `template` argument when running the script files which will lead to a set of values file being generated, which you can then modify accordingly. There is a production guide for the same as we have customers in production using GitOps (ArgoCD) with the provided charts (https://github.com/PeerDB-io/peerdb-enterprise/blob/main/PRO...)

shayonj · on Nov 4, 2024

Yeah, totally fair.

Are you ok with a NATS dependency ? Happy to work with you in supporting a new destination like ES.

Also looking to make NATS optional for smaller/simpler setups (https://github.com/shayonj/pg_flo/issues/21)

iyn · on Nov 4, 2024

Yes, I think NATS is reasonable — I don't have operational experience with it but based on my earlier reading it seems that it can be run on a smaller budget. Is this "regular" NATS or the Jetstream variant?

shayonj · on Nov 4, 2024

Perf! From testing on some of my staging workloads, the footprint isn't too high and I can get 5-6k messages/s. Esp. since there is only one worker instance involved (for strict ordering).

Yes, it does use NATS JetStream.