Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There're so many alternatives to Airflow nowadays that you really need to make sure that Airflow is the best solution (or even a solution) to your use case. There's plenty of use cases better resolved with tools like Prefect or Dagster, but I suppose the inertia to install the tool everyone knows about is really big.

BTW, here's https://github.com/pditommaso/awesome-pipeline a list of almost 200 pipeline toolkits.



I've had a wonderful experience with Dagster so far. I love that it can deploy to Airflow, Celery, Dask, etc, I love the Dagit server and UI and that I can orchestrate pipelines over HTTP, I love the notebook integration via Papermill, I love that it's all free (looking at Prefect here...), and the team is extremely responsive on both Slack and GitHub


Didn't Prefect open source their orchestration component recently, or am I mistaken? What part of Prefect is still closed?


Oh, I was saying it wasn't free. I think you're right and it is fully open source


It's not quite "open source", better labelled as "source available"


I've gone through a large number of these, and I think that Airflow is the best on Kubernetes for managed orchestration. The things I like are:

  * Source control for workflows/DAGs (using git-sync)
  * Tracking/retries with SLAs
  * Jobs run in Kubernetes
  * Web UI for management
  * Fully open source
I also use Argo Workflows, because I like its native handling of Kubernetes objects (e.g. the ability to manage and update a deployment as one of the steps), but it just doesn't have the orchestration/tracking side of things very well managed yet


I absolutely love Cadence/Temporal. It can do anything the others can do (and you could implement any other engine with it), without any DSLs.


I'd also like to +1 on Cadence/Temporal. It's a really great mental model and framework for thinking about workflows.

This is more of a proof of concept but it can also support DSLs (although we found the go client is easier to understand than DSLs): https://github.com/checkr/states-language-cadence


As a team lead of the Cadence/Temporal it is really amazing to hear such a feedback. Thank you!

BTW. You can AMA about the technology here.


Thank you for your work! I believe once it's properly documented and marketed, it will be very popular.


I think this is the link to cadence: https://github.com/uber/cadence

and temporal: https://github.com/temporalio/temporal


Yep, this is it. They're developed by the same team, Temporal is still beta but I believe the production version is coming out late-June.

I can't find an easy way to explain everything it does, but it pretty much allows you to write naive functions with no error handling, with month-long sleeps, auto-retries on unreliable function calls, etc etc.

It also gives you a web interface when you can inspect the running functions, and allows for external code (and other workflow functions) to signal/query the running workflows.


Just found out about Temporal and it looks interesting. I eager to jump in but our organization primarily uses Ruby. I know the big difference between Cadence vs Temporal is the fact they are using GRPC which seems much easier to adopt.


I also find the API to be more consistent on Temporal (I began implementing a project in Cadence before finding out about Temporal)


Temporal has an externally contributed Ruby client. It is going to be open sourced very soon.


Very exciting news. Is there a way that I can subscribe to get info on when it'll be released?


There is #ruby-sdk channel in the Temporal Slack: https://join.slack.com/t/temporalio/shared_invite/zt-c1e99p8...


Moving off of Airflow and to Cadence/Temporal was the single biggest relief in terms of maintainability, operational ease and scalability. Also +1 on being free of any DSL.


I'm currently moving from a custom yaml DSL-based engine to Temporal and it's the best architectural decision I've taken in a long time. I researched a lot and couldn't find anything that even came close to the freedom it provides.


Curious about this. Can you elaborate more? Also happy to hop on a call/zoom if you don't want to share publicly (email me at saguziel@gmail.com). I'm working on something similar.


> A curated list of awesome pipeline toolkits

Is this "curated"? It seems like an exhaustive "dump" of toolkits.


What curation do you feel it lacks?


To curate a collection is to be an editor, determining what to include and what to exclude. From a curated selection of tools, I expect to see a selection of the best tools, chosen by a knowledgable curator who has evaluated the tools in some way.

So for example, if you said "I want to start a library, donate any books you have at my house" that would not be a "curated" collection in my opionion. If you went through & evaluated the books, selecting only those that you'd personally recommend and discarding the rest, that would be a "curated" collection.

(Strictly speaking any maintenance of a collection even just "dump everything at my door at I'll put it on the pile" could be considered "curation" but to call a list of tools "curated" suggests there's some selection going on, and there does not appear to be on this list.)

EDIT: In fact, all of the definitions of "curate" here[0] start with the word "select," for example "Select, organize, and look after the items in (a collection or exhibition)." This fits my meaning. This definition[1] of "curator" requires one to merely have "care and superintendence of something," so in that sense the list is "curated" insofar as someone looks at PRs and clicks the "merge" button.

The Miriam-Webster definition, however, speaks of curating a more general "something" rather than a "collection." If you "curate" a statue by cleaning & protecting it, fine, you don't need to select the statue. However, when a collection is curated, in my opinion, this necessarily implies selection, not just maintenance.

0: https://www.lexico.com/en/definition/curate 1: https://www.merriam-webster.com/dictionary/curator


One way to get around this is to use Kedro https://github.com/quantumblacklabs/kedro, which is the most minimal possible pipeline interface, yet allows you to export to other pipeline formats and/or build your own exporters.


Interesting (and in the list). Reading the documentation looks like the only other format it allows to be exported is Airflow, isn't it?


Here's a nice podcast about Prefect that talks about its and Airflow's approachs:

https://softwareengineeringdaily.com/2020/04/29/prefect-data...


We used Luigi because airflow was to complicated to get an unsupportive IT department to install.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: