There're so many alternatives to Airflow nowadays that you really need to make sure that Airflow is the best solution (or even a solution) to your use case. There's plenty of use cases better resolved with tools like Prefect or Dagster, but I suppose the inertia to install the tool everyone knows about is really big.
I've had a wonderful experience with Dagster so far. I love that it can deploy to Airflow, Celery, Dask, etc, I love the Dagit server and UI and that I can orchestrate pipelines over HTTP, I love the notebook integration via Papermill, I love that it's all free (looking at Prefect here...), and the team is extremely responsive on both Slack and GitHub
I've gone through a large number of these, and I think that Airflow is the best on Kubernetes for managed orchestration. The things I like are:
* Source control for workflows/DAGs (using git-sync)
* Tracking/retries with SLAs
* Jobs run in Kubernetes
* Web UI for management
* Fully open source
I also use Argo Workflows, because I like its native handling of Kubernetes objects (e.g. the ability to manage and update a deployment as one of the steps), but it just doesn't have the orchestration/tracking side of things very well managed yet
Yep, this is it. They're developed by the same team, Temporal is still beta but I believe the production version is coming out late-June.
I can't find an easy way to explain everything it does, but it pretty much allows you to write naive functions with no error handling, with month-long sleeps, auto-retries on unreliable function calls, etc etc.
It also gives you a web interface when you can inspect the running functions, and allows for external code (and other workflow functions) to signal/query the running workflows.
Just found out about Temporal and it looks interesting. I eager to jump in but our organization primarily uses Ruby. I know the big difference between Cadence vs Temporal is the fact they are using GRPC which seems much easier to adopt.
Moving off of Airflow and to Cadence/Temporal was the single biggest relief in terms of maintainability, operational ease and scalability. Also +1 on being free of any DSL.
I'm currently moving from a custom yaml DSL-based engine to Temporal and it's the best architectural decision I've taken in a long time. I researched a lot and couldn't find anything that even came close to the freedom it provides.
Curious about this. Can you elaborate more? Also happy to hop on a call/zoom if you don't want to share publicly (email me at saguziel@gmail.com). I'm working on something similar.
To curate a collection is to be an editor, determining what to include and what to exclude. From a curated selection of tools, I expect to see a selection of the best tools, chosen by a knowledgable curator who has evaluated the tools in some way.
So for example, if you said "I want to start a library, donate any books you have at my house" that would not be a "curated" collection in my opionion. If you went through & evaluated the books, selecting only those that you'd personally recommend and discarding the rest, that would be a "curated" collection.
(Strictly speaking any maintenance of a collection even just "dump everything at my door at I'll put it on the pile" could be considered "curation" but to call a list of tools "curated" suggests there's some selection going on, and there does not appear to be on this list.)
EDIT: In fact, all of the definitions of "curate" here[0] start with the word "select," for example "Select, organize, and look after the items in (a collection or exhibition)." This fits my meaning. This definition[1] of "curator" requires one to merely have "care and superintendence of something," so in that sense the list is "curated" insofar as someone looks at PRs and clicks the "merge" button.
The Miriam-Webster definition, however, speaks of curating a more general "something" rather than a "collection." If you "curate" a statue by cleaning & protecting it, fine, you don't need to select the statue. However, when a collection is curated, in my opinion, this necessarily implies selection, not just maintenance.
One way to get around this is to use Kedro https://github.com/quantumblacklabs/kedro, which is the most minimal possible pipeline interface, yet allows you to export to other pipeline formats and/or build your own exporters.
BTW, here's https://github.com/pditommaso/awesome-pipeline a list of almost 200 pipeline toolkits.