I'm strongly considering moving my fairly immature Airflow pipeline to Argo Workflows because:
* the Airflow DAG deploy/versioning is surprisingly primitive. The best option here seems to be to use the KubernetesOperator to version your steps, and if you're using k8s to execute, why not use it for the rest?
* the Airflow UI is pretty confusing to use, maybe this gets easier once you know your way around it.
* my team has k8s expertise and we don't know Airflow well yet; seems like less to learn running Argo Workflows, assuming you're already fluent in k8s.
* if you're already running k8s, it seems like you have to add fewer components to get Argo running; more duplication with Airflow-on-k8s.
On the other hand, being able to unit test / locally run your DAGs on your dev machine is a big plus for Airflow, where Argo Workflows seem to have a less strong testing story. And writing YAML is not preferable to writing Python DAG files.
I'm an Airflow PMC and would love to know a bit more about your comparison :).
1. Have you tried Airflow 2.0? We made some pretty big overhauls both in terms of UI and backend.
2. DAG versioning is currently problematic, but DAG versioning is a "when" and not an "if" so should be in a future 2.x version :). That said could you describe a bit more about your deployment issues? User stories like this help us improve the product.
3. Have you looked into using KEDA with the CeleryExecutor? You could create KEDA queues for a lot of commonly used workflows and then you'd only need to use the python or bash operator to run those tasks instead of k8spodop.
4. Are you using the Airflow helm chart or did you custom roll a deployment?
Any feedback would be highly appreciated and I'm also glad to answer any questions you might have!
1. We ended up using GCP's hosted Composer to get started more quickly, which doesn't seem to have been updated to Airflow 2.0 yet. I'll put that on the list for evaluation.
2. A few usecases that I immediately hit complexity walls on:
A) Having a "staging" version of our pipelines so that we don't break the prod ETL; it was really difficult to find a canonical method for having common DAG code that's parameterizable per env. The fact that all of the DAGs live side-by-side in the same directory means I have to run the same job for a "prod push" as a "staging push" (i.e. if I get the staging deploy wrong I could break prod). Given that we deploy version vN+1 to staging, check it's working, and only then deploy vN+1 to prod, we ended up with some weird config injection code to let us have two folders containing copies of the same DAG scripts with different config. This just felt janky.
B) Managing Python dependencies between different apps was also painful; for example we wanted to add Meltano, and so that app brings in a bunch of deps, which broke our main dags when I naively updated the main python pip env to install the new meltano requirement. Using the K8s operator lets us effectively have a venv per dag but the pattern of using one python env across the whole Airflow install bit me very early on and seemed pretty unscalable.
3. I haven't looked at KEDA, I'll take a look.
4. We're using GCP Composer for now, though I looked at the Helm chart too.
If you prefer yaml > Python or if you prefer installing one k8s app instead of managing all of the airflow dependencies (scheduler, webserver, workers, etc)
Why surprised? Before Europeans arrived, it's not like everybody lived happily ever after in a utopian pastoral lifestyle. They are people too and have fought, murdered, and slaughtered each other just as much as the Europeans did to themselves and to others. I frankly find it a little concerning that just as much emphasis isn't placed on the genocide of the non-existent tribes by those which remain.
I have no idea if this is one of the reasons your comment is grey, but the “it’s not like natives didn’t have conflict” is just as much projection as “magical natives” portrayals are bullshit. If you don’t recognize that colonization contorted indigenous peoples into conflict based on colonial prerogatives, you’re not telling yourself or anyone else the whole story.
It’s a shame for both sides of the shameful misinterpreted history that people don’t, yes, acknowledge that native peoples had real wars and conflicts, and also that colonizers instigated and coordinated other conflicts that were either less likely or more brutal or both than they would’ve been otherwise.
Yes, that's absolutely true and the brutal treatment natives around the world experienced as a result of colonialism shouldn't be downplayed. I'm not acting as a colonial apologist. What irks me is that it seems the popular mindset does appear to be that one of "surprise" of conflict amongst tribes as the parent commenter suggested, unless I misinterpreted them.
Trying to read that comment charitably, I honestly can't ascribe a motive to the "surprise". I've often stated I was "surprised" to learn something that was particularly detailed/nuanced in an area I already generally had fairly deep knowledge, especially if it revealed a new history or layer of depth I could integrate into my base knowledge.
I'm not saying that's definitely how I interpret it, but it's definitely a reasonable possibility. And I guess I wish more people here would at least try to read others' comments more charitably. At worst, your assumptions are right but you gave an opportunity for the other person to be better understood and for your own frustration level to pause before rising.
The bulk of those indigenous deaths is due to the unwitting introduction of smallpox by the Europeans, which was then spread from one indigenous people to another at a time when Europeans still had little knowledge of the interior of the Americas. Yes, in the wake of this sudden demographic shock Europeans did institute horrible policies of violence and submission (beyond the already bloody initial conquests), but the "tens or hundreds of millions" figure should not be all ascribed to intentional warfare.
This is interesting - I worked on a similar use case by parsing and tokenizing ZooKeeper logs, then converting logs to integer sequences and trying to determine whether or not services were going to experience a fault by training on said sequences, and thus determining what the cause of the fault was/would be. Wasn't too successful but definitely showed me how difficult it can be to work backwards from logs to root cause, esp. with limited data.
This is cool! Have you guys considered going into modern board games too? Some of the rules can be quite complex (1-2 hours explaining rules, getting through the first couple of rounds), and there's quite high demand for people figuring them out on say, TTS. Think Splotter games, Brass, etc.