My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task, relying solely on one-liner build or test commands. If the task is more complicated than a one-liner, make a script for it in the repo to make it a one-liner. Doesn't matter if it's GitHub Actions, Jenkins, Azure DevOps (which has super cursed yaml), etc.
This in turn means that you can do what the pipeline does with a one-liner too, whether manually, from a vscode launch command, a git hook, etc.
This same approach can fix the mess of path-specific validation too - write a regular script (shell, python, JS, whatever you fancy) that checks what has changed and calls the appropriate validation script. The GitHub action is only used to run the script on PR and to prepare the CI container for whatever the script needs, and the same pipeline will always run.
The reason why many CI configs devolve into such a mess isn't typically that they don't extract complicated logic into scripts, it's about all the interactions with the CI system itself. This includes caching, sharing of artifacts, generating reports, configuring permissions, ordering of jobs, deciding when which jobs will run, deciding what to do when jobs fail, etc. All of this can get quite messy in a large enough project.
It never becomes unbearably messy this way though.
The reason it gets unbearably messy is because most people google "how to do x in github actions" (e.g. send a slack message) and there is a way and it's almost always worse than scripting it yourself.
The reason it gets unbearably messy is that GitHub has constructed an ecosystem that encourages developers to write Turing complete imperative behavior into YAML without providing the same language constructs/tooling that a proper adult language provides to encourage code reuse and debugging.
Without tooling like this any sufficiently complex system is guaranteed to evolve into a spaghetti mess, because no sane way exists to maintain such a system at scale without proper tooling, which one would need to hand roll themselves against a giant, ever changing mostly undocumented black box proprietary system (GitHub Actions). Someone tried to do this, the project is called “act”. The results are described by the author in the article as “subpar”.
The only sane way to use GitHub Actions at scale is to take the subset of its features that you leverage to perform the execution (event triggers, runs-on, etc) and only use those features, and farm out all the rest of the work in something that is actually maintainable eg Buildkit, Bazel, Gradle etc
Which I feel is a recurring lesson in CI in general. CI systems have to be scriptable to get their job done because they can't anticipate every build system. With even terrible scriptability comes Turing completeness, because it is hard to avoid, a natural law. Eventually someone figures out how to make it do wild things. Eventually those wild things becomes someone's requirements in their CI pipeline. Eventually someone is blog posting about how terrible that entire CI system is because of how baroque their pipeline has become and how many crazy scripts it has that are hard to test and harder to fix.
Caching and sharing artifacts is usually the main culprit. My company has been using https://nx.dev/ for that. It works locally as well and CI and it just works.
Our NX is pointed to store artifacts in GHA, but our GHA scripts don't do any caching directly, it is all handled by NX. It works so well I would even consider pulling a nodejs environment to run it in non-nodejs projects (although I haven't tried, probably would run into some problems).
It is somewhat heavy on configuration, but it just moves the complexity from CI configuration to NX configuration (which is nicer IMO). Our CI pipelines are super fast if you don't hit one of one of our slow compilling parts of the codebase.
With NX your local dev environment can pull cached items that were built from previous CI ran-jobs or other devs. We have some native C++ dependencies that are kind of a pain to build locally, our dev machines can pull the built binaries built by other devs (since all devs and CI also share the same cache-artifacts storage). So it makes developing locally a lot easier as well, I don't even remember last time I had to build the native C++ stuff myself since I don't work on it.
Do you know the criteria used to pick the nx.dev? That is, do you pay for their Cloud, or do you do some plumbing yourselves to make it work on GitHub and other things?
Looks interesting. We’ve picked tools based on time saved without too much extra knowledge or overhead required, so this may prove promising.
To be honest, I wasn't the one who added it and have only occasionally done some small changes to the NX configuration. I don't think we pay for their Cloud, I think all our stored artifacts are stored in GHA caching system and we pull them using our github SSH keys. Although I don't know exactly how that was set up. The fact that someone set it up and I just started using it and it just works is a testament of how good it works.
NX is good because it does the caching part of CI in a way that works both locally and on CI. But of course it doesn't really help at all with the other points raised by the OP.
One interesting thing about NX as well is that it helps with you managing your own local build chain, like in the example I mentioned above, when I run a project that requires the C++ native dependency, that project gets built automatically (or rather my computer pulls the built binaries from the remote cache).
But for all of this to work you need to set up these dependency chains explicitly in your NX configuration, but that is formalizing an actual requirement instead of leaving it implicit (or in Makefiles or in scripts that only run in CI).
I do have to say that our NX configuration is quite long though, but I feel that once you start using NX it is just too tempting to split your project up in individual cacheable steps even if said steps are very fast to run and produce no artifacts. Although you don't have to.
For example we have separate steps for linting, typescript type-checking, code formatting, unit testing for each unique project in our mono-repo. In practice they could be all the same step because they all get invalidated at the same time (basically on any file change).
It works just fine - you have ci scripts for tests-group-1, test-group-2, and so on. You data collection will need to aggregate data from them all, but that is something most data collection systems have (and at least the ones I know of also allow individuals to upload their local test data thus meaning you can test that upload locally). If you break those test groups up right most developers will know which they should run as a result of their changes (if your tests are not so long that developers wouldn't run them all before pushing then you shouldn't shard anyway, though it may be reasonable to say your CI shards are still longer than what a developer would run locally)
I have had many tests which manipulate the global environment (integration tests should do this, though I'm not convinced the distinction between integration and unit tests is valuable) so the ability run the same tests as CI is very helpful in finding and verifying you fixed these.
I’ll go so far as to say the massive add on/plugin list and featuritis of CI/CD tools is actively harmful to the sanity of your team.
The only functionality a CI tool should be providing is:
- starting and running an environment to build shit in
- accurately tracking success or failure
- accurate association of builds with artifacts
- telemetry (either their own or integration) and audit trails
- correlation with project planning software
- scheduled builds
- build chaining
That’s a lot, but it’s a lot less than any CI tool made in the last 15 years does, and that’s enough.
There’s a big difference for instance between having a tool that understands Maven information enough to present a build summary, and one with a Maven fetch/push task. The latter is a black box you can’t test locally, and your lead devs can’t either, so when it breaks, it triggers helplessness.
If the only answer to a build failure is to stare at config and wait for enlightenment, you fucked up.
100%. The ci/cd job should be nothing more than a wrapper around the actual logic which is code in your repo.
I write a script called `deploy.sh` which is my wrapper for my ci/cd jobs. It takes options and uses those options to find the piece of code to run.
The ci/cd job can be parameterized or matrixed. The eventually-run individual jobs have arguments, and those are passed to deploy.sh. Secrets/environment variables are set from the ci/cd system, also parameterized/matrixed (or alternately, a self-hosted runner can provide deploy.sh access to a vault).
End result: from my laptop I can run `deploy.sh deploy --env test --modules webserver` to deploy the webserver to test, and the CI/CD job also runs the same job the same way. The only thing I maintain that's CI/CD-specific is the GitHub Action-specific logic of how to get ready to run `deploy.sh`, which I write once and never change. Thus I could use 20 different CI/CD systems, but never have to refactor my actual deployment code, which also always works on my laptop. Vendor lock-in is impossible, thanks to a little abstraction.
(If you have ever worked with a team with 1,000 Jenkins jobs and the team has basically decided they can never move off of Jenkins because it would take too much work to rewrite all the jobs, you'll understand why I do it this way)
Hey if you’ve never heard of it consider using just[0], it’s a better makefile and supports shell scripting explicitly (so at least equivalent in power, though so is Make)
The shell also supports shell scripting! You don't need Just or Make
Especially for Github Actions, which is stateless. If you want to reuse computation within their VMs (i.e. not do a fresh build / test / whatever), you can't rely on Just or Make
A problem with Make is that it literally shells out, and the syntax collides. For example, the PID in Make is $$$$, because it's $$ in shell, and then you have to escape $ as $$ with Make.
I believe Just has similar syntax collisions. It's fine for simple things, but when it gets complex, now you have {{ just vars }} as well as $shell_vars.
It's simpler to "just" use shell vars, and to "just" use shell.
Shell already has a lot of footguns, and both Just and Make only add to that, because they add their own syntax on top, while also depending on shell.
I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
Here is an example of a "complex" Makefile I use to help manage Helm deployments (https://github.com/peterwwillis/devops-infrastructure/blob/m...). It uses canned recipes, functions (for loops), default targets, it includes targets and variables from other Makefiles, conditionally crafts argument lists, and more. (It's intended to be a "default" Makefile that is overridden by an additional Makefile.inc file)
I could absolutely rewrite that in a shell script, but I would need to add a ton of additional code to match the existing functionality. Lines of code (and complexity) correlates to bugs, so fewer lines of code = less bugs, so it's easier to maintain, even considering the Make-specific knowledge required.
They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
> I don't typically use .PHONY as my targets aren't the same name as files and performance isn't an issue.
They are still phony at heart in this case, even if you don't declare them .PHONY.
Make really wants to produce files; if your targets don't produce the files they are named for you are going to run into trouble (or have to be rather careful to avoid the sharp edges).
> They say "use the best tool for the job". As far as I've found, for a job like that, Make fits the best. If some day somebody completely re-writes all the functionality of Make in a less-obnoxious way, I'll use that.
I discovered Just with a similar comment in Hacker News and I want to add my +1.
It is so much better to run scripts with Just than it is doing it with Make. And although I frankly tend to prefer using a bash script directly (much as described by the parent commenter), Just is much less terrible than Make.
Now the only problem is convincing teams to stop following the Make dogma, because it is so massively ingrained and it has so many probems and weirdnesses that just don't add anything if you just want a command executor.
The PHONY stuff, the variable scaping, the every-line-is-a-separate-shell, and just a lot of stuff that don't help at all.
Make has a lot of features that you don't use at first, but do end up using eventually, that Just does not support (because it's trying to be simpler). If you learn it formally (go through the whole HTML manual) it's not hard to use, and you can always refer back to the manual for a forgotten detail.
I don't understand why this is not the evident approach for everyone writing GitHub Actions/GitLab CI/CD yaml etc....
I've struggled in some teams to explained why it's better to extract your command in scripts (ShellCheck on it, scripts are simple to run locally etc...) instead of writing a Frankenstein of YAML and shell commands. I hope someday to find an authoritative guidelines on writing pipeline that promote this approach so at least I can point to this link instead of defending myself being a dinosaur!
In a previous job we had a team tasked with designing these "modern" CI/CD pipeline solutions, mostly meant for Kubernetes, but it was suppose to work for everything. They had such a hard on for tools that would run each step as a separate isolated task and did not want pipelines to "devolve" into shell scripts.
Getting anything done in such environments are just a pain. You spend more time fighting the systems than you do actually solving problems. It is my opinion that a CI/CD system needs just the following features: Triggers (source code repo, http endpoints or manually triggered), secret management and shell script execution. That's it, you can build anything using that.
I think what they really wanted was something like bazel. The only real benefit I can think right now for not "devolving" into shell scripts is distributed caching with hermetic builds. It has very real benefits but it also requires real effort to work correctly.
I just joined as the enterprise architect for company that has never had one. There is an existing devops team that is making everyone pull their hair out and I haven't had a single spare minute to dig in on their mess but this sounds early familiar.
The job of senior people should mostly be to make sure the organisation runs smoothly.
If no one else is doing anything about the mess, then it falls to the senior person to sort it out.
As a rule of thumb:
- Ideally your people do the Right Thing by themselves by the magic of 'leadership'.
- Second best: you chase the people to do the Right Thing.
- Third: you as the senior person do the Right Thing.
- Least ideal: no one fixes the mess nor implements the proper way.
I guess some people can achieve the ideal outcome with pure charisma (or fear?) alone, but I find that occasionally getting your hands dirty (option 3) helps earn the respect to make the 'leadership' work. It can also help ground you in the reality of the day to day work.
However, you are right that a senior person shouldn't get bogged down with such work. You need to cut your losses at some point.
Where I work, which granted is a very large company, the enterprise architects focus on ERP processes, logistic flows, how prices flow from the system where they are managed to the places that need them, and so on. They are several levels removed from devops teams. DevOps concerns are maybe handled by tech leads, system architects or technical product managers.
Makes sense. From datavirtue's comment is sounded like they joined a much smaller outfit without much in terms of established _working_ procedures here.
Mostly agreed, but (maybe orthogonal) IME, popular CI/CD vendors like TeamCity* can make even basic things like shell script execution problematic.
* TC offers sh, full stop. If you want to script something that depends on bash, it's a PITA and you end up with a kludge to run bash in sh in docker in docker.
Your "docker in docker" comment makes me wonder if you're conflating the image that you are using, that just happens to be run by TeamCity, versus some inherent limitation of TC. I believe a boatload of the Hashicorp images ship with dash, or busybox or worse, and practically anything named "slim" or "alpine" similarly
My "favorite" is when I see people go all in, writing thousands of lines of Jenkins-flavor Groovy that parses JSON build specifications of arbitrary complexity to sort out how to build that particular project.
"But then we can reuse the same pipeline for all our projects!"
> "But then we can reuse the same pipeline for all our projects!"
oh god just reading that gave me PTSD flash backs.
At $priorGig there was the "omni-chart". It was a helm chart that was so complex it needed to be wrapped in terraform and used composable terraform modules w/ user var overrides as needed.
Debugging anything about it meant clearing your calendar for the day and probably the following day, too.
I think I can summarize it in a rough, general way.
CI/CD is a method to automate tasks in the background that you would otherwise run on your laptop. The output
of the tasks are used as quality gates for merging commits, and for deployments.
- Step 1. Your "laptop in the cloud" requires some configuration (credentials, installed software, cached artifacts)
before a job can be run.
- Requires logic specific to the CI/CD system
- Step 2. Running many jobs in parallel, passing data from step to step, etc requires some instructions.
- Requires logic specific to the CI/CD system
- Step 3. The job itself is the execution of a program (or programs), with some inputs and outputs.
- Works the same on any computer (assuming the same software, environment, inputs, etc)
- Using a container in Step 1. makes this practical and easy
- Step 4. After the job finishes, artifacts need to be saved, results collected, and notifications sent.
- Some steps are specific to the CI/CD system, others can be a reusable job
Step 3 does not require being hard-coded into the config format of the CI/CD system. If it is instead
just executable code in the repo, it allows developers to use (and work on) the code locally without
the CI/CD system being involved. It also allows moving to a different CI/CD system without ever rewriting
all the jobs; the only thing that needs to be rewritten are the CI/CD-specific parts, which should be
generic and apply to all jobs pretty much the same.
Moving the CI/CD-specific parts to a central library of configuration allows you to write some code
once and reuse it many times (making it DRY). CircleCI Orbs, GitHub Actions, Jenkins Shared Libraries/
Groovy Libraries, etc are examples of these. Write your code once, fix a bug once, reuse it everywhere.
To make the thing actually fast at scale, a lot of the logic ends up being specific to the provider; requiring tokens, artifacts etc that aren't available locally. You end up with something that tries to detect if you're running locally or in CI, and then you end up in exactly the same situation.
You are right, and this is where a little bit of engineering comes in. Push as much of the logic to scripts (either shell or python or whatever) that you can run locally. Perhaps in docker, whatever. All the token, variables, artifacts etc should act as inputs or parameters to your scripts. You have several mechanisms at your disposal, command line arguments, environment variables, config files, etc. Those are all well understood, universal, language and environment agnostic, to an extent.
The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on. You want to only build what is needed, so you need something like a DAG. It can get complex fast. The trick is making it only as complex as it needs be, and only as reusable as and when it is actually re-used.
> The trick is to NOT have your script depend on the specifics of the environment, but reverse the dependency. So replace all `If CI then Run X else if Local Run Y` with the ability to configure the script to run X or Y, then let the CI configure X and local configure Y. For example.
> I'm not saying it is always easy and obvious. For bigger builds, you often really want caching and have shitloads of secrets and configurations going on.
Here's the thing. When you don't want secrets, or caching, or messaging, or conditional deploys, none of this matters. Your build can be make/mvn/go/cargo and it just works, and is easy. It only gets messy when you want to do "detect changes since last run and run tests on those components", or "don't build the moon and the stars, pull that dependency in CI, as local users have it built already and it won't change." And the way to handle those situations involves running different targets/scripts/whatever and not what is actually in your CI environment.
I've lost count of how many deploys have been marked as failed in my career because the shell script for posting updates to slack has an error, and that's not used in the local runs of CI.
What you _actually_ need is a staging branch of your CI and a dry-run flag on all of your state modifying commands. Then, none of this matters.
A shell script has many extremely sharp edges like dealing with stdin, stderr, stdout, subprocesses, exit codes, environmental variables, etc.
Most programmers have never written a shell script and writing CI files is already frustrating because sometimes you have to deploy, run, fix, deploy, run, fix, which means nobody is going to stop in the middle of that and try to learn shell scripting.
Instead, they copy commands from their terminal into the file and the CI runner takes care of all the rough edges.
I ALWAYS advise writing a shell script but I know it's because I actually know how to write them. But I guess that's why some people are paid more big bux.
GitHub's CI yaml also accepts eg Python. (Or anything else, actually.)
That's generally a bit less convenient, ie it takes a few more lines, but it has significantly fewer sharp edges than your typical shell script. And more people have written Python scripts, I guess?
I find that Python scripts that deal with calling other programs have even more sharp edges because now you have to deal with stdin, stderr and stdout much more explicitly. Now you need to both know shell scripting AND Python.
Python’s subprocess has communicate(), check_output() and other helpers which takes care of a lot but (a) you need to know what method you should actually call and (b) if you need to do something outside of that, you have to use Popen directly and it’s much more work than just writing a shell script. All doable if you understand pipes and all that but if you don’t, you’ll be just throwing stuff at the wall until something sticks.
1) If possible, don't run shell scripts with Python. Evaluate why you are trying to do that and don't.
2) Python has a bunch of infrastructure compared to shell, you can use it. Shell scripts don't.
3) Apply the same you used for the script to what it calls. CI calls control script for job, script calls tools/libraries for heavy lifting.
Often the shell script just calls a python/Ruby/rust exec anyways...
Shell scripts are for scripting...the shell. Nothing else.
Your average person will be blind sighted either way, at least one way they have a much better set of tools to help them out, once they are blind sighted.
Yes, but at least that's all fairly obvious---you might not know how to solve the problem, but at least you know you have a problem that needs solving. Compare that to the hidden pitfalls of eg dealing with whitespace in filenames in shell scripts. Or misspelled variable names that accidentally refer to non-existent variables but get treated as if they are set to be "".
This all reminds me of the systemd ini-like syntax vs shell scripts debate. Shell scripts are superior, of course, but they do require deeper knowledge of unix-like systems.
I've been working with Linux since I was 10 (I'm much older now), and I still don't think I "know Linux". The upper bound on understanding it is incredibly high. Where do you draw the line?
> [...] instead of writing a Frankenstein of YAML and shell commands.
The 'Frankenstein' approach isn't what makes it necessarily worse. Eg Makefiles work like that, too, and while I have my reservations about Make, it's not really because they embed shell scripts.
it can be quite hard to write proper scripts that work consistently... different shells have different behaviours, availability of local tools, paths, etc
and it feels like fighting against the flow when you're trying to make it reusable across many repos
If you aim for quicker turn-around (eg. just running a single test in <1s), you'll have to either aggressively optimize containers (which is pretty non-idiomatic with Docker containers in particular), or do away with them.
> I've rarely seen a feedback loop with containers that's not longer than 10s only due to containerization itself
Sounds like a skill issue tbh.
`time podman run —-rm -it fedora:latest echo hello` will return in a few milliseconds, whatever delay you are complaining about would be from the application running in the container.
I am talking about either, because the GP post was about "containerizing a build environment": you need your project built to either run it in CI or locally.
Why would it be slow?
It needs to be rebuilt? (on a fast moving project with mid-sized or large team, you'll get dependency or Dockerfile changes frequently)
It needs to restart a bunch of dependent services?
Container itself is slow to initialize?
Caching of Docker layers is tricky, silly (you re-arrange a single command line and poof, it's invalidated, including all the layers after) and hard to make the most of.
If you can't get a single test running in <1s, you are never going to get a full test suite running in a couple of seconds, and never be able to do an urgent deploy in <30s.
That might be the case if Docker did in fact guarantee (or at least make it easy to guarantee) deterministic builds -- but it doesn't really even try:
1. Image tags ("latest", etc.) can change over time. Does any layer in your Dockerfile -- including inside transitive deps -- build on an existing layer identified by tag? If so, you never had reproducibility.
2. Plenty of Dockerfiles include things like "apt-get some-tool" or its moral equivalent, which will pull down whatever is the latest version of that tool.
It's currently common and considered normal to use these "features". Until that changes, Docker mostly adds only the impression of reproducibility, but genuine weight and pain.
The advantage that Docker brings isn't perfect guaranteed reusability, it's complete independence (or as close you can easily get while not wasting resources on VMs) from the system on which the build is running, plus some resuability in practice in a certain period of time, and given other good practices.
Sure, if I try to rebuild a docker image from 5 years ago, it may fail, or produce something different, because it was pulling in some packages that have changed significantly in apt, or perhaps pip has changed encryption and no longer accepts TLS1.1 or whatever. And if you use ':latest' or if your team has a habit of reusing build numbers or custom tags, you may break the assumptions even sooner.
But even so, especially if using always incremented build numbers as image tags, a docker image will work the same way om the Jenkins system, the GitHub Actions pipeline, and every coworker's local build, for a while.
1. Use a hash for the base images.
2. The meaning of “latest” is dependent on what base images you are using. Using UBI images for example means your versions are not going to change because redhat versions don’t really change.
But really containerizing the build environment is not related to deterministic builds as there’s a lot more work needed to guarantee that. Including possible changes in the application itself.
Lastly you don’t need to use docker or dockerfiles to build containers. You can use whatever tooling you want to create a rootfs and then create an image out of that. A nifty way of actually guaranteeing reproducibility is to use nix to build a container image.
They didn't say reproducibility, they said determinism.
If you use a container with the same apt-get commands and the same OS, you already separate out almost all the reproducibility issues. What you get will be a very similar environment.
But it's not deterministic as you point out, that takes a lot more effort.
You know, I'd love to run MSVC in Wine on a ubuntu container. I bet it would be quicker.
I've had the unfortunate pleasure of working with Windows Containers in the past. They are Containers in the sense that you can write a dockerfile for them, and they're somewhat isolated, but they're no better than a VM in my experience.
> And in any case, you can use VMs instead of containers.
They're not the same thing. If that's the case running linux AMI's on EC2 should be the same as containers.
It’s the same thing for the purposes of capturing the build environment.
It doesn’t really matter if you have to spin up a Linux instances and then run your build environment as a container there vs spinning up a windows VM.
Only if your tech stack is bad (i.e. Python). My maven builds work anywhere with an even vaguely recent maven and JVM (and will fail-fast with a clear and simple error if you try to run them in something too old), no need to put an extra layer of wrapping around that.
It's trivial to control your Python stack with things like virtualenv (goes back to at least 2007) and has been for ages now (I don't really remember the time when it wasn't, and I've been using Python for 20+ years).
What in particular did you find "bad" with Python tech stack?
(I've got my gripes with Python and the tooling, but it's not this — I've got bigger gripes with containers ;-))
> What in particular did you find "bad" with Python tech stack?
Stateful virtualenvs with no way to check if they're clean (or undo mistakes), no locking of version resolution (much less deterministic resolution), only one-way pip freeze that only works for leaf projects (and poorly even then), no consistency/standards about how the project management works or even basic things like the directory layout, no structured unit tests, no way to manage any of this stuff because all the python tooling is written in python so it needs a python environment to run so even if you try to isolate pieces you always have bootstrap problems... and most frustrating of all, a community that's ok with all this and tries to gaslight you that the problems aren't actually problems.
Sounds a lot like nitpicking, and I'll demonstrate why.
With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right? You resolve both by recreating them from scratch (and you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally).
The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
As for directory layout, it's pretty clear it's guided by Python import rules: those are tricky, but once you figure them out, you know what you can and should do.
Can you clarify what do you mean with "structured unit tests"? Python does not really limit you in how you organize them, so I am really curious.
Sure, a bootstrapping problem does exist, but rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground, after which you can easily control all the deps in them (again a requirements-dev.txt.in + requirements-dev.txt pattern will help you).
And there's a bunch of new dev tools springing up recently that are written in Rust for Python, so even that points at a community that constantly works to improve the situation.
I am sorry that you see this as "gaslighting" instead of an opportunity to learn why someone did not have the same negative experience.
> With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right?
I guess theoretically you could, but I don't think that's part of anyone's normal workflow. Whereas it's extremely easy to run "pip install" from project A's directory with project B's virtualenv active (or vice versa). You might not even notice you've done it.
> You resolve both by recreating them from scratch
But with Docker you can wipe the container and start again from the image, which is fixed. You don't have to re-run the Dockerfile and potentially end up with different versions of everything, which is what you have to do with virtualenv (you run pip install and get something completely different from the virtualenv you deleted).
> you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally
But you have to undo it every time you want to add or update a dependency. In other ecosystems it's easy to keep my dependencies in line with what's in the equivalent of requirements.txt, but hard to install some random other unmanaged dependency. In the best ecosystems there's no need to "install" your dependencies at all, you just always have exactly the packages listed in the requirements.txt equivalent available at runtime when you run things.
> The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.
I've literally never seen a project that does that. And even if you do that, it's still harder to work with because you can't upgrade one dependency without unlocking all of your dependencies, right?
> As for directory layout, it's pretty clear it's guided by Python import rules
I don't mean within my actual code, I mean like: where does source code go, where does test code go, where do non-code assets go.
> Can you clarify what do you mean with "structured unit tests"?
I mean, like, if I'm at looking at a particular module in the source code, where do I go to find the tests for that module? Where's the test-support code as distinct from the specific tests?
> rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground
Whether virtualenv is available is a relatively recent change, so you already have a fractal problem. Having an uncontrolled way of installing your build environment is another of those things that's fine until it isn't.
> And there's a bunch of new dev tools springing up recently that are written in Rust for Python
Yeah, that's the one thing that gives me some hope that there might be light at the end of the tunnel, since I hear they mostly ignore all this idiocy (and avoid e.g. having user-facing virtualenvs at all) and just do the right thing. Hopefully once they catch on we'll see Python start to be ok without containers too and maybe the container hype will die down. But it's certainly not the case that everything has been fine since 2007; quite the opposite.
except you need to install the correct Java version and maven (we really should be using gradle by now)
Also in many projects there’s things other than code that need to be “built” (assets, textures, translations, etc). Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there. That’s objectively worse than the YAML hell we were complaining about at the top of the thread.
> you need to install the correct Java version and maven
Like I said, any version from the last, like, 10+ years (Java and Maven are both serious about backward compatibility), and if you install an ancient version you at least get fail-fast with a reasonable error.
> we really should be using gradle by now
We really shouldn't.
> Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there.
Maven doesn't have a build.xml, are you thinking of ant? With maven you write your custom build steps as build plugins, and they're written in plain old Java (or Kotlin, or Scala, or...) code, as plain old Maven modules, with the same kind of ordinary unit testing as your regular code; all your usual code standards apply (e.g. if you want to check your test coverage, you do it the same way as for your regular code - indeed, probably the same configuration you set up for your regular code is already getting applied). That's a lot better than YAML.
The reason for this is that nobody took the time to write a proper background document on Github Actions. The kind of information that you or I might convey if asked to explain it at the whiteboard to junior hires, or senior management.
This syndrome is very common these days. Things are explained differentially: it's like Circle CI but in the GH repo. Well that's no use if the audience wasn't around when Circle CI was first new and readily explained (It's like Jenkins but in the cloud...).
> My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task,
I call this “isomorphic CI” — ie: as long as you set the correct env vars, it should run identically on GitHub actions, Jenkins, your local machine, a VM etc
And yet, you would be surprised at the amount of people who react like that's an ignorant statement ("not feasible in real world conditions"), an utopic goal ("too much time to implement"), an impossible feat ("automation difficults human oversight"), or, my favorite, the "this is beneath us" excuse ("see, we are special and this wouldn't work here").
Automation renders knowledge into a set of executable steps, which is much better than rendering knowledge into documentation, or leaving it to rot in people's minds. Compiling all rendered knowledge into a single step is the easiest way to ensure all elements around the build and deployment lifecycle work in unison and are guarded around failures.
Building a Github-specific CI pipeline similarly transfer it into a set of executable steps.
The only difference is that you are now tied to a vendor for executing that logic, and the issue is really that this tooling is proprietary software (otherwise, you could just take their theoretical open source runner system and run it locally).
To me, this is mostly a question of using non-open-source development tools or not.
Yep. I remember at a previous company multiple teams had manually created steps in TeamCity (and it wasn't even being backed up in .xml files).
I just did my own thing and wrapped everything deploy.sh and test.sh and when the shift to another system came... well it was still kind of annoying, but at least I wasn't recreating the whole thing.
That’s usually very hard or impossible for many things. The AzDo yaml consists of a lot of steps that are specific to the CI environment (fetching secrets, running tests on multiple nodes, storing artifacts of various kinds).
Even if the ”meat” of the script is a single build.ps oneliner, I quickly end up with 200 line yaml scripts which have no chance of working locally.
Azure DevOps specifically has a very broken approach to YAML pipelines, because they effectively took their old graphical pipeline builder and just made a YAML representation of it.
The trick to working with this is that you don't need any of their custom Azure DevOps task types, and can use the shell type (which has a convenient shorthand) just as well as in any other CI environment. Even the installer tasks are redundant - in other CI systems, you either use a container image with what you need, or install stuff at the start, and Azure DevOps works with both of these strategies.
So no, it's neither hard nor impossible, but Microsoft's half-assed approach to maintaining Azure DevOps and overall overcomplicated legacy design makes it a bit hard to realize that doing what their documentation suggests is a bad idea, and that you can use it in a modern way just fine. At least their docs do not recommend that you use the dedicated NPM-type task for `npm install` anymore...
(I could rant for ages about Azure DevOps and how broken and unloved it is from Microsoft's side. From what I can tell, they're just putting in the minimum effort to keep old Enterprise customers that have been there through every rename since Team Foundation Server from jumping ship - maybe just until Github's enterprise side has matured enough? Azure DevOps doesn't even integrate well with Azure, despite its name!)
It has been on life support for a long time AFAIK. I designed Visual Studio Online (the first launch of AzDO) - and every engineer, PM, and executive I worked with is either in leadership at GitHub or retired.
It feels clear from an outside perspective that all the work on AzDO Pipelines has shifted to focus on GitHub Actions and Actions is now like 3 or 4 versions ahead. Especially because the public Issue trackers for some of AzDO Pipelines "Roadmap" are still up (on GitHub, naturally) and haven't been updated since ~2020.
I wish Microsoft would just announce AzDO's time of death and save companies increasingly crazy AzDO blinders and/or weird mixes of GitHub and AzDO as GitHub-only is clearly the present/future.
Yeah feels like they should be able to converge actions and pipelines.
Keeping some separation between AzDo itself and GH also requires some balancing. But so far I’m pretty sure I could never sell our enterprise on a shift to GH. Simply not enough jira-esque features in GH with complex workflows, time reporting etc so I can’t see them doing the bigger GH/AzDo merger.
This month's rollout of sub-issues and issue types would be most of what my organization thinks it needs to shift to GH Issues, I believe, barring however long it would take to rewrite some sync up automation with ServiceNow based on those issue types. Of course it will take another 6 months to a year before those kinds of features make it to GitHub Enterprise, so it is still not happening any time soon. (Though that gets back to my "weird" mixes. I don't entirely know why my company is using AzDO SaaS for Issue Tracking but decided GHE over "normal" cloud GH for Repos. But that's not the weirdest mix I've seen.)
I definitely get the backwards compatibility thing and "don't break someone's automation", but at the same time, Microsoft could at least mark AzDO's official Roadmap as "Maintenance Only" and send the message that feels obvious as a user that GitHub is getting far more attention than AzDO can, but is hard to convince management and infosec that a move to GitHub is not just "the future" but "the present" (and also maybe "the past", now, given AzDO seems to have been frozen ~2020).
This doesn’t seem to address the parent comment’s point at all, which was about required non-shell configuration such as for secrets, build parallelism, etc.
The actual subtle issue here is that sometimes you actually need CI features around caching and the like, so you are forced to engage with the format a bit.
You can, of course, chew it down to a bare minimum. But I really wish more CI systems would just show up with "you configure us with scripts" instead of the "declarative" nonsense.
CI that isn't running on your servers wants very deep understanding of how your process works so they can minimize their costs (this is true whether or not you pay for using CI)
Totally! It's a legitimate thing! I just wish that I had more tools for dynamically providing this information to CI so that it could work better but I could also write relatively general tooling with a general purpose language.
The ideal for me is (this is very silly and glib and a total category error) LSP but for CI. Tooling that is relatively normalized, letting me (for example) have a pytest plugin that "does sharding" cleanly across multiple CI operators.
There's some stuff and conventions already of course, but in particular caching and spinning up jobs dynamically are still not there.
I'm increasingly designing CI stuff around rake tasks. Then I run rake in the workflow.
But that caters only for each individual command... as you mention the orchestration is still coded in, and duplicated from what rake knows and would do.
So I'm currently trying stuff that has a pluggable output: one output (the default) is that it runs stuff, but with just a rake var, instead of generating then running commands it generates workflow content that ultimately gets merged in an ERB workflow template.
The model I like the most though is Nix-style distributed builds: it doesn't matter if you do `nix build foo#bar` (local) or `nix build -j0 foo#bar` (zero local jobs => use a remote builder†), the `foo#bar` "task" and its dependents gets "built" (a.k.a run).
† builders get picked matching target platform and label-like "features" constraints.
Ever since there has been gitlab-runner, I've wondered why the hell can't I just submit some job to a (list of) runner(s) - some of which could be local - without the whole push-to-repo+CI orchestrator? I mean I don't think it would be out of this world to write a CLI command that locally parses whatever-ci.yml, creates jobs out of it, and submit them to a local runner.
I agree with wrapping things like build scripts to test locally.
Still, some actions or CI steps are also not meant to be run locally. Like when it publishes to a repo or needs any credentials that are used by more than one person.
Btw, Github actions and corresponding YAML are derived from Azure DevOps and are just as cursed.
The whole concept of Github CI is just pure misuse of containers when you need huge VM images - container is technically correct, but a far fetched word for this - that have all kinds of preinstalled garbage to run typescript-wrapped code to call shell scripts.
I think it's mostly obvious how you could implement CI with "local-first" programs (scripts?), but systems like Github provide value on top by making some of the artifacts or steps first-class objects.
Not sure how much of that does Github do, but it could parse test output (I know we used that feature in Gitlab back when I was using that on a project), track flaky tests for you or give you a link to code for a failing test directly.
Or it could highlight releases on their "releases" page, with release notes prominently featured.
And they allow you to group pipelines by type and kind, filter by target environment and such.
On top of that, they provide a library of re-usable workflows like AWS login, or code checkout or similar.
With all that, the equation is not as clear cut: you need to figure out how to best leverage some of those (or switch to external tools providing them), and suddenly, with time pressure, just going with the flow is a more obvious choice.
This is the right way to use CI/CD systems, as dumb orchestrators without inherent knowledge of your software stack. But the problem is, everything from their documentation, templates, marketplace encourage you to do exactly the opposite and couple your build tightly with their system. It's poor product design imo, clearly optimising for vendor lock-in over usability.
Oh, yeah, I remember looking at that a while back. I don't recall how much it had implemented at the time but it seems that firecow took a vastly different approach than nektos/act did, going so far as to spend what must have been an enormous amount of time/energy to cook up https://github.com/firecow/gitlab-ci-local/blob/4.56.2/src/s... (and, of course, choosing a dynamically typed language versus golang)
>Lack of local development. It's a known thing that there is no way of running GitHub Actions locally.
This is one thing I really love about Buildkite[0] -- being able to run the agent locally. (Full disclosure: I also work for Buildkite.) The Buildkite agent runs as a normal process too (rather than as a Docker container), which makes the process of workflow development way simpler, IMO. I also keep a handful of agents running locally on my laptop for personal projects, which is nice. (Why run those processes on someone else's infra if I don't have to?)
>Reusability and YAML
This is another one that I believe is unique to Buildkite and that I also find super useful. You can write your workflows in YAML of course -- but you can also write them in your language of choice, and then serialize to YAML or JSON when you're ready to start the run (or even add onto the run as it's running if you need to). This lets you encapsulate and reuse (and test, etc.) workflow logic as you need. We have many large customers that do some amazing things with this capability.
Are you two talking about the same thing? I believe the grandparent is talking about running it locally on development machines, often for testing purposes.
Asking because Github Action also supports Self-Hosted runners [1].
Same thing, yeah, IIUC (i.e., running the agent/worker locally for testing). It's conceptually similar to self-hosted runners, yes, but also different in a few practical ways that may matter to you, depending on how you plan to run in production.
For one, with GitHub Actions, hosted and self-hosted runners are fundamentally different applications; hosted runners are fully configured container images, (with base OS, tools, etc., on board), whereas self-hosted runners are essentially thin, unconfigured shell scripts. This means that unless you're planning on using self-hosted runners in production (which some do of course, but most don't), it wouldn't make sense to dev/test with them locally, given how different they are. With Buildkite, there's only one "way" -- `buildkite-agent`, the single binary I linked to above.
The connection models are also different. While both GHA self-hosted runners and the Buildkite agent connect to a remote service to claim and run jobs, GHA runners must first be registered with a GitHub org or repository before you can use them, and then workflows must also be configured to use them (e.g., with `runs-on` params). With Buildkite, any `buildkite-agent` with a proper token can connect to a queue to run a job.
There are others, but hopefully that gives you an idea.
100% agree and that has been my experience too. It also makes testing the logic locally much easier, just run the script in the appropriate container.
The pipeline DSLs, because they are not full programming languages have to include lots of specific features and options and if you want something slightly outside of what they are designed for, you are out of luck. In a way it feels like how graphics were in the age of fixed-pipeline, when there had to be a complex API to cover all use cases yet it was not flexible enough.
This is my preferred way of doing things as well. Not being able to run the exact same thing that's running in CI easily locally is a bit of a red flag in my opinion. I think the only exception I've ever encountered to this is when working on client software for HSMs, which had some end-to-end tests that couldn't be run without actually connecting to the specific hardware that took some setup to be able to access when running tests locally.
That’s my policy too. I see way too many Jenkins/Actions scripts with big logic blocks jammed into YAML. If the entire build and test process is just a single script call, we can run it locally, in a GitHub workflow, or anywhere else. Makes it less painful to switch CI systems, and devs can debug easily without pushing blind commits. It’s surprising how many teams don’t realize local testing alone saves huge amounts of time.
When I automate my github actions I keep everything task orientated and if anything is pushing code or builds it has user step to verify the work by the automation. You approve and then merge it kicks off promotional pipelines not necessarily for deployment but to promote a build as stable through tagging.
While youre correct, environmental considerations are another advantage that testing locally SHOULD be able to provide (i.e. you can test your scripts or Make targets or whatever in the same runner that runs in the actual build system.)
Of course you can, just specify a container image of your choice and run the same container for testing locally.
However, replicating environmental details is only relevant where the details are known to matter. A lot of effort has been wasted and workflows crippled by the idea that everything must be 100% identical irrespective of actual dependencies and real effects.
So for example, my action builds on 4 different platforms (win-64, linux-amd64, mac-intel, mac-arm), it does this in parallel then gets the artifacts for all for and bundles them into a single package.
How would you suggest I do this following your advice?
In our environment (our product is a windows desktop application) we use packer to build a custom windows server 2022 image with all the required tools installed. Build agents run on a azure vm scale set that uses the said image for the instance os.
I suspect the author of the article could greatly simplify matters if they used a task running tool to orchestrate running tasks, for example. Pick whatever manner of decoupling you want really, most of the time this is the path to simplified CI actions. CI is best when its thought of as a way to stand up fresh copies of an environment to run things inside of.
I have never had the struggles that so many have had with CI as a result. Frankly, I'm consistently surprised at how overly complex people make their CI configurations. There's better tools for orchestration and dependency dependent builds, which is not its purpose to begin with.
I generally agree with you, but I'd be interested to hear your take on what the purpose of CI _actually is_.
It seems to me that a big part of the problem here (which I have also seen/experienced) is that there's no one specific thing that something like GitHub Actions is uniquely suited for. Instead, people want "a bunch of stuff to happen" when somebody pushes a commit, and they imagine that the best way to trigger all of that is to have an incredibly complex - and also bespoke - system on the other end that does all of it.
It's like we learned the importance of modularity in the the realm of software design, but never applied what we learned to the tools that we work with.
Standing up fresh images for validation and post validation tasks.
CI shines for running tests against a clean environment for example.
Really any task that benefits from a clean image being stood before running a task.
The key though is to decouple the tasks from the CI. Complexity like pre-packaging artifacts is not a great fit for CI configuration, that is best pushed to a tool that doesn’t require waterfall logic to make it work.
There is a reason task runners are very popular still
Although there are definitely merits in moving the complex logic outside of the CI/CD JSON/YAML DSL, especially when using monorepo setups that can become rather complex in their logic (that they made Google create Bazel, I can think of some interesting Borg/K8s analogies btw), I also believe that modern CI/CD platforms have made several sensible steps in the right direction to handle these more complicated use cases.
(Disclaimer: I work at CircleCI)
At CircleCI for example, we have added valuable features like a VSCode extension[0] to validate and "dry-run" config from within your IDE, we have local runners[1] that you can use to test and run pipelines on your local machine and your own infra, we have dynamic config[2], a Javascript/Typescript SDK[3], a CLI that can validate and run workflows locally[4], and QoL additions like a no-op job type[5] and flexible requires, along with flexible when statements and expression based job filters[6].
And finally, it's of course also possible to combine different approaches into a "best of both worlds" approach, f.e. combining Dagger with CircleCI[7].
My policy is to never let pipeline DSLs contain any actual logic outside orchestration for the task, relying solely on one-liner build or test commands. If the task is more complicated than a one-liner, make a script for it in the repo to make it a one-liner. Doesn't matter if it's GitHub Actions, Jenkins, Azure DevOps (which has super cursed yaml), etc.
This in turn means that you can do what the pipeline does with a one-liner too, whether manually, from a vscode launch command, a git hook, etc.
This same approach can fix the mess of path-specific validation too - write a regular script (shell, python, JS, whatever you fancy) that checks what has changed and calls the appropriate validation script. The GitHub action is only used to run the script on PR and to prepare the CI container for whatever the script needs, and the same pipeline will always run.