If you're not containerizing your CI/CD, you're really lost.

akoboldfrying · 2025-01-21T02:59:31 1737428371

That might be the case if Docker did in fact guarantee (or at least make it easy to guarantee) deterministic builds -- but it doesn't really even try:

1. Image tags ("latest", etc.) can change over time. Does any layer in your Dockerfile -- including inside transitive deps -- build on an existing layer identified by tag? If so, you never had reproducibility.

2. Plenty of Dockerfiles include things like "apt-get some-tool" or its moral equivalent, which will pull down whatever is the latest version of that tool.

It's currently common and considered normal to use these "features". Until that changes, Docker mostly adds only the impression of reproducibility, but genuine weight and pain.

tsimionescu · 2025-01-21T07:15:39 1737443739

The advantage that Docker brings isn't perfect guaranteed reusability, it's complete independence (or as close you can easily get while not wasting resources on VMs) from the system on which the build is running, plus some resuability in practice in a certain period of time, and given other good practices.

Sure, if I try to rebuild a docker image from 5 years ago, it may fail, or produce something different, because it was pulling in some packages that have changed significantly in apt, or perhaps pip has changed encryption and no longer accepts TLS1.1 or whatever. And if you use ':latest' or if your team has a habit of reusing build numbers or custom tags, you may break the assumptions even sooner.

But even so, especially if using always incremented build numbers as image tags, a docker image will work the same way om the Jenkins system, the GitHub Actions pipeline, and every coworker's local build, for a while.

carlmr · 2025-01-22T08:22:05 1737534125

I agree that Docker doesn't do the best job here, you can still get a lot better reproducibility than without though.

You can use specific image hashes to work around image tag changes.

For problem 2 ideally use something like NixOS as a base or at least nix package manager.

But IMO Nix is quite complicated, even though the working model is really good. So this is mostly if you really need deterministic builds.

Docker already gets you 80% of the way with 20% of the effort.

akdev1l · 2025-01-21T14:15:59 1737468959

1. Use a hash for the base images. 2. The meaning of “latest” is dependent on what base images you are using. Using UBI images for example means your versions are not going to change because redhat versions don’t really change.

But really containerizing the build environment is not related to deterministic builds as there’s a lot more work needed to guarantee that. Including possible changes in the application itself.

Lastly you don’t need to use docker or dockerfiles to build containers. You can use whatever tooling you want to create a rootfs and then create an image out of that. A nifty way of actually guaranteeing reproducibility is to use nix to build a container image.

akoboldfrying · 2025-01-21T22:30:36 1737498636

>But really containerizing the build environment is not related to deterministic builds

What would you say the goal of containerising builds is, if not reproducibility?

carlmr · 2025-01-22T08:25:15 1737534315

They didn't say reproducibility, they said determinism.

If you use a container with the same apt-get commands and the same OS, you already separate out almost all the reproducibility issues. What you get will be a very similar environment.

But it's not deterministic as you point out, that takes a lot more effort.

maccard · 2025-01-20T23:26:21 1737415581

How do I containerize building desktop apps for windows with MSVC?

eru · 2025-01-21T05:42:24 1737438144

Wine?

Less snarky: https://learn.microsoft.com/en-us/virtualization/windowscont... seems to be a thing?

And in any case, you can use VMs instead of containers.

maccard · 2025-01-21T11:32:58 1737459178

You know, I'd love to run MSVC in Wine on a ubuntu container. I bet it would be quicker.

I've had the unfortunate pleasure of working with Windows Containers in the past. They are Containers in the sense that you can write a dockerfile for them, and they're somewhat isolated, but they're no better than a VM in my experience.

> And in any case, you can use VMs instead of containers.

They're not the same thing. If that's the case running linux AMI's on EC2 should be the same as containers.

akdev1l · 2025-01-21T14:07:46 1737468466

It’s the same thing for the purposes of capturing the build environment.

It doesn’t really matter if you have to spin up a Linux instances and then run your build environment as a container there vs spinning up a windows VM.

lmm · 2025-01-21T04:33:13 1737433993

Only if your tech stack is bad (i.e. Python). My maven builds work anywhere with an even vaguely recent maven and JVM (and will fail-fast with a clear and simple error if you try to run them in something too old), no need to put an extra layer of wrapping around that.

necovek · 2025-01-21T21:00:52 1737493252

It's trivial to control your Python stack with things like virtualenv (goes back to at least 2007) and has been for ages now (I don't really remember the time when it wasn't, and I've been using Python for 20+ years).

What in particular did you find "bad" with Python tech stack?

(I've got my gripes with Python and the tooling, but it's not this — I've got bigger gripes with containers ;-))

lmm · 2025-01-22T00:46:47 1737506807

> What in particular did you find "bad" with Python tech stack?

Stateful virtualenvs with no way to check if they're clean (or undo mistakes), no locking of version resolution (much less deterministic resolution), only one-way pip freeze that only works for leaf projects (and poorly even then), no consistency/standards about how the project management works or even basic things like the directory layout, no structured unit tests, no way to manage any of this stuff because all the python tooling is written in python so it needs a python environment to run so even if you try to isolate pieces you always have bootstrap problems... and most frustrating of all, a community that's ok with all this and tries to gaslight you that the problems aren't actually problems.

necovek · 2025-01-22T05:51:41 1737525101

Sounds a lot like nitpicking, and I'll demonstrate why.

With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right? You resolve both by recreating them from scratch (and you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally).

The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.

As for directory layout, it's pretty clear it's guided by Python import rules: those are tricky, but once you figure them out, you know what you can and should do.

Can you clarify what do you mean with "structured unit tests"? Python does not really limit you in how you organize them, so I am really curious.

Sure, a bootstrapping problem does exist, but rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground, after which you can easily control all the deps in them (again a requirements-dev.txt.in + requirements-dev.txt pattern will help you).

And there's a bunch of new dev tools springing up recently that are written in Rust for Python, so even that points at a community that constantly works to improve the situation.

I am sorry that you see this as "gaslighting" instead of an opportunity to learn why someone did not have the same negative experience.

lmm · 2025-01-22T07:07:08 1737529628

> With docker containers, you can shell into it, do a couple of changes and "docker commit" it afterwards: similarly stateful, right?

I guess theoretically you could, but I don't think that's part of anyone's normal workflow. Whereas it's extremely easy to run "pip install" from project A's directory with project B's virtualenv active (or vice versa). You might not even notice you've done it.

> You resolve both by recreating them from scratch

But with Docker you can wipe the container and start again from the image, which is fixed. You don't have to re-run the Dockerfile and potentially end up with different versions of everything, which is what you have to do with virtualenv (you run pip install and get something completely different from the virtualenv you deleted).

> you could easily chmod -w the entire virtualenv directory if you don't want it to change accidentally

But you have to undo it every time you want to add or update a dependency. In other ecosystems it's easy to keep my dependencies in line with what's in the equivalent of requirements.txt, but hard to install some random other unmanaged dependency. In the best ecosystems there's no need to "install" your dependencies at all, you just always have exactly the packages listed in the requirements.txt equivalent available at runtime when you run things.

> The pattern of using requirements.txt.in and pip-freeze generated requirements.txt has been around for a looong time, so it sounds like non-idiomatic way to use pip if you've got problems with locking of versions or non-leaf projects.

I've literally never seen a project that does that. And even if you do that, it's still harder to work with because you can't upgrade one dependency without unlocking all of your dependencies, right?

> As for directory layout, it's pretty clear it's guided by Python import rules

I don't mean within my actual code, I mean like: where does source code go, where does test code go, where do non-code assets go.

> Can you clarify what do you mean with "structured unit tests"?

I mean, like, if I'm at looking at a particular module in the source code, where do I go to find the tests for that module? Where's the test-support code as distinct from the specific tests?

> rarely do you need exactly a particular version of Python and any of the dev tools to be able to get a virtualenv off the ground

Whether virtualenv is available is a relatively recent change, so you already have a fractal problem. Having an uncontrolled way of installing your build environment is another of those things that's fine until it isn't.

> And there's a bunch of new dev tools springing up recently that are written in Rust for Python

Yeah, that's the one thing that gives me some hope that there might be light at the end of the tunnel, since I hear they mostly ignore all this idiocy (and avoid e.g. having user-facing virtualenvs at all) and just do the right thing. Hopefully once they catch on we'll see Python start to be ok without containers too and maybe the container hype will die down. But it's certainly not the case that everything has been fine since 2007; quite the opposite.

akdev1l · 2025-01-21T14:10:37 1737468637

except you need to install the correct Java version and maven (we really should be using gradle by now)

Also in many projects there’s things other than code that need to be “built” (assets, textures, translations, etc). Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there. That’s objectively worse than the YAML hell we were complaining about at the top of the thread.

lmm · 2025-01-22T00:57:27 1737507447

> you need to install the correct Java version and maven

Like I said, any version from the last, like, 10+ years (Java and Maven are both serious about backward compatibility), and if you install an ancient version you at least get fail-fast with a reasonable error.

> we really should be using gradle by now

We really shouldn't.

> Adding custom build targets to maven’s build.xml is truly not ideal then there’s people who actually try to write logic in there.

Maven doesn't have a build.xml, are you thinking of ant? With maven you write your custom build steps as build plugins, and they're written in plain old Java (or Kotlin, or Scala, or...) code, as plain old Maven modules, with the same kind of ordinary unit testing as your regular code; all your usual code standards apply (e.g. if you want to check your test coverage, you do it the same way as for your regular code - indeed, probably the same configuration you set up for your regular code is already getting applied). That's a lot better than YAML.