This false sense of reproducability is why I funded https://docs.stablebuild.com/ some years ago. It lets you pin stuff in dockerfiles that are normally unpinnable like OS package repos, docker hub tags and random files on the internet. So you can go back to a project a year from now and actually get the same container back again.
Isn't this problem usually solved by building an actual image for your specific application, tagging that and pushing to some docker repo? At least that's how it's been at placec I've worked at that used docker. What am I missing?
Build and tag internal base images on a regular cadence that individual projects then use in their FROM. You’ll have `company-debian-python:20250901` as a frozen-in-time version of all your system level dependencies, then the Dockerfile using it handles application-level dependencies with something that supports a lockfile (e.g. uv, npm). The application code itself is COPY’d into the image towards the end, such that everything before it is cached, but you’re not relying on the cache for reproducibility, since you’re starting from a frozen base image.
The base image building can be pretty easily automated, then individual projects using those base images can expect new base images on a regular basis, and test updating to the latest at their leisure without getting any surprise changes.
At that point you're doing most of the work yourself, and the value add from Docker is pretty small (although not zero) - most of the gains are coming from using a decent language-level dependency manager.
It's not, but at that point you're giving up on most of the things Docker was supposed to get you. What about when you need to upgrade a library dependency (but not all of them, just that one)?
I'm not sure what the complication here is. If application code changes, or some dependency changes, you build a new docker image as needed, possibly with an updated Dockerfile as well if that's required. The Dockerfile is part of the application repo and versioned just like everything else in the repo. CICD helps build and push a new image during PRs, or tag creation, just like you would with any application package / artifact. Frequent building and pushing of docker images can over time start taking up space of course but you can take care of that by maybe cleaning out old images from time to time if you can determine they're no longer needed.
Docker re-uses layers as needed and can detect when a new layer needs to be added. It's not like images grow in size without bound each time something is changed in the Dockerfile.
The removal (or moving) of the Bitnami images from Docker Hub is going to break a ton of systems that depend on them. I helped set up https://www.stablebuild.com/ some years ago to counter these types of issues, it provides (among other things) a transparent cache to Docker Hub which automatically caches image tags and makes them immutable - underlying tag might be deleted or modified, but you’ll get the exact same original image back.
This is exactly why so many of us have advocating for private registries and copies of every image you run in production. Pulling straight from Docker Hub was always a risk.
FYI, I've helped set up StableBuild (https://www.stablebuild.com) to help pin stuff in Docker that's normally virtually impossible to pin (e.g. OS package repos, Docker base images, random files from the internet, etc.)
Not never. E.g. all the capital we as founders put in the business before we raised our seed round was converted into Series Seed Preferred shares at the same rights as angels / seed VC. Small portion of total equity but still.
Packages and versions can be deleted from PyPI, which can be a massive pain in the ass for anyone consuming these packages. Can have your whole Python dependency tree pinned => author pulls a package version => builds broken. As part of StableBuild (https://www.stablebuild.com) we create full daily snapshots of the PyPI registry - so figured it would be nice to make an overview of deleted packages/versions and make the wheels available for download.
Reading through the comments in https://news.ycombinator.com/item?id=39720007 I saw a common misconception pop up again: Dockerfiles are not deterministic. But they _look_ like they are, and even are for a while: Build a Dockerfile on your local machine; then build it again => most likely exactly the same container. Stuff starts to break down quickly though; so I did this writeup some time ago that should be informative for the wider community.
That's the interesting bit about Dockerfiles. They look _looks_ deterministic, and they even are for a while while you're looking at it as a developer. I've done a detailed writeup of how it's not deterministic in https://docs.stablebuild.com/why-stablebuild
I've gone down the same path. I love deterministic builds, and I think Docker's biggest fault is that to the average developer a Dockerfile _looks_ deterministic - and it even is for a while (build a container twice in a row on the same machine => same output), but then packages get updated in the package manager, base images get updated w/ the same tag, and when you rebuild a month later you get something completely different. Do that times 40 (the number of containers my team manages) and now fixing containers is a significant part of your job.
So in theory Nix would be perfect. But it's not, because it's so different. Get a tool from a vendor => won't work on Nix. Get an error => impossible to quickly find a solution on the web.
Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine. Currently consists of an immutable Docker Hub pull-through cache, full daily copies of the Ubuntu/Debian/Alpine package registries, full daily copies of most popular PPAs, daily copies of the PyPI index (we do a lot of ML), and arbitrary immutable file/URL cache.
So far it's been the best of both worlds in my day job: easy to write, easy to debug, wide software compatibility, and we have seen 0 issues due to non-determinism in containers that we moved over to StableBuild in my day job.
I've work many years on bare metal. We did (by requirement) acceptance tests, so we did need deterministic builds, before such thing had even a name, or at least before it was mentioned as much as nowadays.
Redhat has a lot of tooling around versioning of mirrors, channels, releases, updates, etc. But I'm so old that even foreman and spacewalk didn't exist, redhat satellite was out of the budget, and the project was migrating from the first versions of CentOS to Debian.
What I did was simply use DNS + Vhosts (dev, stage, prod + versions) for our own package mirrors, and bash+rsync (and of course, raid+backups), with both, CentOS and Debian (and our project packages).
So we had repos like prod/v1.1.0, stage/v1.1.0, dev/v1.1.0, dev/v2.0.0, dev/2.0.1, etc Allowing us to rebuild things without praying, backport bug fixings with confidence, etc
Feels old and simple, however I think it was the same problem/issue that people gets now (re)building containers.
If you need to be able to produce the same output from the same input, you need the same input.
But also Nix solves more problems than Docker. For example if you need to use different versions of software for different projects. Nix lets you pick and choose the software that is visible in your current environment without having to build a new Docker image for every combination, which leads to a combinatorial explosion of images and is not practical.
But I also agree with all the flaws of Nix people are pointing out here.
I don't have any experience with Nix but regarding stable builds of Docker: we provide Java application, have all dependencies as fixed versions so when doing a release, if someone is not doing anything fishy (re-releasing particular version, which is bad-bad-bad) you will get exactly same binaries on top of the same image (again, considering you are not using `:latest` or somesuch)...
Until someone overwrites or deletes the Docker base image (regularly happens), or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).
I am convinced that any sort of free public service is fundamentally incomapatible with long term reproducible builds. It is simply unfair to expect free service to maintain archives forever and never clean them up, rename itself, or go out of business.
If you want reproducibility, the first step is to copy everything to a storage you control. Luckily, this is pretty cheap nowdays
> Until someone overwrites or deletes the Docker base image (regularly happens)
Any source of that claim?
> or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).
Well... please re-read my previous comment - we do Java thing so we use any JDK base image and then we slap our distribution on top of it (which are mostly fixed-version jars).
Of course if you are after perfection and require additional packages then you can install it via dpgk or somesuch but... do you really need that? What about security implications?
You gave example of nvidia and not ubuntu itself. What's more, you are referring to devel(opment) version, i.e. "1.0-devel-ubuntu20.04" which seems like a nightly so it's expected to be overriden (akin to "-SNAPSHOT" for java/maven)?
Besides, if you really need utmost stability you can use image digest instead of tag and you will always get exactly the same image...
Do you have an example that isn't Nvidia? They're infamous for terrible Linux support, so an egregious disregard for tag etiquette is entirely unsurprising.
> Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine.
Another option for reproducible container images is https://github.com/reproducible-containers although you may need to cache package downloads yourself, depending on the distro you choose.
For Debian, Ubuntu, and Arch Linux there are official snapshots available so you don't need to cache package downloads yourself. For example, https://snapshot.debian.org/.
Yes, fantastic work. Downside is that snapshot.debian.org is extremely slow, times out / errors out regularly - very annoying. See also e.g. https://github.com/spesmilo/electrum/issues/8496 for complaints (but it's pretty apparent once you integrate this in your builds).
Yeah, but it's impossible to properly pin w/o running your own mirrors. Anything you install via apt is unpinnable, as old versions get removed when a new version is released; pinning multi-arch Docker base images is impossible because you can only pin on a tag which is not immutable (pinning on hashes is architecture dependent); Docker base images might get deleted (e.g. nvidia-cuda base images); pinning Python dependencies, even with a tool like Poetry is impossible, because people delete packages / versions from PyPI (e.g. jaxlib 0.4.1 this week); GitHub repos get deleted; the list goes on. So you need to mirror every dependency.
> Anything you install via apt is unpinnable, as old versions get removed when a new version is released
Huh, I have never had this issue with apt (Debian/Ubuntu) but frequently with apk/Alpine: The package's latest version this week gets deleted next week.