Hacker Newsnew | past | comments | ask | show | jobs | submit | janjongboom's commentslogin

This false sense of reproducability is why I funded https://docs.stablebuild.com/ some years ago. It lets you pin stuff in dockerfiles that are normally unpinnable like OS package repos, docker hub tags and random files on the internet. So you can go back to a project a year from now and actually get the same container back again.


Isn't this problem usually solved by building an actual image for your specific application, tagging that and pushing to some docker repo? At least that's how it's been at placec I've worked at that used docker. What am I missing?


What do you do when you then actually need to make a change to your application (e.g. a 1-liner fix)? Edit the binary image?


Build and tag internal base images on a regular cadence that individual projects then use in their FROM. You’ll have `company-debian-python:20250901` as a frozen-in-time version of all your system level dependencies, then the Dockerfile using it handles application-level dependencies with something that supports a lockfile (e.g. uv, npm). The application code itself is COPY’d into the image towards the end, such that everything before it is cached, but you’re not relying on the cache for reproducibility, since you’re starting from a frozen base image.

The base image building can be pretty easily automated, then individual projects using those base images can expect new base images on a regular basis, and test updating to the latest at their leisure without getting any surprise changes.


At that point you're doing most of the work yourself, and the value add from Docker is pretty small (although not zero) - most of the gains are coming from using a decent language-level dependency manager.


You can always edit the file in the container and re-upload it with a different tag. That's not best practice, but it's not exactly sorcery.


It's not, but at that point you're giving up on most of the things Docker was supposed to get you. What about when you need to upgrade a library dependency (but not all of them, just that one)?


I'm not sure what the complication here is. If application code changes, or some dependency changes, you build a new docker image as needed, possibly with an updated Dockerfile as well if that's required. The Dockerfile is part of the application repo and versioned just like everything else in the repo. CICD helps build and push a new image during PRs, or tag creation, just like you would with any application package / artifact. Frequent building and pushing of docker images can over time start taking up space of course but you can take care of that by maybe cleaning out old images from time to time if you can determine they're no longer needed.


you append it to the end of the docker file so that the previous image is still valid with its cached build steps


And just keep accreting new layers indefinitely?


Docker re-uses layers as needed and can detect when a new layer needs to be added. It's not like images grow in size without bound each time something is changed in the Dockerfile.


Perhaps more focused on docker-based development workflows than final deployment.


Builds typically aren’t retained forever.


The removal (or moving) of the Bitnami images from Docker Hub is going to break a ton of systems that depend on them. I helped set up https://www.stablebuild.com/ some years ago to counter these types of issues, it provides (among other things) a transparent cache to Docker Hub which automatically caches image tags and makes them immutable - underlying tag might be deleted or modified, but you’ll get the exact same original image back.


That's what the announcement said, there is a copy of everything at https://hub.docker.com/u/bitnamilegacy


Still gonna break everyone’s CI until they manually update the tag. (And who guarantees that these tags will stay alive after they pull this)


This is exactly why so many of us have advocating for private registries and copies of every image you run in production. Pulling straight from Docker Hub was always a risk.


Seconding this - absolutely terrific content.


FYI, I've helped set up StableBuild (https://www.stablebuild.com) to help pin stuff in Docker that's normally virtually impossible to pin (e.g. OS package repos, Docker base images, random files from the internet, etc.)


Not never. E.g. all the capital we as founders put in the business before we raised our seed round was converted into Series Seed Preferred shares at the same rights as angels / seed VC. Small portion of total equity but still.


Packages and versions can be deleted from PyPI, which can be a massive pain in the ass for anyone consuming these packages. Can have your whole Python dependency tree pinned => author pulls a package version => builds broken. As part of StableBuild (https://www.stablebuild.com) we create full daily snapshots of the PyPI registry - so figured it would be nice to make an overview of deleted packages/versions and make the wheels available for download.

E.g. jaxlib 0.4.4 was removed a few days back: https://dashboard.stablebuild.com/pypi-deleted-packages/pkg/... => can download the wheels for free w/o registration


Apologies for editorializing the title a bit :-)

Reading through the comments in https://news.ycombinator.com/item?id=39720007 I saw a common misconception pop up again: Dockerfiles are not deterministic. But they _look_ like they are, and even are for a while: Build a Dockerfile on your local machine; then build it again => most likely exactly the same container. Stuff starts to break down quickly though; so I did this writeup some time ago that should be informative for the wider community.


And that assumes that `foo` and `bar` are not overwritten or deleted in your package repository, and that the git repository remains available.


That's the interesting bit about Dockerfiles. They look _looks_ deterministic, and they even are for a while while you're looking at it as a developer. I've done a detailed writeup of how it's not deterministic in https://docs.stablebuild.com/why-stablebuild


I've gone down the same path. I love deterministic builds, and I think Docker's biggest fault is that to the average developer a Dockerfile _looks_ deterministic - and it even is for a while (build a container twice in a row on the same machine => same output), but then packages get updated in the package manager, base images get updated w/ the same tag, and when you rebuild a month later you get something completely different. Do that times 40 (the number of containers my team manages) and now fixing containers is a significant part of your job.

So in theory Nix would be perfect. But it's not, because it's so different. Get a tool from a vendor => won't work on Nix. Get an error => impossible to quickly find a solution on the web.

Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine. Currently consists of an immutable Docker Hub pull-through cache, full daily copies of the Ubuntu/Debian/Alpine package registries, full daily copies of most popular PPAs, daily copies of the PyPI index (we do a lot of ML), and arbitrary immutable file/URL cache.

So far it's been the best of both worlds in my day job: easy to write, easy to debug, wide software compatibility, and we have seen 0 issues due to non-determinism in containers that we moved over to StableBuild in my day job.


I think this issue is not specific to containers.

I've work many years on bare metal. We did (by requirement) acceptance tests, so we did need deterministic builds, before such thing had even a name, or at least before it was mentioned as much as nowadays.

Redhat has a lot of tooling around versioning of mirrors, channels, releases, updates, etc. But I'm so old that even foreman and spacewalk didn't exist, redhat satellite was out of the budget, and the project was migrating from the first versions of CentOS to Debian.

What I did was simply use DNS + Vhosts (dev, stage, prod + versions) for our own package mirrors, and bash+rsync (and of course, raid+backups), with both, CentOS and Debian (and our project packages).

So we had repos like prod/v1.1.0, stage/v1.1.0, dev/v1.1.0, dev/v2.0.0, dev/2.0.1, etc Allowing us to rebuild things without praying, backport bug fixings with confidence, etc

Feels old and simple, however I think it was the same problem/issue that people gets now (re)building containers.

If you need to be able to produce the same output from the same input, you need the same input.

BTW about stablebuild: nice project!


But also Nix solves more problems than Docker. For example if you need to use different versions of software for different projects. Nix lets you pick and choose the software that is visible in your current environment without having to build a new Docker image for every combination, which leads to a combinatorial explosion of images and is not practical.

But I also agree with all the flaws of Nix people are pointing out here.


I don't have any experience with Nix but regarding stable builds of Docker: we provide Java application, have all dependencies as fixed versions so when doing a release, if someone is not doing anything fishy (re-releasing particular version, which is bad-bad-bad) you will get exactly same binaries on top of the same image (again, considering you are not using `:latest` or somesuch)...


Until someone overwrites or deletes the Docker base image (regularly happens), or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).


I am convinced that any sort of free public service is fundamentally incomapatible with long term reproducible builds. It is simply unfair to expect free service to maintain archives forever and never clean them up, rename itself, or go out of business.

If you want reproducibility, the first step is to copy everything to a storage you control. Luckily, this is pretty cheap nowdays


> Until someone overwrites or deletes the Docker base image (regularly happens)

Any source of that claim?

> or when you depend on some packages installed through apt - as you'll get the latest version (impossible to pin those).

Well... please re-read my previous comment - we do Java thing so we use any JDK base image and then we slap our distribution on top of it (which are mostly fixed-version jars).

Of course if you are after perfection and require additional packages then you can install it via dpgk or somesuch but... do you really need that? What about security implications?


> Any source of that claim?

Any tag like ubuntu:20.04 -> this tag gets overwritten every time there's a new release (which is very often)

https://hub.docker.com/r/nvidia/cuda -> these get removed (see e.g. https://stackoverflow.com/questions/73513439/on-what-conditi...)


You gave example of nvidia and not ubuntu itself. What's more, you are referring to devel(opment) version, i.e. "1.0-devel-ubuntu20.04" which seems like a nightly so it's expected to be overriden (akin to "-SNAPSHOT" for java/maven)?

Besides, if you really need utmost stability you can use image digest instead of tag and you will always get exactly the same image...


Do you have an example that isn't Nvidia? They're infamous for terrible Linux support, so an egregious disregard for tag etiquette is entirely unsurprising.


> Anyway, out of that frustration I've funded https://www.stablebuild.com. Deterministic builds w/ Docker, but with containers built on Ubuntu, Debian or Alpine.

Very nice project!


Another option for reproducible container images is https://github.com/reproducible-containers although you may need to cache package downloads yourself, depending on the distro you choose.


Yeah, very similar approach. We did this before, see e.g. https://www.stablebuild.com/blog/create-a-historic-ubuntu-pa... - but then figured everyone needs exactly the same packages cached, so why not set up a generic service for that.


For Debian, Ubuntu, and Arch Linux there are official snapshots available so you don't need to cache package downloads yourself. For example, https://snapshot.debian.org/.


Yes, fantastic work. Downside is that snapshot.debian.org is extremely slow, times out / errors out regularly - very annoying. See also e.g. https://github.com/spesmilo/electrum/issues/8496 for complaints (but it's pretty apparent once you integrate this in your builds).


Ubuntu now has snapshot.ubuntu.com, see https://ubuntu.com/blog/ubuntu-snapshots-on-azure-ensuring-p...

Here's a related discussion about reproducible builds by the Docker people, where they provide some more details: https://github.com/docker-library/official-images/issues/160...


Just pin the dependencies and your mostly fine right?


Yeah, but it's impossible to properly pin w/o running your own mirrors. Anything you install via apt is unpinnable, as old versions get removed when a new version is released; pinning multi-arch Docker base images is impossible because you can only pin on a tag which is not immutable (pinning on hashes is architecture dependent); Docker base images might get deleted (e.g. nvidia-cuda base images); pinning Python dependencies, even with a tool like Poetry is impossible, because people delete packages / versions from PyPI (e.g. jaxlib 0.4.1 this week); GitHub repos get deleted; the list goes on. So you need to mirror every dependency.


> Anything you install via apt is unpinnable, as old versions get removed when a new version is released

Huh, I have never had this issue with apt (Debian/Ubuntu) but frequently with apk/Alpine: The package's latest version this week gets deleted next week.


> apt is unpinnable, as old versions get removed

not necessarily, eg snapshot.debian.org

> pinning on hashes is architecture dependent

can't you pin the multi-arch manifest instead?

I still like StableBuild for protection against package deletion, and mirroring non-pinnable deps


The pricing page for StableBuild says

Free …

Number of Users 1

Number of Users 15GB

Is that a mistake or if not can you explain please?

https://www.stablebuild.com/pricing


Ah, yes, on mobile it shows the wrong pricing table... Copying here while I get it fixed:

Free => Access to all functionality, 1 user, 15GB traffic/month, 1GB of storage for files/URLs. $0

Pro => Unlimited users, 500GB traffic included (overage fees apply), 1TB of storage included. $199/mo

Enterprise => Unlimited users, 2,000GB traffic included (overage fees apply), 3TB of storage included, SAML/SSO. $499/mo


Are you associated with the project?


I’m an investor in StableBuild.


What is an efficient process to avoid using versions with known vulnerabilities for long times when using a tool like stablebuild?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: