It is normally not a necessary feature of a compiler to be determistic. A compil...

munificent · on June 28, 2024

It's technically correct that a language specification is rarely precise enough to require compiler output to be deterministic.

But it's pragmatically true that engineers will want to murder you if your compiler is non-deterministic. All sorts of build systems, benchmark harnesses, supply chain validation tools, and other bits of surrounding ecosystem will shit the bed if the compiler doesn't produce bitwise identical output on the same input and compiler flags.

foobazgt · on June 28, 2024

Can vouch for this having fixed non-determinism bugs in a compiler. Nobody is happy if your builds aren't reproducible. You'll also suffer crazy performance problems as everything downstream rebuilds randomly and all your build caches randomly miss.

SushiHippie · on June 28, 2024

NixOS with its nixpkgs [0] and cache [1] would also not work if compilers weren't reproducible. Though they won't use something like PGO or some specific optimization flags as these would very likely lead to unreproducible builds. For example most distros ship a PGO optimized build of Python while nixos does not.

[0] https://github.com/nixos/nixpkgs

[1] https://cache.nixos.org/

boomanaiden154 · on June 28, 2024

PGO can be used in such situations, but the profile needs to be checked in. Same code + same profile -> same binary (assuming the compiler is deterministic, which is tested quite extensively).

There are several big projects that use PGO (like Chrome), and you can get a deterministic build at whatever revision using PGO as the profiles are checked in to the repository.

vlovich123 · on June 28, 2024

It’s called autofdo although I’ve struggled to get it working well in Rust.

boomanaiden154 · on June 28, 2024

It's not called AutoFDO. AutoFDO refers to a specific sampling-based profile technique out of Google (https://dl.acm.org/doi/abs/10.1145/2854038.2854044). Sometimes people will refer to that as PGO though (with PGO and FDO being somewhat synonymous, but PGO seeming to be the preferred term in the open source LLVM world). Chrome specifically uses instrumented PGO which is very much not AutoFDO.

PGO works just fine in Rust and has support built into the compiler (https://doc.rust-lang.org/rustc/profile-guided-optimization....).

vlovich123 · on June 29, 2024

I wasn’t trying to conflate the two. PGO traditionally meant a trace build but as a term it’s pretty generic, at least to me to the general concept of “you have profile information that replaces generically tuned heuristics that the compiler uses). AutoFDO I’d classify as an extension to that concept to a more general PGO technique; kind of ThinLTO vs LTO. Specifically, it generates the “same” information to supplant compiler heuristics, but is more flexible in that the sample can be fed back into “arbitrary” versions of the code using normal sampling techniques instead of an instrumented trace. The reason sampling is better is that it more easily fits into capturing data from production which is much harder to accomplish for the tracing variant (due to perf overheads). Additionally, because it works across versions the amortized compile cost drops from 2x to 1x because you only need to reseed your profile data periodically.

I was under the impression they had switched to AutoFDO across the board but maybe that’s just for their cloud stuff and Chrome continues to run a representative workload since that path is more mature. I would guess that if it’s not being used already, they’re exploring how to make Chrome run AutoFDO for the same reason everyone started using ThinLTO - it brought most of the advantages while fixing the disadvantages that hampered adoption.

And yes, while PGO is available natively, AutoFDO isn’t quite as smooth.

boomanaiden154 · on June 29, 2024

I'm not sure where you're getting your information from.

Chrome (and many other performance-critical workloads) is using instrumented PGO because it gives better performance gains, not because it's a more mature path. AutoFDO is only used in situations where collecting data with an instrumented build is difficult.

vlovich123 · on June 29, 2024

Last I looked AutoFDO builds were similar in performance to PGO as ThinLTO vs LTO is. I’d say that collecting data with an instrumented Chrome build is extremely difficult - you’re relying on your synthetic benchmark environment which is very very different from the real world (eg extensions aren’t installed, the patterns of websites being browsed is not realistic, etc). There’s also a 2x compile cost because you have to build Chrome twice in the exact same way + you have to run a synthetic benchmark on each build to generate the trace.

I’m just using an educated guess to say that at some point in the future Chrome will switch to AutoFDO, potentially using traces harvested from end user computers (potentially just from their employees even to avoid privacy complaints).

boomanaiden154 · on June 29, 2024

You can make the synthetic benchmarks relatively accurate, it just takes effort. The compile-time hit and additional effort is often worth it for the extra couple percent for important applications.

Performance is also pretty different on the scales that performance engineers are interested in for these sorts of production codes, but without the build system scalability problems that LTO has. The original AutoFDO paper shows an improvement of 10.5%->12.5% going from AutoFDO to instrumented PGO. That is pretty big. It's probably even bigger with newer instrumentation based techniques like CSPGO.

They also mention the exact reasons that AutoFDO will not perform as well, with issues in debug info and losing profile accuracy due to sampling inaccuracy.

I couldn't find any numbers for Chrome, but I am reasonably certain that they have tried both and continue to use instrumented PGO for the extra couple percent. There are other pieces of the Chrome ecosystem (specifically the ChromeOS kernel) that are already optimized using sampling-based profiling. It's been a while since I last talked to the Chromium toolchain people about this though. I also remember hearing them benchmark FEPGO vs IRPGO at some point and concluding that IRPGO was better.

c0balt · on June 29, 2024

Yeah, and nixpkgs also, last time I checked, does patch GCC/ clang to ensure determinism. Many compilers and toolchain by default want to, e.g., embed build information that may leak from the build env in a non-deterministic/ non-reprodicible manner.

munificent · on June 28, 2024

Yup. Even so much as inserting the build timestamp into the generated executable (which is strangely common) causes havoc with build caching.

a_t48 · on June 28, 2024

NVCC CUDA builds were nondeterministic last time I checked, it made certain things (trying to get very clever with generating patches) difficult. This was also hampered by certain libraries (maybe GTSAM?) wanting to write __DATE__ somewhere in the build output, creating endlessly changing builds.

sigmoid10 · on June 29, 2024

In parallel computing you run into nondeterminism pretty quickly anyways - especially with CUDA because of undetermined execution order and floating point accuracy.

a_t48 · on June 29, 2024

Yes, at runtime. Compiling CUDA doesn’t require a GPU, though, and doesn’t really use “parallel computing”. I think CUDA via clang gets this right and will produce the same build every time - it was purely an NVCC issue.

skybrian · on June 29, 2024

I’m amused by the possibility of a compiler having a flag to set a random seed. (with a fixed default, of course).

If you hit a compiler bug, you could try a different seed to see what happens.

Or how about a code formatter with a random seed?

Tool developers could run unit tests with a different seed until they find a bug - or hide the problem by finding a lucky seed for which you have no provable bugs :)

Edit:

Or how about this: we write a compiler as a nondeterministic algorithm where every output is correct, but they are optimized differently depending on an input vector of choices. Then use machine learning techniques to find the picks that produce the best output.

Sophira · on June 28, 2024

Plus, these models are entirely black boxes. Even given weights, we don't know how to look at them and meaningfully tell what's happening - and not only that, but training these models is likely not cheap at all.

Stable output is how we can verify that attacks like the one described in Reflections on Trusting Trust[0] don't happen.

[0] https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

skissane · on June 30, 2024

> But it's pragmatically true that engineers will want to murder you if your compiler is non-deterministic. All sorts of build systems, benchmark harnesses, supply chain validation tools, and other bits of surrounding ecosystem will shit the bed if the compiler doesn't produce bitwise identical output on the same input and compiler flags.

I think that’s rather true nowadays, but hasn’t always been thus. Back in the 20th century, non-deterministic compiler output was very common - even if only due to the common practice of embedding the compilation timestamp in the resulting executable - and very few ever cared. Whereas nowadays, there is a much bigger culture of hermetic/reproducible build processes, in which stuff like embedding compilation timestamps in the executable or object files is viewed as an antipattern.

azinman2 · on June 29, 2024

Just fix the random seed :)

dheera · on June 29, 2024

LLMs can be deterministic if you set the random seed and pin it to a certain version of the weights.

My bigger concern would be bugs in the machine code would be very, very difficult to track down.

pton_xd · on June 28, 2024

> It is normally not a necessary feature of a compiler to be determistic. A compiler should be correct against a specification.

That sounds like a nightmare. Optimizing code to play nice with black-box heuristic compilers like V8's TurboFan is, already in fact, a continual maintenance nightmare.

If you don't care about performance, non-deterministic compilation is probably "good enough." See TurboFan.

stavros · on June 28, 2024

LLMs are deterministic. We inject randomness after the fact, just because we don't like our text being deterministic. Turn temperature to 0 and you're good.

waldrews · on June 28, 2024

But temperature 0 LLM's don't exhibit the emergent phenomena we like, even in apparently non-creative tasks. The randomness is, in some sense, a cheap proxy for an infeasible search over all completion sequences, much like simulated annealing with zero temperature is a search for a local optimum but adding randomness makes it explore globally and find more interesting possibilities.

sebzim4500 · on June 28, 2024

Sure but you could add pseudo random noise instead and get the same behavior while retaining determinism.

refulgentis · on June 29, 2024

Temperature is at ~1.2 in this thread, here's some 0.0:

- Yes, temperature 0.0 is less creative.

- Injecting pseudo-random noise to get deterministic creative outputs is "not even wrong", in the Wolfgang Pauli sense. It's fixing something that isn't broken, with something that can't fix it, that if it could, would be replicating the original behavior - more simply, it's proposing non-deterministic determinism.

- Temperature 0.0, in practice, is an LLM. There aren't emergent phenomena, in the sense "emergent phenomena" is used with LLMs, missing. Many, many, many, applications use this.

- In simplistic scenarios, on very small models, 0.0 could get stuck literally repeating the same token.

- There's a whole other layer of ex. repeat penalties/frequency penalties and such that are used during inference to limit this. Only OpenAI and llama.cpp expose repeat/frequency.

- Temperature 0.0 is still non-deterministic on ex. OpenAI, though substantially the same, and even the same most of the time. It's hard to notice differences. (Reproducible builds require extra engineering effort, the same way ensuring temperature = 0.0 is truly deterministic requires engineering effort.)

- Pedantically, only temperature 0.0 at the same seed (initial state) is deterministic.

Sophira · on June 28, 2024

Even then, though, the output could change drastically based on a single change to the input (such as a comment).

That's not something you want in a compiler.

wakawaka28 · on June 29, 2024

It is very important for a compiler to be deterministic. Otherwise you can't validate the integrity of binaries! We already have issues with reproducibility without adding this shit in the mix.

riedel · on June 29, 2024

Reproducible builds are an edge case, that require determistic compilation for sure. But profile based optimisation or linker address randomisation are sometimes also useful. While rule out one thing for the other. Normally you can easily turn on and off optimisation depending on your need. Just do -O0 if you want determinism. But normally you should not rely on it (also at execution time)