Can vouch for this having fixed non-determinism bugs in a compiler. Nobody is ha...

SushiHippie · on June 28, 2024

NixOS with its nixpkgs [0] and cache [1] would also not work if compilers weren't reproducible. Though they won't use something like PGO or some specific optimization flags as these would very likely lead to unreproducible builds. For example most distros ship a PGO optimized build of Python while nixos does not.

[0] https://github.com/nixos/nixpkgs

[1] https://cache.nixos.org/

boomanaiden154 · on June 28, 2024

PGO can be used in such situations, but the profile needs to be checked in. Same code + same profile -> same binary (assuming the compiler is deterministic, which is tested quite extensively).

There are several big projects that use PGO (like Chrome), and you can get a deterministic build at whatever revision using PGO as the profiles are checked in to the repository.

vlovich123 · on June 28, 2024

It’s called autofdo although I’ve struggled to get it working well in Rust.

boomanaiden154 · on June 28, 2024

It's not called AutoFDO. AutoFDO refers to a specific sampling-based profile technique out of Google (https://dl.acm.org/doi/abs/10.1145/2854038.2854044). Sometimes people will refer to that as PGO though (with PGO and FDO being somewhat synonymous, but PGO seeming to be the preferred term in the open source LLVM world). Chrome specifically uses instrumented PGO which is very much not AutoFDO.

PGO works just fine in Rust and has support built into the compiler (https://doc.rust-lang.org/rustc/profile-guided-optimization....).

vlovich123 · on June 29, 2024

I wasn’t trying to conflate the two. PGO traditionally meant a trace build but as a term it’s pretty generic, at least to me to the general concept of “you have profile information that replaces generically tuned heuristics that the compiler uses). AutoFDO I’d classify as an extension to that concept to a more general PGO technique; kind of ThinLTO vs LTO. Specifically, it generates the “same” information to supplant compiler heuristics, but is more flexible in that the sample can be fed back into “arbitrary” versions of the code using normal sampling techniques instead of an instrumented trace. The reason sampling is better is that it more easily fits into capturing data from production which is much harder to accomplish for the tracing variant (due to perf overheads). Additionally, because it works across versions the amortized compile cost drops from 2x to 1x because you only need to reseed your profile data periodically.

I was under the impression they had switched to AutoFDO across the board but maybe that’s just for their cloud stuff and Chrome continues to run a representative workload since that path is more mature. I would guess that if it’s not being used already, they’re exploring how to make Chrome run AutoFDO for the same reason everyone started using ThinLTO - it brought most of the advantages while fixing the disadvantages that hampered adoption.

And yes, while PGO is available natively, AutoFDO isn’t quite as smooth.

boomanaiden154 · on June 29, 2024

I'm not sure where you're getting your information from.

Chrome (and many other performance-critical workloads) is using instrumented PGO because it gives better performance gains, not because it's a more mature path. AutoFDO is only used in situations where collecting data with an instrumented build is difficult.

vlovich123 · on June 29, 2024

Last I looked AutoFDO builds were similar in performance to PGO as ThinLTO vs LTO is. I’d say that collecting data with an instrumented Chrome build is extremely difficult - you’re relying on your synthetic benchmark environment which is very very different from the real world (eg extensions aren’t installed, the patterns of websites being browsed is not realistic, etc). There’s also a 2x compile cost because you have to build Chrome twice in the exact same way + you have to run a synthetic benchmark on each build to generate the trace.

I’m just using an educated guess to say that at some point in the future Chrome will switch to AutoFDO, potentially using traces harvested from end user computers (potentially just from their employees even to avoid privacy complaints).

boomanaiden154 · on June 29, 2024

You can make the synthetic benchmarks relatively accurate, it just takes effort. The compile-time hit and additional effort is often worth it for the extra couple percent for important applications.

Performance is also pretty different on the scales that performance engineers are interested in for these sorts of production codes, but without the build system scalability problems that LTO has. The original AutoFDO paper shows an improvement of 10.5%->12.5% going from AutoFDO to instrumented PGO. That is pretty big. It's probably even bigger with newer instrumentation based techniques like CSPGO.

They also mention the exact reasons that AutoFDO will not perform as well, with issues in debug info and losing profile accuracy due to sampling inaccuracy.

I couldn't find any numbers for Chrome, but I am reasonably certain that they have tried both and continue to use instrumented PGO for the extra couple percent. There are other pieces of the Chrome ecosystem (specifically the ChromeOS kernel) that are already optimized using sampling-based profiling. It's been a while since I last talked to the Chromium toolchain people about this though. I also remember hearing them benchmark FEPGO vs IRPGO at some point and concluding that IRPGO was better.

c0balt · on June 29, 2024

Yeah, and nixpkgs also, last time I checked, does patch GCC/ clang to ensure determinism. Many compilers and toolchain by default want to, e.g., embed build information that may leak from the build env in a non-deterministic/ non-reprodicible manner.

munificent · on June 28, 2024

Yup. Even so much as inserting the build timestamp into the generated executable (which is strangely common) causes havoc with build caching.