Automated Code Optimization with E-Graphs

ChrisRackauckas · on Jan 3, 2022

Awesome to see this here! A lot of this work was done by an undergrad as part of his thesis, and I summarized the work in this Twitter thread along with our planned directions for 2022 (https://twitter.com/ChrisRackauckas/status/14772748124604497...). Though the abstract also captures the idea quite well too:

Abstract: This thesis proposes an advanced, generic and high-level code rewriting and analysis system in the Julia programming language, providing applied equality saturation in the presence of multiple dispatch and metaprogramming. We show how our system can practically solve some challenging problems: Can programmers implement their own high-level compiler optimizations for their domain-specific scientific programs, without the requirement of them being compiler experts at all? Can these optimizers be implemented by users in the same language and inside the same programs they want to optimize, solving the two-language problem? Can these compiler optimizers be written in a high-level fashion, as equations, without the need to worry about the rewriting ordering? Thus, can symbolic mathematics do high-level compiler optimizations or vice-versa?

DannyBee · on Jan 3, 2022

For those who are not compiler experts, let me try to give a simplified (but not really wrong) explanation.

About 10 years ago, a paper came out on using something called equality saturation to do compiler optimization. This is a fancy term that really means "generate all equivalences in the IR rather than destructive IR updates".

IE if you have

a = 5 + 5

"Destructive" optimizers will rewrite this to:

a = 10

"Equality saturation" would simply annotate this with

<a = 10> a = 5 + 5

Rather than destroying the original program.

Each optimization generates sets of equivalences and equalities, rather than updating the IR. Over time, equality saturation has moved to more formally expressed rewrite systems as the optimizers.

Something eventually comes through and picks the most "optimal" set of equivalent code for the program.

Note that almost all compilers already do this in various phases for various reasons - LLVM and GCC both, for example, annotate the IR with inequalities and other equivalences prior to doing value optimizations, etc.

Equality saturation has it's niceties:

1. phase ordering is less of an issue- assuming you do this well, you will have no missed optimizations due to alternative representations that were not chosen.

2. more formalized rewrite systems are often verifiable, simpler, and have less bugs

It also has downsides:

1. lots of memory

2. very complicated exploration of the optimization space - none of the optimizations know which set of equivalences will be chosen in the end, so if they want the best code, they have to explore optimizing all the equivalences that have been generated before.

3. Most rewrite passes (formally expressed or not) individually are not optimal enough (and some can be proven impossible to make optimal) that you will still end up with phase ordering issues, just less of them, whether you express your passes as formal rewrite rules or not. In general, most of these problems end up as NP-hard constraint solving problems or equivalent.

Truthfully, it's an approach that is interesting (I went down this rabbit hole myself in LLVM at one point), and if you applied good ML or heuristics to it, i bet you could find a reasonable tradeoff of what to explore vs not, and get better results than the compiler gets now. Expressing them as formally verifiable rewrite rules rather than ad-hoc passes will also make for a lot less compiler bugs ;)

Note that the general notion of expressing optimizations in a equivalence/rewrite language and applying those iteratively, is not new (Stratego was an example of such a thing, there are many). Lots of them were destructive, there were some non-destructive ones, but none (IMHO) as well formalized as equality saturation or e-graphs. This is in part because the computational cost has historically been quite high.

In any case, fast forward 12 years, this paper generalizes equality saturation to a turing complete graph expression (the original paper used extended PEGs) and language for doing rewrites, and uses it to optimize Julia programs, and provide a way for folks to write their own rewrite/equality/etc rules.

This is awesome work. It still has the same issues (matching is NP-hard, etc).

That is, as i said, likely solvable in a practical way by better guiding matching/rewriting/equivalence finding. You can prove you can't get optimal answers anyway (it would require the programs to be statically decidable, and none are), so no point in trying.

The basic problem with both approaches, at least historically is that it's not better enough for the cost. Those who know me know I am a total sucker for abstractly nice optimization algorithms that produce really great results. I'm a big believer we've thrown away a lot of years of compiler theory in the pursuit of making "good enough" compilers. Lots of knowledge and theory is ignored when it would make things a lot better.

But even I have my limits, and pragmatically, like I said, these kinds of approaches have just never been better enough for the cost, nor have they caught on among compiler folks yet for their ability to have less buggy compilers (that part is sad, FWIW)

This is not a knock on the paper - it's a bachelors thesis, and a good one at that.

But in terms of practicality of these approaches in real compilers, i'll believe it when i see it :)

ChrisRackauckas · on Jan 3, 2022

This is a very charitable summary of the pros and cons to e-graph approaches to code rewriting. That said, the part that your analysis misses here is where it's being applied for this specific implementation. Standard optimization passes in languages like Julia/Numba/Jax need to be "general" in the sense that if they do things that violate IEEE semantics then all programs coded in that language could have hidden issues. There's usually a few knobs to tweak, like @fastmath in Julia, but those can be very heavy-handed and not domain-specific. So the question was, can we allow users of dynamic interactive JIT-compiled languages to write their own optimizing passes at runtime? The mechanism to do this is to use an e-graph specification because it can be very easy for domain scientists and engineers to describe what kind of pass they would want simply by defining equalities and a cost function (your description missed the cost function: it needs to be described as the equality saturation then tries to grab the element of the equivalence class that has the lowest cost). This gives a rather high-level yet robust mechanism for people who are not compiler engineers to then describe compiler transformations they would like to apply to their own code.

As the test case, Alessandro then applied this idea to ordinary differential equation (ODE) and differential-algebraic equation (DAE) models written in Julia's ModelingToolkit.jl symbolic system (the MetaTheory.jl and SymbolicUtils.jl symbolic rewrite systems are generic to the IR. Symbolics.jl defines a symbolic computing IR built on these tools, which then means it's a symbolic mathematics library with e-graph and traditional rule rewriting simplifiers, and ModelingToolkit.jl is a modeling framework built on this system. But note that you can directly apply this to Julia IR and LLVM IR, as shown in other examples not in this paper). The KUKA IIWA 14 robotic arm test case is a nice example because it uses rules like a*sin(x) -> 0 if a<tol, definitely a rule that you wouldn't want to generally apply, but is the kind of thing where when a scientist writes some generated code there might just be extra terms around taking a bunch of nonsense compute (this is a real-world example from a real-world user). A simple set of equalities with a cost function to minimize the floating point operations accelerates the simulation by 8.5x while changing the simulation result by ~1e-12 when the tolerance was 1e-11, that's a very useful problem-specific compilation pass that now has a 10 line syntax!

Thus we see the true application of this as not something to replace the "core" compilation passes that Julia or anything else relies on in its standard pipeline, instead we see this as a tool that scientists and application writers can use to selectively extend the compiler on the fly with domain-knowledge. This allows for compiler passes that are too specific and/or destructive to apply generically, but when applied smartly and selectively it can do some interesting things. Things like "fuse all a*b + c into fma(a,b,c)" is a rather trivial thing to write with this framework, so while a programming language may not make this the default behavior (since fma changes floating point results), this can make it easy to selectively enable such passes. That said, we still need to hook this into Julia's AbstractInterpreter so we can more easily apply it across Julia codes (I even wrote a Twitter thread saying our goal of 2022 should be to complete this https://twitter.com/ChrisRackauckas/status/14772748124604497...), there are many improvements to the e-graph saturation that could be implemented, there's many other applications to look into, etc. but I find it's a very promising start and I'm really impressed with how far Alessandro has come.

DannyBee · on Jan 3, 2022

I do get that, and I understand the application to the more general numerical/ML language space. I glossed over it because I think, in the short term, as a way of exploration/filling in shortcomings, it's a useful approach.

In the medium/long term, i hope that particular use case dies a horrible death (no offense!), and stays highly limited to a few national labs or others. In general, i think you will discover over time folks will not allow it in a production system in any way unless it was worth a billion dollars or enabled something fundamentally impossible otherwise.

That's why i glossed over it - this is not the first system i've seen try to allow normal users to add compiler optimizations for good reasons (I don't mean that sarcastically). I've even seen easy to use ones, and I've seen them generate the kinds of speedups you talk about, and be worth tons. All of them eventually stopped being used as soon as possible.

These are most useful when the underlying infrastructure/field is new/being fleshed out, like it is now. When the chaos starts to settle, they are incredibly hard to maintain, etc. It also becomes much easier to make something that serves users well enough over time.

As an aside, i'll also say that outside of a few domains, or very select applications, it is also incredibly hard to get people to care about performance, and in particular, to trade perceived (not necessarily real!) reliability or standardization for performance.

I think the TL;DR is basically - i think over time you will have a lot of trouble getting anyone writ large to want to allow or use those kind of extensions in production systems, and beyond that, the inability of the infrastructure to serve the majority of users without their help is a product failure ;)

Obviously, this is just my opinion from watching these sorts of spaces for decades and seeing how they develop, I know of others who have the exact opposite opinion (IE extensibility is more important than anything), so take it for whatever it's worth :)

ChrisRackauckas · on Jan 3, 2022

It doesn't just have to be for application-specific performance though. You can also define the cost function to be to minimize the floating point error of a specific set of codes, which is what egg-herbie does. I can see throwing a custom pass on code that says "make this be the most floating point stable version it can find" as a nice quick fix for cases where application scientists without a lot of numerical analysis experience hit cases which traditionally would've required a lot more thought and care.

Another application we have in mind is to do similar linear algebra transformations to XLA. For example, sequential matrix-vector operations (A*v1 + A*v2) can be more optimally be applied as a single BLAS3 matrix-matrix call (A*[v1;v2]). Those kinds of rules can be very easily structured as an E-graph, and doing it this way would make it very inviting to mathematicians to extend and maintain the rulesets.

Both of those are cases where you may not want those passes to always be running, but they are super helpful in many numerical applications. And there's a lot more where that came from. Forward-mode AD can be implemented quite naturally this way. And there's a few more complex examples I'll hold in my back pocket until they are completed.

jpsamaroo · on Jan 3, 2022

I think the OP's point is that running these optimizations in production as-is is dangerous because future code changes in the various places could accidentally impede the optimizer's ability to apply all the transformations that users of the codebase expect.

The obvious solution is to query the optimizer to get the final transformation as actual Julia code, and replace the pre-transform code with the post-transform optimized code, and disable any further optimization (aside from very trivial transforms that aren't worth directly including). This ensures that one doesn't accidentally lose the amazing benefits of this symbolic optimization approach on a given piece of code, and that production code always keeps its performance and correctness.

throwaway81523 · on Jan 3, 2022

GHC rewrite directives sound less powerful than this, but they have been available for years, and "ordinary" people use them. Although, Haskell tends to attract compiler nerds the way Julia attracts numerics nerds.

gugagore · on Jan 3, 2022

Here is a short (5 min) video explaining E-Graphs: https://www.youtube.com/watch?v=ap29SzDAzP

There is also a longer (30 min) video that goes into details.

I recommend watching them!

Also consider this blogpost: https://blog.sigplan.org/2021/04/06/equality-saturation-with...

minetest2048 · on Jan 3, 2022

> https://www.youtube.com/watch?v=ap29SzDAzP This video isn't available anymore

gugagore · on Jan 3, 2022

Oops, sorry: https://www.youtube.com/watch?v=ap29SzDAzP0