> Anyone waiting for 10x improvements is likely to be disappointed. I don’t see that happening, except for the occasional unusual program (such as the chess engine case mentioned above).
> The compiler is a large, mature project. It’s well over 10 years old, and hundreds of thousands of lines. Projects like that tend to not get quantum leaps in performance. Many small improvements is more typical.
And
> And it’s hard to radically change the data structures used in a mature compiler.
As the author describes, the compiler is likely fairly well optimized for its existing way of doing things. 10x improvements would come from architectural changes and it sounds like the compiler is so big now that this is difficult. The other piece that makes it difficult is that LLVM is a separate also large project.
An example of an architectural change I think hasn’t been investigated enough is in the space of PGO. While compilers use PGO to choose better heuristics, they don’t use that to control optimization levels applied to various code. Most programs have performance critical compute paths, but the rest of the program is no where near that. I feel like dropping expensive optimization passes for “cold” code would provide significant savings and Rust already lets you annotate cold/hot functions. Would be nice if it combined it with call flow analysis to build something like audio ABR but for optimization.
Anyway, such things are difficult research projects to attempt on existing large codebases. I believe the author that further meaningful performance wins are unlikely without deeper architectural changes. This is very similar to comments that Lemire makes in [1] and is well understood to be the same (often misunderstood) critique as Knuth made many decades ago about premature optimization. Architecture trumps code tuning.
Where can I read up on this? AFAIK PGO is used in lieu of various heuristics within a pass, but is not used to remove optimization passes on “dead code”.
Good question, I unfortunately don't have a link to give you, or fine details, but GCC does hot/cold partitioning and applies different optimization level to hot and cold code blocks. IIRC, it uses the level given on the command line for hot code, and -Os for cold code.
Exist many potential ways to speed up a compiler, than are proven (example: pascal), but the most nasty truth about that is that change the language.
That is a reason for why will be impossible to fix Python (you can't assert that anything will keep their types) or in this case Rust (several features are "made" with the sole purpose to make the compiler pipeline SAD!).
You need certain discipline to develop both the language/syntax/semantic so the compiler can exploit them for speed.
---
However, I suspect it must be possible to speed up Rust and other langs massively, because you can chew gigabytes in a RDBMs in seconds. I now wonder if a "compiler database engine" could help, so you can store the lexemes, sources, inferences, etc and query them.
Also, A LOT of the work of compiler is just repeat the same steps over and over, and exist some research about save pre-computed snippets to avoid that.
This doesn't feel like the same thing. The performance benefits of an RDBMS come from the way its storage is organized and (at least in typical RDBMS scenarios) the way queries are dynamically optimized to traverse that storage, based on the data it contains.
The thing rustc calls "queries" is more about incremental compilation- reusing work within and across compiler invocations, but without the sort of impact on memory layout and access patterns that you get from an RDBMS.
The closest thing I've seen in a compiler is rather Zig's AST memory layout work: https://github.com/ziglang/zig/pull/7920. But this is a very "hand-crafted" version of what an RDBMS does.
It's hard work. Small AST changes often require hundreds of changes to the code. The required changes usually make the AST less ergonomic to work with. And the perf benefits I obtained were very small. Even after shrinking `ast::Expr` (by far the most common AST node kind) from over 100 bytes to 64 bytes on 64-bit.
The linked Zig PR has very impressive reductions in walltime/cycles/etc. but if you read closely it's restricted just to parsing, which is usually a very small slice of compile time, at least for rustc. My experience with these kinds of changes was disappointing. I concluded "I’d love to be proven wrong, but it doesn’t feel like this work is taking place in a good part of the effort/benefit curve."
Of course Zig is a very different language and its compiler handles a rather different workload. It's totally possible that their approach makes more sense in younger codebases, or with a different source language design, or whatever else. But I also don't think node size tells the whole story- there's a synergy between memory usage, memory layout, and memory access patterns. For example Cranelift gets a lot of mileage from tweaking their algorithms in combination with their data structures, e.g. the "half-move" design mentioned in https://cfallin.org/blog/2022/06/09/cranelift-regalloc2/#per...
Perf was a bit of a wash on this one but it means we can serialize most of the compiler's state with a single pwritev syscall. For a 300,000 line codebase, this data takes up 30 MiB and uses the same format on disk as in memory. On my laptop, 30 MiB can be written from memory to disk in 25ms. This is one puzzle piece for incremental compilation. More puzzle pieces are listed in the PR description here: https://github.com/ziglang/zig/pull/16917
I guess I'm in the thought camp (a diminishingly small camp, if HN is any measure) that compile times don't matter. If you want short compile times, use interpreted languages. I appreciate the checks, validation, and optimization a compiler gives, and Rust's cuts down on total development time because of the extensive checks it performs.
I remember a time where C++ projects took hours to build that today only take a few minutes. Perhaps that's an assist when 'dealing' with compilation times.
I also don't think tests should be blazing fast. I'd rather correct tests than fast tests, just like I'd rather a thorough compilation over a fast one.
Anecdotally, even with a project that has an upwards of 50 dependencies, many of them *-sys crates, I've never thought that compilation times were that bad.
Speaking of interpreted languages, Javascript projects have historically taken exponentially longer to compile (looking at you, Meteor) and are rarely incremental.
That's not to say "don't optimize things" of course but there are seemingly many people very upset or critical of Rust's compile times that I've never fully understood.
I use rust because of the value it brings for the reasons you say and for others, but it would bring me even more value if it built faster. Compile times absolutely matter; how they trade off against other things will depend on the particular project.
We could do day-to-day Rust/C++/any compiled language related development with an interpreter/JIT and then compile the code once a PR is ready and the code hits CI/CD or gets deployed. The "instant" runtime doesn't even need to have the exact same behavior -- it can be 99% accurate and still be very useful.
All mainstream build systems already support incremental builds, and in some cases it's even possible to leverage compiler cache to further shave down the time it takes to rebuild a project.
Breaking down a monolithic project into submodules is all it takes to shave off a big chunk of time from a build.
Imagine being able to set a breakpoint in a function, change something and ask the debugger to “travel” back in time and pretend the code was always what it’s been changed to. It’s something some languages give you but I’m not sure an incremental compiler is sufficient?
These kinds of few seconds/minutes shaved add up and can impact one’s productivity.
for that kind of thing I would imagine more late binding in the language is helpful
at the level Rust works, late binding is hard to do but I don't think it's impossible. Name mangling and calling conventions are a couple big obstacles.
I don't understand this take. Correct and fast are absolutely not mutually exclusive, wether it apply to tests or compilation. The rust compiler is not slow because of all the checks it performs, it's slow because of the way those checks are implemented.
> If you want short compile times, use interpreted languages
Interpreted languages are not compiled, so this sentence doesn't make sense.
> Interpreted languages are not compiled, so this sentence doesn't make sense.
Usually it's a moot point, but in the context of the post it isn't. Python, Lua and JavaScript are compiled to bytecode / intermediate representations. That's compilation, but usually not mentioned, as it's so fast. However, in this context, that speed is the exact point. Use interpreted languages if compilation times (or lack thereof) are a concern!
The lines are blurry though. C# and Java are also interpreted languages (compiled to intermediate representation, then executed on the language runtime aka VM; possibly JITed to native machine code), but no one calls them that.
There is a very clear difference between compilation and interpretation: in scripting languages, a statement does not depend on the following ones to be executed, meaning that the code can just be read through. Compilation is different because the binary produced need to include the whole program, therefore the whole code has to be read and analyzed before the bytecode can be produced. Compile litterally means "group together".
So compilation time obviously only applies to compiled languages.
Saying that the compilation time is not a concern to you and shouldn't be to anyone is also very short sighted. I'm pretty confident that you would consider it to be a problem if a "Hello world" took a week to compile. Maybe you meant "Compilation is fast enough for me", which is fine, but there is no reason why your speed expectation should be more legitimate that anyone else's.
There is not a clear differentiation. For one thing just because the primary implementation is a compiler doesn't mean you can't also have an interpreter.
There is no requirement that a compiler has to produce a binary. Compilers can also produce byte code or JavaScript or C code.
Also very few languages are purely interpreted.
Python, Ruby, and Pearl aren't (purely) interpreted. They're compiled
to bytecode, then that bytecode is interpreted. At least for primary version there's also jvm/.net versions of python and ruby and pypy(jit). Compilation happens at import/run time. Emscripten uses llvm to compile c/c++/rust into JavaScript.
Java and C# are recompiled into byte code which is then interpreted or jit compiled or both.
> in scripting languages, a statement does not depend on the following ones to be executed, meaning that the code can just be read through
I'm pretty sure this is nonsense and is not a consistent way to distinguish languages that are considered scripting languages and languages considered compiled languages.
> Compile litterally means "group together".
Maybe so, but as a technical term in computer science, it means translate from one language to another.
That said, I definitely agree that compile times matter.
> There is a very clear difference between compilation and interpretation: in scripting languages, a statement does not depend on the following ones to be executed, meaning that the code can just be read through.
But most scripting languages are usually compiled (even if not to always AOT and not always to native code, but instead to bytecode of some kind), so how can a difference between compilation and interpretation, even if it was a valid one, be relevant here?
Bytecode encoding an interpreted language is only superficially similar to compilation.
After all you can't perform meaningful optimizations when the environment your code is running completely defines what the code does.
Sure you don't know the details of how syscalls work but you do know the exact process necessary to call a function in compiled languages. Interpreted languages tend to just emit the original names in the bytecode because you don't know want the function `sin` when that is environment dependent.
> Bytecode encoding an interpreted language is only superficially similar to compilation.
No, its literally compilation, not something superficially similar to it.
> After all you can’t perform meaningful optimizations when the environment your code is running completely defines what the code does.
The environment your code is running on completely defines what the code does in all cases, and yet, meaningful optimization is possible, and bytecode for abstract/virtual machines are common for both things both for things that are commonly called scripting languages and things that generally are not. (Sometimes, the implementation of an abstract/virtual machine may involve another, JIT, compiler to some other – maybe native, maybe not – target.)
It’s true a lot of the languages called “scripting” languages have semantics that are structurally very difficult to optimize, but that has nothing to do with “interpreted vs. compiled”.
I beg you, please don't speak with such certainty about definitions which you're not certain of, particularly when you've already been corrected. Take the opportunity to learn something new. Dragonwriter is using exactly the definition that people who work in this area use.
My copy of the red dragon book defines a compiler as a program for translating from one language to another. A translator from source to bytecode clearly meets this definition. If you have a problem with that definition then you're on pretty shaky ground since you're essentially making the argument that you know more about this sort of thing than Aho, Lam, Sethi and Ullman.
The distinction between a compiler and an interpreter is most clearly illustrated by old school (pre-JIT) java, where javac is clearly a compiler which translates Java source to Java bytecode (language -> language) and the java executable itself which interprets bytecode.
That distinction still exists in most interpreted language implementations, though not all. Python being a good example. It has a compiler which translates source to bytecode and an interpreter which executes the bytecode. They just both exist in the same program but crucially it certainly includes a compiler.
It was that there needs to be a distinct target language.
Encoding the source language in a more compact form without meaningfully changing it beyond superficially isn't compilation.
Like a web server that compresses JavaScript isn't a compiler.
My point was anything that outputs something that has to be interpreted is on sketchy grounds: if you are interpreting it how is that a distinct language from the original.
CPython can interpret py files or compile them into pyc.
CPython does not interpret the resulting bytecode. It executed on its virtual machine that bytecode.
Similarly Java and .NET languages are compiled into a bytecode which is executed on a virtual machine.
You don't interpret a compiled output.
The confusion comes from languages which support both like Python. We call Python interpreted as a shorthand for "can be interpreted" but modern Python often is compiled for the enormous performance benefits.
To reiterate you don't interpret bytecode you translate or execute it on a virtual machine.
No, I haven’t. Compilation is well-defined (unlike, really, "scripting language", though there are clearly languages that are frequently called that despite the absence of any coherent definition bounding the class), and its unquestionable that many implementations of languages usually called "scripting languages" use it.
> and are wondering why there is uncertainty…
No, I’m not “wondering” anything, and “uncertainty” isn’t even part of the discussion.
> in scripting languages, a statement does not depend on the following ones to be executed
I agree with most of your comment, but how do you explain languages with end-of-block markers? E.g. in Ruby, if you stop execution before the end you will surely get an error:
I wouldn't say compile time doesn't matter, just that it's highly non-linear. The breaking points are:
- You can restart the program as you're typing it. I've been looking at Elixir lately, and if I find a bug I can fix it and immediately look at a page that's got my change.
- You can compile it and it will be done before you async to something else, so you aren't waiting long. Typically you also have a good idea as to whether it will compile.
- You hit compile, and you know you have to find something else to do. Sometimes it will break when you're doing this other thing and the time lost is even greater. I think this used to be normal in the punch-card era.
> Rust's cuts down on total development time because of the extensive checks it performs
Plus, there's rust-analyzer/LSP. It works in real-time, and catches basically everything one might care about, no compiler needed. It's kind of like a static interpreter.
My use cases are modest, but so far, once r-a was happy, the compiler was as well. Mileage might vary when using crate features, possibly? Inactive features are (until Rust 1.72 at least) invisible to the compiler so issues might arise only at compile-time there.
> I'd rather correct tests than fast tests, just like I'd rather a thorough compilation over a fast one.
The first one is a false dichotomy though. It's not like people write fast but incorrect tests whenever given a keyboard.
> I also don't think tests should be blazing fast. I'd rather correct tests than fast tests, just like I'd rather a thorough compilation over a fast one.
That's a false dilemma, and a poorly thought out strawman.
The goal is not to pick between fast tests and correct tests. The goal is to pick between correct tests that are fast, and correct tests that are slow. There is absolutely zero reasons to prefer slow tests.
A tie between the Linux kernel and LLVM (specifically Clang). Also making game engines gets pretty hefty quite quickly. That's just open source work though. I've worked on larger systems at some Big Tech that were unwieldy, too.
And at no point were you bothered by the timewaste C/C++ compiling (and especially, unavoidable linking!) brings to your work process, CI process and general ability to quickly debug and iterate on a hard problem?
It depends on the problem. For some problems where you’re spending time mostly thinking, and relatively little time actually typing code, the compile times don’t really matter that much.
For LLVM for example, with clangd configured to do code completion and syntax checking in the editor, you don’t really have to do as many build/test cycles.
That said, I think fast compile times are important for the other use cases.
I am a bit surprised about this as well, since my perspective on a Rust comparison to C++, one of the more noticable issues is the extra verbosity rather than compile time.
You're right about switching to interpreted languages for this reason as well. Python is so incredibly simple to write, and doesn't have any compile time at all.
I think a better trade off is a language that has a long compile time, is very fast and secure, but has simpler syntax.
Effectively a rust based language with simpler syntax.
I recognize some of these exist, but they need to be popular enough to use as well.
I don't think the compile times of Go can be even compared with Rust or any major compiled language.
It does compile extremely quickly; Rust on the other end lies completely opposite on that spectrum IMO.
Although, both languages are "modern", both have their pros & cons.
Although this may sound like a pink elephant, but I would really want something that has the guard-rails, expresiveness & performance of Rust with the simplicity & compile-time speed of Go.
You are looking at only the language, you should also evaluate the toolchain and runtime. Go has a toolchain that most other languages (including Rust) can only dream of.
> Although this may sound like a pink elephant, but I would really want something that has the guard-rails, expresiveness & performance of Rust with the simplicity & compile-time speed of Go.
> Compilers involve giant tree structures, with lots of allocations, pointer chasing, and data-dependent multi-way branches.
Tree structures and data-dependent multi-way branches do seem inevitable to me, but pointer chasing and allocations don't seem inevitable at all. There are lots of ways to avoid allocations or make allocations practically free. Similarly there are lots of ways to make pointer chasing less prevalent, especially getting down to single-indirection.
rustc may not be able to fix their allocation or pointer chasing problems, but this is not endemic to the problem of compiling.
No but it is endemic to how we think about compilers up to now, and especially when the rust compiler arch was massively developed, more than 10 years ago.
I recommend a watch of this for a recent take on trying to change how we think of these. It is an open research field
I’m impressed that this is the original code base dating back to the first lines of self-compiling Rust. They’ve managed to rearchitect and refactor it multiple times instead of rewriting it.
Why would they use SipHash at all? That’s a significantly slower hash algo than, say, wyhash, and any additional DOS protection SipHash might have (assuming it does) seems irrelevant for this use case
> Compilers involve giant tree structures, with lots of allocations, pointer chasing, and data-dependent multi-way branches. All the things that modern microarchitectures handle poorly. There is research into different designs that avoid these problems, but they involve data structures that can be highly non-ergonomic.
A lot of graph algorithms are being expressed in matrix form. Like the current ML stuff.
How much is slow compilation caused by the compiler, and how much is it caused by the crates ecosystem and the many, many, many dependencies a project gains?
I'm genuinely curious because a couple of projects I have written are not that large per se, but I don't like their compilation times too much. What I have observed is that I have ended with 100-200 (indirect) dependencies, and compilation slows down the closer it gets to the "leaf" of the process. In other words: by the time compilation reaches my own code, my code seems to take much longer to compile relative to the intermediate (larger) crates. And it's not link time (I've switched to mold).
Most of those are trivially paralllelized so it's not a problem. It gets slower towards the end because e.g. your crate needs all of its dependencies in place before it can be compiled.
This brings up a potentially huge optimization waiting to happen. When you have a Rust dependency, you compile the whole crate before you can even start compiling your code. But because of the depth of the ecosystem, there is a lot of code packaged into crates that may not need to be compiled...the crate might only used for a single function. There are probably dozens of ways to do this, but a quick parse and analysis of the target code could eliminate a large amount of dependency compilation.
I do know there is vague / high-level interest in this kind of approach though I expect this would require a lot of work from the compiler, especially when it comes to figuring out what traits need to be compiled.
Already, rust splits crate compilation into two passes where dependents are only blocked on the first pass and linking on the second. Maybe an intermediate is to do the first pass everywhere and then use that to decide what can be dropped from the second pass.
traits or not, isn't it all compiled down to impls? and if some are not called, they can be dropped (dead code elimination basically)? of course this would require tracking the full call graph, right? (which the compiler might or might not do already; or might not be able to do due to memory constraints.)
> This is a fun one. @Dragon-Hatcher created a very unusual Rust program: a chess engine that computes things at compile time only using Rust’s Turing-complete trait system. (They also have a TypeScript version of the same program.) This causes vast numbers of trait obligations to be generated for the compiler’s trait solver. In this PR I made some small changes to a de-duplication step that runs on collections of these obligations, which reduced both the compile time and peak memory usage of this crate by almost 7x! It also gave icount wins of up to 4% across a number of secondary benchmarks.
Nothing in this blog post says this, or even implies it. Just because expecting a 10x improvement is unrealistic doesn't mean that it's impossible for the Rust compiler to get "much" faster for a more reasonable definition of "much" (say, 2x). He does say that there likely won't be some step change, but rather the result of lots of small improvements over a long period of time.
Yeah, just give the author a time machine so that they can post the same blog at the end of September.
Jokes aside, this is not a guide on how to tweak your compiler settings or stuff like that, but rather it's about changes in the compiler source code that just make it faster for everyone. You just need to wait 6-12 weeks for them to come to stable though.
I'm a member of Futurewei's Rust team, and I'm in the Compiler Contributors group.
I started these posts back in 2016 when I wasn't a member of either. It's been a long-running series, and there's never been much reason for them to go on the official Rust blog, because they get enough attention on my personal blog.
In the past I have posted links to Hacker News but usually they don't make the front page and get few if any comments, so I stopped bothering, but others sometimes do. I always post to /r/rust where the level of discussion tends to be higher-quality than HN, because there is a higher level of Rust knowledge.
> The compiler is a large, mature project. It’s well over 10 years old, and hundreds of thousands of lines. Projects like that tend to not get quantum leaps in performance. Many small improvements is more typical.
And
> And it’s hard to radically change the data structures used in a mature compiler.
As the author describes, the compiler is likely fairly well optimized for its existing way of doing things. 10x improvements would come from architectural changes and it sounds like the compiler is so big now that this is difficult. The other piece that makes it difficult is that LLVM is a separate also large project.
An example of an architectural change I think hasn’t been investigated enough is in the space of PGO. While compilers use PGO to choose better heuristics, they don’t use that to control optimization levels applied to various code. Most programs have performance critical compute paths, but the rest of the program is no where near that. I feel like dropping expensive optimization passes for “cold” code would provide significant savings and Rust already lets you annotate cold/hot functions. Would be nice if it combined it with call flow analysis to build something like audio ABR but for optimization.
Anyway, such things are difficult research projects to attempt on existing large codebases. I believe the author that further meaningful performance wins are unlikely without deeper architectural changes. This is very similar to comments that Lemire makes in [1] and is well understood to be the same (often misunderstood) critique as Knuth made many decades ago about premature optimization. Architecture trumps code tuning.
[1] https://lemire.me/blog/2023/04/27/hotspot-performance-engine...