My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
If a human expert were to write a compiler not necessarily designed to match GCC, but provide a really good balance of features to complexity, they'd be able to make something much simpler. There are some projects like this (QBE,MIR), which come with nice technical descriptions.
Likewise there was a post about a browser made by a single dude + AI, which was like 20k lines, and worked about as well as Cursor's claimed. It had like 10% of the features, but everything there worked reasonably well.
So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
> My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
Per the article from the person who directed this, the user directed the AI to use SSA form.
> However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
That is quite possibly true, but presumably at least in part reflects the fact that it has been measured on completeness, not performance, and so that is where the compiler has spent time. That doesn't mean it'd necessarily be successful at adding optimisation passes, but we don't really know. I've done some experiments with this (a Ruby ahead-of-time compiler) and while Claude can do reasonably well with assembler now, it's by no means where it's strongest (it is, however, far better at operating gdb than I am...), but it can certainly do some of it.
> So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
Yes, it absolutely is, but the point in both cases was to test the limits of what AI can do on their own, and you won't learn anything about that if you let a human intervene.
$20k in tokens to get to a surprisingly working compiler from agents working on their own is at a point where it is hard to assess how much money and time you'd save once considering the cleanup job you'd probably want to do on it before "taking delivery", but had you offered me $20k to write a working C-compiler with multiple backends that needed to be capable of compiling Linux, I'd have laughed at the funny joke.
But more importantly, even if you were prepared to pay me enough, delivering it as fast if writing it by hand would be a different matter. Now, if you factor in the time used to set up the harness, the calculation might be different.
But now that we know models can do this, efforts to make the harnesses easier to set up (for my personal projects, I'm experimenting with agents to automatically figure out suitable harnesses), and to make cleanup passes to review, simplify, and document, could well end up making projects like this far more viable very quickly (at the cost of more tokens, certainly, but even if you double that budget, this would be a bargain for many tasks).
I don't think we're anywhere near taking humans out of the loop for many things, but I do see us gradually moving up the abstraction levels, and caring less about the code at least at early stages and more about the harnesses, including acceptance tests and other quality gates.
You misunderstand me - first, almost all modern compilers (that I know of) use SSA, so that's not much of a thing you need to point out. The point I was making, is that by looking at the assembler, it seems the generated code is totally unoptimized, even though it was mentioned that Claude implemented SSA opt passes.
The generated code's quality is more inline with 'undergrad course compiler backend', that is, basically doing as little work on the backend as possible, and always doing all the work conservatively.
Basic SSA optimizations such as constant propagation, copy propagation or common subexpression propagation are clearly missing from the assembly, the register allocator is also pretty bad, even though there are simple algorithms for that sort of thing that perform decently.
So even though the generated code works, I feel like something's gone majorly wrong inside the compiler.
The 300k LoC things isnt encouraging either, its way too much for what the code actually does.
I just want to point out, that I think a competent-ish dev (me?) could build something like this (a reasonably accurate C compiler), by a more human-in-the-loop workflow. The result would be much more reasonable code and design, much shorter, and the codebase wouldn't be full of surprises like it is now, and would conform to sane engineering practices.
Honestly I would certainly prefer to do things like this as opposed to having AI build it, then clean it up manually.
And it would be possible without these fancy agent orchestration frameworks and spending tens of thousands of dollars on API.
This is basically what went down with Cursor's agentic browser, vs an implementation that was recreated by just one guy in a week, with AI dev tools and a premium subscription.
There's no doubt that this is impressive, but I wouldn't say that agentic sofware engineering is here just yet.
However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
If a human expert were to write a compiler not necessarily designed to match GCC, but provide a really good balance of features to complexity, they'd be able to make something much simpler. There are some projects like this (QBE,MIR), which come with nice technical descriptions.
Likewise there was a post about a browser made by a single dude + AI, which was like 20k lines, and worked about as well as Cursor's claimed. It had like 10% of the features, but everything there worked reasonably well.
So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.