But really in my experience the best way to get better at compilers (I can't claim to be a wizard) is to just build a goddamn compiler. Start by writing a parser (you can crib from Crafting Interpreters), writing the simplest possible typechecker for arithmetic (crib from Hindley Milner), then a code generator to whatever target you want. Code generation is tricky, but if you're generating arithmetic, it's really not that bad.
If at any point you get confused or have no clue what to do, take a deep breath and guess! Yep, just guess how to do it. I've done this quite a few times and later learned that my guess was really a dumbed down version of idk, solving the AST typing problem, or lambda lifting, or whatever. If that's too hard, scroll through other compilers and figure out what they do.
Once you get something working, celebrate! It's pretty cool seeing even arithmetic end up as code generated and run.
Then start adding other features! Add local variables. Add nested functions. Add whatever.
The secret is to treat a compiler as a challenging software development project. Because that's what it is. Start writing code and figure it out.
Granted this is not great advice if you want a usable compiler at the end. I'm just trying to learn, not make something for people to use.
Exactly. Just start from the beginning and work through each problem as you get to it. Keep the scope small and iterate. You'll probably find yourself wanting to start from scratch a lot once you learn how to do it better, that's ok.
I'm writing up a tutorial for my students right now. It is just a stripped down dialect of BASIC. I think seeing the entire compiler process from start to finish on a tiny language is what helped me the most.
Thanks! First time reading Byte mag. Kinda almost half century late :)
The article is written with amazing (these days) clarity. Kinda coincided with my current interest in LLVM. Nice to see this article also be coming from UofI Urbana-Champaign grads.
Reportedly the article series misses implementation of one component - run-time, which is needed in order to fully replicate it now. Well, anyway, reading on!
Type checkers are actually surprisingly easy to write. I recommend learning about unification (in the logic sense), maybe just by playing around with Prolog. After you understand that I think type checking is rather easy to figure out (it's literally just unification).
I think it depends on whether you have type inference or not. If you have type inference, then it's like unification. If you have explicit type annotations, it's even simpler than that.
Nominal sub typing is hard from trivial and calling it unsafe is non-sensical. Structural subtyping yields poor error messages and does not support encapsulation very well, especially if you also want separate compilation.
Imperative languages without type inference (for variables and functions, that is) work from the bottom up. Symbols (variables, functions) have declared types, constants have known types. Operators have overloads, and depending on the arguments a particular overload is chosen which best matches, and its return type is the type of the operator expression. Type conversions may need to occur along the way to make the arguments to operators or functions fit.
The hairiest thing is usually overload resolution (both operator and function, if implemented). Determining the set of applicable overloads may depend in the types of the arguments, scoping rules, or generic instantiation if it's in the language, and determining the best overload depends on weighing up coercions, subtyping and polymorphism. Simple rules may lead to rejection of overload selection as ambiguous in situations where programmers don't want to define more overloads.
I fully agree. I wrote my elements compilers this way. Slowly from a tokenizer, parser up to .net backend, Java backend. Bit by bit. The only way to learn is by doing.
I'd even go as far as saying the literature on the subject is overrated or out of date.
And not everything is set in stone. Not all compilers have a single symbol table. Sometimes it makes sense to have a type/global table and local table, and check the current class scope on the fly. Sometimes not. It doesn't really.
Abstraction doesn't make sense until you actually have multiple implementations. And you might have rewritten it four times by then.
As an alternative to Hindley-Milner, also consider a compositional type system (https://unsafePerform.IO/projects/talks/2016-06-compty/CompT... it should be a very good match for the kind of simple HM type systems that you're going to encounter while learning about compilers.
I second your recommendation of Crafting Interpreters. Not only is it free, but it's an excellent way to learn tokenization, parsing, and tree walking, it's a lot of fun to read.
* Lambda calculus evaluator
* Hindley Milner typechecker for lambda calculus
* Stack based calculator (extend with variables)
* Play around with macros
* Work through Crafting Interpreters
But really in my experience the best way to get better at compilers (I can't claim to be a wizard) is to just build a goddamn compiler. Start by writing a parser (you can crib from Crafting Interpreters), writing the simplest possible typechecker for arithmetic (crib from Hindley Milner), then a code generator to whatever target you want. Code generation is tricky, but if you're generating arithmetic, it's really not that bad.
If at any point you get confused or have no clue what to do, take a deep breath and guess! Yep, just guess how to do it. I've done this quite a few times and later learned that my guess was really a dumbed down version of idk, solving the AST typing problem, or lambda lifting, or whatever. If that's too hard, scroll through other compilers and figure out what they do.
Once you get something working, celebrate! It's pretty cool seeing even arithmetic end up as code generated and run.
Then start adding other features! Add local variables. Add nested functions. Add whatever.
The secret is to treat a compiler as a challenging software development project. Because that's what it is. Start writing code and figure it out.
Granted this is not great advice if you want a usable compiler at the end. I'm just trying to learn, not make something for people to use.