Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some other good projects:

* Lambda calculus evaluator

* Hindley Milner typechecker for lambda calculus

* Stack based calculator (extend with variables)

* Play around with macros

* Work through Crafting Interpreters

But really in my experience the best way to get better at compilers (I can't claim to be a wizard) is to just build a goddamn compiler. Start by writing a parser (you can crib from Crafting Interpreters), writing the simplest possible typechecker for arithmetic (crib from Hindley Milner), then a code generator to whatever target you want. Code generation is tricky, but if you're generating arithmetic, it's really not that bad.

If at any point you get confused or have no clue what to do, take a deep breath and guess! Yep, just guess how to do it. I've done this quite a few times and later learned that my guess was really a dumbed down version of idk, solving the AST typing problem, or lambda lifting, or whatever. If that's too hard, scroll through other compilers and figure out what they do.

Once you get something working, celebrate! It's pretty cool seeing even arithmetic end up as code generated and run.

Then start adding other features! Add local variables. Add nested functions. Add whatever.

The secret is to treat a compiler as a challenging software development project. Because that's what it is. Start writing code and figure it out.

Granted this is not great advice if you want a usable compiler at the end. I'm just trying to learn, not make something for people to use.



> in my experience the best way to get better at compilers (I can't claim to be a wizard) is to just build a goddamn compiler.

Yup. In fact, the best way to start is to build a desk calculator, where you type in strings like:

    4 + 6
and it gives you back 10. There was one in the old K+R book. A compiler just scales that up.


For context, parent wrote a C++ compiler and invented the D programming language.


Exactly. Just start from the beginning and work through each problem as you get to it. Keep the scope small and iterate. You'll probably find yourself wanting to start from scratch a lot once you learn how to do it better, that's ok.

I'm writing up a tutorial for my students right now. It is just a stripped down dialect of BASIC. I think seeing the entire compiler process from start to finish on a tiny language is what helped me the most.


I got my start from the Tiny Pascal compiler listing from BYTE Magazine in the 1970's. It was a complete compiler and interpreter.



Thanks! First time reading Byte mag. Kinda almost half century late :)

The article is written with amazing (these days) clarity. Kinda coincided with my current interest in LLVM. Nice to see this article also be coming from UofI Urbana-Champaign grads.

Reportedly the article series misses implementation of one component - run-time, which is needed in order to fully replicate it now. Well, anyway, reading on!

Btw: HN thread on Tiny Pascal from 2018 https://news.ycombinator.com/item?id=17220507


Thank you. I was too lazy to look it up.


Always happy to lend context to these things; BYTE was so fantastic.


How do I learn to write a type checker? Got any suggested readings?


Type checkers are actually surprisingly easy to write. I recommend learning about unification (in the logic sense), maybe just by playing around with Prolog. After you understand that I think type checking is rather easy to figure out (it's literally just unification).


I think it depends on whether you have type inference or not. If you have type inference, then it's like unification. If you have explicit type annotations, it's even simpler than that.

https://eli.thegreenplace.net/2018/unification/

https://eli.thegreenplace.net/2018/type-inference/


And if you want nominal sub typing, F-bounded generics, and some inference, things get really fun.


Nominal subtyping is trivial and unsafe. The fun starts with structural subtyping.


Nominal sub typing is hard from trivial and calling it unsafe is non-sensical. Structural subtyping yields poor error messages and does not support encapsulation very well, especially if you also want separate compilation.


Imperative languages without type inference (for variables and functions, that is) work from the bottom up. Symbols (variables, functions) have declared types, constants have known types. Operators have overloads, and depending on the arguments a particular overload is chosen which best matches, and its return type is the type of the operator expression. Type conversions may need to occur along the way to make the arguments to operators or functions fit.

The hairiest thing is usually overload resolution (both operator and function, if implemented). Determining the set of applicable overloads may depend in the types of the arguments, scoping rules, or generic instantiation if it's in the language, and determining the best overload depends on weighing up coercions, subtyping and polymorphism. Simple rules may lead to rejection of overload selection as ambiguous in situations where programmers don't want to define more overloads.


I have a simple implementation of the Hindley Milner Type Inference algorithm in OCaml that should be easy to grok: https://github.com/prakhar1989/type-inference


I fully agree. I wrote my elements compilers this way. Slowly from a tokenizer, parser up to .net backend, Java backend. Bit by bit. The only way to learn is by doing.

I'd even go as far as saying the literature on the subject is overrated or out of date.

And not everything is set in stone. Not all compilers have a single symbol table. Sometimes it makes sense to have a type/global table and local table, and check the current class scope on the fly. Sometimes not. It doesn't really.

Abstraction doesn't make sense until you actually have multiple implementations. And you might have rewritten it four times by then.

My advice like above. Just start writing


As an alternative to Hindley-Milner, also consider a compositional type system (https://unsafePerform.IO/projects/talks/2016-06-compty/CompT... it should be a very good match for the kind of simple HM type systems that you're going to encounter while learning about compilers.

Here's code if you prefer that to slides: https://github.com/gergoerdi/hm-compo


I second your recommendation of Crafting Interpreters. Not only is it free, but it's an excellent way to learn tokenization, parsing, and tree walking, it's a lot of fun to read.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: