I'm building Mochi, a small programming language with a custom VM and a focus on querying structured data (CSV, JSON, and eventually graph) in a unified and lightweight way.
It started as an experiment in writing LINQ-style queries over real datasets and grew into a full language with:
- declarative queries built into the language
- a register-based VM designed for analysis and optimization
- an intermediate representation with liveness analysis, constant folding, and dead code elimination
- static type inference, inline tests, and golden snapshot support
Example:
type Person {
name: string
age: int
}
let people = load "people.yaml" as Person
let adults = from p in people
where p.age >= 18
select { name: p.name, age: p.age }
for a in adults {
print(a.name, "is", a.age)
}
save adults to "adults.json"
The long-term goal is to make a small, expressive language for data pipelines, querying, and agent logic, without reaching for Python, SQL, and a half-dozen libraries.
Happy to chat if you're into VMs, query engines, or DSLs.
The expected output would be a file with the name "adults.json" containing XML data. I don't see much benefit in this specific use case but I do have a 'code smell' in having the language automagically determine the output structure for me.
It's been great for quickly filtering and transforming structured data like CSV and JSON. Optimizing the VM is fun too, though it sometimes comes at a cost, we once broke around 400 tests after adding peephole optimizations that changed how the IR handled control flow.
Interesting project. I'm quite interested in developing a small programming language myself, but am not sure where to start. What resources do you recommend?
Crafting Interpreters https://craftinginterpreters.com is a super friendly, step-by-step guide to building your own language and VM, looking forward to seeing what kind of language you come up with too!
The concepts that the OP talks about (liveness analysis, constant folding, dead code elimination), and similar stuff revolving around IR optimization, can be found explained in great detail in Nora Sandler's "Writing a C compiler".
I'm building Mochi, a small programming language with a custom VM and a focus on querying structured data (CSV, JSON, and eventually graph) in a unified and lightweight way.
It started as an experiment in writing LINQ-style queries over real datasets and grew into a full language with:
- declarative queries built into the language
- a register-based VM designed for analysis and optimization
- an intermediate representation with liveness analysis, constant folding, and dead code elimination
- static type inference, inline tests, and golden snapshot support
Example:
The long-term goal is to make a small, expressive language for data pipelines, querying, and agent logic, without reaching for Python, SQL, and a half-dozen libraries.Happy to chat if you're into VMs, query engines, or DSLs.