Tuesday, September 13, 2005

Compiling a language with multiple operations

I've now twice compiled languages with more than just an eval operation on expressions. For example, I wrote an LL(1) parser generator that generated for each expression in the grammar an object with a parse method (which you can think of as eval for grammars) as well as first, follow, and nullable? methods. I've also run into the same issue when working on the Topsl compiler.

But when you generate objects for expressions and you need to model binding in your language, you find an interesting little problem. Imagine compiling a simple functional language into a bunch of expression objects that have two methods: eval and analyze (for some hypothetical static analysis you might like to perform). Now how would you implement the rule for compiling let-expressions?
[[let x = e1 in e2]] =
{ analyze : λ().[[e1]].analyze() ∪ let x = ()
in [[e2]].analyze(),
eval : λ().let x = [[e1]].eval() in [[e2]].eval() }
By binding the variable x in both methods, it can be referred to in the compiled code for e2. In the static analysis, we can bind it to any dummy value, since the analysis presumably doesn't care about the dynamic result of evaluating the expression.

But this results in duplicating the expansion of e1 and e2, which is not just code bloat, it's exponential code bloat. A non-starter.

We could then lift out the generated let to surround the entire object creation, initialize x to a dummy value, and then mutate it within the body of eval. But this introduces brittle dependencies on the order in which the methods can be invoked.

Instead, we can generate one piece of code, specifically, a function from x-value to expression object, and then invoke this function differently from each method:
[[let x = e1 in e2]] =
let f = λx.[[e2]]
in { analyze : λ().[[e1]].analyze() ∪ (f ()).analyze(),
eval : λ().(f [[e1]].eval()).eval() }
Now there are no fragile method invocation interdependencies, and each method can give a different value for x, and each expression is only compiled once.

Update: I wasn't clear that the semantic brackets function [[--]] represents Scheme-style macro-expansion. So I assume all hygiene issues are handled automatically by the expansion process.

4 comments:

Sam TH said...

There's something confusing going on with binding here. When you do

let x = () in [[e1]]

there seems to be something strange happening, where binding x in the meta-language allows it to be used in the evaluation of an object language term (here e1). Is that really what you intended? Don't you think that would lead to insidious bugs (what if I want to evaluate this expression:

let eval = 7 in eval

)?

Dave Herman said...

Sorry, I wasn't very clear about my notation. The [[--]] function is for compilation, and the whole expression represents the output of compilation in the object language. So

let x = () in [[e1]]

is entirely in the object language. This is just a shorthand for what happens in a Scheme macro, where you would have something like

(syntax-rules ()
   [(_ ([x e1]) e2)
     <...> (let ([x #f]) e1) <...>])

Dave Herman said...

Just to clarify further, the compilation is producing a runtime object with two methods, i.e., generating code that at runtime can be used to retrieve an analysis of the program or evaluate the program.

You could certainly imagine another approach where you generate a meta-language object that instead of an eval method has a compile method. Then the analysis would only be usable at compile-time, and at runtime all you would have is the evaluation. But that's not what I'm describing in this post.

Dave Herman said...

is entirely in the object language

Ack! I meant exactly the opposite! It's entirely in the meta-language (i.e., target language), with the [[--]] expressions implicitly unquoted and spliced in.