Thursday, February 08, 2007

Lazy substitutions

Substitution is expensive. That's why we tend to use environments in efficient implementations of programming languages. They encode all the information we need to do substitutions, but we wait until as late as possible to perform the substitutions. The complication, however, arises when expressions containing free variables (i.e., variables that haven't yet been substituted) travel around to disparate parts of the program. This happens in the lambda calculus when functions get passed or returned as values, and that's why we have to introduce closures. Closures allow expressions with free variables to travel around with enough information to continue performing their lazy substitutions.

In macro systems, we have exactly the same problem: the substitutions we need to perform (among others) are fresh names for bound variables. We need to substitute these fresh names throughout the bodies of expressions, but it's expensive to revisit syntax. So we close expressions over their substitutions and rename lazily.

But in macro systems, these kinds of closures are even more complicated than in the lambda calculus, because each macro invocation produces more code and mixes it together with other code. What we want is for the renamings to occur only on the code as it appears at the particular time of the current macro invocation. So in Dybvig, Hieb and Bruggeman's macro-expansion algorithm, the solution is to produce a fresh "mark" or "time-stamp" distinguishing code that resulted from a particular macro invocation. This way, renamings are applied only to code that existed at the time that the eager substitution would have been applied. When different code gets mixed together from different macro invocations, these marks keep track of what renamings apply to what pieces of syntax.

There are two salient features required of these marks: 1) every distinct invocation of a macro should have a distinct mark associated with it, and 2) only new code produced by a macro invocation should be marked as new, not code that it received as input. The first requirement is satisfied simply by generating fresh marks on each invocation. The second is satisfied by giving the mark algebra a "toggling" feature: re-applying the same mark twice to a piece of syntax erases the mark. The expander marks both the input and output of each macro invocation, which has the effect of "staining" new syntax generated by the invocation.

No comments: