Wednesday, March 09, 2005

Shriram's Automaton Language

Shriram's educational pearl describing the construction of a macro for an automaton language has the answer to my question last week of whether binding abstractions are really an important property of macros.

Matthias has long promoted the four primary families of applications for macros as
  1. providing cosmetics
  2. introducing binding constructs
  3. implementing control operators (i.e., altering the order of evaluation)
  4. defining data languages
Shriram's paper mentions these as well, and I'm glad he does. Macros are so universally misunderstood, even by their proponents, that it's important to know just what is and isn't important about them. (In particular, inlining code for performance gain is not a killer app of macros!)

His paper deals particularly with the fourth use, defining data languages. And he makes the following insight:
The sub-terms of a macro invocation can contain arbitrary phrases, some of which may not even be legitimate Scheme code. In such cases, the macro construct is essentially creating an abstraction over an implementation choice. For instance, suppose a particular datum can be represented either as a procedure or as a structure.
When an embedded language is implemented as a macro, the user may in fact have no idea what parts are implemented as binding forms in the implementation. The automaton example is a perfect demonstration: you might choose to implement the names of states as symbols in an association list, or you might choose to implement them as binding mutually (tail-)recursive local procedures. In this example, your user shouldn't know what your implementation decision was, and certainly shouldn't have to know.

This is one place where the distinction between macros as language extensions and macros as embedded languages becomes significant. When you add a feature to Scheme, your user needs to know about its binding properties. By contrast, when you implement a different language in Scheme, your user thinks about the semantics of the embedded language, rather than the semantics of its implementation language.