Thursday, November 09, 2006

Skolem meets Quine

How's this for the crazy hack of the day? I was writing a little code generator for SML (because as a Schemer in ML-land, it was only a matter of time before I'd start trying to write macros) and I wanted to use quasiquotation for constructing code. (I'm sure Lisp was the first programming language with quasiquotation, although the concept originated from logic, at least dating back to Quine.)

SML/NJ includes a quasiquotation facility, but you have to write your own parser for the object language. In this case, my object language was ML itself. So I wanted to reuse the ML parsing libraries that come with SML/NJ. But a quasiquoted source expression is a list of "fragments"--either a (quoted) string or an (unquoted) term:
datatype 'a frag = QUOTE of string | ANTIQUOTE of 'a
So I needed a way to extend the existing parser to be able to parse not just raw ML source, but ML with holes in it.

My solution is to generate fresh variables at the holes (analogous to "Skolem constants" in a type inference algorithm) to 1) generate a single, parseable string; 2) parse that string using the existing parser; and then 3) replace instances of the fresh variables in the resulting AST with their original unquoted terms. Luckily the parser that comes with SML/NJ (from the MLRISC library) has a generic term-rewriting library so this last step was easy.

Here's the entire parser:
fun parseQuasiML frags =
let fun skolemize [] = ([], [])
| skolemize ((QUOTE s)::frags) =
let val (ss, skolems) = skolemize frags
in
(s::ss, skolems)
end
| skolemize ((ANTIQUOTE x)::frags) =
let val skolem = gensym "qqSkolem"
val (ss, skolems) = skolemize frags
in
(skolem::ss, (skolem, x)::skolems)
end
val (ss, skolems) = skolemize frags
fun exp _ (e as (IDexp (IDENT ([], x)))) =
(case lookup (x, skolems) of
NONE => e
| SOME e' => e')
| exp _ e = e
val NIL = MLAstRewriter.noRewrite
val rw = #exp(MLAstRewriter.rewrite
{exp=exp,pat=NIL,decl=NIL,ty=NIL,sexp=NIL})
in
rw (parseMLExp (String.concat ss))
end
The skolemize function traverses the list of fragments, generating fresh "skolem variables" for each unquoted term. The ML parser then parses the resulting string. The last part (the exp function and MLAstRewriter.rewrite) simply uses the rewriter module to replace any instances of IDexp (an AST representing an ML variable reference) that contain skolem variables with their original, unquoted terms, and otherwise leave the expressions intact.

Update: Bawden suggests that quasiquotation (or at least the term "quasiquotation") is indeed due to Quine.

3 comments:

Sam Tobin-Hochstadt said...

This has an interesting relationship to macros, since the places you can unquote in your system are precisely the places that you could write an expression macro in Scheme.

Dave Herman said...

Interestingly enough, there are two axes on which you can extend this, and you get something more expressive (loosely speaking) than Scheme expression macros: the types of holes and the types of fragments.

For the first axis, you create a datatype of holes with multiple variants representing multiple types of holes:

datatype hole = EXPhole of exp | PAThole of pat

Then you extend parseQuasiML to substitute unquoted EXPholes in expression positions and unquoted PATholes in pattern positions, raising a "dynamic type error" for ill-typed fragments (e.g., ANTIQUOTE (PAThole _) in an expression position).

For the second axis, you extend parseQuasiML to take an extra argument representing which parser you want to use: an expression parser, pattern parser, declaration parser, etc. That way you can quasiquote different types of program terms.

Anonymous said...

Thank you. Recently, I had to deal with quotation/antiquotaion and your posting helped me a lot. Cheers, Wonseok