Monday, May 05, 2008

A functional visitor pattern

When your data definitions get big enough, it becomes Tedious and Error-Prone (oh my!) to write multiple operations over the same datatype, so it's useful to define a generic traversal. Here's a pattern I used recently for the ECMAScript Edition 4 reference implementation.

Lots of languages have mutually recursive AST definitions. For example, take a language with mutually recursive expressions Ast.expr, statements Ast.stmt, and declarations Ast.decl. Define an interface VISITOR:
signature VISITOR = sig
datatype cmd = Stop | Skip | Cont

type ('a, 'z) method = 'a * 'z -> 'z * cmd

type 'z visitor = { visitExpr : (Ast.expr, 'z) method,
visitStmt : (Ast.stmt, 'z) method,
visitDecl : (Ast.decl, 'z) method }

val foldExpr : 'z visitor * Ast.expr * 'z -> 'z
val foldStmt : 'z visitor * Ast.stmt * 'z -> 'z
val foldDecl : 'z visitor * Ast.decl * 'z -> 'z
end;
A visitor is a record of operations, one for each variant of AST nodes. (Alternatively, we could have defined a single type datatype ast = Expr of ... | Stmt of ... | Decl of ...;. But there's no need for the extra indirection.) Each operation takes a node and an intermediate result, and produces a new intermediate result along with a "traversal command" cmd.

This simple traversal language instructs the fold either to continue (Cont), to skip all descendants of the current node (Skip), or to abort the traversal entirely (Stop).

The implementation of the fold uses a few auxiliary functions to compute a list of child nodes for each node:
type children = Ast.expr list * Ast.stmt list * Ast.decl list

fun exprChildren (e : Ast.expr) : children = ...
fun stmtChildren (e : Ast.stmt) : children = ...
fun declChildren (e : Ast.decl) : children = ...
The fold itself is (internally) in CPS. At each node, there are three potential continuations: the empty continuation for aborting, the given continuation for skipping the current node's children, or a new continuation that first traverses the current node's children and then uses the given continuation.
type 'z cont = 'z -> 'z

fun cont (result : 'z, cmd : cmd)
(skipk : 'z cont)
(contk : 'z cont)
: 'z =
case cmd of
Stop => result
| Skip => skipk result
| Cont => contk result

fun foldExprK (v as { visitExpr, ... }, expr, init) k =
cont (visitExpr (expr, init))
(fn x => foldAllK v (exprChildren expr) x k)
k

and foldStmtK (v as { visitStmt, ... }, stmt, init) k =
cont (visitStmt (stmt, init))
(fn x => foldAllK v (stmtChildren stmt) x k)
k

and foldDeclK (v as { visitDecl, ... }, decl, init) k =
cont (visitDecl (decl, init))
(fn x => foldAllK v (declChildren decl) x k)
k

and foldAllK (v : 'z visitor)
((exprs, stmts, decls) : children)
(init : 'z)
(k : 'z cont)
: 'z =
let
fun foldList f (v, asts, init) k =
case asts of
[] => k init
| [ast] => f (v, ast, init) k
| ast::asts => f (v, ast, init)
(fn x => foldList f (v, asts, x) k)
in
foldList foldExprK (v, exprs, init) (fn result =>
foldList foldStmtK (v, stmts, result) (fn result =>
foldList foldDeclK (v, decls, result) k))
end
I originally had foldList defined at the top-level, in the same mutual recursion group with foldExprK et al, and got smacked by SML because that would require polymorphic recursion. Luckily I was able to work around it by placing its definition inside the body of foldAllK, but since I'd never run into SML's lack of polymorphic recursion before, it took quite a while to decipher the type error (with help from #sml).

Finally, the top-level functions seed the CPS functions with an initial continuation:
fun id x = x

fun foldExpr x = foldExprK x id
fun foldStmt x = foldStmtK x id
fun foldDecl x = foldDeclK x id
(Note that because of the value restriction, there's no possibility of refactoring these into point-free val-declarations.)

5 comments:

Vesa Karvonen said...

The foldList function does not seem to be mutually recursive with the other fold functions. So, it should not be defined in the same "mutual recursion group". I have observed that ML newbies tend to overuse "and" (in both SML and OCaml) in both value and type definitions. I would prefer that ML compilers would (optionally) give warnings for such redundant uses of "and".

Note that typing the code does not really require polymorphic recursion. It would be sufficient to destructure a mutual recursion group into strongly connected components and type them independently. Specifying typing that way would be somewhat more complicated (than what the spec in the Definition of SML) and might cause complications with lexically scoped type variables.

I personally very much like the way ML (both SML and OCaml) force one to be (more) explicit about recursion (than most languages). I find that it helps to increase my understanding of the code. I even prefer OCaml's explicit "let rec" for defining recursive functions to SML's "fun", which is implicitly recursive. I like the explicitness, recursive functions are quite rare, so it is not a great burden, and having non-recursive bindings by default makes it more convenient to layer definitions.

I would also recommend looking at Neil Mitchell's Uniplate library (read the paper). It can be used to define folds and I've already translated a part of Uniplate to SML as a part of my generics library. Here is an example of using Uniplate in SML.

Kenn Knowles said...

Interesting! I'll ponder this. From the masses of "Scrap your boilerplate" and related work I'd point you to work by Jeremy Gibbons:

The Essence of the Iterator Pattern
and before that,
Design Patterns as Higher-Order Datatype-Generic Programs

Kenn Knowles said...

Oops, I overlooked
The Essence of the Visitor Pattern by Palsberg.

Dave Herman said...

Thanks guys. I'm relatively familiar with all the SYB stuff, but my sense was that I could have finer-grained control if I wrote my own custom visitor. I'm also familiar with Palsberg's paper.

This implementation was somewhat inspired by Buchlovsky and Thielecke's A Type-Theoretic Reconstruction of the Visitor Pattern. The short story isn't all that surprising (though it is beautiful!) -- the visitor pattern is a fold, as is the Boehm-Berarducci representation of algebraic datatypes.

rduht said...

if wow gold and maple story mesos wow gold