Monday, July 11, 2005

Deriving an Efficient Tree Traversal, Part II

In the last episode, we tried representing control with continuations. But continuations are in fact just one implementation of an abstract datatype of contexts. To understand the kinds of contexts we encounter in this problem, let's look at all the ways continuations get constructed in the CPS version:
(define (select/k tree path)
...
[(matches? tree (car path))
(select/k tree (cdr path)
(lambda (ts1)
(select*/k (children tree) path
(lambda (ts2)
(k (append ts1 ts2))))))]
[else (select*/k (children tree) path k)]))

(define (select*/k trees path k)
(cond
...
[else (select/k (car trees) path
(lambda (ts1)
(select*/k (cdr trees) path
(lambda (ts2)
(k (append ts1 ts2))))))]))
In essence, there are two types of context frames, describing the work we have left to do: frames containing a tree and a path, describing a future application of select/k, and frames containing a list of trees and a path, describing a future application of select*/k. The bolded parts of the continuations are the only parts that rely on dynamic data; the rest is just boilerplate, which can be abstracted into the datatype of contexts.

Abstract Datatype: Contexts

The abstract datatype of traversal contexts includes (as always) the initial context, as well as compound contexts containing a base context and a context frame. Context frames in turn are either tree frames, containing a tree and a path, or tree list frames, containing a list of trees and a path. That is:
context ::= [] | context-frame ⋅ context
context-frame ::= tree × path | (listof tree) × path
We need two operations on contexts: one to build compound contexts, and one to apply an existing context.1
cons-context : context-frame × context → context
apply-context : context → (listof tree)
Our latest version of the program can now operate on these datatypes independent of their representation:
;; select/c : tree × path × context → (listof tree)
(define (select/c tree path c)
(cond
[(null? path) (apply-context c (list tree))]
[(matches? tree (car path))
(select/c tree (cdr path)
(cons-context (make-context-frame (children tree) path)
c))]
[else (select*/c (children tree) path c)]))

;; select*/c : (listof tree) × path × context → (listof tree)
(define (select*/c trees path c)
(cond
[(null? trees) (apply-context c null)]
[else (select/c (car trees) path
(cons-context (make-context-frame (cdr trees) path)
c))]))

Final Algebra Representation

We can recover the CPS implementation by representing the constructors of the context datatype (cons-context, make-context-frame, initial-context) as functions, and the observer of the datatype (apply-context) as function application. This is the so-called final algebra representation of an abstract datatype: all the intelligence is in the constructors.
;; context-frame = (context → context)
;; context = (listof tree) → (listof tree)


;; make-context-frame : (union tree (listof tree)) × path → context-frame
(define (make-context-frame work path)
(cond
[(tree? work) (lambda (k)
(lambda (ts1)
(select/c work path
(lambda (ts2)
(k (append ts1 ts2))))))]
[else (lambda (k)
(lambda (ts1)
(select*/c work path
(lambda (ts2)
(k (append ts1 ts2))))))]))

;; cons-context : context-frame × context → context
(define (cons-context frame context)
(frame context))

;; apply-context : context × (listof tree) → (listof tree)
(define (apply-context k trees)
(k trees))

(define initial-context identity)

Initial Algebra Representation

Alternatively, we can represent the constructors of the abstract datatype using a simple disjoint union datatype, and put all the intelligence in the observer. This is the initial algebra representation. The new representations of contexts and frames are:
;; context = (listof context-frame)
;; context-frame = (union tree (listof tree)) × path


(define-struct context-frame (work path))

(define cons-context cons)

(define initial-context null)

;; apply-context : context × (listof tree) → (listof tree)
(define (apply-context ls trees)
(cond
[(null? ls) trees]
[(tree? (context-frame-work (car ls)))
(append trees
(select/c (context-frame-work (car ls))
(context-frame-path (car ls))
(cdr ls)))]
[else
(append trees
(select*/c (context-frame-work (car ls))
(context-frame-path (car ls))
(cdr ls)))]))
This is looking promising: now we can represent contexts with simple tuples and unions, and without the use of higher-order functions.

To be concluded...
1 In general, the return type of applying a context could be polymorphic, but for our purposes, all computations will return lists of tree nodes.

3 comments:

Felix said...

Going from a traversal over a datatype to the structure of the context frames reminds me a lot of Connor McBride's Derivatives of Regular Types.

See diff.ps.gz by Connor himself, or the later work as discussed on LtU.

Dave Herman said...

Thanks for making the connection. Another paper that seems related is Huet's Linear Contexts and the Sharing Functor (slides), which is very interesting but so far I haven't understood. Although it's not in general possible to decide when you can replace operations on a functional data structure with destructive updates (as I do in this derivation), I wonder if starting from a simple datatype, a compiler could derive efficient traversals by choosing to implement them linearly. This may be one of the topics in Huet's paper, but I'm not sure.

Felix said...

A "problem" with the analogy (which I found while I was trying to find the derivative of the datatype describing your DOM tree) is that his one-hole contexts may not exactly match up with (y)our evaluation contexts.

For example, his (deriv (w.r.t. X) of [Listof X]) yields a [Listof X] times [Listof X]. This is appropriate for Conor's context; the one-hole context for a list-element is a pair of lists (the prefix and the suffix). See page 7.

However, the evaluation context when traversing a [Listof X] is not a pair of lists, but instead just (isomorphic to) a [Listof X]; This is because in the traversal, the "suffix" is part of the redex, not the context.

Perhaps I am making a mistake in relating these particular two problems. Reducing a whole list is different from looking only at the individual elements (since reduction allows us to combine elements together, rather than looking at them in isolation).