Monday, July 11, 2005

Deriving an Efficient Tree Traversal, Part III

Previously:

Part I: Overview, specification, CPS
Part II: Contexts, data abstraction

A Small Refolding Simplification

We like our stack representation of contexts (after all, that's the well-known implementation technique we've been aiming for), so let's commit to that from now on. But there's a slight redundancy in the context implementation: notice that the select/c procedure takes a tree, a path, and a context, and the select*/c procedure takes a list of trees, a path, and a context. Those first two parameters for each procedure can also be encoded as context frames. Let's combine the two functions into a single function of just one parameter: the context.

(To distinguish this implementation from the previous version, and to emphasize our commitment to the initial algebra representation, we'll refer to contexts as "work lists" now, and context frames as "work items.")
;; item = (union tree (listof tree)) × path
(define-struct item (task path))

;; initial-work-list : tree × path → (listof item)
(define (initial-work-list tree path)
(list (make-item tree path)))

;; select/w : (listof item) → (listof tree)
(define (select/w items)
(if (null? items)
null
(let* ([item (car items)]
[task (item-task item)]
[path (item-path item)]
[items (cdr items)])
(cond
[(tree? task)
(cond
[(null? path) (cons task (select/w items))]
[(matches? task (car path))
(select/w (cons (make-item task (cdr path))
(cons (make-item (children task) path)
items)))]
[else
(select/w (cons (make-item (children task) path)
items))])]
[(null? task) (select/w items)]
[else (select/w (cons (make-item (car task) path)
(cons (make-item (cdr task) path)
items)))]))))
Now that we've folded both tree and (listof tree) arguments into the work list argument, there's just a single function. By now, we've completely gotten rid of all of the appends, so the implementation is linear.

(Update: Ryan found the removal of the appends unclear. The idea is that the context is only being applied to results that are either null or a single-element list, so we can optimize this away to a cons in one case and a simple tail-recursion in the other. Also, I claimed that this implementation is linear, but I don't think that's true: nodes can be visited multiple times since they can reappear in the work list.)

The only problems left are 1) that non-tail recursive call: (cons task (select/w items)), and 2) the use of lists instead of vectors. The first problem is easily solved with an accumulator. The second isn't really a problem, but imperative languages tend to be optimized for array manipulation, so in preparation we'll use an abstract stack datatype:
;; stack : a ... → (stackof a)
;; push : a × (stackof a) → (stackof a)
;; pop : (stackof a) → a × (stackof a)
;; stack-empty? : (stackof a) → boolean
The obvious implementation of stacks is as lists:
(define stack list)

(define (push x s)
(cons x s))

(define (pop s)
(values (car s) (cdr s)))

(define stack-empty? null?)

Work Lists with an Accumulator

The accumulator-passing-style implementation converts the recursive calls to tail calls:
(define (initial-work-stack tree path)
(stack (make-item tree path)))

;; select/a : (stackof item) → (stackof tree)
(define (select/a items)
(let loop ([items items] [result (stack)])
(if (stack-empty? items)
result
(let-values ([(item items) (pop items)])
(let ([task (item-task item)]
[path (item-path item)])
(cond
[(tree? task)
(cond
[(null? path) (loop items (push task result))]
[(matches? task (car path))
(loop (push (make-item task (cdr path))
(push (make-item (children task)
path)
items))
result)]
[else
(loop (push (make-item (children task) path)
items)
result)])]
[(null? task) (loop items result)]
[else (loop (push (make-item (car task) path)
(push (make-item (cdr task) path)
items))
result)]))))))
This version uses bounded control space, making only tail calls for any non-trivial control. It's easy to see how this would translate to a while-loop.

Imperative Implementation

At last, the final imperative implementation is easy to derive. Since each stack is used linearly in the previous version, we can replace it with a destructive update. So we change our stack representation to use a growable vector datatype (thanks to Jacob for the Scheme implementation): 1
(define stack gvector)

(define (pop v)
(let ([len (gvector-length v)])
(let ([x (gvector-ref v (- len 1))])
(gvector-remove! v (- len 1))
(values x v))))

(define (push x v)
(gvector-insert! v x)
v)

(define (stack-empty? v)
(zero? (gvector-length v)))
We'll represent paths as indices into a fixed path vector, and the child lists of tree nodes will be vectors as well. So we extend work list items to be triples with a task (either a tree or a vector of trees), an optional index into the tree vector (hidden invariants--yuck!), and a path index.
;; tree = symbol × (vectorof tree)
;; path = (vectorof symbol)
;; task = (union (tree × #f × nat) ((vectorof tree) × nat × nat))
And here it is, in all its horrifying glory, our final imperative implementation:
(define-struct item (task index path) #f)

(define (select/! tree path)
(let ([path-len (vector-length path)])
(let loop ([items (stack (make-item tree #f 0))]
[result (stack)])
(if (stack-empty? items)
result
(let-values ([(item items) (pop items)])
(let ([task (item-task item)]
[index (item-index item)]
[path-i (item-path item)])
(cond
[(tree? task)
(cond
[(>= path-i path-len)
(loop items (push task result))]
[(matches? task (vector-ref path path-i))
(loop (push (make-item task #f (+ path-i 1))
(push (make-item (children task)
0
path-i)
items))
result)]
[else
(loop (push (make-item (children task)
0
path-i)
items)
result)])]
[(>= index (vector-length task))
(loop items result)]
[else
(loop (push (make-item (vector-ref task index)
#f
path-i)
(push (make-item task
(+ index 1)
path-i)
items))
result)])))))))
Translation to your favorite imperative language is now trivial, and left as an exercise.
1 With the simple addition of a gvector-remove! operation.