Monday, November 27, 2006

Does type inference have to suck?

Type inference with tools like Haskell and SML obviously sucks. Hard. The error messages are useless. I have to pinpoint my errors manually, narrowing down the error by commenting out half the program at a time. Any time you're using binary search to find a bug, it's safe to say your debugging tools aren't helping you.

There's been a fair amount of pretty hardcore research on this. But does it really need to be hard? If I could just have some help from a modern IDE, it seems like type inference errors should be perfectly adequate. At the very least, the type inference engine could keep track of source location for every type constraint it acquires, and then when it finds conflicts, the GUI could simply draw an arrow between the two conflicting expressions. If it doesn't know which one is wrong, fine! I'll know which one is wrong, but it has to show me both!

SML/NJ's approach of picking one at random to blame is pure madness.

Another useful tool would be for the IDE to show me the inferred type of any expression in the program, so that if it's not manifest, I can at least make it manifest. Annotating one or two expressions at a time and recompiling is a colossal waste of time.

I've got plenty of criticisms for Eclipse, but if there's one thing we can learn from it, it's that compiled languages don't have to force the programmer into the edit-recompile cycle. It's perfectly possible these days to run the compiler incrementally and continuously. (Conversely, you could say that there's no reason why interpreted languages can't be incrementally compiled.)

I know I'm not the first to point this out. Why hasn't anyone in the statically-typed functional languages community done this? You don't need to reinvent the IDE, just build some plug-ins for emacs or Eclipse.

Wednesday, November 15, 2006

Dynamically optimizing dynamic dispatch

Robert O'Callahan just explained to me the basic optimization approach for dynamically typed languages that I've been hearing about lately. It's the approach used in the Strongtalk VM, recently mentioned by Avi Bryant and Gilad Bracha. At least, I think that's what they were talking about, but I didn't understand it until Robert explained it to me.

Performing a dynamic dispatch compiles into a branch deciding which class's method to invoke:
if (class == C) {
else if (class == B) {
else ...
With some runtime profiling, you can figure out which branch is most likely to occur and dynamically rewrite the branch so that the common case comes first in the branch.
if (class == B) {
else if (class == C) {
else ...
This then feeds into the hardware branch prediction.

With branch prediction, the hardware will keep some small history to help it predict which way to take. But if the particular branch in question is not in the history (say, because it's part of a loop with a large body whose branch history is bigger than the hardware branch history table), a conditional branch is an either-or; the branch predictor will still pick one. So as long as you structure the conditional so that the branch predictor will always guess the common path, even absent any branch history, it will still predict correctly in the common case.

This is something that you can't do with a vtable, because branch history tables are smaller for computed jumps (since each entry is a pointer rather than a single branch-taken/branch-not-taken bit), and because the hardware predictor can't guess right when it doesn't have a history.

You can then do even better by actually inlining the method body directly into the branching code, so it looks like
if (class == B) {
// B::foo method body
else ...
and then even better, by merging several inlined methods in sequence. With some simple dataflow analysis on locals (between mutations), you can eliminate redundant checks. That is, code that looks like:
if (x.class == B) {
// B::foo method body
else ...
if (x.class == B) {
// B::bar method body
else ...
gets optimized to:
if (x.class == B) {
// B::foo method body
// B::bar method body
else ...

Monday, November 13, 2006

Non-native shift/reset with exceptions

Filinski's implementation of shift and reset has undesirable behavior in a language with exceptions. Consider the following example:
(reset (lambda ()
(+ (shift (lambda (k)
(with-handlers ([symbol? (lambda (exn) 100)])
(k 1))))
(raise 'boo))))
You'd expect this program to evaluate to 100, but with Filinski's implementation, it raises the uncaught exception 'boo. The same example with MzScheme's native delimited continuations yields 100.

Sunday, November 12, 2006

Filinski's implementation of shift and reset

With the nice specification of shift and reset via local and global continuations, it's easier to understand Filinski's implementation in terms of a host language with undelimited continuations and mutable state. He uses a mutable cell to represent the global continuation:
(define meta-k (lambda (x) (raise 'missing-reset)))
The implementations of shift and reset push and pop representations of the local continuation onto this cell.
(define (reset thunk)
(call/cc (lambda (k)
(let ([meta-k0 meta-k])
(set! meta-k (lambda (r)
(set! meta-k meta-k0)
(k r)))
(let ([v (thunk)])
(meta-k v))))))
The reset function simulates pushing the local continuation onto the global one by updating it with a function that, when invoked, both pops the local continuation off (by simply mutating the global continuation back to its previous value) and jumps to the local continuation (by invoking the host language continuation captured by call/cc)--see the next-to-last eval rule. To ensure that the meta-continuation really does act like the global continuation, it is invoked on the result of the thunk; so even if the thunk never invokes a control effect, control eventually jumps to the initial global continuation.
(define (shift f)
(call/cc (lambda (k)
(let ([v (f (lambda (v)
(reset (lambda ()
(k v)))))])
(meta-k v)))))
The implementation of shift invokes its function argument and then forces a jump back to the global continuation afterwards--see the last eval rule. Even though the semantics suggests that the local continuation is immediately abandoned before evaluating the function argument, this implementation is morally equivalent; it doesn't bother to throw away the local continuation until after the function returns, but it still throws it away. The local continuation argument passed to the function argument is represented as a function that pushes the current local continuation onto the global continuation before jumping to the current local continuation--see the last cont1 rule.

Now, when taking into account proper tail recursion and space efficiency, this implementation is spectacularly wrong. Furthermore, stack reflection via language features like continuation marks will render this implementation inequivalent to native implementations of delimited control.

Shift and reset via two-continuations

I did a little pleasure-reading while I was out of town this weekend. Biernacki, Danvy, and Shan have a paper on delimited continuations with a nice formulation of shift and reset based on Filinski's specification. Filinski's approach is to evaluate every expression in two continuations. The first is the local continuation, representing the dynamic context up to the nearest delimiter. The second is the global continuation (which he calls the "meta-continuation"), representing everything above the delimiter. The nice thing about separating these two is that control effects are only able to affect the local continuation; the global continuation is completely off-limits to the user program.

Filinski uses a two-continuation version of CPS for his specification, but Biernacki et al give a nice CEK machine semantics (which I have a certain fondness for). The two continuations are represented as evaluation contexts, which themselves are represented as composable sequences of context frames.

The first rules specify the evaluation of a term. The interesting rules are the last two: installing a prompt promotes the current local continuation to "inviolable" status by placing it in the global continuation; and capturing the current local continuation removes it and stores it in the environment.
eval <x, e, C1, C2>     → cont1 <C1, e(x), C2>
eval <λx.t, e, C1, C2> → cont1 <C1, [x,t,e], C2>
eval <t0 t1, e, C1, C2> → eval <t0, e, C1 o (t1 [])^e, C2>
eval <#t, e, C1, C2> → eval <t, e, [], C2 o C1>
eval <Sk.t, e, C1, C2> → eval <t, e[k := C1], [], C2>
The second set of rules represent returning a value to the local continuation. The most interesting rules is the last one, which represents applying a captured continuation: it simulates "returning" the operand to the captured continuation and saves the current local continuation in the global continuation, so as to preserve the current state of the computation.
cont1 <[], v, C2>                → cont2 <C2, v>
cont1 <C1 o (t [])^e, v, C2> → eval <t, e, C1 o (v []), C2>
cont1 <C1 o ([x,t,e] []), v, C2> → eval <t, e[x := v], C1, C2>
cont1 <C1 o (C1' []), v, C2> → cont1 <C1', v, C1 o C2>
The last rules represent returning values to the global continuation. If it has some local computing to do, that gets placed back into the local continuation, and otherwise it returns the final value.
cont2 <C2 o C1, v> → cont1 <C1, v, C2>
cont2 <[], v> → v
This is a lovely presentation of delimited continuations. I'd also never thought of the connection between delimited continuations and two-continuation backtracking (having still never read Filinski's work).

Thursday, November 09, 2006

Skolem meets Quine

How's this for the crazy hack of the day? I was writing a little code generator for SML (because as a Schemer in ML-land, it was only a matter of time before I'd start trying to write macros) and I wanted to use quasiquotation for constructing code. (I'm sure Lisp was the first programming language with quasiquotation, although the concept originated from logic, at least dating back to Quine.)

SML/NJ includes a quasiquotation facility, but you have to write your own parser for the object language. In this case, my object language was ML itself. So I wanted to reuse the ML parsing libraries that come with SML/NJ. But a quasiquoted source expression is a list of "fragments"--either a (quoted) string or an (unquoted) term:
datatype 'a frag = QUOTE of string | ANTIQUOTE of 'a
So I needed a way to extend the existing parser to be able to parse not just raw ML source, but ML with holes in it.

My solution is to generate fresh variables at the holes (analogous to "Skolem constants" in a type inference algorithm) to 1) generate a single, parseable string; 2) parse that string using the existing parser; and then 3) replace instances of the fresh variables in the resulting AST with their original unquoted terms. Luckily the parser that comes with SML/NJ (from the MLRISC library) has a generic term-rewriting library so this last step was easy.

Here's the entire parser:
fun parseQuasiML frags =
let fun skolemize [] = ([], [])
| skolemize ((QUOTE s)::frags) =
let val (ss, skolems) = skolemize frags
(s::ss, skolems)
| skolemize ((ANTIQUOTE x)::frags) =
let val skolem = gensym "qqSkolem"
val (ss, skolems) = skolemize frags
(skolem::ss, (skolem, x)::skolems)
val (ss, skolems) = skolemize frags
fun exp _ (e as (IDexp (IDENT ([], x)))) =
(case lookup (x, skolems) of
NONE => e
| SOME e' => e')
| exp _ e = e
val NIL = MLAstRewriter.noRewrite
val rw = #exp(MLAstRewriter.rewrite
rw (parseMLExp (String.concat ss))
The skolemize function traverses the list of fragments, generating fresh "skolem variables" for each unquoted term. The ML parser then parses the resulting string. The last part (the exp function and MLAstRewriter.rewrite) simply uses the rewriter module to replace any instances of IDexp (an AST representing an ML variable reference) that contain skolem variables with their original, unquoted terms, and otherwise leave the expressions intact.

Update: Bawden suggests that quasiquotation (or at least the term "quasiquotation") is indeed due to Quine.

Temporarily violating invariants

Standard ML doesn't allow you to put anything on the right-hand side of a val rec declaration except for a lambda expression. This means that you can't create a cyclic data structure unless you use a ref cell with an option type inside it, temporarily create the cell to be empty, and then immediately mutate it to SOME of its contents. Unfortunately, this means that a) you always have to refer to that value by dereferencing it, and b) you can't enforce that the ref cell won't be assigned to later.

I found an old discussion on about this, where Xavier Leroy explains that you could in fact relax the restriction to allow not just syntactic functions but variable references, constants, and constructor applications on the right-hand side of a val rec, since none of these expression forms cause side effects. The underlying language implementation would temporarily create an empty data structure and mutate it, but this would never be observable to the program. Leroy claims you need all sorts of crazy math to prove it, but it should actually be simple to prove operationally.

Just to free associate for a moment, this also reminds me of dynamic code-rewriting techniques, such as JIT compilation or dynamic optimization based on runtime profiling. If you can prove that a dynamic drop-in replacement is observationally equivalent, you can push the temporary mutation down under the model/hood and make the object language safer.