Friday, May 15, 2009

Lexical syntax thought

We reserve the single-quote character (#\') in the Scheme lexical syntax so that it can't be used anywhere in an identifier. If you stick it in the middle of an identifier (e.g., foo'bar) it lexes as three tokens ("foo", "'", "bar"). But I don't recall ever seeing anyone make use of this fact; they always stick a space before the quote.

I really miss my precious "prime" character in ML and Haskell identifiers (let val state' = munge state ...). Scheme prides itself on a permissive lexical syntax for identifiers. Why couldn't we allow quote characters in all positions of an identifier other than the first?

P.S. I know I can use the Unicode prime character (U+2032). Sorry, not till I get my Unicode keyboard.

Follow-up on effect masking and backtracking

A follow-up on yesterday's post about backtracking with state: an anonymous commenter reminded me that persistent data structures are a useful and lightweight (compared to e.g. first-class stores) approach. In fact, probably my favorite paper from ESOP 2008 was Conchon and Filliâtre's Semi-Persistent Data Structures, which was addressing exactly this problem. Rather than introducing an entirely transactional semantics for the language, it's just a data structure with a transactional interface.

Continuation marks in exceptions

Rethrowing caught exceptions is an important idiom. In particular, there are lots of cases where it's important to conditionally handle an exception by handling it and then rethrowing it. In ML, I believe it's the actual definition of pattern matching on particular exception types: the pattern matching is syntactic sugar for inspecting the caught exception and conditionally rethrowing it. In standard JavaScript it's the only way to catch particular exception types, because there is no standard conditional handling form.

But a useful feature of some exception systems, like in Java, is the association of a stack trace with an exception. But when you rethrow an exception, this involves a choice: do you want a stack trace to be associated with the original continuation that threw the exception, or the one that rethrew it? There are genuine use cases for both. But in Java, the implementers have to choose one or the other for you.

PLT Scheme gives you the flexibility to choose for yourself. Exceptions are not special values in Scheme; there's an orthogonal way of reifying the control state. The PLT runtime stores stack trace information in a special continuation mark, so if you grab the current continuation mark set, you've got your hands on a value that provides DrScheme's IDE with all the information it needs to visualize the continuation.

The standard exception types in PLT Scheme are records with a field for the current continuation mark set. So when you conditionally rethrow an exception, you can either keep its version of the continuation marks, or functionally update it with the current continuation marks and throw that instead.

Afterthought: I think Java might have chosen something like a policy where an exception value saves its stack trace at the point where it's created. This gives you the ability to do both by either rethrowing the same exception (to keep the original stack trace) or wrapping the exception with a new one (to use the new stack trace). I still prefer the PLT Scheme approach, which makes the stack trace a first-class value and consequently more explicit.

Thursday, May 14, 2009

Effect masking vs. purity

I've been working for a while on an algorithm that involves two effects: mutation and backtracking. Implementation strategies for the two effects seem to have different economics. I'm working in the context of PLT Scheme, here, but I suspect my observations would be similar in other non-pure languages.

For backtracking, I could implement the algorithm in a pure way using two continuations or the list monad. Or I could just use the built-in exceptions and exception handlers of PLT Scheme. For mutation, the choice is between store-passing or the built-in mutation of Scheme.

I've tried abstracting these two effects out into a single monad for a completely pure implementation. There are downsides to this approach, though. For example, debuggers and stack traces are completely useless since most of the program control flow goes through unit, bind, and the myriad intermediate closures created by such a higher-order style of programming. Besides, an excessively higher-order style imposes real readability costs on a program. So for implementing backtracking, I find exceptions to be a more straightforward approach.

But as it turns out, the algorithm also needs to roll back mutations when it backtracks. So I simply can't use mutation. I think this is one of the things that make mutation so unwieldy--there's typically no corresponding "masking" operation for a destructive update. Exceptions have handlers, continuations (sometimes) have delimiters-- even some I/O features come with effect masks: for example, in PLT Scheme you can override the current-output-port with a string port and capture a program's output. All of these allow you to bound the effects a computation can have, to put a sort of dynamic safety net around it. Not so for set!, vector-set!, set-box!, or set-record-field!.

So for mutation, I'm sticking with explicit store-passing-- not in monadic style. That seems to be the reasonable middle ground for my purposes.

Monday, May 11, 2009

Representing failure

I'm working on an algorithm that uses backtracking implemented with exceptions and handlers (rather than, say, explicit continuations). I just noticed that, for debugging, it's nice to represent the failure exceptions as records that contain not just the data about the operation that failed but also an optional list of other failure exceptions. In the disjunction cases, I can save the intermediate failures and, if they all fail, include them in the resulting failure record. This provides me with a nice data structure representing the control flow of the failed search tree. It only required modifying about 3 lines of code and the resulting data structure is easier to analyze than a manual debugging sequence. (I hate working with debuggers, even good ones.)