Monday, September 29, 2008

Representing the cost of control

Every once in a while I play around with this idea for the design of a web language (server-side).

The nice thing about using first-class continuations to implement a web library is that you can write your code in direct style, but a big down-side is that you still have to figure out how to manage the storage of those continuations. But the alternative of essentially representing the control with program data structures, i.e. with explicit representations of the state of the computation, is really painful.

So I've thought of taking a page from Haskell's playbook: what if you separated a web language into two portions? One language would be for writing the "driver" of the program, where you can call send/suspend and computations may be suspended and restarted an arbitrary number of times. The other language would be "pure" (well, let's say full Scheme, but no send/suspend). In other words, replace Haskell's IO monad with the Web monad. As with Haskell, code in the Web monad can call pure code, but not vice versa.

The benefit of this would be that you could still write in direct style, but you've made explicit the portions of the program where control will cost you. Then you could play with different ways of managing that cost--expiration of continuations, back-off for expiration, etc.--by playing with the design of the Web monad, which wouldn't affect the pure language.

JavaScript completion value: pros and cons

In JavaScript the completion value is the result value of a computation, which may result even from evaluating a statement. Simple statements like
2 + 2;
have result values (in this case 4), but so do complex statements like
for (var i = 0; i < n; i++)
Notice that if n is less than [edit: or equal to] 0 in this example, there won't be any computation at all, in which case the statement can't really produce any reasonable value.

I've just figured out how to articulate an intuition I've long felt about the completion value. The great thing is that it mitigates some of the awfulness of statements: even imperative code can still produce values! This means you can have some of the syntactic convenience of statements for writing effectful code, while still producing results. But the thing that always made me uncomfortable was how code like the for-loop above screws with the idea of a tail call. Because the for-loop may or may not produce a value, there's a sort of implicit return happening after the loop.

There might be tricks you could pull using something like continuation marks to allow tail position inside of loops, but I think a simpler answer in a JavaScript with proper tail calls would be to say that loops do not preserve tail position. So essentially tail position would include the last statement in a block, the arms of an if statement, and the arms of a conditional expression, and that's pretty much it. Inside a try-block, a loop, or a switch (because of the reliance on break) there simply wouldn't be any tail positions.

Note that a) JavaScript doesn't have tail calls, and b) the only places where you can even observe the completion value of a statement are at a REPL or with eval. But I think JavaScript could have proper tail calls (potentially), and I've always wondered whether the completion value should remain a hack isolated to eval or actually see its way into the design of other features for JavaScript.

Friday, September 26, 2008

Most productive conference trip ever

I can confidently say I've never accomplished so much actual work at a conference before. I actually solved a research problem during downtime while at ICFP this week. As a fledgling PhD student I used to take advantage of my boundless free time to think about problems for hours on end, often working into the night. As time goes by I have less and less uninterrupted free time, so I guess that's making me more efficient at using my spare cycles in very short bursts. I guess Butler Lampson would classify that under speculative execution.

Monday, September 22, 2008

Debugger as library

The great thing about printf-debugging is that it's got about as low a barrier to entry as possible (unless you're using Haskell--don't get me started). You have implicit access to the program's control flow and I/O and the function is essentially always there. When you want to use a debugger, you usually have to search for a good one, download and install it, learn how it works, figure out how to set breakpoints, learn the language of breakpoint conditions, figure out how to inspect locals and evaluate expressions, etc. etc.

What if debugging was just available as a library? What power would you need? I think PLT Scheme is uniquely suited to making this extremely practical. It's easy to implement a programmatic breakpoint: just call/cc and raise an exception with the captured continuation. You immediately have access to a nice language of boolean predicates for determining breakpoint conditions (Scheme), as well as a language to evaluate expressions with access to the locals (also Scheme). But PLT Scheme also provides continuation marks, which allow you to leave little bread-crumbs through the control flow of a program. What I'd really love to do is use this to allow the programmer to annotate interesting points in the control and connect this up to the DrScheme GUI libraries so that the programmer can do something like:
> (define bp (debug <expression>))
> bp
> (show-control bp)
and have a window pop open that displays a stack trace of the suspended program.

I'm working on a PLaneT library to make this just about as low-tech and seamless as printf-debugging. You'll just need to throw in one extra line somewhere in the module:
(require (planet dherman/debug))
and this will allow you to throw in breakpoints and debug programs much like you could with a full debugging IDE, only with the directness and intra-linguistic style of printf.

Sunday, September 21, 2008

Units of measure in F#

I'm at ICFP and just watched Andrew Kennedy give a delightful invited talk about the support for units of measure in F#. It was a very impressive example of work that's well-designed all the way through, from theory to practice. In essence, it allows you to say things like
let gravity = 9.8<m/s^2>
and the type system tracks not only the type float but also the unit of measure m/s^2 and prevents unit-mismatch errors in the program.

One of the questions I had was based on my brief experience several years ago with statistical natural language processing, where a lot of the computation we did was in logspace. In a situation like that you would like to keep track of both the ordinary units of measure as well as the fact that everything is in logspace.

Dan Grossman took this a step further and pointed out that you want your arithmetic operations to have different types when you're working in logspace. For example, while the + operator usually has a type like:
+: float<'u> -> float<'u> -> float<'u>
in logspace it should instead have the type:
+: float<log 'u> -> float<log 'v> -> float<log ('u · 'v)>
Now, I don't know how much support there is for "unit constructors," but the F# unit system does allow you to use units with custom abstract data types. According to Andrew, F# also supports operator overloading and ties it in with the unit system. So it might in fact be possible to express the logspace version of the + operator!

This integration between overloading and units could be a nice solution to the issue of different representations of numbers. Libraries like Java's BigInteger are a usability disaster, but we only have one syntax for numerals and operators. Tying a type-and-unit system in with overloading could allow you to use various machine representations (fixnums, flonums, IEEE754r decimal, bignums, exact rationals, ...) by annotating literals and variables, and type reconstruction could help with the operators. I'm not a big fan of unification-based inference, but in this case I wonder if simple structurally recursive reconstruction would be enough in practice.

Update: Ken Shan points out that only the implementation would use +, but you really would want to overload the * operator. Conceptually you're doing a multiplication.

Monday, September 15, 2008

True unions, revisited

I've spoken about true unions (or ad-hoc unions, or non-disjoint unions) sort of abstractly before, but Typed Scheme has 'em, and they're great.

In my recent experiences with Haskell and ML, the amount of unnecessary and distracting injection and projection that goes on starts to bloat code to the point of illegibility. Disjoint unions in ML and Haskell provide two orthogonal pieces of functionality in one feature. Typed Scheme separates these two: struct types give you disjointness, and unions give you, well, unions. It's up to you to guarantee that you don't create unions with overlap, but so far I haven't seen that cause any problems.

Then you can build disjoint unions on top of that, as I have in my new types.plt PLaneT package:
(define-datatype Expr
[Var ([name : Symbol])]
[Abs ([var : Symbol] [body : Expr])]
[App ([rator : Expr] [rand : Expr])])
macro-expands to:
(define-struct: Var ([name : Symbol]))
(define-struct: Abs ([var : Symbol] [body : Expr]))
(define-struct: App ([rator : Expr] [rand : Expr]))
(define-type-alias Expr (U Var Abs App)))

Scheme record syntax

I think if I were designing my own Scheme I'd tighten up some of the aspects of PLT's structs into a smaller syntactic footprint. In PLT, when you say
(define-struct thing (foo bar))
it binds thing to an identifier carrying static information about the struct type (which is needed for other macros such as match), struct:thing to a runtime value containing introspective information about the struct type, and make-thing to a constructor for the struct type.

I think I would just bind the single identifier thing for all three purposes: statically, it would carry the static information for match and friends; dynamically, it would be the constructor. (I never liked the imperative-sounding "make-" prefix.) For dynamic introspection, I would probably include a library form where you could say (struct foo) to reflect the struct type information into a dynamic value.