The Little Calculist: May 2006

Tuesday, May 30, 2006

Customized literals in ECMAScript 4

One neat little feature that has come up in the design of ECMAScript Edition 4 is the ability to define conversions between datatypes. Class-based OO is nice for some purposes, and is certainly popular, but the ability to use syntactic literals to construct data is nice, and usually disappears when you "upgrade" to a bondage-and-discipline language like Java.

In ES4, it looks like you'll be able to define a sort of customized literal syntax for any class by means of an automatic conversion method. For example, imagine the following code:

var kleinBlue : Color = [ 0x00, 0x2F, 0xA7 ]
...
kleinBlue.toCMYK()
...

We can define a Color class with whatever methods we want (such as toCMYK) while still constructing one with a simple, lightweight array literal syntax.

We achieve this by defining a conversion method on the Color class:

class Color {
    ...
    type RGB = [ UInt, UInt, UInt ]
    function to(triple : RGB) {
        type switch (triple) {
            case ([ r, g, b ] : RGB) {
                return new Color(...)
            }
        }
    }
}

Now any context expecting the type Color can also accept an RGB triple and automatically convert it to an instance of Color.

Friday, May 26, 2006

ECMAScript reference values

Try this in SpiderMonkey or Firefox:

function foo(flag) {
    if (flag) {
        print('hello') = 10
    }
    return 42
}

foo(false) // 42
foo(true)  // SyntaxError: invalid assignment left-hand side

Tuesday, May 23, 2006

Solving today's web problems

Gilad Bracha's provocative Will Continuations Continue? has stirred up some controversy this week. His objection to continuations stems from the observation that in Ajax applications, the back button is less important than in traditional web programs, sometimes even effectively disabled. For an architect deciding whether to implement continuations for a web framework or its underlying VM, this has to alter the equation. Are Ajax applications rendering continuations obsolete? [1]

I'm a little late to respond, thanks to several days of netlessness. Gilad's post has generated some thoughtful discussion: Tim Bray protests that the web's simple UI model is well understood by users, Don Box points out that the web is not necessarily the only killer app for continuations, and Avi Bryant explains a few of the different kinds of state in web applications and where continuations fit into the picture. Later, Ian Griffiths raises deeper technical complaints about server-side continuations persistence about resource management (what happens when the user walks away?) and thread migration; Ezra Cooper responds that these problems don't exist for Links, where continuations live on the client side.

What makes Gilad's argument unrealistic is that Ajax applications currently account for some miniscule percentage of the web. Even if interactive applications are the future, the web never forgets its past. It's as naive to think you can stop supporting the back button as it would be for browsers to stop supporting HTML 4. You could protest that only the browsers have to support these "legacy" web pages, whereas on the server side, web programmers should start writing Ajax apps right now. But I don't see amazon.com or hotwire.com throwing away their shopping carts in favor of a Ruby on Rails codebase. There are way more existing applications with huge codebases that could benefit from a backwards-compatible solution to the Orbitz problem than there are companies itching to throw their entire codebase away to make way for an all-Ajax site.

None of this changes the fact that adding continuations to the JVM isn't easy. It's still a design decision with the attendant trade-offs. But speculation that everyone will be writing programs that make no use of the back button, say, 10 years from now, is not a compelling excuse not to support those programs now.

[1] As a pedantic aside, continuations could no more become obsolete than variables, or functions, or algorithms could. Continuations are an abstract concept, not a programming construct. More precisely, the real question is whether Ajax applications are rendering obsolete the use of first-class continuations as an implementation strategy for web application frameworks. In particular, I think a number of people have the misconception that programmers have to understand and use call/cc in order to program with these frameworks. In fact, the frameworks only use continuations under the hood to allow the programmer to write their web programs in a natural style.

Friday, May 19, 2006

Late and early binding

It's taken me a long time to figure out what people mean when they talk about "late binding" and "early binding." Little by little, I was beginning to get a handle on it, when Brendan showed me this Python spec which used the terms to refer to order of expression evaluation in a syntactic form! According to the spec's terminology, evaluating an expression earlier is ostensibly "early binding." This freaked me out, but Brendan reassured me that this was probably just a misuse of terminology on the part of the Pythonistas.

At any rate, as I understand them, late and early binding describe the point in the lifetime of a computer program (usually, compile-time versus runtime) at which a particular variable reference becomes dereferenced, i.e., at what point its binding is discovered. Lisp, which is dynamically scoped, doesn't dereference function arguments until runtime, based on the dynamic context. Scheme variables, by contrast, can be early bound, since all variables' bindings are based on their lexical scope. Dynamic dispatch is another instance of late binding, where method names are resolved at runtime.

Dynamic scope is generally regarded as a liability, because it completely violates abstractions, exposing programmers' choice of variable names and making them vulnerable to accidental capture. Dynamic dispatch is a somewhat more restricted form of late binding, where the programmer has finer control over what names can be overridden by unknown external parties.

Compiler writers like early binding because it's an opportunity for optimization--variable references can be compiled into direct pointer dereferences rather than map lookups. From the perspective of program design, early binding is good because it results in clearer programs that are easier to reason about. However, in certain restricted forms, late-bound variables can be useful as a way of leaving a program open to subsequent extension. This is the purpose behind dynamic dispatch, as well as implicit parameters such as in MzScheme or Haskell.

Resumable exceptions

One more thing I learned from Avi: there's a simpler way to implement resumable exceptions than the way I'd thought of. Having been perverted by my formative years as a Java programmer, I always assume the meaning of throw was carved in stone tablets:

Thou shalt abandon thy current context.
Thou shalt unwind the stack, verily, until thou findest the first handler.

So my solution involves capturing the continuation at the point of the throw, to allow the handler to reinvoke the continuation, effectively resuming at the point where the exception occurred. But why waste time unwinding the stack, if you're only going to put it right back in place? The Smalltalk solution (and I believe the Common Lisp solution as well) is to leave it up to the handler to decide whether to unwind the stack. In other words, throw (I think it's called raise in Smalltalk) simply causes the runtime to crawl up the stack, which it leaves intact, to find the first handler, and lets the handler decide whether to unwind, i.e., to blow away the portion of the stack between the throw point and the catch point.

I suppose this means the handler runs on top of the portion of the stack where the raise happened, and there must be some provision for what to do if the handler finishes without a control operation; presumably this would have the same behavior as resume nil, or however you'd express "don't unwind the stack; resume where the exception occurred; but I don't have any useful result value for the failed operation."

My introduction to Smalltalk

Another highlight of my trip to Amsterdam this week was meeting Avi Bryant, of Seaside fame. We talked about Smalltalk, web programming, and his latest project, Dabble. In the past, I've had trouble knowing where to begin with Smalltalk, so he sat down with me and walked me through building a simple program. It was enough to give me a flavor of the language, and to get me started if I want to experiment with it at some point (no time these days!). It's an absolutely breathtaking language. I am still skeptical about excessive reflection. It's clearly extremely powerful, and attractive in its conceptual simplicity. But I generally favor a careful balance between expressive power and local reasoning, and Smalltalk seems to tip awfully far towards the former. Nevertheless, I'd need to learn more and build real systems in Smalltalk before making judgments.

The way Avi designed Seaside leads to a beautiful style of web programming. Of course, there's the fact that the control flow is both compatible with the back button and window cloning and written in a natural style, thanks to continuations under the hood--but that's no surprise. It's in fact roughly the same approach as send/suspend/dispatch (though with the rather more friendly method name renderContentOn). But what I really enjoyed was the object-oriented approach to modeling the components of a web page.

In Squeak, essentially every component in the entire VM--numbers, blocks (closures), windows, stack frames, metaclasses, you name it--is represented as a Smalltalk object, with some particular set of supported methods and various ways to view it. The "system browser" provides a useful view: it introspects the object for its properties and their values, its methods and their source code, and any associated documentation, and displays them in an informative GUI window.

In Seaside, web pages can be seen as simply a different view on objects (that have special methods for rendering on the web, of course). When you change properties of a web component, its web view changes accordingly. Usually, these changes in state are triggered by user actions via the web: clicking links or submitting forms. If you change the property through the Squeak UI, then you can just reload the web page to see the changes. Most impressively, if you change the source code of a method on the object--say, the code that responds to a user action--there's no need to reload whatsoever! The next action simply invokes the method, which now happens to have a new implementation.

Exciting week!

I've had an exciting week in Amsterdam, attending XTech 2006 thanks to the generosity of Mozilla. On Thursday I visited Eelco Visser and Martin Bravenboer at the University of Utrecht. It was a fun and productive visit.

In the last couple of months I have started using Stratego/XT to model the semantics of ECMAScript. I believe that it's possible to scale the semantic frameworks that we use in research up to model whole, industry-sized programming languages. Now, these modeling frameworks are formal systems, just like programming languages themselves. So it should also be possible to express these models in a machine-executable form. There are several benefits to this approach:

The added rigor that comes from the level of detail and precision required to get a machine to understand the model.
The ability to test actual ECMAScript programs against the model.
A reference implementation against which implementations can be tested.
The eventual possibility of applying automated reasoning tools to the model (though not immediately!).

A particularly attractive feature of Stratego is its extremely customizable syntax. I have been able to use this to be very particular in defining the "syntax of the semantics" to maximize its readability.

Eelco and Martin liked my model--they found the CEKS machine representation of continuations lovely--and are glad to have an interesting application of Stratego. We talked about some of the things I could do better (and they already found a few subtle bugs in my implementation) as well as a couple of requests I had for Stratego. They're interested in what kinds of optimizations they could apply to their compiler that would provide the most benefit to applications like mine. For instance, since my abstract machine generally operates on only a few small forms that have fixed shapes, deforestation might be applicable.

Martin is now working on implementing the ECMAScript grammar, which is notoriously difficult to express in traditional parser generation frameworks. The module grammar formalism SDF, based on Tomita's generalized LR parsing algorithm and used in the Stratego/XT system, should actually allow us to define the syntax quite elegantly. This will allow us to parse actual ECMAScript programs, as well as to model eval realistically, since it relies on parsing strings as ECMAScript programs.

Wednesday, May 17, 2006

Syntactic capital

When you step outside your familiar community, you learn interesting things about your assumptions. Something I've learned on the ECMA committee is that in Scheme, new syntax is "cheap," but not so in other languages.

Because syntax definition is in the hands of Scheme user programmers, we expect to see new syntactic forms all the time. As a result, implementing a feature as a syntactic form is uncontroversial in Scheme. In other languages, however, a new syntactic form signals (at least intuitively) a global change to the language. Users know that a new API is something they can put off learning until they feel they need it. But if there's a syntactic form in a language that you don't understand, you feel as though you don't have a handle on the language.

In terms of pragmatics, it's easier for language implementors to implement a subset of an API than it is to implement a subset of the syntax; you can at least guarantee that all programs will parse the same. What this really comes down to is scoped syntactic extensions, i.e., macros. If you can modularize syntactic forms, you can signal a parse error that says "I'm sorry, I don't support this syntactic form," which is the same as "I don't support this API."

In terms of cognitive load, modularizing syntactic forms allows users to learn subsets of a language in exactly the same way they learn subsets of libraries.

Thursday, May 04, 2006

Lightweight

One possible definition: avoiding unnecessary abstraction.

The Little Calculist