The Little Calculist

Wednesday, August 27, 2008

The death of namespaces

As promised in my post on ECMAScript Harmony, I want to talk about the problems with the proposed namespaces feature and why I'm glad it's gone.

In ES3, one of the problems for information hiding is that the language's primary datatype is a mutable table mapping transparent strings to values. As a result, sharing an object creates abstraction hazards: anyone can view -- or even modify! -- an object's internals. Namespaces were an attempt to facilitate hidden properties by generalizing objects in a backwards-compatible way.

So objects became mutable tables mapping namespace/string pairs to values. This has precedent in Common Lisp, and it seems natural. But here's where it started going awry: JavaScript has an ill-conceived history of specifying variable scope by way of a specification construct known as the variable object: a fictional JavaScript object that serves as a rib in the lexical environment. So JavaScript variables are conceptualized as being the same thing as properties. (I'm sure I know where this came from: implementors could use the same internal mechanism to lookup properties in an object inheritance chain as for looking up variables in the environment.) So now with namespaces, variable names were not just strings but also these (namespace × string) pairs.

But namespaces were first-class values. There were even forms for qualifying variable references with a dynamically computed namespace! Lexical scope was slipping away from JavaScript as an almost-was. Even more troubling was the concept of "open namespaces." For convenience and backwards-compatibility, there has to be a reasonable default namespace for looking up unqualified variable and field references. The proposed spec allowed multiple namespaces to be given as candidates for defaults, with a particular search order. Now, variable lookup is tough enough in JavaScript. With bizarre dynamic constructs like with, lexical eval, and the global object, there can be two dimensions of lookup: up the environment and up the inheritance chain of an object. Now with default namespaces there was a third dimension of lookup. For implementors, this creates all sorts of efficiency nightmares. For language users, it would likely be a usability nightmare.

There are ways to simplify namespaces; even Common Lisp's approach is simpler that what was originally on the table for ECMAScript. But when I heard that namespaces, which had been a part of the proposed standard since before I got involved, were on the chopping block, I was thrilled. It had never even occurred to me that they might be cut! I proposed a simpler solution: if we just add gensym to the operations on property names, then we can create private, unguessable properties. No property name pairs, no extra dimension of search, just opaque names. In an OO world, this is more likely to look like new Name() than gensym(), but same difference.

ECMAScript Harmony

There's been a lot of buzz about recent events in Ecma TC39 since Brendan Eich made his announcement of ECMAScript Harmony resulting from the seminal Oslo meeting last month. Some of the public discussion has been so misleading I won't even link to it. I've been getting a number of questions from friends and colleagues, asking if it's true that we've stopped working on ECMAScript Edition 4. I'm giving them all the same answer: don't believe the hype!

As an invited expert, I try to avoid politics. But from my perspective, the Harmony effort is promising to be a great development for the technical quality of the ECMAScript standard. Both sides of the split in the committee have good technical points to make and I've longed for more cooperation. Now we have much more cooperation, and the language design is improving for it.

As for the future of ECMAScript Edition 4: whatever name gets attached to it, the work on improving and standardizing ECMAScript continues apace. We'd already deferred static types for potential future work well before the Oslo meeting, and as part of the Harmony effort we dropped some of the more questionable aspects of the proposed ES4, most notably namespaces. These were not pure political concessions but in fact good technical decisions. (I'll blog about this separately.) The proposed language is changing all the time, but reports of ECMAScript's death have been grossly exaggerated.

Friday, August 08, 2008

Nirvana

I have achieved hygienic macro enlightenment.

Tuesday, June 24, 2008

Implicit homomorphism

One of the really nice features of macros and their mathematical first cousin notational definitions is that they leave their straightforward, recursive expansion implicit. When we write in math:

A ⇔ B = A ⇒ B ∧ B ⇒ A

what we really mean is that all occurrences of ⇔ should be recursively expanded within a term. But there's essentially an implicit structural recursion through all propositions in the logic (which is left open "until" all notational definitions have been provided) which expands any occurrences of notational definitions homomorphically through sub-propositions. We don't require anyone to go through the painstaking and virtually information-free process of explicitly marking all points of recursive expansion.

I would love a similar notational style for defining annotations. I often find myself defining an instrumentation function that inserts one or two kinds of extra pieces of information into a tree. Maybe term-rewriting would work pretty well for this. Say we're writing a function that annotates certain λ-expressions with their height. We could write

λ x . e → λ [ |e| ] x . e

and then leave the compatible, reflexive, and transitive closure of this rewriting rule implicit, since they're obvious.

Then I would really like some way to make this style of definition composable with other definitions, so for example I could define a type-checking-cum-instrumentation algorithm

Γ ⊢ e : τ → e'

where the instrumentation portion (→ e') is mostly left implicit.

Tuesday, June 17, 2008

How to impair a declarative

Declarative programming is a vague term but could loosely be interpreted to mean not imperative programming. It's also about defining a program by breaking it into separate, modular units of partial specification.

Of course, the language's semantics has to have some algorithmic means to combine these declarations into a complete program definition. And if the language admits recursive definitions, the combined program may contains cycles. So the implementation of a declarative language will often have an imperative flavor, combining recursive elements by incrementally updating the set of definitions.

The question is whether it is still possible for the user to understand the semantics without resorting to reasoning about its possibly-imperative implementation. Specifically, the user shouldn't have to worry about time. If the order in which definitions are combined matters, then the semantics becomes imperative. Worse, if the programming interface allows definitions in multiple files, the order of definition might not even be under the user's control--or worse, it might only be controllable through extra-linguistic means like compiler command-line arguments.

Take as an example a language with open recursion, the ability to define a recursive function in multiple, disparate pieces. If an open recursive function has overlapping cases in separate declarations, the language can handle this in one of several ways:

Let rules defined "later" take precedence over rules defined "earlier."
Define a total order on rule specificity, and let more specific rules take precedence over less specific rules.
Disallow overlapping cases statically.

Option #1 reintroduces the notion of time. The resulting semantics is less modular, because it requires understanding not just the individual modules, but subtle aspects of their composition. (If the algorithm for combining definitions is sophisticated--unification, for example--this order may even be a complicated dynamic notion, not just textual order.) The other two options eliminate the imperative flavor, defining time out of the semantics. This frees the programmer to reorder definitions without affecting the program's meaning, and making the program's meaning more modular.

ADTs in JavaScript

Sjoerd Visscher has written a neat post about implementing algebraic data types in JavaScript. There are lots of ways to do this, but this one looks interesting. I don't quite understand it but I thought I'd point it out.

Note the use of expression closures! Good stuff.

Thursday, June 12, 2008

Clumsy existential

This might be a dumb question: I often want to prove something like this:

(∃ v . e ⇒ v) ⇔ (∃ v′ ~ v . [[e]] ⇒ v′)

except that the scope doesn't work out: v is only in scope in the first parenthesized proposition, so it can't be referred to in the second one. Of course, it's generally sufficient to prove co-termination and just leave out the relationship between the final values, since that's usually easily derivable. But it's an important part of the intuition behind correctness, so I want the relationship between answers to be in the statement of the theorem.

An awkward way to make this work is to write big clunky conjunctions like:

(e⇓ ⇔ [[e]]⇓) ∧ (∀ v . e ⇒ v . ∃ v′ ~ v . [[e]] ⇒ v′)

But it would be nice if there were some way to write it more compactly like the first pseudo-proposition I wrote. I like that it reads almost like you'd say in English: "e and [[e]] always get the same answers."

Anyone have any suggestions?

Tuesday, May 20, 2008

The bat-signal operator

Starting now, I propose that every programming language strive to incorporate the bat-signal operator: ~@~

Friday, May 16, 2008

Fail-soft

Automatically repairing errors and continuing execution goes against the grain of software engineering, for good reason, but I've come to see that it's what makes the web go 'round. Robert O'Callahan has an interesting market-oriented take on error recovery:

...the economics of error recovery --- the fact that recovering from malformed input, instead of hard failure, is a competitive advantage for client software so in a competitive market it's sure to develop and you might as well put it in specifications...

In other words: ignore the errors, users will love you, and you win. We may not like it, but you fight economics at your own peril.

Friday, May 09, 2008

N-ary infix predicates

In languages with operator overloading, it's possible to define binary infix predicates such as (using Haskell here):

infix <:
(<:) :: Ty -> Ty -> Bool
x <: y = x `subtype` y -- for some auxiliary subtype function

With arithmetic operators that continue to return elements of the same type (does this make their types endofunctors? I'm no category theorist), so as long as you declare an associativity, the n-ary case is easy to parse. Predicates aren't closed over their input type, though, so in Haskell I can't take the above definition and write:

t1 <: t2 <: t3

But in math, we overload binary predicates to the n-ary case by assuming an implicit "and" between each pair. It would just take a little kludge (but one with the full weight of historical precedent) to let the operator overloading and type system conspire to determine that any binary predicate (i.e., with result type Bool) can be extended to the n-ary case using &&.

(You could maybe also extend this to any type Monad m => m a using >>= instead of &&?)

Thursday, May 08, 2008

Solving the wrong problem

I've been having a depressing couple of weeks trying to solve a really hard problem in my research. Occasionally I thought I had some promising leads, but usually they just led to more despair.

Then I discovered today that I've been tackling the entirely wrong problem all along.

I never knew feeling so stupid could feel so good.

Monday, May 05, 2008

Sigh

> (eqv? +NaN.0 +NaN.0)
#t
> (eq? +NaN.0 +NaN.0)
#t
> (= +NaN.0 +NaN.0)
#f
> (equal? +NaN.0 +NaN.0)
#t

A functional visitor pattern

When your data definitions get big enough, it becomes Tedious and Error-Prone (oh my!) to write multiple operations over the same datatype, so it's useful to define a generic traversal. Here's a pattern I used recently for the ECMAScript Edition 4 reference implementation.

Lots of languages have mutually recursive AST definitions. For example, take a language with mutually recursive expressions Ast.expr, statements Ast.stmt, and declarations Ast.decl. Define an interface VISITOR:

signature VISITOR = sig
    datatype cmd = Stop | Skip | Cont

    type ('a, 'z) method = 'a * 'z -> 'z * cmd

    type 'z visitor = { visitExpr : (Ast.expr, 'z) method,
                        visitStmt : (Ast.stmt, 'z) method,
                        visitDecl : (Ast.decl, 'z) method }

    val foldExpr : 'z visitor * Ast.expr * 'z -> 'z
    val foldStmt : 'z visitor * Ast.stmt * 'z -> 'z
    val foldDecl : 'z visitor * Ast.decl * 'z -> 'z
end;

A visitor is a record of operations, one for each variant of AST nodes. (Alternatively, we could have defined a single type datatype ast = Expr of ... | Stmt of ... | Decl of ...;. But there's no need for the extra indirection.) Each operation takes a node and an intermediate result, and produces a new intermediate result along with a "traversal command" cmd.

This simple traversal language instructs the fold either to continue (Cont), to skip all descendants of the current node (Skip), or to abort the traversal entirely (Stop).

The implementation of the fold uses a few auxiliary functions to compute a list of child nodes for each node:

type children = Ast.expr list * Ast.stmt list * Ast.decl list

fun exprChildren (e : Ast.expr) : children = ...
fun stmtChildren (e : Ast.stmt) : children = ...
fun declChildren (e : Ast.decl) : children = ...

The fold itself is (internally) in CPS. At each node, there are three potential continuations: the empty continuation for aborting, the given continuation for skipping the current node's children, or a new continuation that first traverses the current node's children and then uses the given continuation.

type 'z cont = 'z -> 'z

fun cont (result : 'z, cmd : cmd)
         (skipk : 'z cont)
         (contk : 'z cont)
    : 'z =
    case cmd of
        Stop => result
      | Skip => skipk result
      | Cont => contk result

fun foldExprK (v as { visitExpr, ... }, expr, init) k =
    cont (visitExpr (expr, init))
         (fn x => foldAllK v (exprChildren expr) x k)
         k

and foldStmtK (v as { visitStmt, ... }, stmt, init) k =
    cont (visitStmt (stmt, init))
         (fn x => foldAllK v (stmtChildren stmt) x k)
         k

and foldDeclK (v as { visitDecl, ... }, decl, init) k =
    cont (visitDecl (decl, init))
         (fn x => foldAllK v (declChildren decl) x k)
         k

and foldAllK (v : 'z visitor)
             ((exprs, stmts, decls) : children)
             (init : 'z)
             (k : 'z cont)
    : 'z =
    let
        fun foldList f (v, asts, init) k =
            case asts of
                [] => k init
              | [ast] => f (v, ast, init) k
              | ast::asts => f (v, ast, init)
                               (fn x => foldList f (v, asts, x) k)
    in
        foldList foldExprK (v, exprs, init) (fn result =>
            foldList foldStmtK (v, stmts, result) (fn result =>
                foldList foldDeclK (v, decls, result) k))
    end

I originally had foldList defined at the top-level, in the same mutual recursion group with foldExprK et al, and got smacked by SML because that would require polymorphic recursion. Luckily I was able to work around it by placing its definition inside the body of foldAllK, but since I'd never run into SML's lack of polymorphic recursion before, it took quite a while to decipher the type error (with help from #sml).

Finally, the top-level functions seed the CPS functions with an initial continuation:

fun id x = x

fun foldExpr x = foldExprK x id
fun foldStmt x = foldStmtK x id
fun foldDecl x = foldDeclK x id

(Note that because of the value restriction, there's no possibility of refactoring these into point-free val-declarations.)

Aren't auto-curried functions non-expansive?

The value restriction in ML is so limited. If I have an auto-curried function

> fun foo (k : int -> 'a) (x : int) = ...;

it should be the case that partial application of foo is always non-expansive, right? It's really annoying that partial applications of auto-curried functions often can't be given general types.

> val raisek (x : int) = raise (Fail (Int.toString x));
> val fooRaise = foo raisek;
Warning: type vars not generalized because of value restriction...

I hate when the type system interferes with standard, semantics-preserving transformations.

Wednesday, April 30, 2008

Dynamic languages need modules

A dynamic language starts as an implementation. And that implementation almost always includes some variation on a REPL. Look at Lisp, Scheme, Python, Ruby, JavaScript, Lua, etc. One of the things that makes these languages "dynamic" is that they're attached to some long-running process: an IDE, a user prompt, an embedding application, a web page. So these languages are simply implemented with some global table of definitions that gets updated through the lifetime of the host process. Easy enough to understand from the implementor's perspective, but what happens to the user of the embedded language?

The problem with the "top-level" (that ever-changing table of program definitions) is the creeping specter of dynamic scope. While static vs. dynamic typing may be a debate that civilization takes to its grave, the dust seems to have more or less settled on dynamic scoping. The fact is, even though many decisions a program makes must depend on its context, it's very hard to understand a program if you can't nail down its definitions.

To some degree, if a dynamic language has lambda, you can escape the top-level with a design pattern that simulates a poor man's module. The module pattern is almost always a variation of what Schemers call "left-left-lambda"--the immediate application of a function literal. Bindings inside the lambda are no longer floating in that airy and transient stratosphere of the top-level; they're nailed to the function that is being applied. And you know that function is being applied only once, because once you've created it, it's applied an discarded.

This pattern goes a long way, and if you have macros, you can create sugar to give the pattern linguistic status. But a module system it ain't.

Linking: Nothing in this pattern deals with the relationships between modules. There's no way to declare what a module's imports and exports are. In fact, if you want a module to communicate with any other modules, the top-level's poison tends to seep back in. To export a value, a lambda-module can mutate some global variable, or it can return a value--but where does the caller save the value? You can always nest these solutions within more lambda-modules, but ultimately there's a problem of infinite regress: in the end you have to have at least one special top-most module surrounding your program.

Separate development: And that's only if you have control over the whole program. If you want to create a library and share it, there needs to be some shared common framework in which people and organizations can share code without stomping on each other's invariants or polluting each other's global environments. To be sure, a built-in module system doesn't eliminate all such issues (you still need conventions for naming and registering modules within a common framework), but modules help to standardize on these issues, and they can provide helpful errors when modules step on each other's toes, rather than silently overwriting one another.

Loading: There's not much flexibility in the loading of an immediately applied function. If your language involves multiple stages of loading, the implementation may be able to be smarter about loading and linking multiple modules at once.

Scoping conveniences: Lexical scope is a widget in the programmer's UI toolkit, and for different scenarios, there are different appropriate designs. The tree shape of expressions makes the lexical scoping rule ("inner trumps outer") appropriate; it favors the local over the global. But, ignoring nested modules for the moment, modules aren't tree shaped; they're more like a global table. In a sense, all modules are peers. So when you import the same name from two different modules, which one should win? You could say that whichever you import later wins, but this is much more subtle than the obvious nesting structure of ordinary variable bindings. I find it's more helpful for the module system to give me an error if I import the same name from different sources (unless it's a diamond import). Other useful facilities are selective import, import with renaming, or import with common prefixes. These are subtle usability designs where modules differ from lambda.

Extensibility: In PLT Scheme, we've used the module system as the point for language design and extension. By allowing modules to be parameterized over their "language", we have a natural way for introducing modalities into PLT Scheme. As languages grow, these modalities are an inevitability (cf. "use strict" in Perl and ECMAScript Edition 4). Buried within pragmas or nested expressions, this makes the design of the language much harder. But within a module, bindings are sacrosanct and interactions with other modules are limited to imports and exports. This significantly cuts down on the space of possible interactions and interferences between the language's modalities. As an example, Sam Tobin-Hochstadt has made good use of this for the design of Typed Scheme, a statically typed modality of PLT Scheme that can still interact reliably with dynamically typed modules.

The unrestricted mutation of the top-level environment is a good thing thing for many purposes: interactive development, self-adjusting execution environments, etc. But it's terrible for nailing down program definitions. Modules are a way of circumscribing a portion of code and declaring it "finished". It can still be useful to have an environment where modules can be dynamically loaded and possibly even replaced, but it's critical for the language to provide the programmer with basic invariants about the definitions within a module.

All of this is stuff I wish I'd had a clearer head about earlier in the ES4 process. But I hope that, down the road, we'll consider a module system for ECMAScript.

Monday, April 28, 2008

Literate code

I used to find the notion of literate code attractive, but right now my dissertation prototype implementation is undergoing some serious redevelopment and the horribly out-of-date docs are dragging me down. Now I find myself wishing I'd kept them separate.

Tuesday, April 22, 2008

How to spell StopIteration

Some of the pseudo-constants in JavaScript are actually mutable global variables: undefined is a notorious example. Luckily, there's generally some reliable way to generate such a value without having to hope that no one has reassigned a global variable.

For undefined, it's the void operator, which, given any argument, produces the value that undefined is initially bound to. Conventionally, people write void(0).

In JavaScript 1.7 and later, there's a special constant called StopIteration that's thrown after an iterator runs out of values. This too is a mutable global. But this value is a little trickier to get at reliably. Here's the simplest expression I could come up with that produces the StopIteration value:

(function(){try{(function(){true||(yield)})().next()}catch(e){return e}})()

The inner function is a generator by virtue of textually containing the yield keyword, but it returns immediately, yielding no values, because of the short-circuited logical or. (Note that because of yield's finicky grammar, it has to be parenthesized.) So by calling next() on it, it immediately raises StopIteration, which the outer function catches and returns.

Sunday, April 20, 2008

Compilation, obfuscation, encryption

It's interesting that we tend to use compilation as a proxy for encryption. It seems to me it would help clarify the system design of a language semantics and architecture if you separate the concerns of code ownership with performance. Compilation is primarily used to cache the transformation of one semantics into another--a partial evaluation of an interpreter. Using this as a weak encryption mechanism is pretty silly. If you want to encrypt program source, why not use real encryption? That way you can use whatever representation of the code you want; source, byte-code, microcode, whatever.

Friday, April 11, 2008

A poem for the weekend

Listen to Mustn'ts, child, listen to the Don'ts.
Listen to the Shouldn'ts, the Impossibles, the Won'ts.
Listen to the Never Haves, then listen close to me.
Anything can happen, child, Anything can be.
--Shel Silverstein

Tuesday, April 08, 2008

Different perspectives on parameterized types

To the PL theorist, parameterized types are a necessity for the expressiveness of a statically typed language. Imagine Hindley-Milner without them: you'd have to duplicate definitions all over the place. But in practice, statically typed languages have loopholes--long before generics, Java always had the type Object so that you could essentially "turn off" type-checking when you couldn't type things precisely. This meant that you could always write generic operations and generic collections, but without type-checking.

So in practice, people don't look at parameterized types as increasing the expressiveness of Java at all; it just looks like a way to increase the amount of type-checking in the language, to make it stricter. And they're right.

So you have two correct perspectives on parameterized types: roughly, that they make the language less restrictive and more restrictive. It's not actually a contradiction, but it's enough to have caused me confusion in some conversations.