The Little Calculist: October 2008

Thursday, October 30, 2008

Maybe vs. exception vs. failure thunk in Scheme

It's pretty easy to switch back and forth between representing a function with a "maybe" result type as:

returning X or #f;
returning X or raising an exception; or
returning X and taking a failure thunk

In Scheme, sometimes X might include the value #f. In that case I think representation #3 is the best one, since raising an exception is clunky. You can also combine #2 and #3 and take an optional thunk that by default raises an exception.

This is one case where true unions do cause some problems, so maybe I should have tempered my earlier enthusiasm a little. Still, it's not insurmountable, and this cost needs to be weighed against the price of strictly disjoint unions.

Sweet

A Super Mario Bros. clone, implemented in Haskell.

Friday, October 24, 2008

Fun programming exercise

Off and on I work on my implementation of JavaScript in PLT Scheme. Implementing the standard library means translating the spec's low-level, imperative pseudocode (think GW-BASIC) to Scheme. It's a fun and mostly mindless exercise to take code with assignments and goto, translate that into mutually tail-calling basic blocks, eliminate the assignments by parameterization, and then finally polish it into a more idiomatic style. Sort of my version of a Sudoku puzzle or something.

Once you've already implemented it...

A common phenomenon in API design is to expose something simply because you've already put in all the effort required to implement it. For example, if your language runtime implements unbounded recursion, you've already gone a long way towards representing continuations as data. In that case, it's not much more work to offer first-class continuations as a data structure in the language. Similarly, languages often optimize their startup time by dumping the heap into a persistent data structure so they can cache the initialization of the standard library; once they've done this, heap-dumping is an easy feature to offer to users.

On a semantic level, first-class continuations and first-class stores are obviously similar kinds of constructs. But I'd never noticed till now how they can both just sort of crop up as natural consequences of incidental needs in language implementations.

Monday, October 20, 2008

Updated javascript.plt

I've just updated my implementation of JavaScript in PLT Scheme. The latest version fixes some of the subtle ways I'd deviated from the spec with respect to variable binding. Back in the day, I was wooed into thinking that JavaScript was so similar to Scheme that I could pretty easily just compile JavaScript variable bindings directly to similar Scheme variable bindings. For the most part, I got variable hoisting (i.e., the scope of var-declarations being hoisted to the top of their nearest enclosing function body) right. But the real subtlety comes in with the with (dynamically add a computed object value to the runtime environment as a new environment frame) and eval constructs.

I almost had with right: as soon as I detect that the programmer is using with, I enter a "stupid" mode, where I synthesize an explicit runtime environment and populate it with whatever's currently in the static environment. Within the body of the with, variable lookup is performed dynamically in this runtime environment. So outside of a with, you get normal, efficient lexical scope; inside a with, you get stupid, slow, dynamic scope. My only mistake there was forgetting to make the runtime environment entries aliases to the existing variables so that mutations persist after the with-block returns.

But eval is hard: Scheme's eval doesn't have access to the lexical environment where it is called, whereas JavaScript's does. And JavaScript's eval also inherits the ambient "variable object" (ECMA-262 10.1.3), which means that eval code can create variable bindings in the caller's scope. Getting all of this right means simulating the variable object similarly to the way I simulate the runtime environment, and switching into "stupid" mode whenever I detect a potential application of eval. (The standard allows you to refuse to run eval in situations other than a "direct" call to a variable reference spelled e-v-a-l, so you don't have to treat every single function call as a potential call to eval. It's messy, but it's better than forcing you to implement a dynamic environment everywhere.)

Wednesday, October 15, 2008

Unordered multi-arity binders

Cool: GHC knows that "∀" doesn't care what order you bind its type variables:

{-# LANGUAGE RankNTypes, TypeOperators #-}

f1 :: ∀ a b . a → b → b
f1 x y = y

f2 :: ∀ b a . a → b → b
f2 x y = y

h :: (∀ a b . a → b → b) → c → d → d
h f x y = f x y

a = h f1
b = h f2

Think about what this means for implementing a type-checker: to match one ∀-type against another, you have to find the right permutation of bound type variables. (A naive approach requires searching through all n! permutations!) Presumably the actual implementation incrementally unifies the type variables as it recurs into the bodies of the ∀-types.

Does anyone know of a paper that describes this aspect of implementing higher-rank polymorphism?

Tuesday, October 14, 2008

Today's JavaScript kookiness

One of the weirdest aspects of JavaScript's eval is that it not only has complete access to the lexical scope of its caller, but it even gets to add variable declarations to its caller's lexical frame. So I can write

(function() {
  eval("var x = 10")
  return x
})()

and get 10. Now, bearing in mind that var declares a binding for the entire containing function, eval combines in a truly spectacular way with lexical bindings. What does the following test function produce?

function test() {
  var x = 'outer'
  return (function() {
              {
                  let x = 'inner'
                  eval("var x = 'eval'")
              }
              return x
          })()
}

Clearly the let-binding of x seems to be in scope for the eval code. So we might expect the function to return the string 'outer', with the assumption that the eval code simply modified the inner binding. Then again, the eval code is creating a var-binding, so we might expect the function to return the string 'eval', with the assumption that the eval code is declaring a new variable x to be local to the inner function. So what does it really return? undefined.

What?

If I've got this right, what's happening is that the eval code is creating a var-binding for x in its containing function, but its initializer is mutating the current x in scope, which happens to be the let-bound x. So the eval code is actually creating a variable binding two lexical frames out. The evaled var-declaration and its initializer are referring to two completely different bindings of x.

Lest you think this is strictly an unfortunate interaction between the legacy var and eval semantics and the new (post-ES3) let declaration form, there are a couple of other local declaration forms in pure ES3, including catch. Consider:

function test() {
  var x = 'outer'
  return (function() {
              try { throw 'inner' }
              catch(x) { eval("var x = 'eval'") }
              return x
          })()
}

Monday, October 06, 2008

That's not supposed to happen...

When smoke comes out of your laptop, that's...bad, right?

Update: IDE->USB adapters are wonderful things.

Friday, October 03, 2008

Clojure metadata

I watched the video of Rich Hickey's talk about Clojure at the Boston Lisp group this week and was very impressed by his design. One of the features I've been mulling is Clojure's notion of metadata: every value can be associated with extra associative data. That data does not affect the value's behavior under the standard equality predicate, so in some sense it can be seen as "invisible." This is kind of like JavaScript objects' arbitrarily extensible property tables, except for the very important difference that it's immutable. So you can create a value that's more or less equivalent to another (it is distinguishable by identical?, but use of that predicate is discouraged) but with some extra information on the side.

The first use I immediately thought of for this is when you're writing a compiler or interpreter that has AST nodes and you want to save source location information. This information shouldn't affect AST equality. Clojure metadata sounds like it could be a good way to separate that concern.

I think it's confusing, at least for me, that Rich describes metadata as not being "part of the value"--where I come from, a value is the result of evaluating an expression. When you evaluate (with-meta v m) you do get a new value that is associated with metadata m. What Rich is saying is that the metadata is not contained within v. At first when he said it wasn't part of the value, I thought "that sounds brittle." But if I've understood it, a more accurate description might be that values in Clojure may carry with them an extra associative map that doesn't affect their printed representation or their equality.

Thursday, October 02, 2008

Quote of the day

"As goofy as Java is, the JVM is stunning technology."
-- Rich Hickey

Web inching towards actors?

I learned about the new HTML 5 postMessage API recently, and it struck me that we are ever so slowly inching towards actors on the web: now all windows in a browser (including separate tabs and iframes) are communicating sequential processes.

Now, if Alan Kay had had his way, every node in the DOM would be separately addressable (i.e., have its own URI) and would probably itself be an actor.

The Little Calculist