The Little Calculist

Sunday, December 21, 2008

I want my Unicode keyboard

"You see the typographical limitations we're still saddled with? The mock-wounded :( and the actually-wounded :( aren't slated to have distinct code points until Unicode 17."
-- Jon Zeppieri

Sign o' the times

"Who would have thought a discussion about lambda syntax in JavaScript
would go over 120 posts while a simultaneous thread about class syntax
has had little attention outside a handful of posters?

Would this have been reverse 10 years ago?

Sign of the paradigm shift? Maybe folks want an immutable cons cells too?"
-- Peter Michaux

Friday, December 19, 2008

Whenever I write a function with an external wrapper and an internal recursive function (say, to add an accumulator), I can never come up with a name for the internal function that I find satisfying. This is totally picking nits, but it bugs me for all the recursive calls to be to foo/aux or foo/helper or something like that. Come to think of it, somehow the prime symbol in Haskell and ML feels more natural.

Saturday, December 13, 2008

Hats off to Duff

From an LtU thread:

When I see the phrase "Wiki type systems", I immediately wonder what a typed Wiki could possibly be. (Monadic pages? Contribution combinators? Co-recursive spam?) Of course you mean "Wiki-type systems", systems that are Wiki-like. But I was excited for a minute.
-- Tom Duff (yes, the Tom Duff)

Thursday, December 11, 2008

Author's summary

I spent a few days visiting my alma mater last week and among the many engaging conversations I had were several with my dear friend Gregg Whitworth about incorporating research into a liberal arts undergraduate education. It's very tricky, but Gregg pointed out several aspects of the field of biology that help. Gregg mentioned that many biology papers include authors' summaries. This just sounds like an all-around great idea, and one that's easily implemented: simply add a brief summary to your publications list on your web site.

In some sense an abstract may be useful, but an author's summary could also be used to put work into broader context. Since it's not subject to the restrictions of the paper itself, it also gives authors more freedom to write whatever they like. And the benefit to students could be enormous: one of the hardest parts of breaking into a research field is understanding the larger conversation each paper participates in.

Thursday, October 30, 2008

Maybe vs. exception vs. failure thunk in Scheme

It's pretty easy to switch back and forth between representing a function with a "maybe" result type as:

returning X or #f;
returning X or raising an exception; or
returning X and taking a failure thunk

In Scheme, sometimes X might include the value #f. In that case I think representation #3 is the best one, since raising an exception is clunky. You can also combine #2 and #3 and take an optional thunk that by default raises an exception.

This is one case where true unions do cause some problems, so maybe I should have tempered my earlier enthusiasm a little. Still, it's not insurmountable, and this cost needs to be weighed against the price of strictly disjoint unions.

Sweet

A Super Mario Bros. clone, implemented in Haskell.

Friday, October 24, 2008

Fun programming exercise

Off and on I work on my implementation of JavaScript in PLT Scheme. Implementing the standard library means translating the spec's low-level, imperative pseudocode (think GW-BASIC) to Scheme. It's a fun and mostly mindless exercise to take code with assignments and goto, translate that into mutually tail-calling basic blocks, eliminate the assignments by parameterization, and then finally polish it into a more idiomatic style. Sort of my version of a Sudoku puzzle or something.

Once you've already implemented it...

A common phenomenon in API design is to expose something simply because you've already put in all the effort required to implement it. For example, if your language runtime implements unbounded recursion, you've already gone a long way towards representing continuations as data. In that case, it's not much more work to offer first-class continuations as a data structure in the language. Similarly, languages often optimize their startup time by dumping the heap into a persistent data structure so they can cache the initialization of the standard library; once they've done this, heap-dumping is an easy feature to offer to users.

On a semantic level, first-class continuations and first-class stores are obviously similar kinds of constructs. But I'd never noticed till now how they can both just sort of crop up as natural consequences of incidental needs in language implementations.

Monday, October 20, 2008

Updated javascript.plt

I've just updated my implementation of JavaScript in PLT Scheme. The latest version fixes some of the subtle ways I'd deviated from the spec with respect to variable binding. Back in the day, I was wooed into thinking that JavaScript was so similar to Scheme that I could pretty easily just compile JavaScript variable bindings directly to similar Scheme variable bindings. For the most part, I got variable hoisting (i.e., the scope of var-declarations being hoisted to the top of their nearest enclosing function body) right. But the real subtlety comes in with the with (dynamically add a computed object value to the runtime environment as a new environment frame) and eval constructs.

I almost had with right: as soon as I detect that the programmer is using with, I enter a "stupid" mode, where I synthesize an explicit runtime environment and populate it with whatever's currently in the static environment. Within the body of the with, variable lookup is performed dynamically in this runtime environment. So outside of a with, you get normal, efficient lexical scope; inside a with, you get stupid, slow, dynamic scope. My only mistake there was forgetting to make the runtime environment entries aliases to the existing variables so that mutations persist after the with-block returns.

But eval is hard: Scheme's eval doesn't have access to the lexical environment where it is called, whereas JavaScript's does. And JavaScript's eval also inherits the ambient "variable object" (ECMA-262 10.1.3), which means that eval code can create variable bindings in the caller's scope. Getting all of this right means simulating the variable object similarly to the way I simulate the runtime environment, and switching into "stupid" mode whenever I detect a potential application of eval. (The standard allows you to refuse to run eval in situations other than a "direct" call to a variable reference spelled e-v-a-l, so you don't have to treat every single function call as a potential call to eval. It's messy, but it's better than forcing you to implement a dynamic environment everywhere.)

Wednesday, October 15, 2008

Unordered multi-arity binders

Cool: GHC knows that "∀" doesn't care what order you bind its type variables:

{-# LANGUAGE RankNTypes, TypeOperators #-}

f1 :: ∀ a b . a → b → b
f1 x y = y

f2 :: ∀ b a . a → b → b
f2 x y = y

h :: (∀ a b . a → b → b) → c → d → d
h f x y = f x y

a = h f1
b = h f2

Think about what this means for implementing a type-checker: to match one ∀-type against another, you have to find the right permutation of bound type variables. (A naive approach requires searching through all n! permutations!) Presumably the actual implementation incrementally unifies the type variables as it recurs into the bodies of the ∀-types.

Does anyone know of a paper that describes this aspect of implementing higher-rank polymorphism?

Tuesday, October 14, 2008

Today's JavaScript kookiness

One of the weirdest aspects of JavaScript's eval is that it not only has complete access to the lexical scope of its caller, but it even gets to add variable declarations to its caller's lexical frame. So I can write

(function() {
  eval("var x = 10")
  return x
})()

and get 10. Now, bearing in mind that var declares a binding for the entire containing function, eval combines in a truly spectacular way with lexical bindings. What does the following test function produce?

function test() {
  var x = 'outer'
  return (function() {
              {
                  let x = 'inner'
                  eval("var x = 'eval'")
              }
              return x
          })()
}

Clearly the let-binding of x seems to be in scope for the eval code. So we might expect the function to return the string 'outer', with the assumption that the eval code simply modified the inner binding. Then again, the eval code is creating a var-binding, so we might expect the function to return the string 'eval', with the assumption that the eval code is declaring a new variable x to be local to the inner function. So what does it really return? undefined.

What?

If I've got this right, what's happening is that the eval code is creating a var-binding for x in its containing function, but its initializer is mutating the current x in scope, which happens to be the let-bound x. So the eval code is actually creating a variable binding two lexical frames out. The evaled var-declaration and its initializer are referring to two completely different bindings of x.

Lest you think this is strictly an unfortunate interaction between the legacy var and eval semantics and the new (post-ES3) let declaration form, there are a couple of other local declaration forms in pure ES3, including catch. Consider:

function test() {
  var x = 'outer'
  return (function() {
              try { throw 'inner' }
              catch(x) { eval("var x = 'eval'") }
              return x
          })()
}

Monday, October 06, 2008

That's not supposed to happen...

When smoke comes out of your laptop, that's...bad, right?

Update: IDE->USB adapters are wonderful things.

Friday, October 03, 2008

Clojure metadata

I watched the video of Rich Hickey's talk about Clojure at the Boston Lisp group this week and was very impressed by his design. One of the features I've been mulling is Clojure's notion of metadata: every value can be associated with extra associative data. That data does not affect the value's behavior under the standard equality predicate, so in some sense it can be seen as "invisible." This is kind of like JavaScript objects' arbitrarily extensible property tables, except for the very important difference that it's immutable. So you can create a value that's more or less equivalent to another (it is distinguishable by identical?, but use of that predicate is discouraged) but with some extra information on the side.

The first use I immediately thought of for this is when you're writing a compiler or interpreter that has AST nodes and you want to save source location information. This information shouldn't affect AST equality. Clojure metadata sounds like it could be a good way to separate that concern.

I think it's confusing, at least for me, that Rich describes metadata as not being "part of the value"--where I come from, a value is the result of evaluating an expression. When you evaluate (with-meta v m) you do get a new value that is associated with metadata m. What Rich is saying is that the metadata is not contained within v. At first when he said it wasn't part of the value, I thought "that sounds brittle." But if I've understood it, a more accurate description might be that values in Clojure may carry with them an extra associative map that doesn't affect their printed representation or their equality.

Thursday, October 02, 2008

Quote of the day

"As goofy as Java is, the JVM is stunning technology."
-- Rich Hickey

Web inching towards actors?

I learned about the new HTML 5 postMessage API recently, and it struck me that we are ever so slowly inching towards actors on the web: now all windows in a browser (including separate tabs and iframes) are communicating sequential processes.

Now, if Alan Kay had had his way, every node in the DOM would be separately addressable (i.e., have its own URI) and would probably itself be an actor.

Monday, September 29, 2008

Representing the cost of control

Every once in a while I play around with this idea for the design of a web language (server-side).

The nice thing about using first-class continuations to implement a web library is that you can write your code in direct style, but a big down-side is that you still have to figure out how to manage the storage of those continuations. But the alternative of essentially representing the control with program data structures, i.e. with explicit representations of the state of the computation, is really painful.

So I've thought of taking a page from Haskell's playbook: what if you separated a web language into two portions? One language would be for writing the "driver" of the program, where you can call send/suspend and computations may be suspended and restarted an arbitrary number of times. The other language would be "pure" (well, let's say full Scheme, but no send/suspend). In other words, replace Haskell's IO monad with the Web monad. As with Haskell, code in the Web monad can call pure code, but not vice versa.

The benefit of this would be that you could still write in direct style, but you've made explicit the portions of the program where control will cost you. Then you could play with different ways of managing that cost--expiration of continuations, back-off for expiration, etc.--by playing with the design of the Web monad, which wouldn't affect the pure language.

JavaScript completion value: pros and cons

In JavaScript the completion value is the result value of a computation, which may result even from evaluating a statement. Simple statements like

{
  2 + 2;
}

have result values (in this case 4), but so do complex statements like

for (var i = 0; i < n; i++)
  f(i);

Notice that if n is less than [edit: or equal to] 0 in this example, there won't be any computation at all, in which case the statement can't really produce any reasonable value.

I've just figured out how to articulate an intuition I've long felt about the completion value. The great thing is that it mitigates some of the awfulness of statements: even imperative code can still produce values! This means you can have some of the syntactic convenience of statements for writing effectful code, while still producing results. But the thing that always made me uncomfortable was how code like the for-loop above screws with the idea of a tail call. Because the for-loop may or may not produce a value, there's a sort of implicit return happening after the loop.

There might be tricks you could pull using something like continuation marks to allow tail position inside of loops, but I think a simpler answer in a JavaScript with proper tail calls would be to say that loops do not preserve tail position. So essentially tail position would include the last statement in a block, the arms of an if statement, and the arms of a conditional expression, and that's pretty much it. Inside a try-block, a loop, or a switch (because of the reliance on break) there simply wouldn't be any tail positions.

Note that a) JavaScript doesn't have tail calls, and b) the only places where you can even observe the completion value of a statement are at a REPL or with eval. But I think JavaScript could have proper tail calls (potentially), and I've always wondered whether the completion value should remain a hack isolated to eval or actually see its way into the design of other features for JavaScript.

Friday, September 26, 2008

Most productive conference trip ever

I can confidently say I've never accomplished so much actual work at a conference before. I actually solved a research problem during downtime while at ICFP this week. As a fledgling PhD student I used to take advantage of my boundless free time to think about problems for hours on end, often working into the night. As time goes by I have less and less uninterrupted free time, so I guess that's making me more efficient at using my spare cycles in very short bursts. I guess Butler Lampson would classify that under speculative execution.

Monday, September 22, 2008

Debugger as library

The great thing about printf-debugging is that it's got about as low a barrier to entry as possible (unless you're using Haskell--don't get me started). You have implicit access to the program's control flow and I/O and the function is essentially always there. When you want to use a debugger, you usually have to search for a good one, download and install it, learn how it works, figure out how to set breakpoints, learn the language of breakpoint conditions, figure out how to inspect locals and evaluate expressions, etc. etc.

What if debugging was just available as a library? What power would you need? I think PLT Scheme is uniquely suited to making this extremely practical. It's easy to implement a programmatic breakpoint: just call/cc and raise an exception with the captured continuation. You immediately have access to a nice language of boolean predicates for determining breakpoint conditions (Scheme), as well as a language to evaluate expressions with access to the locals (also Scheme). But PLT Scheme also provides continuation marks, which allow you to leave little bread-crumbs through the control flow of a program. What I'd really love to do is use this to allow the programmer to annotate interesting points in the control and connect this up to the DrScheme GUI libraries so that the programmer can do something like:

> (define bp (debug <expression>))
> bp
#<breakpoint>
> (show-control bp)

and have a window pop open that displays a stack trace of the suspended program.

I'm working on a PLaneT library to make this just about as low-tech and seamless as printf-debugging. You'll just need to throw in one extra line somewhere in the module:

(require (planet dherman/debug))

and this will allow you to throw in breakpoints and debug programs much like you could with a full debugging IDE, only with the directness and intra-linguistic style of printf.