The Little Calculist

Monday, April 28, 2008

Literate code

I used to find the notion of literate code attractive, but right now my dissertation prototype implementation is undergoing some serious redevelopment and the horribly out-of-date docs are dragging me down. Now I find myself wishing I'd kept them separate.

Tuesday, April 22, 2008

How to spell StopIteration

Some of the pseudo-constants in JavaScript are actually mutable global variables: undefined is a notorious example. Luckily, there's generally some reliable way to generate such a value without having to hope that no one has reassigned a global variable.

For undefined, it's the void operator, which, given any argument, produces the value that undefined is initially bound to. Conventionally, people write void(0).

In JavaScript 1.7 and later, there's a special constant called StopIteration that's thrown after an iterator runs out of values. This too is a mutable global. But this value is a little trickier to get at reliably. Here's the simplest expression I could come up with that produces the StopIteration value:

(function(){try{(function(){true||(yield)})().next()}catch(e){return e}})()

The inner function is a generator by virtue of textually containing the yield keyword, but it returns immediately, yielding no values, because of the short-circuited logical or. (Note that because of yield's finicky grammar, it has to be parenthesized.) So by calling next() on it, it immediately raises StopIteration, which the outer function catches and returns.

Sunday, April 20, 2008

Compilation, obfuscation, encryption

It's interesting that we tend to use compilation as a proxy for encryption. It seems to me it would help clarify the system design of a language semantics and architecture if you separate the concerns of code ownership with performance. Compilation is primarily used to cache the transformation of one semantics into another--a partial evaluation of an interpreter. Using this as a weak encryption mechanism is pretty silly. If you want to encrypt program source, why not use real encryption? That way you can use whatever representation of the code you want; source, byte-code, microcode, whatever.

Friday, April 11, 2008

A poem for the weekend

Listen to Mustn'ts, child, listen to the Don'ts.
Listen to the Shouldn'ts, the Impossibles, the Won'ts.
Listen to the Never Haves, then listen close to me.
Anything can happen, child, Anything can be.
--Shel Silverstein

Tuesday, April 08, 2008

Different perspectives on parameterized types

To the PL theorist, parameterized types are a necessity for the expressiveness of a statically typed language. Imagine Hindley-Milner without them: you'd have to duplicate definitions all over the place. But in practice, statically typed languages have loopholes--long before generics, Java always had the type Object so that you could essentially "turn off" type-checking when you couldn't type things precisely. This meant that you could always write generic operations and generic collections, but without type-checking.

So in practice, people don't look at parameterized types as increasing the expressiveness of Java at all; it just looks like a way to increase the amount of type-checking in the language, to make it stricter. And they're right.

So you have two correct perspectives on parameterized types: roughly, that they make the language less restrictive and more restrictive. It's not actually a contradiction, but it's enough to have caused me confusion in some conversations.

Sunday, April 06, 2008

Keep your /tmp clean

For a couple years my laptop's cygwin installation has been dramatically, inexplicably slow to start up the first time after rebooting. I mean, like, 5 to 10 minutes to start up. (You can imagine how much more this makes me love Windows Update for forcing a reboot every time they patch the latest buffer overrun that authorizes Solitaire to initiate a nuclear first strike against North Dakota.)

Well, today I found the culprit. A program I haven't used in quite a long time had left thousands of orphaned temp files in /tmp. Don't ask my why, but this was enough to obliterate the performance of cygwin across the board, and especially on startup. Maybe /tmp is memory-mapped? No idea. Anyway, I wish I'd known to keep my /tmp clean. Now you know.

ESOP in a heartbeat

I presented my latest paper, A Theory of Hygienic Macros, co-authored with Mitch, at ESOP in Budapest this past week. Because I'm a genius, I managed to let my passport expire. Conveniently, there's an expiration notification service at every airport known as "not letting you on the plane."

It's a long story, but it involves emergency passport renewal, an extra weekend in San Francisco, and very merciful conference organizers and attendees.

All of this means I got to spend a total of about 48 hours in lovely Budapest--about long enough to see the Danube and a few of its glorious monuments, experience cheap beer, try delicious Tokaji, present a paper, chat with a few friends, and get right back on a plane again.

What a shame I couldn't have stayed longer. But the talk went great, and I'm still very glad I got to go.

For the record, the San Francisco passport office is staffed exclusively with miracle workers.

Case exhaustiveness for interactivity

In functional programming we're very good at exhaustive case analysis of our data definitions. But I've found I'm pretty bad at reasoning about cases in interactive programs. (Embarrassingly, I recently lost a week's worth of data in an Ajax application I wrote due to a bad response to an unpredicted server failure.)

What are all the possible actions the user could perform? What are all the failure cases? What are all the possible interleavings of interactions? These are harder questions than enumerating variants of a disjoint union.

I think this is where dynamic safety (e.g., runtime tag checking in dynamically typed languages or case-dispatch catchalls in statically typed languages) is so critical: if you don't have a sound model for enumerating the space of possible behaviors, you have to make sure that there's some reasonable recovery programmed into the system for when you encounter an event you didn't predict. This reminds me of recovery-oriented computing and Erlang's "let it fail" philosophy.

Wednesday, April 02, 2008

Monkey-patching evolves

Do you like the ability to dynamically mutate the definition of your program, but keep getting thwarted by sophisticated static analyses such as grep? Your prayers have been answered: ninja-patching is here.

Sunday, March 23, 2008

Makefile for HUnit tests

Here's a nice idiom for running HUnit tests from a Makefile. Jesse Tov gave me the idea that you can use make's dependencies to test only modules that have changed since the last time you ran the tests. Pick a standard name for each module's tests--tests, say--and make sure it's defined in the first column (since we'll be searching for it with a simple grep). Every time we test a module, we'll touch a dummy file of the same name in some hidden directory--call it .test. So we have two lists of files, the modules of the application and the test dummy files:

SOURCES := $(wildcard *.hs) $(wildcard *.lhs)
TESTS := $(foreach src,$(SOURCES),.test/$(src))

To run the tests, first we collect the list of modified modules and save the names of their tests (Foo.tests, Bar.tests, etc.) in a temporary file .test/args.

.test/%hs: %hs
   @if grep '^tests ' $< > /dev/null 2>&1 ; then \
       touch $@ ; \
       echo $* | sed -e 's/\..*/.tests/' >> .test/args ; \
   fi

Now every time we run the tests, we first make a fresh .test/args file, and then we run GHC with a command-line option to evaluate an expression that runs those tests:

test: teststart $(TESTS)
   @echo ghc Test -e "\"Test.run [ `xargs -a .test/args | tr ' '  ','` ]\""
   @ghc Test -e "Test.run [ `xargs -a .test/args | tr ' ' ','` ]"

teststart:
   @mkdir -p .test
   @$(RM) -f .test/args
   @touch .test/args

It's also useful to have a target that always runs all the tests:

retest: testclean test

testclean:
   $(RM) -rf .test

This assumes the existence of a simple test harness module:

module Test (run) where
import Test.HUnit
run :: [Test] -> IO ()
run ts = (runTestTT $ TestList ts) >> (return ())

Some tools just don't like each other

brit·tle, adj.

Multi-line string literals in Haskell require so-called "string gaps": one '\' character to terminate a line and another '\' to start the next line.
With GHC, Haskell programs may be preprocessed with CPP, which coincidentally strips the "\ ... \" characters from the source, resulting in an illegal Haskell string literal.
Mercifully, it also happens that CPP doesn't strip the characters if the first '\' character is followed by a space before the newline.
But of course, a commonly used feature of emacs is to silently strip trailing whitespace at the end of lines on every save.
Not that you can see the difference, given the well-known human limitations at visually distinguishing whitespace.

(Sensitive types should take this neither as a criticism of GHC nor as implicit condonement of using CPP for Haskell programs--I know it's shunned. You've gotta admit this is beautiful, though.)

Friday, March 21, 2008

When to use point-free style

I've often struggled with the question of when/whether to use point-free style. It can be dangerously addictive, especially in languages with syntactic support for it like Haskell. But it's notorious for creating dense and impenetrable code. Yesterday I linked to advice from the GHC community on when not to create needless abstractions, advice which could be applied when considering a point-free abstraction (man, does it ever take self-restraint not to go wild with the puns here).

The reason why pointful style can be so helpful is it allows us to think about the definition of a computation in terms of particular examples. For any number, let's call it x, ... This is also why set comprehensions in math, and consequently list comprehensions in programming languages, are so approachable: they describe a set by appeal to representative examples.

I think the key to successfully using point-free style is when you want to treat a function itself as a single data point. For example, if you're sorting a list with some comparison function, you don't want to have to drop down a level of abstraction to think about the individual pairs of elements being compared; you just want to think about the comparison function (like "numeric" or "reverse alphabetical") as the object of your attention.

Wednesday, March 19, 2008

Well said

From the GHC wiki:

It's much better to write code that is transparent than to write code that is short.

Often it's better to write out the code longhand than to reuse a generic abstraction (not always, of course). Sometimes it's better to duplicate some similar code than to try to construct an elaborate generalisation with only two instances. Remember: other people have to be able to quickly understand what you've done, and overuse of abstractions just serves to obscure the really tricky stuff.

Friday, March 14, 2008

Not fade away

From the old-languages-never-die dept.:

Obsolete programming languages such as FORTRAN and COBOL are still widely used.

-- J.J. Duby, 1971

Monday, March 10, 2008

Another cute Haskell function

join :: Eq a => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
join a1 a2 = [ (a,(b,c)) | (a,b) <- a1, c <- [ c | (a',c) <- a2, a == a' ] ]

infixr 5 ⋈
(⋈) :: Eq a => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
(⋈) = join

-- for example:
[("a",1),("b",2),("c",3)] ⋈ [("a",'a'),("b",'b'),("c",'c')]

Friday, March 07, 2008

Crazy debugging feature idea

Here's a crazy idea. Closures are like objects with private fields, right? Specifically, outside of the function body, you're not allowed to refer to the variables in the closed environment. But one of the most annoying things that happen when I'm debugging is that I want to call some local function, but it's not available from the REPL:

(define (foo x y)
  (define (helper z)
    ... x ... y ...)
  ...)

What if, at the REPL (i.e., in debug mode only), closures were openable? You could use dot-notation to refer to them, and all you'd have to do is somehow provide the extra bindings needed to fill in their environment. Keyword arguments, maybe?

> ((close foo.helper [#:x 42] [#:y 'blah]) "value of z")
'result-value-ftw!

Wednesday, March 05, 2008

Xor

While I'm at it, let's continue the series of logical Maybe operators with xor:

infixr 2 ^^^

(^^^) :: Maybe a -> Maybe a -> Maybe a
Nothing ^^^ r@(Just _) = r
l@(Just _) ^^^ Nothing = l
_ ^^^ _ = Nothing

My current favorite little Haskell function

Here's my pet Haskell combinator du jour:

infixr 2 |||

(|||) :: Maybe a -> Maybe a -> Maybe a
Nothing ||| x = x
x ||| _ = x

This gives you a concise notation for branching in the Maybe monad, so instead of having to write:

case x `lookup` env1 of
 Just v -> Just $ f1 v
 Nothing -> case x `lookup` env2 of
              Just v -> Just $ f2 v
              Nothing -> Nothing

you can write:

do { v <- x `lookup` env1; return $ f1 v } |||
do { v <- x `lookup` env2; return $ f2 v }

This gives you the same kind of idiom that Schemers use or for. Come to think of it, I guess you could also write the analog of and:

infixr 3 &&&

(&&&) :: Maybe a -> Maybe a -> Maybe a
Just _ &&& r@(Just _) = r
_ &&& _ = Nothing

I haven't used this one yet, but it seems natural.

Tuesday, March 04, 2008

Intentional capture

Procedural hygienic macro systems like the syntax-case system make it possible to write capturing macros--macros which, depending on your philosophy, you might call "non-hygienic." The classic example is the "anaphoric" conditional form if-it, which implicitly binds a variable it to the result of the test expression:

(if-it 42 (+ it 1) #f) ; => 43

The difficulty in getting such a macro right comes when you try to write another macro that expands into if-it. To quote the mzscheme manual's section on macros, "macros that expand into non-hygienic macros rarely work as intended."

Andre van Tonder's SRFI 72 document contains a perfect and concise example, due to Kent Dybvig, of two different ways a macro might expand into a capturing macro. On the one hand, we might want to write when-it, a simple "one-armed" conditional that implicitly binds it in the same way as if-it:

(when-it 42 (+ it 1)) ; => 43

On the other hand, we might want to use if-it to implement the hygienic or macro, which shouldn't capture any variables.

(let ([it 10]) (or #f it)) ; => 10

First, here's the implementation of if-it: we create an identifier for it with the same lexical context as the operator of the expression:

(define-syntax (if-it stx)
  (syntax-case stx ()
    [(op e1 e2 e3)
     (with-syntax ([it (datum->syntax #'op 'it)])
       #'(let ([it e1])
           (if it e2 e3)))]))

The references that will be captured by the introduced binding of it are the ones that were introduced into the program in the same expansion step as the occurrence of if-it in the macro call. In particular, if the occurrence of if-it was in the original program (i.e., written explicitly by the programmer), it captures references to it that were in the original program; if the occurrence of if-it is the result of a macro expansion, it captures only those references to it that were generated in that same expansion step.

This means that a hygienic macro that expands into if-it will work as expected:

(define-syntax or
  (syntax-rules ()
    [(op e1 e2)
     (if-it e1 it e2)]))

Since the reference to it appears in the same expansion step as the occurrence of if-it, that reference is captured, but no references to it within subexpressions e1 or e2 (which had to have already been there before this expansion step) are captured.

If you want to write another capturing macro that expands into if-it, it's a little more work. Essentially, you have to capture it all over again. The moral of the story is that you always have to ask explicitly for a macro to capture an introduced identifier.

(define-syntax (when-it stx)
  (syntax-case stx ()
    [(op e1 e2)
     (with-syntax ([it* (datum->syntax #'op 'it)])
       #'(if-it e1 (let ([it* it]) e2) (void)))]))

Here we once again create an identifier with the same lexical context as the operator, and we bind it to the occurrence of it introduced by if-it.

These are good defaults for a hygienic macro system: it's easier to write hygienic macros but still possible (albeit a little harder) to write macros that capture. This is even true when you abstract over capturing macros: macros that expand into capturing macros are hygienic by default, but with a little more work again, you can create capturing macros that abstract over other capturing macros.

Friday, February 22, 2008

True unions

I am a Super Big Fan of disjoint union datatypes in programming languages, but there are places where they are really inconvenient. Having to inject and project data between related types can be prohibitively cumbersome, especially when dealing with large data definitions such as the AST definitions for realistic programming languages. I know of a couple of languages where "true" unions are used instead, including Typed Scheme and the draft ECMAScript Edition 4. In both cases, unions are being added to a language where the runtime is already paying the price for runtime type tags, so keeping the variants distinct where they don't overlap doesn't introduce any extra burden.

But I was thinking this morning, what would happen if you tried to add true unions to Hindley-Milner languages? For concreteness, let's imagine extending Haskell. So instead of writing

data Foo = I Int
         | S String

you could instead write

type Foo = Int | String

Now if you want to pattern match on such a construct, you have to explicitly mention the type; there's no other way to distinguish the variants. So you could imagine type-annotating the patterns in a match expression:

showFoo :: Foo -> String
showFoo foo = case foo of
                n :: Int -> show n
                s :: String -> s

Note that because there's nothing prevent overlap in the variants of a true union, the order in which you match the patterns is significant. Alternatively, you could write the definition of showFoo in pattern-definition style:

showFoo :: Foo -> String
showFoo (n :: Int) = show n
showFoo (s :: String) = s

Consider the possible drawbacks to such a feature:

Cost of tagging:

One of the benefits of (sound) static typing is the ability to compile to efficient runtimes that avoid tagging runtime objects with their datatype. The Foo type, by contrast would require its Ints and Strings to be tagged. But this is the wrong way of looking at it; the language requires you to tag them anyway if you use the disjoint union type, so there's no additional cost over what you would already have been paying. For non-union types, you can still do the same amount of erasure.

Possible overlap:

You can express overlapping types like (Int | (Int | String)) with true unions, which makes the order of patterns significant, could result in surprising (order-dependent) logical relationships between case dispatches, and could generally lead to messy type definitions. Maybe a more principled way of looking at it is a disjoint union can be thought of as a type abstraction, whereas with a true union you might have to know its full definition to use it effectively. But hey, the same is true of type-definitions but they're still useful; and besides, nothing's preventing you from using disjoint unions when they're more appropriate.

Case exhaustiveness:

Standard ML's pattern matching facility is carefully designed to allow the compiler to prove or disprove case exhaustiveness. I tend to think this is a bad trade-off; the language is significantly crippled to enable a very weak theorem prover to prove a theorem that's of limited utility. Sure, it has its uses, but when you know things the theorem prover doesn't, you have to abandon pattern matching entirely. Other language designers seem to agree with me, since Haskell and I think Ocaml also don't check for case exhaustiveness.

Inference problems:

I have no expertise in Hindley-Milner inference, but I'm sure true unions are not inferrable. But one thing recent Haskell research has demonstrated is that convenient but uninferrable extensions to Hindley-Milner type systems can often be safely added with the requirement that they be explicitly annotated. I bet you could add true unions, require explicit type annotations when they're used, and get along just fine.

Function types and polymorphic types:

This is where it gets tougher. How do you tag these kinds of types? Do you do runtime subtyping checks for things like the more-polymorphic-than relation? I don't have an answer. A simple one is to restrict unions to types with simple tags, although those kinds of arbitrary restrictions lead to lack of compositionality in language design. But perhaps it's possible to make this work with the entire type system.

Some of this design space has been explored with Ocaml's polymorphic variants. I should take a look at the literature on that.