Wednesday, April 27, 2005

If I were teaching macros

Yeah, if I were teaching macros I'd start without pattern matching. I would start with a programmed macro system like the one described by Dybvig et al, just without mentioning syntax-case. That way you'd have properly lexically scoped macros. And I wouldn't tell the students about datum->syntax-object so that they couldn't break hygiene. Here are some chapters in this hypothetical tutorial:

Lesson 1: Metaprogramming: programs that write programs.
Learn how to create syntax objects to create quoted code, by analogy to quote; talk about the phase distinction; write some simple macros that generate programs; make (over-)simplified analogy to cutting and pasting code.
Lesson 2: Macros are embedded metaprograms.
Demonstrate that macros can refer to identifiers that are in scope. Show that there's no danger of such identifiers getting captured when macros are referentially transparent; refine the idea of cutting and pasting code to being more than just textual replacement -- a variable reference identifies a binding and the generated code reproduces that binding, not necessarily the name.
Lesson 3: Macros can introduce identifiers.
Demonstrate how macros can introduce brand new identifiers; show that in hygienic macro systems those identifiers can never refer to existing bindings in any program, only to uses of the same identifier within a macro's templates; show how identifiers given as parameters can generate binding occurrences, bound occurrences, and quoted occurrences of the identifier.
If I could get through all of that without pattern matching, that'd be cool, but somewhere along the line the code might just get too hairy without it. (Then again, Lisp defmacro hackers live without it, don't they?) Since pattern matching is orthogonal to the above lessons, I would love to put it off until after mastering metaprogramming, the phase separation, and issues of scope.

Some might claim you can write macros in a hygienic macro system without understanding properly lexically scoped macros, since they're just taken care of for you. That's pretty much doomed to failure, because as a macro writer you need to understand the distinction between names and identifiers in order to know when you can and can't bind a particular identifier.

Friday, April 22, 2005

When to reserve keywords

As I've pointed out before, when a macro uses literal keywords, the macro writer should reserve the keywords by binding them to macros that simply raise syntax errors if the keywords are used outside the context of invocations of the macro.

There are in fact two different ways that keywords get used in macros. The first we might describe as true syntactic sugar: in compound-unit/sig, there are always exactly three clauses, and their tags, import, link, and export, serve only to remind the reader of their purpose. Similarly, each (tag : signature) clause in the import clause only contains the : for readability. In this case, rebinding the keyword identifiers will just cause invocations of the macro to fail with a syntax error, because the use of the keyword at the invocation site doesn't have the same binding as the use of the keyword in the macro definition.

The second way that keywords get used, though, is as tags to distinguish different expression forms. In this case, if the keyword gets rebound, then a subexpression of the macro invocation will actually appear as a different kind of expression, which may very well succeed at matching some other case in the macro. The cond macro serves as an example:
(define-syntax cond
(syntax-rules (else)
[(_ [else a-exp]) a-exp]
[(_ [q0-exp a0-exp] [q1-exp a1-exp] ...)
(if q0-exp a0-exp (cond [q1-exp a1-exp] ...))]))
This macro essentially declares the "question" subexpression of each clause to have a kind of union type: a question is either an expression or the keyword else. If, at the macro invocation site, the identifier else isn't bound the same way as it was in the macro definition, it is still an expression, so it matches the second case of the macro:
(let ([else #f])
[else 23]))
; => #<void>
It's in this second case that reserving keywords is really important. In the first case, you get an error, but you find out right away and can fix it. In the second, you may not get an error at all, or the error may not appear until much later in the program.

Ask and ye shall receive my bugs

Naturally, the macro I posted yesterday is broken. (It was only a matter of time, right?) Sadly, not only did I have to rely on Ryan to fix it for me, but I don't even quite understand the fix. In fact, he doesn't even totally understand why the fix worked. Bleah.

Let's consider a hypothetical example to understand the problem better. The user invents a new language called my-lang with keywords foo and mumble, and a variant of syntax-case called my-syntax-case. The problem has to do with the fact that the definition of the my-syntax-case macro introduces foo and mumble into its output (because invocations of my-syntax-case don't have to mention the keywords). So they get fresh marks, which distinguish them from the original foo and mumble. The solution is to cancel the fresh marks with syntax-local-introduce.

I won't reproduce the entire corrected macro, but here's the relevant portion of the second of the two cases (the simpler one, where there is no parent language):
;; define the new language-specific syntax-case variant
(define-syntax (case stx)
(syntax-case stx ()
[(_ stx-exp (extra-kw (... ...)) clause (... ...))
(with-syntax ([(k (... ...))
(map syntax-local-introduce
(syntax->list #'(kw ...)))])
(syntax-case stx-exp (k (... ...) extra-kw (... ...))
clause (... ...)))]))
Also, I should point out that it wasn't necessary to require at least one clause in the syntax-case variant, so I've corrected that, too.

Hygienic macros are generative

In normal functional programming, we can "substitute equals for equals" -- any two equivalent expressions are interchangeable, because they will always produce equivalent results. This allows for all sorts of refactoring, because you can pull out an expression from one lexical context, stick it in another context, and expect it to behave the same.

Any functional programmer who's been burned by programmed macros such as syntax-case will tell you that when they try to refactor their macros in simple ways -- in particular, separating out pieces of a complicated macro and moving them into helper functions -- the macros suddenly stop working!

This is because in a hygienic macro system, syntax transformers are not pure functions. The hygiene condition requires that, in order to avoid unwanted capture, every use of a syntax transformer must generate different binding occurrences of identifiers! This is analogous to applicativity vs. generativity in ML-style parameterized module systems: applicative functions generate the same output for the same input, whereas generative functions are guaranteed to have different results every time you invoke them.

In a sense, this reminds me of the balance between expressive power vs. local reasoning, because hygiene introduces a non-local effect that ruins the ability to understand a macro in isolation. (Happily, there are limitations on these effects; for example, the "freshness monad" is known to be commutative, meaning that it doesn't matter what order you generate fresh names in, because they're all distinct anyway.) I wonder what kind of reasoning tools can help mitigate the problems hygiene causes with understanding macros.

Thursday, April 21, 2005

You asked for it, Felix

Well, I'm not sure if you can quite call it a macro-defining-macro-defining macro, but it's still meta enough to be entertaining. Here's the rough idea. I found I wanted to use a lot of macro keywords when defining an embedded language: the various constructors of the terms of the language are keywords recognized by the macro, which get compiled down to core Scheme. It turned out my compilation would best proceed in a couple of passes, so I borrowed an idea from Sarkar et al's paper at last year's ICFP: I have a define-lang macro that can define either a stand-alone embedded language or an extension to an existing embedded language. The define-lang macro produces:
  1. a static registry of the keywords associated with the language as well as a link to its parent language
  2. a specialized version of syntax-case tailored to the new language -- essentially a wrapper around syntax-case that automatically adds the language's keywords to the literals list
So a user of define-lang can declare a language with its keywords:
(define-lang calc-lang calc-syntax-case (<+> <-> <*> </>))
And this declaration will generate a new calc-syntax-case macro, which can be used to define macros on the calc-lang language:
(define-syntax (compile-calc-program stx)
(calc-syntax-case stx ()
[(<+> e1 e2) ————]
[(<-> e1 e2) ————]
[(<*> e1 e2) ————]
[(</> e1 e2) ————]))
Notice that I didn't have to add the keywords to the literals list of calc-syntax-case, because it adds them already. (I left in the literals list argument to calc-syntax-case in case I need to add more keywords to a particular macro.) Now let's say I want to add an exponentiation operation to the calc-lang, which compiles down to the core calc-lang language. Then we would add the new operation with
(define-lang (extended-calc-lang calc-lang) extended-calc-syntax-case (<pow>))
So here's the approximate definition of define-lang:
(define-for-syntax (lang-keywords lang-name)
———— recursively collect all keywords from static registry ————)

(define-syntax (define-lang stx)
(syntax-case stx ()
[(_ (lang-name parent-name) case (kw ...))
(with-syntax ([(parent-kw ...) (lang-keywords #'parent-name)])
;; register the static information about the language
(define-syntax lang-name
(list #'parent-name
(list #'kw ...)))
;; define the language-specific syntax-case variant
(define-syntax case
(syntax-rules ()
[(_ stx-exp (extra-kw (... ...))
clause1 clause2 (... ...))
(syntax-case stx-exp
(parent-kw ... kw ... extra-kw (... ...))
clause1 clause2 (... ...))]))))]
[(_ lang-name case (kw ...))
;; register the static information about the language
(define-syntax lang-name
(list #f (list #'kw ...)))
;; define the new language-specific syntax-case variant
(define-syntax case
(syntax-rules ()
[(_ stx-exp (extra-kw (... ...))
clause1 clause2 (... ...))
(syntax-case stx-exp (kw ... extra-kw (... ...))
clause1 clause2 (... ...))])))]))
Incidentally, notice that since these macros produce a whole lots of keywords, it's important to do something to reserve the keywords to get decent module-level scoping behavior.

Wednesday, April 20, 2005

Saturday, April 16, 2005

Good hackers work with what they've got

[File under squishy musings... -Ed.]

From Gian-Carlo Rota's Ten Lessons I Wish I Had Been Taught:
Richard Feynman was fond of giving the following advice on how to be a genius. You have to keep a dozen of your favorite problems constantly present in your mind, although by and large they will lay in a dormant state. Every time you hear or read a new trick or a new result, test it against each of your twelve problems to see whether it helps. Every once in a while there will be a hit, and people will say: "How did he do it? He must be a genius!"
I was thinking about database abstractions and how often people get in pissing matches over whether it's even possible or desirable to abstract over persistence. In teaching abstraction, we advocate information hiding as a way of building scalable software, but then there's always some detractor who can't wait to snipe at you just for trying: "no, you can't hide that detail, that's simply too prevalent an issue." Sometimes history proves them wrong, sometimes not.

It's a constant source of heartburn to computer scientists that, for example, the space of viable algorithms (say, the ones that run in polynomial time) is one tiny speck in the universe of algorithms we wish we could use. And lots of folks from Dijkstra to AOP advocates have tried vainly to abstract out efficiency as a separate concern. While we're at it, let's imagine some extreme violations of reality -- let's say I want to create a computer program that does my research for me, grows wings, and cooks me dinner in response to picking up on my brain wave patterns that indicate my desire for it to do so, but never mind the details of how it works.

I suppose this means that ultimately someone has to be responsible for the details. Maybe that's the key: you can only set up a reasonable interface to hide an implementation if the implementation can truly deliver on its contract with the client. Which means that deciding on the interface to begin with relies on a balance between hiding enough from the client on the one hand but being plausible for the provider to achieve on the other. Feynman suggests one way to go about solving this: set ridiculously high goals for yourself, rely on being a genius, and be on the constant lookout for possible solutions.

But I was also struck by Joel Webber's description of the technology behind Google Maps. In particular, when I read about their use of PNG alpha-channels to implement drop-shadows for the pushpins that get overlaid on a map, it occurred to me that this is an absolutely unnecessary feature, and yet an incredibly nice touch. Clearly there wasn't some suit at Google demanding, "dammit, drop shadows are a deal breaker. Get me drop shadows!" Rather, I imagine someone thought, hey, if we stick a semi-transparent PNG on top of an image we can get a nice shadow effect. I bet that would make us look cooler than Apple!

Maybe good hackers work from the bottom up, starting with their existing bag of tricks and figuring out what they can solve with them. Or maybe they work like Feynman, always on the lookout for solutions to the Big Problems. Chances are, good hackers work with what they've got, be it a tool set or a problem set.

Friday, April 15, 2005

The balance of expressiveness and reasonability

Yesterday during the question and answer period of a talk, Matthias pointed out that there is usually a balance involved in language design between the expressive power of the language and the ability to do local reasoning on programs in the language. This confirmed and crystallized an intuition I've been mulling for a while. It was always clear to me that an increase in expressivity is not inherently "good" (for example, in his expressivity paper, Matthias proves that set! increases the expressive power of a language, but he certainly wasn't advocating the gratuitous use of mutation). But I hadn't quite put my finger on it until Matthias's comment.

It's in fact a direct result of the idea that powerful language constructs require global transformations: when you use a construct that affects your whole program, then you have to reason globally about the changes it makes. Essentially this means you'll need more powerful reasoning tools to be able to tame the power -- which can be both constructive and destructive -- of the new language feature.

As a simple example, the other day I was writing some pattern matching code in "substitution-passing style," where the matcher returns either a substitution or #f, the latter indicating that it failed to match. But before I knew it, I ended up with loads of intermediate checks to see if intermediate matches had failed. The real kicker was when I started writing helper functions (match-list) that themselves had to propagate these checks. I finally decided that there was no need to reimplement a poor man's exception mechanism when my implementation language already has one. So I changed the interface to have matchers raise a match-error exception when matching fails. But the upshot is that the code no longer locally indicates that such an exception can occur; this property then ends up in comments and in proof obligations of the clients.

Type and effect systems like Java's checked exceptions can mitigate these non-local changes by checking and maintaining invariants about the effects. This is precisely an example of the kinds of reasoning tools that help tame the power of expressive language features.

Another example is dynamic dispatch in OO languages. I was having a discussion last year with Shriram about local reasoning and AOP, and he made an analogy to dynamic dispatch. Programmers are completely accustomed to looking at a local piece of code and not knowing its behavior, he argued; in OO languages a method call could refer to any arbitrary reimplementation of the method. But this is par for the course, and in practice we assume that the reimplementations will be well-behaved.

Furthermore, we don't use this power with reckless abandon. I found a relevant quote from an aside made by Paul Wilson in the midst of an unrelated Usenet conversation:
It turns out that at most overtly polymorphic call sites in OO programs, the receiver is always of the same class. And of the minority of truly polymorphic call sites, there's only two or three classes of receiver. That makes it sound like OO doesn't do much work for you. But it does, really, because the 10% of sites where you need it, you need it bad.
Now, I was initially unhappy with these "pragmatic" justifications of language features that destroy local reasoning. They seemed to pit abstraction and local reasoning on one side against usefulness and practicality on the other, and I'm simply not prepared to give up on abstraction. I just don't see how anyone could survive a software project without it. But the underlying tension is this one between expressive power and local reasoning. So good language design requires a sweet spot between features that are just powerful enough to express the kinds of things you want to express while not being too much harder to reason about.1

I believe that the core principles of program design (should) stem from functional programming. But many important design principles require more powerful tools than pure functional programming offers. So more powerful language features are important. On the other hand, any time you introduce effects that can affect the behavior of arbitrary parts of a program, you need to be able to mitigate those changes with reasoning tools, whether they are proof techniques or automated reasoning software.

1As Moore's law allows lightweight formal methods to get closer to what used to be heavyweight formal methods, we should be able to make more powerful reasoning tools for our programs, which should allow us to slide the scale further in favor of more expressive language features. Which is all to say that I should really check out Shriram and Kathi's work on formal methods for AOP.

Monday, April 11, 2005

Reader macros

PLT has reader macros!

When I get around to it, I'd like to create a read syntax where you can annotate procedure definitions with documentation, and a Javadoc-like tool can easily extract the information. Semantically meaningful comments make my teeth itch, and in a language where Code Is Data™, code documentation ought to be something that can be extracted without having to write a custom parser.

Monday, April 04, 2005


When I took COMP210 from Matthias as a freshman at Rice, I didn't understand why he even bothered pointing out the notion of an accumulator, or why it had such a fancy name. If you need to give more information to another procedure you give it an extra argument -- duh. Why all the commotion?

There are many reasons why accumulators are so fundamental. First of all, accumulator-passing style--a "style" is what the FP cool kids call a design pattern--is a functional way to simulate mutation. It's used in semantics and it's used in functional programs to preserve referential transparency. The one-sentence justification for functional update is that it renders the dependencies of the function explicit, which makes the function's invariants clearer, and generally has the effect that you can't cheat and mistakenly call the procedure in an invalid state.

Once you become comfortable with accumulator-passing style, you start being able to write iterative loops that are more general and natural than for or while loops, and avoid nasty dependencies on statement order that arise from imperative update. This is one of those things that just drive you crazy after a while in C. What's really nice about the "named let" construct in Scheme is that you can easily write tail-recursive procedures that can be compiled to loops, which you can essentially think of as "while with arguments."

The next step, once you've mastered accumulator-passing style and named let, is to understand fold. I know, it makes Guido's head hurt. But never mind the haters. Accumulators are a general pattern of building up a result by successive approximations. And a subset of this pattern is when the construction of the result is driven by the deconstruction of an input, particularly a list. It happens that a whole bunch of the time, we build up some result by adding to it the result of computing something for every item in a list. For example, imagine a Scheme interpreter that adds a bunch of variable bindings to an environment: it starts with an initial environment, and one at a time it goes through the list of new bindings, each time generating a new environment with the added binding.

And here's the kicker--just for fun, once you've become a master of functional folds, you are primed to understand fixed-point constructions. The very idea of successive approximations to a result is the key to a fixed-point algorithm; the difference is what drives the recursion. In a fold, it's the simple deconstruction of a list that determines when to stop, whereas in a fixed-point recursion, it's when the approximations stop changing the result that the algorithm terminates.

Finally, once you can write fixed-point algorithms while standing on your head and whistling Jerusalem, you're ready to understand the Y combinator. Remember how when you first learned to write recursive functions, you had to teach yourself not to try to imagine the entire nested computation, but just to trust the recursive call to do its job? Well, you'll never be able to imagine an infinite fixed-point construction, so here this mental laziness is crucial.

When you understand what successive approximation has to do with all of these concepts, you will have achieved enlightenment.

Sunday, April 03, 2005

What contexts represent

One of my pet peeves is that people always pay lip service to explaining what contexts are, and they always end up saying the same thing, which means absolutely nothing to people who don't already know what contexts are. They always say, "a context is an expression with a hole." This is basically a useless introduction. If you expect your audience to know what a context is, don't waste their time re-explaining it. If you don't expect them to know, then for God's sake, give them a real explanation.

The notion of a program context can be used to describe a number of ideas. So you could approach it from a number of angles.
  1. Compatible closure - the β-reduction rule isn't useful until you extend it to operate within the context of a whole program.
  2. Observational equivalence - even non-PL programmers are familiar with the idea of "substituting equals for equals" -- Liskov made the substitution principle famous to OO programmers, for example. So you can talk about being able to place two different expressions into the same context and observing the results.
  3. Quasiquotation - a quasiquoted code template constructs a term with holes in it where you can splice in code.
  4. Macros - similarly, the template in a syntax-rules macro, for example, is an expression context with named holes for all the pattern variables.
  5. Evaluation order - this goes along with the idea of compatible closure; since understanding how β-reduction applies within the context of a whole program requires knowing how to find the next expression to reduce, the grammar of evaluation contexts is a specification for the algorithm that finds the redex.
In On the Expressive Power of Programming Languages, Felleisen gives a very general definition of contexts (Def. 3.4), where you can have multiple holes with different names. This demonstrates the connection between contexts and macro-expressibility: a macro template is a context with several distinguished holes.

Although I found the phrase "an expression with a hole" opaque as a first year grad student, I do like the metaphor of a hole. But it's important to motivate it with an application. For PL students learning about semantics, it's probably useful to take the compatible closure angle. Otherwise, observational equivalence is probably the most accessible approach.

Saturday, April 02, 2005

Three ways to label expressions

Static analyses like CFA compute results for each subexpression of a program. Consequently the result of such an analysis is a map from individual subexpressions to answers. But consider the following program:
((lambda (x) x) (lambda (x) x))
Since a program may contain identical subexpressions, there needs to be some way to distinguish seemingly identical terms. But the program already contains enough information to distinguish all subexpressions: their distinction is precisely the program context in which they occur. So one way to label subexpressions is to make the result of the analysis a map from program contexts to results:
But comparison of two expressions by recursively comparing their entire program context is inefficient. We can optimize this by representing each position in the program as a unique label, and then implement these labels with a datatype that has a fast comparison operation (such as integers or symbols):
But there's another possible representation hack. Sometimes we happen to be working with a language whose expressions themselves contain information that distinguishes them from all other expressions. A program in A-normal form binds every subexpression to a local variable. If we make sure that each such variable has a unique name*, then our analysis can simply map the variable of each let-expression to a result:
*As they point out on p.142 of Principles of Program Analysis, there's not actually a requirement that the labels be unique, but an analysis that identifies two distinct expressions will potentially be less precise.

Garbage collecting blog posts

Interesting! I just fixed a typo in the title of a blog entry, and Blogger demonstrated a property of the semantics of blog archives.

All blog entries are archived permanently, and each post gets its own dedicated archive page. The file name of the page is derived from the words in the title of the post (and placed in a subdirectory based on the date, to lessen the likelihood of name clashes -- I'm sure they have conflict resolution rules, but this makes it less of an issue).

But Blogger also lets you to go back and edit old posts, even changing their titles. This means that the archive file name can change. So what happens with the old archive file? My first guess was that it would disappear, but it didn't! Instead, the new archive file was created with the new name, and the old archive file remained as it was, with the contents of the original version of the post.

The reason why this is (at least almost) the right behavior is because links might exist to the old site. In fact, you can't ever garbage collect the old post because, even if no links existed originally, some jerk might link to the old post in the future, like me:
If my understanding is right, both these links should remain live.

The one part of this behavior that is arguable in my mind is what the contents of the old version should be. It could stay the same as it was, which is what Blogger does, it could get replaced with the new version, or it could contain some kind of message indicating that the page has been superseded by a new one, with a link or automatic redirect. As it is, there's no indication on the old page that there's another, newer version.

Dynamic dispatch

The beginning of the chapter on control flow analysis in Principles of Program Analysis describes dynamic dispatch as being when variables can denote procedures.

To clarify, we could say that dynamic dispatch is whenever the actual procedure used in an invocation could at runtime originate from more than one procedure definition in the program. In a first order system, the name uniquely identifies the piece of code that will be used for the procedure. In a higher-order language, the operator can be an arbitrary expression, so its procedure body could come from any number of places in the program. Somewhere in between these two poles are OO languages such as Java where there is a fixed but non-trivial number of implementations of the procedure named in the invocation.

Of course, static analysis such as CFA might collapse certain cases to determine when there is only a single possible source location where the procedure can originate from, but we could perhaps say that dynamic dispatch is when you require a whole program analysis to determine which procedure will be used, if you can determine it at all.

Friday, April 01, 2005

Resurrecting Java objects

The semantics of finalization in Java dictates that a finalizer can be run at most once. So if an object revives itself from within its finalizer, then the next time the garbage collector tries to collect it, the finalizer isn't run. Here's a class that refuses to die:
import java.util.Hashtable;

public class Immortal {

/** tracks the number of times each finalizer is run */
private static Hashtable<Integer,Integer> finalizeCounts =
new Hashtable<Integer,Integer>();

/** used to control object lifetimes */
private static Hashtable<Integer,Immortal> pointers =
new Hashtable<Integer,Immortal>();

/** used to generate unique identifier codes */
private static int unique = 0;

/** deletes the object with the given id. */
public static void kill(int id)
int finalizeCount = finalizeCounts.get(id);
while (finalizeCounts.get(id) == finalizeCount)
System.out.println("trying to kill " + id + "...");

// The code from these two methods can't be inlined, because
// we rely on their stack frame disappearing to prevent the
// link to the tracked object from persisting.

public static int makeTemporaryObject()
Immortal temp = new Immortal();

public static void doSomethingWith(int id)
Immortal temp = pointers.get(id);

/** identifier code */
private int id;

private Immortal()
id = unique++;
System.out.println("creating instance " + id);
finalizeCounts.put(id, 0);
pointers.put(id, this);

public void sayHello()
System.out.println("hi, I'm number " + id);

public void finalize()
System.out.println("finalizing " + id + "...");
finalizeCounts.put(id, finalizeCounts.get(id) + 1);
// clear! *khzh-thump*
pointers.put(id, this);

And here's an example interaction with the class:
int id = Immortal.makeTemporaryObject();

// This causes the finalizer to run (in the GC thread.)

// And yet, the object is still alive!

// This will now loop infinitely, since the finalizer
// will never be run a second time.
The interaction loops infinitely, with the beginning of the output looking something like:
creating instance 0
trying to kill 0...
trying to kill 0...
finalizing 0...
trying to kill 0...
hi, I'm number 0
trying to kill 0...
trying to kill 0...
trying to kill 0...
trying to kill 0...
trying to kill 0...
trying to kill 0...
Notice that the output is interleaved since the GC (and hence the finalizer) runs in a separate thread from the main code.

Update: How embarrassing, I misspelled a world in the title of this post. I hope fixing it doesn't cause problems with the archived file name. Let's see if Blogger can handle it...