Monday, September 19, 2005

How do we automate?

Read this excellent and reasonable post from Jacob Matthews on undergraduate curriculum in computer science.

Let me call attention to Jacob's characterization of CS: "computer science is fundamentally the study of how to best systematically solve problems." I think it's interesting to compare this to Knuth's definition of computer science (pp. 5-6) as the answer to Forsythe's question "what can be automated?" (With some quick googling I found a slight generalization of this formulation to the broader question "how do we automate?")

It's useful to place Jacob's definition in the context of our research culture. You could describe us as belonging to the academic diaspora of Matthias Felleisen, whose profound textbook How to Design Programs and corresponding pedagogy permeate our approach to computer science. (The book's title is an homage to Polya's How to Solve It, a landmark work in the history of mathematical pedagogy, which sought to elevate math education beyond the realm of black art by providing students with a systematic process for solving problems rigorously.) The HtDP approach to teaching computer science is to give students a repeatable process for solving problems--this is systematic both in the sense that students are given a dependable process that they can reliably use to solve new problems, and in the sense that expressing the solution in terms that a computer can execute and evaluate is more formal and reliable than a rote hand-calculation.

Nevertheless, I like Knuth's and Raatikainen's definitions. Computer science is not always about problems--both in research and industry, I think there's a place for exploration where the discovery of applications may not follow until later. The real common thread seems to be the ability to describe any process in a formal way so that it can be specified without unnecessary implementation detail and yet in a way that it can be repeated reliably.

Wednesday, September 14, 2005

The language in my head

Felix points out how bizarre the hypothetical language was that I used in my previous post. Very true! It combines Javascript object-literal notation, ML expression syntax, semantic brackets to represent compilation, Scheme's hygienic expansion, Java (et al) method invocation, and hey, let's give credit to the new guy, Fortress-style mathematical notation for set-union.

Well, I know what I like!

Tuesday, September 13, 2005

Compiling a language with multiple operations

I've now twice compiled languages with more than just an eval operation on expressions. For example, I wrote an LL(1) parser generator that generated for each expression in the grammar an object with a parse method (which you can think of as eval for grammars) as well as first, follow, and nullable? methods. I've also run into the same issue when working on the Topsl compiler.

But when you generate objects for expressions and you need to model binding in your language, you find an interesting little problem. Imagine compiling a simple functional language into a bunch of expression objects that have two methods: eval and analyze (for some hypothetical static analysis you might like to perform). Now how would you implement the rule for compiling let-expressions?
[[let x = e1 in e2]] =
{ analyze : λ().[[e1]].analyze() ∪ let x = ()
in [[e2]].analyze(),
eval : λ().let x = [[e1]].eval() in [[e2]].eval() }
By binding the variable x in both methods, it can be referred to in the compiled code for e2. In the static analysis, we can bind it to any dummy value, since the analysis presumably doesn't care about the dynamic result of evaluating the expression.

But this results in duplicating the expansion of e1 and e2, which is not just code bloat, it's exponential code bloat. A non-starter.

We could then lift out the generated let to surround the entire object creation, initialize x to a dummy value, and then mutate it within the body of eval. But this introduces brittle dependencies on the order in which the methods can be invoked.

Instead, we can generate one piece of code, specifically, a function from x-value to expression object, and then invoke this function differently from each method:
[[let x = e1 in e2]] =
let f = λx.[[e2]]
in { analyze : λ().[[e1]].analyze() ∪ (f ()).analyze(),
eval : λ().(f [[e1]].eval()).eval() }
Now there are no fragile method invocation interdependencies, and each method can give a different value for x, and each expression is only compiled once.

Update: I wasn't clear that the semantic brackets function [[--]] represents Scheme-style macro-expansion. So I assume all hygiene issues are handled automatically by the expansion process.