- Haskell, ML: Maybe T, T option
- Scheme: either T or #f
- Java, C#: either an ordinary T or null
- ML, Scheme, Java, C#: either a value or an exception
- Prolog: success or failure
Tuesday, March 28, 2006
How do you say "maybe"?
Monday, March 27, 2006
C rots the brain
It's high time for a good rant. I'll warm up by quoting Cassandra:
Consider the pure lambda calculus. Application of a β-redex completely replaces the application node with the body of the function being applied. That is, there is never any space consumed in the representation of control for a procedure application. Rather, it is when one of these applications occurs in the context of some other computation that the program's control consumes space. To quote Guy Steele:
The pernicious legacy of C is that we assume entry to a procedure body must always create a stack frame. But what is this frame for? It keeps a finger on our current point in the program, i.e., the program counter, and perhaps it stores some local data that we'll need in order to complete the evaluation of the function. If the procedure happens to be a leaf procedure, i.e., does not make any function calls, there's no need to save the current position, since we're not going anywhere else. Or maybe the procedure only conditionally calls another procedure. In that case, we might not need a continuation frame until we jump to the particular branch that is guaranteed to make the procedure call.
The assumption that procedure calls must push stack is so built-in to my head that I constantly have to remind myself: procedure entry and continuation frame allocation are not necessarily correlated!
But now the true folly is in the representation of the continuation as a simple stack. Gone are the days where we were told "if you want a string, just allocate a fixed buffer that's really, really big and hope you never need any more than that." (Well, gone should be the days.) Why then do we only get a fixed buffer size for the single most important part of every single program, the program's control itself? It's absurd to have your program tell you, "other than, like, those couple of gigabytes sitting over there, I'm totally out of space."
Why do these silly limitations persist? Because C rots the brain.
Update: Sam doesn't think it's fair for me to blame C. I guess C has been the drug that led to my personal brain damage, but it's perfectly possible for programmers of older generations to have been addled by Algol or Fortran, or newer generations by Java or C#. There's plenty of blame to go around...
I'm sorry to be politically incorrect, but for the ACM to then laud "C" andOne of our nastiest inheritances from C is a set of ingrained assumptions about the implementation of the control of any programming language.
its inventors as a major advance in computer science has to rank right up
there with Chamberlain's appeasement of Hitler.
Consider the pure lambda calculus. Application of a β-redex completely replaces the application node with the body of the function being applied. That is, there is never any space consumed in the representation of control for a procedure application. Rather, it is when one of these applications occurs in the context of some other computation that the program's control consumes space. To quote Guy Steele:
Intuitively, function calls do not "push control stack"; instead, it is argument evaluation which pushes control stack.A continuation frame represents a kind of to-do list of what the program has left to do. It only needs to save this information in memory somewhere if it's going to do something else first, i.e., call another procedure.
The pernicious legacy of C is that we assume entry to a procedure body must always create a stack frame. But what is this frame for? It keeps a finger on our current point in the program, i.e., the program counter, and perhaps it stores some local data that we'll need in order to complete the evaluation of the function. If the procedure happens to be a leaf procedure, i.e., does not make any function calls, there's no need to save the current position, since we're not going anywhere else. Or maybe the procedure only conditionally calls another procedure. In that case, we might not need a continuation frame until we jump to the particular branch that is guaranteed to make the procedure call.
The assumption that procedure calls must push stack is so built-in to my head that I constantly have to remind myself: procedure entry and continuation frame allocation are not necessarily correlated!
But now the true folly is in the representation of the continuation as a simple stack. Gone are the days where we were told "if you want a string, just allocate a fixed buffer that's really, really big and hope you never need any more than that." (Well, gone should be the days.) Why then do we only get a fixed buffer size for the single most important part of every single program, the program's control itself? It's absurd to have your program tell you, "other than, like, those couple of gigabytes sitting over there, I'm totally out of space."
Why do these silly limitations persist? Because C rots the brain.
Update: Sam doesn't think it's fair for me to blame C. I guess C has been the drug that led to my personal brain damage, but it's perfectly possible for programmers of older generations to have been addled by Algol or Fortran, or newer generations by Java or C#. There's plenty of blame to go around...
Sunday, March 19, 2006
Why the CPS simulation of CBV works
Incidentally, it's exactly the common element of call-by-name and call-by-value I mentioned a moment ago that facilitates the CPS simulation of call-by-value in call-by-name. Both reduction rules require that the operator of a redex be a value. In either calculus, reduction relies on instantiating a bound variable within the body of a function; there's no obvious way you could know how to reduce if you don't actually know what the operator is. So both systems force evaluation of an operator in order to evaluate an application. The difference is whether you also force the operand.
The CPS transform just makes it so that all evaluation within an entire program occurs exclusively in operator positions. After the translation, all operands are values and therefore irreducible. This property is also preserved under reduction, so at every step, a CPS'ed program behaves the same in either calculus.
In other words, if every operand is always a value, there's no difference between the β-rule and the βv rule.
The CPS transform just makes it so that all evaluation within an entire program occurs exclusively in operator positions. After the translation, all operands are values and therefore irreducible. This property is also preserved under reduction, so at every step, a CPS'ed program behaves the same in either calculus.
In other words, if every operand is always a value, there's no difference between the β-rule and the βv rule.
Observable differences between reduction strategies
It's easy to misunderstand the difference between lazy and eager evaluation. You might be tempted to think that 1) lazy evaluation does reductions from the "outside in," meaning that finds the outermost redex and reduces it before reducing its subterms, and 2) eager evaluation, by contrast, reduces from the "inside out," not performing an outer reduction until it has reduced all its subterms.
But this isn't the case. The difference between lazy and eager evaluation is not a difference in evaluation strategy, but rather a difference in the definition of what is reducible. In ordinary evaluation (as opposed to, say, compiler optimizations using partial evaluation), we always use the exact same strategy when reducing terms--typically, always reduce the leftmost, outermost redex. The difference is in the definition of redexes.
In call-by-name lambda calculus, the definition of a reducible expression is an application whose operator is a value. The operand can be any arbitrary expression. As a result, non-values end up being passed as arguments to functions. In call-by-value, we likewise only perform reduction when the operator is a value, but we also require that the operand be a value as well.
The combination of an outermost reduction strategy with the call-by-value rule may appear to be the same as an innermost reduction strategy; after all, it forces the evaluation of subterms before reducing a compound term. But it doesn't force the evaluation of all subterms, specifically, underneath abstractions. This is the key distinction.
Here's an example that distinguishes the three semantics. Imagine an untyped lambda calculus with two effects: we can ask Alice a question, say by sending her computer a message over the network and receiving a response, or we can similarly ask Bob a question. Each of these operations results in a distinct observable effect. Now take the following simple program:
In call-by-name lambda calculus with a leftmost-outermost reduction strategy:
I added a couple simple effects to the lambda calculus to demonstrate the idea. But in pure lambda calculus, we can still observe differences between these systems because we can demonstrate side effects through non-termination. An expression that leads to non-termination results in different evaluation behavior of a system when the system makes different decisions about when and whether to evaluate the expression.
But this isn't the case. The difference between lazy and eager evaluation is not a difference in evaluation strategy, but rather a difference in the definition of what is reducible. In ordinary evaluation (as opposed to, say, compiler optimizations using partial evaluation), we always use the exact same strategy when reducing terms--typically, always reduce the leftmost, outermost redex. The difference is in the definition of redexes.
In call-by-name lambda calculus, the definition of a reducible expression is an application whose operator is a value. The operand can be any arbitrary expression. As a result, non-values end up being passed as arguments to functions. In call-by-value, we likewise only perform reduction when the operator is a value, but we also require that the operand be a value as well.
The combination of an outermost reduction strategy with the call-by-value rule may appear to be the same as an innermost reduction strategy; after all, it forces the evaluation of subterms before reducing a compound term. But it doesn't force the evaluation of all subterms, specifically, underneath abstractions. This is the key distinction.
Here's an example that distinguishes the three semantics. Imagine an untyped lambda calculus with two effects: we can ask Alice a question, say by sending her computer a message over the network and receiving a response, or we can similarly ask Bob a question. Each of these operations results in a distinct observable effect. Now take the following simple program:
(λx.(ask bob)) (ask alice)
We'll annotate each reduction step with a note recording the side effect it produces, if any, so we can see when the effects happen.In call-by-name lambda calculus with a leftmost-outermost reduction strategy:
(λx.(ask bob)) (ask alice)
-> (ask bob)
b! -> 22
In call-by-value lambda calculus with leftmost-outermost: (λx.(ask bob)) (ask alice)
a! -> (λx.(ask bob)) 11
-> (ask bob)
b! -> 22
And finally, the weirdest of them all, an innermost reduction strategy: (λx.(ask bob)) (ask alice)
b! -> (λx.22) (ask alice)
a! -> (λx.22) 44
-> 22
Intuitively, this program should always ask Bob a question and return its answer. Each of these programs preserves that meaning. But in outermost call-by-value it asks Alice a question first and then ignores it, in an innermost evaluation strategy it asks Alice a question second and then ignores it, and in call-by-name it never asks Alice at all.I added a couple simple effects to the lambda calculus to demonstrate the idea. But in pure lambda calculus, we can still observe differences between these systems because we can demonstrate side effects through non-termination. An expression that leads to non-termination results in different evaluation behavior of a system when the system makes different decisions about when and whether to evaluate the expression.
Tuesday, March 14, 2006
Proper tail recursion and space efficiency
I just read through Will Clinger's Proper Tail Recursion and Space Efficiency. I'd been scared of it for a few years because I had the impression it was a very technical and subtle paper. There are perhaps a few subtleties but I actually found it very clear and well-written. Here are some interesting points made in the paper:
1. Proper tail recursion is a property of the asymptotic space complexity of a language's runtime behavior. That is, in improperly tail recursive languages, control can consume unbounded amounts of space for programs that, when run in properly tail recursive languages, only require a constant amount of space.
2. In order to measure the space complexity of a language, you have to measure the entire system, not just, say, the stack. Otherwise, someone could always cheat and hide away its representation of the stack in the heap to make it look like the control is more space-efficient than it really is.
3. But when you measure everything including the heap, you have to deal with the fact that the language doesn't ever explicitly deallocate storage. This means that you have to add garbage collection into your model or else the model always increases its space usage unrealistically.
4. Proper tail recursion is totally natural in the lambda calculus. An application is always replaced by the body of the applied function, regardless of whether it's a tail call. It's just languages with that freaky return construct that accidentally make the mistake of leaving around residue of the application expression until it's completed evaluating. In order to model improper tail calls in Scheme, Will actually has to add a silly return context frame, whose sole purpose is to degrade the space efficiency of the language.
5. By leaving the expression language the same but changing the definition of the runtime system (in this case, the set of context frames), you can compare apples to apples and look at the same space consumption of the exact same program in one system or another, assuming the systems always produce the same results from the programs. Then we can actually construct space complexity classes to classify the different systems.
1. Proper tail recursion is a property of the asymptotic space complexity of a language's runtime behavior. That is, in improperly tail recursive languages, control can consume unbounded amounts of space for programs that, when run in properly tail recursive languages, only require a constant amount of space.
2. In order to measure the space complexity of a language, you have to measure the entire system, not just, say, the stack. Otherwise, someone could always cheat and hide away its representation of the stack in the heap to make it look like the control is more space-efficient than it really is.
3. But when you measure everything including the heap, you have to deal with the fact that the language doesn't ever explicitly deallocate storage. This means that you have to add garbage collection into your model or else the model always increases its space usage unrealistically.
4. Proper tail recursion is totally natural in the lambda calculus. An application is always replaced by the body of the applied function, regardless of whether it's a tail call. It's just languages with that freaky return construct that accidentally make the mistake of leaving around residue of the application expression until it's completed evaluating. In order to model improper tail calls in Scheme, Will actually has to add a silly return context frame, whose sole purpose is to degrade the space efficiency of the language.
5. By leaving the expression language the same but changing the definition of the runtime system (in this case, the set of context frames), you can compare apples to apples and look at the same space consumption of the exact same program in one system or another, assuming the systems always produce the same results from the programs. Then we can actually construct space complexity classes to classify the different systems.
Friday, March 10, 2006
The Rule of Least Power
Via Phil Wadler's blog, I see that Tim Berners-Lee has written a "W3C Finding" on a principle he terms the rule of least power. The idea is that you should use the least powerful language possible to solve any given problem, in order to achieve the greatest ability to reason about the program statically.
I've talked about this principle before as the balance of expressive power with reasoning power.
In working on the ECMAScript design, I've already seen this issue come up several times, for example with the problems caused by dynamic scoping.
I've talked about this principle before as the balance of expressive power with reasoning power.
In working on the ECMAScript design, I've already seen this issue come up several times, for example with the problems caused by dynamic scoping.
Mom was right...
...hygiene really is important!
It took me hours the other day to catch a bug that came from an unintended variable capture in a macro I'd written.
In a reduction semantics I'm implementing with PLT Redex, I wrote a macro to abstract over certain common patterns in my reduction rules. The macro template introduced a temporary name via a call to the name macro in PLT Redex. I thought name was implemented with a Scheme binding and would therefore be hygienically renamed. This was my mistaken assumption; name is implemented with a quoted symbol.
Then later in a reduction rule, I was using my macro and by chance introducing another variable at the call site with the same name as the temporary variable in the macro definition. This silently caused a capture of the quoted name, with unexpected consequences on the meaning of the reduction rule. In this particular case, it cause the reduction rule not to match when it should.
It took hours to discover that it wasn't a bug in my reduction rule at all, but in the implementation of my macro. Imagine trying to debug this if I hadn't just spent years studying macros and hygiene!
It took me hours the other day to catch a bug that came from an unintended variable capture in a macro I'd written.
In a reduction semantics I'm implementing with PLT Redex, I wrote a macro to abstract over certain common patterns in my reduction rules. The macro template introduced a temporary name via a call to the name macro in PLT Redex. I thought name was implemented with a Scheme binding and would therefore be hygienically renamed. This was my mistaken assumption; name is implemented with a quoted symbol.
Then later in a reduction rule, I was using my macro and by chance introducing another variable at the call site with the same name as the temporary variable in the macro definition. This silently caused a capture of the quoted name, with unexpected consequences on the meaning of the reduction rule. In this particular case, it cause the reduction rule not to match when it should.
It took hours to discover that it wasn't a bug in my reduction rule at all, but in the implementation of my macro. Imagine trying to debug this if I hadn't just spent years studying macros and hygiene!
Monday, March 06, 2006
Redex is cool
I hadn't looked at PLT Redex since its pre-publication days, but yesterday I started using it to implement my draft model for Topsl, the programming language for implementing online surveys. I couldn't believe how easy it was! I expected it to take quite a while, but within an hour I had a working operational model of core Scheme, and half an hour later I had the entire grammar of Topsl and even a few nice abstractions. I haven't quite finished the implementation yet, but it's been surprisingly painless.
Of course, it helps to have a good understanding of macrological* pattern-matching and template instantiation, capture-avoiding substitution, and Felleisen-style context-sensitive reduction semantics, so you might say I'm uniquely suited to use the tool. Regardless, I'm very impressed.
* This is a very silly adjective. Please don't ever use it. Someone could get hurt.
Of course, it helps to have a good understanding of macrological* pattern-matching and template instantiation, capture-avoiding substitution, and Felleisen-style context-sensitive reduction semantics, so you might say I'm uniquely suited to use the tool. Regardless, I'm very impressed.
* This is a very silly adjective. Please don't ever use it. Someone could get hurt.
Wednesday, March 01, 2006
The call stack is not a history
I had a fascinating conversation with John Clements yesterday about continuation marks, and he gave me a nice way of clarifying a common misconception about the call stack. There are two major pieces of information you might like to inspect about the control flow of a program: what the program has already done (its history), and what it has left to do (its future).
The call stack might appear to be giving you information about the history of the program execution, but in fact, it's an incomplete history. For example, any functions that have already returned are not recorded; they've already popped off the stack. Languages that implement proper tail calls also remove information about function calls that have completed just about everything but may not yet have returned from their tail operations. Now, if this is information you need, it doesn't mean you can't record it through some other means. It just means the default behavior of the runtime is not to record it--and for good reason, too. Not accumulating that information affects the asymptotic space complexity of the runtime. You wouldn't want your runtime collecting complete information about every iteration of a for-loop by default, either, just on the off-chance that information might be useful!
The evaluation context, i.e. call stack, is not about the history of the program execution. It contains a complete record of the remainder of the program execution, the work the program has left to do. That's the whole reason for maintaining stack frames, so that when a function returns, the runtime knows what to do next. So stack inspection is fundamentally about inspecting the future of the computation, not about recording its history.
The call stack might appear to be giving you information about the history of the program execution, but in fact, it's an incomplete history. For example, any functions that have already returned are not recorded; they've already popped off the stack. Languages that implement proper tail calls also remove information about function calls that have completed just about everything but may not yet have returned from their tail operations. Now, if this is information you need, it doesn't mean you can't record it through some other means. It just means the default behavior of the runtime is not to record it--and for good reason, too. Not accumulating that information affects the asymptotic space complexity of the runtime. You wouldn't want your runtime collecting complete information about every iteration of a for-loop by default, either, just on the off-chance that information might be useful!
The evaluation context, i.e. call stack, is not about the history of the program execution. It contains a complete record of the remainder of the program execution, the work the program has left to do. That's the whole reason for maintaining stack frames, so that when a function returns, the runtime knows what to do next. So stack inspection is fundamentally about inspecting the future of the computation, not about recording its history.
Subscribe to:
Posts (Atom)