Sunday, March 23, 2008

Makefile for HUnit tests

Here's a nice idiom for running HUnit tests from a Makefile. Jesse Tov gave me the idea that you can use make's dependencies to test only modules that have changed since the last time you ran the tests. Pick a standard name for each module's tests--tests, say--and make sure it's defined in the first column (since we'll be searching for it with a simple grep). Every time we test a module, we'll touch a dummy file of the same name in some hidden directory--call it .test. So we have two lists of files, the modules of the application and the test dummy files:
SOURCES := $(wildcard *.hs) $(wildcard *.lhs)
TESTS := $(foreach src,$(SOURCES),.test/$(src))
To run the tests, first we collect the list of modified modules and save the names of their tests (Foo.tests, Bar.tests, etc.) in a temporary file .test/args.
.test/%hs: %hs
@if grep '^tests ' $< > /dev/null 2>&1 ; then \
touch $@ ; \
echo $* | sed -e 's/\..*/.tests/' >> .test/args ; \
fi
Now every time we run the tests, we first make a fresh .test/args file, and then we run GHC with a command-line option to evaluate an expression that runs those tests:
test: teststart $(TESTS)
@echo ghc Test -e "\"Test.run [ `xargs -a .test/args | tr ' ' ','` ]\""
@ghc Test -e "Test.run [ `xargs -a .test/args | tr ' ' ','` ]"

teststart:
@mkdir -p .test
@$(RM) -f .test/args
@touch .test/args
It's also useful to have a target that always runs all the tests:
retest: testclean test

testclean:
$(RM) -rf .test
This assumes the existence of a simple test harness module:
module Test (run) where
import Test.HUnit
run :: [Test] -> IO ()
run ts = (runTestTT $ TestList ts) >> (return ())

Some tools just don't like each other

brit·tle, adj.
  1. Multi-line string literals in Haskell require so-called "string gaps": one '\' character to terminate a line and another '\' to start the next line.
  2. With GHC, Haskell programs may be preprocessed with CPP, which coincidentally strips the "\ ... \" characters from the source, resulting in an illegal Haskell string literal.
  3. Mercifully, it also happens that CPP doesn't strip the characters if the first '\' character is followed by a space before the newline.
  4. But of course, a commonly used feature of emacs is to silently strip trailing whitespace at the end of lines on every save.
  5. Not that you can see the difference, given the well-known human limitations at visually distinguishing whitespace.
(Sensitive types should take this neither as a criticism of GHC nor as implicit condonement of using CPP for Haskell programs--I know it's shunned. You've gotta admit this is beautiful, though.)

Friday, March 21, 2008

When to use point-free style

I've often struggled with the question of when/whether to use point-free style. It can be dangerously addictive, especially in languages with syntactic support for it like Haskell. But it's notorious for creating dense and impenetrable code. Yesterday I linked to advice from the GHC community on when not to create needless abstractions, advice which could be applied when considering a point-free abstraction (man, does it ever take self-restraint not to go wild with the puns here).

The reason why pointful style can be so helpful is it allows us to think about the definition of a computation in terms of particular examples. For any number, let's call it x, ... This is also why set comprehensions in math, and consequently list comprehensions in programming languages, are so approachable: they describe a set by appeal to representative examples.

I think the key to successfully using point-free style is when you want to treat a function itself as a single data point. For example, if you're sorting a list with some comparison function, you don't want to have to drop down a level of abstraction to think about the individual pairs of elements being compared; you just want to think about the comparison function (like "numeric" or "reverse alphabetical") as the object of your attention.

Wednesday, March 19, 2008

Well said

From the GHC wiki:
It's much better to write code that is transparent than to write code that is short.

Often it's better to write out the code longhand than to reuse a generic abstraction (not always, of course). Sometimes it's better to duplicate some similar code than to try to construct an elaborate generalisation with only two instances. Remember: other people have to be able to quickly understand what you've done, and overuse of abstractions just serves to obscure the really tricky stuff.

Friday, March 14, 2008

Not fade away

From the old-languages-never-die dept.:
Obsolete programming languages such as FORTRAN and COBOL are still widely used.

-- J.J. Duby, 1971

Monday, March 10, 2008

Another cute Haskell function

join :: Eq a => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
join a1 a2 = [ (a,(b,c)) | (a,b) <- a1, c <- [ c | (a',c) <- a2, a == a' ] ]

infixr 5 ⋈
(⋈) :: Eq a => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
(⋈) = join

-- for example:
[("a",1),("b",2),("c",3)] ⋈ [("a",'a'),("b",'b'),("c",'c')]

Friday, March 07, 2008

Crazy debugging feature idea

Here's a crazy idea. Closures are like objects with private fields, right? Specifically, outside of the function body, you're not allowed to refer to the variables in the closed environment. But one of the most annoying things that happen when I'm debugging is that I want to call some local function, but it's not available from the REPL:
(define (foo x y)
(define (helper z)
... x ... y ...)
...)
What if, at the REPL (i.e., in debug mode only), closures were openable? You could use dot-notation to refer to them, and all you'd have to do is somehow provide the extra bindings needed to fill in their environment. Keyword arguments, maybe?
> ((close foo.helper [#:x 42] [#:y 'blah]) "value of z")
'result-value-ftw!

Wednesday, March 05, 2008

Xor

While I'm at it, let's continue the series of logical Maybe operators with xor:
infixr 2 ^^^

(^^^) :: Maybe a -> Maybe a -> Maybe a
Nothing ^^^ r@(Just _) = r
l@(Just _) ^^^ Nothing = l
_ ^^^ _ = Nothing

My current favorite little Haskell function

Here's my pet Haskell combinator du jour:
infixr 2 |||

(|||) :: Maybe a -> Maybe a -> Maybe a
Nothing ||| x = x
x ||| _ = x
This gives you a concise notation for branching in the Maybe monad, so instead of having to write:
case x `lookup` env1 of
Just v -> Just $ f1 v
Nothing -> case x `lookup` env2 of
Just v -> Just $ f2 v
Nothing -> Nothing
you can write:
do { v <- x `lookup` env1; return $ f1 v } |||
do { v <- x `lookup` env2; return $ f2 v }
This gives you the same kind of idiom that Schemers use or for. Come to think of it, I guess you could also write the analog of and:
infixr 3 &&&

(&&&) :: Maybe a -> Maybe a -> Maybe a
Just _ &&& r@(Just _) = r
_ &&& _ = Nothing
I haven't used this one yet, but it seems natural.

Tuesday, March 04, 2008

Intentional capture

Procedural hygienic macro systems like the syntax-case system make it possible to write capturing macros--macros which, depending on your philosophy, you might call "non-hygienic." The classic example is the "anaphoric" conditional form if-it, which implicitly binds a variable it to the result of the test expression:
(if-it 42 (+ it 1) #f) ; => 43
The difficulty in getting such a macro right comes when you try to write another macro that expands into if-it. To quote the mzscheme manual's section on macros, "macros that expand into non-hygienic macros rarely work as intended."

Andre van Tonder's SRFI 72 document contains a perfect and concise example, due to Kent Dybvig, of two different ways a macro might expand into a capturing macro. On the one hand, we might want to write when-it, a simple "one-armed" conditional that implicitly binds it in the same way as if-it:
(when-it 42 (+ it 1)) ; => 43
On the other hand, we might want to use if-it to implement the hygienic or macro, which shouldn't capture any variables.
(let ([it 10]) (or #f it)) ; => 10
First, here's the implementation of if-it: we create an identifier for it with the same lexical context as the operator of the expression:
(define-syntax (if-it stx)
(syntax-case stx ()
[(op e1 e2 e3)
(with-syntax ([it (datum->syntax #'op 'it)])
#'(let ([it e1])
(if it e2 e3)))]))
The references that will be captured by the introduced binding of it are the ones that were introduced into the program in the same expansion step as the occurrence of if-it in the macro call. In particular, if the occurrence of if-it was in the original program (i.e., written explicitly by the programmer), it captures references to it that were in the original program; if the occurrence of if-it is the result of a macro expansion, it captures only those references to it that were generated in that same expansion step.

This means that a hygienic macro that expands into if-it will work as expected:
(define-syntax or
(syntax-rules ()
[(op e1 e2)
(if-it e1 it e2)]))
Since the reference to it appears in the same expansion step as the occurrence of if-it, that reference is captured, but no references to it within subexpressions e1 or e2 (which had to have already been there before this expansion step) are captured.

If you want to write another capturing macro that expands into if-it, it's a little more work. Essentially, you have to capture it all over again. The moral of the story is that you always have to ask explicitly for a macro to capture an introduced identifier.
(define-syntax (when-it stx)
(syntax-case stx ()
[(op e1 e2)
(with-syntax ([it* (datum->syntax #'op 'it)])
#'(if-it e1 (let ([it* it]) e2) (void)))]))
Here we once again create an identifier with the same lexical context as the operator, and we bind it to the occurrence of it introduced by if-it.

These are good defaults for a hygienic macro system: it's easier to write hygienic macros but still possible (albeit a little harder) to write macros that capture. This is even true when you abstract over capturing macros: macros that expand into capturing macros are hygienic by default, but with a little more work again, you can create capturing macros that abstract over other capturing macros.