Monday, November 27, 2006

Does type inference have to suck?

Type inference with tools like Haskell and SML obviously sucks. Hard. The error messages are useless. I have to pinpoint my errors manually, narrowing down the error by commenting out half the program at a time. Any time you're using binary search to find a bug, it's safe to say your debugging tools aren't helping you.

There's been a fair amount of pretty hardcore research on this. But does it really need to be hard? If I could just have some help from a modern IDE, it seems like type inference errors should be perfectly adequate. At the very least, the type inference engine could keep track of source location for every type constraint it acquires, and then when it finds conflicts, the GUI could simply draw an arrow between the two conflicting expressions. If it doesn't know which one is wrong, fine! I'll know which one is wrong, but it has to show me both!

SML/NJ's approach of picking one at random to blame is pure madness.

Another useful tool would be for the IDE to show me the inferred type of any expression in the program, so that if it's not manifest, I can at least make it manifest. Annotating one or two expressions at a time and recompiling is a colossal waste of time.

I've got plenty of criticisms for Eclipse, but if there's one thing we can learn from it, it's that compiled languages don't have to force the programmer into the edit-recompile cycle. It's perfectly possible these days to run the compiler incrementally and continuously. (Conversely, you could say that there's no reason why interpreted languages can't be incrementally compiled.)

I know I'm not the first to point this out. Why hasn't anyone in the statically-typed functional languages community done this? You don't need to reinvent the IDE, just build some plug-ins for emacs or Eclipse.

15 comments:

Greg Buchholz said...

Maybe you could give us an example of some Haskell code with an unhelpful error message? The GHC community seems interested in producing better error messages. Why not submit a bug report or a feature request. Worst case scenario: It'll be ignored. But there's also a chance that it will get fixed.

Neel Krishnaswami said...

SML/NJ is just amazingly evil when it comes to error messages. The sorts of problems you describe aren't even the most frustrating!

For example, when you have an incomplete pattern match, SML/NJ will print the entire match expression and tell you it's missing a case. What it won't do is something useful, like printing an example of a value that doesn't match.

A lot of the SML programmers I know will use SML/NJ for the REPL, and then compile with MLton in order to get useful error messages out of the type checker. This is a crazy situation.

Anonymous said...

If type inference is the problem, stop using it. Problem solved.

As an added bonus, you'll learn the type system and realize when it's possible to omit explicit types without causing a problem.

Anonymous said...

Obviously this is a good point. I think the solution, is however, not so easy. Since type inference works in a bottom up fashion (they work from the facts (atomic statements) up the syntax tree) the effect is that the point at which the problem is noticed has nothing to do with the actual problem in the program. This makes making good diagnostics difficult if not impossible. Maybe different formulations of the problem (like terms with holes do for traces) could help in this regard.

Cale Gibbard said...

I haven't really used SML enough to be able to say much about it in this regard, but in the context of Haskell, I don't understand what your complaint is here. GHC generally gives extremely helpful error messages, which include the types that didn't match, the part of the source which caused the problems and exact line/column numbers. It also tends to generate multiple errors only for genuinely distinct problems.

If you're having problems with type errors occurring far away from where the real problem with your code was, it is quite likely that you haven't given the compiler enough information about your intentions for it to refine the location of the problem. That is, if you don't ever give things type signatures, then they might actually get valid types, but not the ones which you intended. Incremental programming helps to some extent here, but an easy fix is just to add type signatures to important top-level functions. If there is a type error in one of those then, you end up with a type error locally, rather than much later on when you go to apply it (or something which used it) and it has the wrong type. That will at least solve the problem of getting better source locations, though even without that, you get a fairly sharp look at the part of the source text where the types didn't match, and if you find that you expected them to match, then you immediately know that one of the functions or values involved is the problem.

Anonymous said...

Check out the helium haskell compiler. It concentrates on very informative error messages, particularly in relation to types.

Isaac Gouy said...

Do technical complaints have to suck?
What you've written is probably easy to understand for someone who has experienced the same issues - but in that case you aren't saying anything they don't already know.

Because there are no examples, what you've written is so generalized that it really doesn't help me understand the issue at all.

Taylor Venable said...

I think a lot of problems that people have with the strong static typing of languages like Haskell, SML, and OCaml come from two things: an over-reliance on type inference (for the wrong reasons) and a misunderstanding of the type system in use. They're related issues.

To the first issue, from personal experience I can say that giving everything an explicit type produces much more useful errors. The type inference system is best used to check your explicit type (what you think / want the function / data to be) against the actual type (what the computation / evaluation actual comes out to). Checking your work, not coming up with it in the first place, is what the type inference system is really useful for. I've noticed that this fact is emphasized in The Little MLer, which is pretty useful for getting to know SML's typing system. First think about the function as it's seen by the outside world -- what does it consume, what does it produce -- then write the code.

To the second issue, you have to realize there's a lot behind the scenes in terms of polymorphism and type classes. Sometimes you think that something is going to "just work" when using a generic type like [a] (list of a) in Haskell. Then the compiler gives you an error message like "No instance of (Ord a) arising from use of `<' at source:row:col" which, to one not acquainted with type classes, is very confusing. Like anything, one has to know the tools one is using before they can use them to their fullest potential.

Between these two issues, I don't think things are as bad as you make them seem. Type inference should be used for checking results against what you expect, and you have to know the system before you can use it most effectively. Maybe more IDE support would help programmers work through the problems "in the moment," but it won't help you understand why you're getting those errors, or (therefore) how to fix them non-specifically.

Tony Morris said...

Take a look at Chameleon http://www.comp.nus.edu.sg/~sulzmann/chameleon/download/haskell.html

Anonymous said...

Did you ever looked at the Visual Haskell project? It's done by some people at Microsoft Research, but its only a technical preview...

Jon Harrop said...

Check out OCaml and F#. Both have graphical throwback of inferred types.

I don't notice how helpful it is until I upgrade Debian and break it, at which point I have to fix it immediately!

Cheers,
Jon.

Ezra said...

Dave--

You're doing it wrong! Don't comment out half the program--just add type annotations at strategic places, starting near the error. This will move the error message around, like squeezing the toothpaste tube, and eventually you'll hit the hard lump that is the error. I find it usually gets me there in a few steps.

I've never used SML/NJ, but I've been quite happy with mosml (Moscow ML), FWIW.

Phil Armstrong said...

Having written a type checker for a functional language, I had a look at exactly this problem. Any type checking algorithm which works in a bottom-up fashion will only be able to give you a single type-inference failure: the problem is that the point of failure may be many kloc away from the actual error in the source; the original type error is propagated through the code by the type checker until it runs into an unresolvable type conflict.

I think that best that a type checker could do would be to record the chain of reasoning that led to the type error, so you could 'unwind the stack' looking for the cause of the problem.

This could still include the entirety of your source code unfortunately :(

As others have posted, you can use explicit type annotations to push the error around until it reaches the cause of the type conflict.

sigfpe said...

Though it'd be nice to be able to pick any object and display the type the compiler thinks it has, this can also fail to be enlightening.

I'd like to have the IDE be able to point to those places in the code that caused the compiler to make the inferences that it did. For example, if the compiler has inferred Num a, I'd like it to show me which application of (+) it was that caused it to make this inference.

frontline plus said...

could provide a rare example of an error in the Haskell and SML you mention? I would be very helpful

thanks