Sunday, September 21, 2008

Units of measure in F#

I'm at ICFP and just watched Andrew Kennedy give a delightful invited talk about the support for units of measure in F#. It was a very impressive example of work that's well-designed all the way through, from theory to practice. In essence, it allows you to say things like
let gravity = 9.8<m/s^2>
and the type system tracks not only the type float but also the unit of measure m/s^2 and prevents unit-mismatch errors in the program.

One of the questions I had was based on my brief experience several years ago with statistical natural language processing, where a lot of the computation we did was in logspace. In a situation like that you would like to keep track of both the ordinary units of measure as well as the fact that everything is in logspace.

Dan Grossman took this a step further and pointed out that you want your arithmetic operations to have different types when you're working in logspace. For example, while the + operator usually has a type like:
+: float<'u> -> float<'u> -> float<'u>
in logspace it should instead have the type:
+: float<log 'u> -> float<log 'v> -> float<log ('u · 'v)>
Now, I don't know how much support there is for "unit constructors," but the F# unit system does allow you to use units with custom abstract data types. According to Andrew, F# also supports operator overloading and ties it in with the unit system. So it might in fact be possible to express the logspace version of the + operator!

This integration between overloading and units could be a nice solution to the issue of different representations of numbers. Libraries like Java's BigInteger are a usability disaster, but we only have one syntax for numerals and operators. Tying a type-and-unit system in with overloading could allow you to use various machine representations (fixnums, flonums, IEEE754r decimal, bignums, exact rationals, ...) by annotating literals and variables, and type reconstruction could help with the operators. I'm not a big fan of unification-based inference, but in this case I wonder if simple structurally recursive reconstruction would be enough in practice.

Update: Ken Shan points out that only the implementation would use +, but you really would want to overload the * operator. Conceptually you're doing a multiplication.

No comments: