Monday, June 20, 2005

We have met the enemy, and it is... hidden invariants

A couple well-known encodings have a very nasty property that they are not bijective. For example, a list of pairs can always be transformed into a pair of lists, and a pair of lists can be transformed into a list of pairs, but if the two lists are of different length, you either have to fail, drop some elements, or throw in some dummy elements. (The latter may be impossible for a polymorphic function.)
unzip : [(t, u)] → ([t], [u])
zip : ([t], [u]) ⇀ [(t, u)]
That means that the "unzipped" representation, i.e., the pair of lists, contains a hidden invariant that's not expressed in the type: that the two lists have the same length. Even in a language with decidable dependent types like Dependent ML, this invariant can only be enforced so far; for example, if you run the filter function over the two lists, the resulting lists will have statically unknown length.

Similarly, the option type constructor doesn't distribute nicely over tuples. If you have two fields that must both either be present or absent, it's better to represent them as a single field that is an optional tuple, rather than separate optional fields. This is taking advantage of the type system to help keep your datatypes straight. Even without a type system, it's still better because it groups related elements, and it's one less property to document. We're talking datatype abstraction here, folks. It's a Good Thing.

I've noticed that languages that have a kind of built-in option always available--I'm looking at you, Java--encourage the proliferation of this family of hidden invariants. There are many cases where an object has multiple fields that must all be either null or non-null. I realize they needed null in the language because when you create an object, the fields need to have some well-defined value before they're initialized. (Of course, imperative constructors are a whole 'nother world of pain...) But the additional fact that Java discourages the creation of small datatypes deters people from abstracting their datatypes into separate classes, and the result is these extra invariants.

And we know what happens to hidden invariants: they hide away in the back of the programmer's brain until they get metally garbage collected, lost to posterity, and consequently violated.

No comments: