Thursday, February 23, 2006

Nancy typing

It's popular in scripting languages to perform lots of automatic conversions on data in an attempt to "do what I mean". In some cases, the heuristics for conversion are arguably relatively clear, but sometimes the language is so eager to find some way to make the operation consistent rather than simply give an error (and force the programmer to be more explicit) that extremely bizarre and unpredictable behavior ensues.

I call this Nancy typing, for a delicious reason that will soon become clear. What this has to do with duck typing, I'm not sure--I can't find a straight-forward definition. According to Wikipedia's rather confused entry it's not the same thing as structural types, but the jury seems to be out in this LtU thread. As far as I can tell, duck typing either has something to do with the idea of performing conversions between separate datatypes in order to support an operation or simply subtyping based on the set of supported operations. Actually, in the TAPL chapter on structural types, Pierce points out that structural type systems often include axioms that assert subtyping relationships between types whose runtime representations are completely different (such as float and int, for example), and these coercions actually have to be implemented in the runtime by conversions. So maybe the distinction is artificial, and duck typing really is the same thing as structural subtyping.

Anyway, the typical example used to demonstrate the hazards of structural subtyping is a GraphicalObject class and Cowboy class that both implement draw operations with entirely different meanings. But I prefer the Nancy bug.

My friend Mike MacHenry told me this story. Mike's company has an expert system, written in Perl, that employs heuristics to determine whether a customer is likely to have filled out a form incorrectly. One of the heuristics is supposed to check that the names given in two different columns of a record are the same. Now, it turns out that the comparison was implemented wrong--the software was almost always reporting that the names were the same, even when they weren't--but because the expert system is only making a best guess, this just passed unnoticed for a while as a false negative.

But every once in a while, the system reported a false positive, i.e., that the names were not the same when in fact they were. And it just so happened that in each one of these cases, the name in the record was "Nancy."

The explanation is that the programmer had accidentally used the numeric equality operation, rather than string equality. Perl silently converted the strings to numbers in order to make the operation succeed. Since in most of the cases, the strings were not numeric, Perl parsed the maximum numeric prefix of the string, namely the empty string. It so happens that the empty string happily converts to 0 in Perl, so most of the time, any pair of strings both convert to 0, and the equality comparison evaluates to true.

But "Nan" is parseable in Perl as not-a-number, which is always considered unequal to any number, including itself. So "Nancy" happily converts to not-a-number, and then tests false for numeric equality to itself.

Tada! The Nancy bug.

Update:

7 comments:

Anonymous said...

I can't reproduce this problem. Do you have some example code?

Dave Herman said...

Yeah, I wanted to write something pithy like "Nancy" != "Nancy" but "Sluggo" == "Sluggo", but I couldn't actually verify it on my machine. It appears that the behavior of NaN depends on the particular Perl implementation and platform. Search man perlop for "NaN" and you'll see that not all platforms have the same behavior.

dskippy said...

We're using an ancient version of Perl; Version 5.005_03. The code is just as Dave said.

mike.machenry said...

So you might be interested to know that this behavior was briefly changed in perl 5.8.7 but then went back to the way it was in 5.005_03 in the perl 5.8.8 release. My coworker suggests that the perl developers are rotating this behavior back and forth to make sure that perl programmers aren't depending on comparing strings as numbers.

jto said...

"simply subtyping based on the set of supported operations" That sounds definitely wrong to me. For example, in Python it's common to define an object with only a write() method (no close/seek/etc.) to serve where a file (which has all those operations) is expected. C++ templates support this kind of thing too.

Structural typing is an exact static analog of duck typing, as far as I can tell.

But note that just like any type-y concept from dynamic programming, supporting everything dynamic programmers actually do with duck typing probably isn't really possible. To start with you would need recursive structural types...

frontline plus said...

I also prefer Nancy BLUG. I previously had the opportunity to use class and Cowboy GraphicalObject but I worked as I expected

Anonymous said...

Use'ing warnings (preferrably diagnostics as well) will warn about comparing strings as numbers, in any Perl release (at least since 4.0.0).
Never turn off warnings in production code just because they're nasty and there's less coding effort with 'no warnings;' -- warnings are there for good reason and most usually help coding faster and safer rather than anything else.