Friday, May 16, 2008

Fail-soft

Automatically repairing errors and continuing execution goes against the grain of software engineering, for good reason, but I've come to see that it's what makes the web go 'round. Robert O'Callahan has an interesting market-oriented take on error recovery:
...the economics of error recovery --- the fact that recovering from malformed input, instead of hard failure, is a competitive advantage for client software so in a competitive market it's sure to develop and you might as well put it in specifications...
In other words: ignore the errors, users will love you, and you win. We may not like it, but you fight economics at your own peril.

3 comments:

Daniel Yokomizo said...

The problem isn't ignoring malformed content, but that it's ad-hoc and we don't know if the content was well-formed or not. The classical example of IE illustrates this: it's bad to accept malformed HTML because it encourages sloppy programs, but it's even worse to not be able to trigger a standards mode and segregate the implementation handling malformed from well-formed content. In Perl you can go as lenient as you want or use strict and get more safety. Ditto for turning compiler warnings into errors and such.

If your program accepts malformed content it'll have to guess things, if it keep guessing wrong it will piss off its users, unless there's a way to override the guessing.

The Erlang way is a good example of doing this approach correctly. You have a fail fast path (i.e. abort on errors) and the higher-level layer (i.e. supervisor process) can decide to try a different approach (perhaps a lenient program), use heuristics to "correct" the input and try again, etc.. Provided the approach is modular and predictable it's just better engineering.

Anonymous said...

Users with no clear expectations from the tool don't tend to set high standards... This is even valid for compilers.

Anonymous said...

Automatically repairing errors and continuing execution goes against the grain of software engineering

I just finished Goal-Directed Reasoning for Specification-Based Data Structure Repair by Demsky and Rinard ( IEEE Trans. Soft. Eng., Dec. 2006). They hedge a bit by pointing out that sometimes automated repair may not appropriate, but I found their examples - including an air-traffic-control application - interesting demonstrations of automatic repair's value when it may be acceptable.