The Little Calculist: Garbage collecting blog posts

Saturday, April 02, 2005

Garbage collecting blog posts

Interesting! I just fixed a typo in the title of a blog entry, and Blogger demonstrated a property of the semantics of blog archives.

All blog entries are archived permanently, and each post gets its own dedicated archive page. The file name of the page is derived from the words in the title of the post (and placed in a subdirectory based on the date, to lessen the likelihood of name clashes -- I'm sure they have conflict resolution rules, but this makes it less of an issue).

But Blogger also lets you to go back and edit old posts, even changing their titles. This means that the archive file name can change. So what happens with the old archive file? My first guess was that it would disappear, but it didn't! Instead, the new archive file was created with the new name, and the old archive file remained as it was, with the contents of the original version of the post.

The reason why this is (at least almost) the right behavior is because links might exist to the old site. In fact, you can't ever garbage collect the old post because, even if no links existed originally, some jerk might link to the old post in the future, like me:

http://calculist.blogspot.com/2005/04/ressurecting-java-objects.html
http://calculist.blogspot.com/2005/04/resurrecting-java-objects.html

If my understanding is right, both these links should remain live.

The one part of this behavior that is arguable in my mind is what the contents of the old version should be. It could stay the same as it was, which is what Blogger does, it could get replaced with the new version, or it could contain some kind of message indicating that the page has been superseded by a new one, with a link or automatic redirect. As it is, there's no indication on the old page that there's another, newer version.