The Little Calculist: modules

Lexical scope in general and lambda in particular go a long way towards supporting modular, separate development. As I mentioned before, PLT Scheme builds a number of first-class, module-like component systems on top of lambda. But it also contains a more primitive module system in which modules are not first-class. This serves a number of purposes.

First of all, static modules provide systematic support for packaging, compiling, and deploying code. First-class modules are flexible and expressive, but they don't have anything to say about compilation and deployment. Somewhere along the line there has to be a notion of what the compiler takes as input, and when you have separate development, you need separate deployment and ideally separate compilation.

Another critical purpose of static modules is the ability to modularize static entities in a language. In ML, for example, modules can import and export types. Scheme is of course more [ed: dynamic] than ML, but it still has its own crucial compile-time abstractions: macros. With dynamic modules, there's no straightforward way to import and export static entities like macros. A secondary benefit of static modules is that you can import all the bindings from another module at once without having to spell them all out; this is admittedly less important but still very convenient.

Finally, PLT Scheme was designed to support multiple languages. For pedagogical purposes, this has allowed them to design multiple, concentric subsets of the language tailored to the How to Design Programs curriculum. This has also facilitated language research by making it easy to design and implement new languages (by macro-compiling them to Scheme) and to research language interactions in a multi-language environment. The relevant piece of the module system is a single hook at the beginning of a module definition: the grammar of a module is an S-expression containing the symbol module, a symbol naming the module, and an S-expression indicating the language of the body:

(module foo scheme body ...)

Typically the language chosen is the special built-in language scheme, as above, or the somewhat leaner scheme/base. But the language position works by simply importing another module that implements the required macros for compiling the body. In place of scheme, you can put in a module path for any module installed on the system.

It's also possible to specify a custom reader for a PLT language so it doesn't even have to be restricted to an S-expression syntax. The initial reader allows you to specify a language with the special shebang-like #lang syntax:

#lang scheme/base
body ...

From that point on, the language's reader has access to the input stream to parse it any way it likes.

Anyone can implement a language module, which means people have developed PLT implementations of Algol, Java, ML, and JavaScript, to name a few.

Update: Added in the missing word "dynamic" above. Also, this is a better link for module paths.

Also, Sam is right in the comments: the scheme language really isn't special or built-in in any significant way. It's simply another module provided in the standard PLT collections.

A dynamic language starts as an implementation. And that implementation almost always includes some variation on a REPL. Look at Lisp, Scheme, Python, Ruby, JavaScript, Lua, etc. One of the things that makes these languages "dynamic" is that they're attached to some long-running process: an IDE, a user prompt, an embedding application, a web page. So these languages are simply implemented with some global table of definitions that gets updated through the lifetime of the host process. Easy enough to understand from the implementor's perspective, but what happens to the user of the embedded language?

The problem with the "top-level" (that ever-changing table of program definitions) is the creeping specter of dynamic scope. While static vs. dynamic typing may be a debate that civilization takes to its grave, the dust seems to have more or less settled on dynamic scoping. The fact is, even though many decisions a program makes must depend on its context, it's very hard to understand a program if you can't nail down its definitions.

To some degree, if a dynamic language has lambda, you can escape the top-level with a design pattern that simulates a poor man's module. The module pattern is almost always a variation of what Schemers call "left-left-lambda"--the immediate application of a function literal. Bindings inside the lambda are no longer floating in that airy and transient stratosphere of the top-level; they're nailed to the function that is being applied. And you know that function is being applied only once, because once you've created it, it's applied an discarded.

This pattern goes a long way, and if you have macros, you can create sugar to give the pattern linguistic status. But a module system it ain't.

Linking: Nothing in this pattern deals with the relationships between modules. There's no way to declare what a module's imports and exports are. In fact, if you want a module to communicate with any other modules, the top-level's poison tends to seep back in. To export a value, a lambda-module can mutate some global variable, or it can return a value--but where does the caller save the value? You can always nest these solutions within more lambda-modules, but ultimately there's a problem of infinite regress: in the end you have to have at least one special top-most module surrounding your program.

Separate development: And that's only if you have control over the whole program. If you want to create a library and share it, there needs to be some shared common framework in which people and organizations can share code without stomping on each other's invariants or polluting each other's global environments. To be sure, a built-in module system doesn't eliminate all such issues (you still need conventions for naming and registering modules within a common framework), but modules help to standardize on these issues, and they can provide helpful errors when modules step on each other's toes, rather than silently overwriting one another.

Loading: There's not much flexibility in the loading of an immediately applied function. If your language involves multiple stages of loading, the implementation may be able to be smarter about loading and linking multiple modules at once.

Scoping conveniences: Lexical scope is a widget in the programmer's UI toolkit, and for different scenarios, there are different appropriate designs. The tree shape of expressions makes the lexical scoping rule ("inner trumps outer") appropriate; it favors the local over the global. But, ignoring nested modules for the moment, modules aren't tree shaped; they're more like a global table. In a sense, all modules are peers. So when you import the same name from two different modules, which one should win? You could say that whichever you import later wins, but this is much more subtle than the obvious nesting structure of ordinary variable bindings. I find it's more helpful for the module system to give me an error if I import the same name from different sources (unless it's a diamond import). Other useful facilities are selective import, import with renaming, or import with common prefixes. These are subtle usability designs where modules differ from lambda.

Extensibility: In PLT Scheme, we've used the module system as the point for language design and extension. By allowing modules to be parameterized over their "language", we have a natural way for introducing modalities into PLT Scheme. As languages grow, these modalities are an inevitability (cf. "use strict" in Perl and ECMAScript Edition 4). Buried within pragmas or nested expressions, this makes the design of the language much harder. But within a module, bindings are sacrosanct and interactions with other modules are limited to imports and exports. This significantly cuts down on the space of possible interactions and interferences between the language's modalities. As an example, Sam Tobin-Hochstadt has made good use of this for the design of Typed Scheme, a statically typed modality of PLT Scheme that can still interact reliably with dynamically typed modules.

The unrestricted mutation of the top-level environment is a good thing thing for many purposes: interactive development, self-adjusting execution environments, etc. But it's terrible for nailing down program definitions. Modules are a way of circumscribing a portion of code and declaring it "finished". It can still be useful to have an environment where modules can be dynamically loaded and possibly even replaced, but it's critical for the language to provide the programmer with basic invariants about the definitions within a module.

All of this is stuff I wish I'd had a clearer head about earlier in the ES4 process. But I hope that, down the road, we'll consider a module system for ECMAScript.

The Little Calculist

Thursday, February 19, 2009

PLT System Facilities (Software Components): Modules

Wednesday, April 30, 2008

Dynamic languages need modules

Blog Archive

Recommended Reading

Haunts

About Me