Wednesday, April 30, 2008

Dynamic languages need modules

A dynamic language starts as an implementation. And that implementation almost always includes some variation on a REPL. Look at Lisp, Scheme, Python, Ruby, JavaScript, Lua, etc. One of the things that makes these languages "dynamic" is that they're attached to some long-running process: an IDE, a user prompt, an embedding application, a web page. So these languages are simply implemented with some global table of definitions that gets updated through the lifetime of the host process. Easy enough to understand from the implementor's perspective, but what happens to the user of the embedded language?

The problem with the "top-level" (that ever-changing table of program definitions) is the creeping specter of dynamic scope. While static vs. dynamic typing may be a debate that civilization takes to its grave, the dust seems to have more or less settled on dynamic scoping. The fact is, even though many decisions a program makes must depend on its context, it's very hard to understand a program if you can't nail down its definitions.

To some degree, if a dynamic language has lambda, you can escape the top-level with a design pattern that simulates a poor man's module. The module pattern is almost always a variation of what Schemers call "left-left-lambda"--the immediate application of a function literal. Bindings inside the lambda are no longer floating in that airy and transient stratosphere of the top-level; they're nailed to the function that is being applied. And you know that function is being applied only once, because once you've created it, it's applied an discarded.

This pattern goes a long way, and if you have macros, you can create sugar to give the pattern linguistic status. But a module system it ain't.

Linking: Nothing in this pattern deals with the relationships between modules. There's no way to declare what a module's imports and exports are. In fact, if you want a module to communicate with any other modules, the top-level's poison tends to seep back in. To export a value, a lambda-module can mutate some global variable, or it can return a value--but where does the caller save the value? You can always nest these solutions within more lambda-modules, but ultimately there's a problem of infinite regress: in the end you have to have at least one special top-most module surrounding your program.

Separate development: And that's only if you have control over the whole program. If you want to create a library and share it, there needs to be some shared common framework in which people and organizations can share code without stomping on each other's invariants or polluting each other's global environments. To be sure, a built-in module system doesn't eliminate all such issues (you still need conventions for naming and registering modules within a common framework), but modules help to standardize on these issues, and they can provide helpful errors when modules step on each other's toes, rather than silently overwriting one another.

Loading: There's not much flexibility in the loading of an immediately applied function. If your language involves multiple stages of loading, the implementation may be able to be smarter about loading and linking multiple modules at once.

Scoping conveniences: Lexical scope is a widget in the programmer's UI toolkit, and for different scenarios, there are different appropriate designs. The tree shape of expressions makes the lexical scoping rule ("inner trumps outer") appropriate; it favors the local over the global. But, ignoring nested modules for the moment, modules aren't tree shaped; they're more like a global table. In a sense, all modules are peers. So when you import the same name from two different modules, which one should win? You could say that whichever you import later wins, but this is much more subtle than the obvious nesting structure of ordinary variable bindings. I find it's more helpful for the module system to give me an error if I import the same name from different sources (unless it's a diamond import). Other useful facilities are selective import, import with renaming, or import with common prefixes. These are subtle usability designs where modules differ from lambda.

Extensibility: In PLT Scheme, we've used the module system as the point for language design and extension. By allowing modules to be parameterized over their "language", we have a natural way for introducing modalities into PLT Scheme. As languages grow, these modalities are an inevitability (cf. "use strict" in Perl and ECMAScript Edition 4). Buried within pragmas or nested expressions, this makes the design of the language much harder. But within a module, bindings are sacrosanct and interactions with other modules are limited to imports and exports. This significantly cuts down on the space of possible interactions and interferences between the language's modalities. As an example, Sam Tobin-Hochstadt has made good use of this for the design of Typed Scheme, a statically typed modality of PLT Scheme that can still interact reliably with dynamically typed modules.

The unrestricted mutation of the top-level environment is a good thing thing for many purposes: interactive development, self-adjusting execution environments, etc. But it's terrible for nailing down program definitions. Modules are a way of circumscribing a portion of code and declaring it "finished". It can still be useful to have an environment where modules can be dynamically loaded and possibly even replaced, but it's critical for the language to provide the programmer with basic invariants about the definitions within a module.

All of this is stuff I wish I'd had a clearer head about earlier in the ES4 process. But I hope that, down the road, we'll consider a module system for ECMAScript.

7 comments:

Anonymous said...

Not quite sure why you include Python in that list - the pollution of the top-level namespace that occurs in other languages generally wouldn't happen in Python. Any module you use must be explicitly imported in any other module that wishes to use it. Even when imported, it must still be qualified using the import name - unless you ask explicitly for some set of symbols from that module.

The REPL in Python is effectively considered a "module", and (with the exception of a few "builtin" methods) thus has access only to those modules and variables that it explicitly imports.

Maybe I'm missing your point, but I don't think this problem is a static vs. dynamic language thing...

Paul Steckler said...

In ghci, the Haskell REPL, you can query current bindings with ":show bindings". Using that helps avoid some surprises.

-- Paul

John Haugeland said...

It is important that you learn a well established language that has already successfully grappled with these problems before deciding on your own mechanism. Given the similarity in approach especially to Adobe's ES4 implementation, Erlang seems like a natural choice. And, since they had this stuff worked out 15 years ago, chances are there are a great number of other pieces of goodness waiting for you in their mechanism.

Dave Herman said...

thomas: I don't know very much about Python, so maybe it was hasty to put it in the list.

stonecypher: Care to give me any references on Erlang's module system? The reference manual doesn't have that much to say that isn't pretty straight-forward. (Also, I suspect Erlang's lack of mutation makes a lot of things easier.) But as you say, the Erlang community has lots of wisdom to offer so I'd be very interested in any papers or books you might suggest.

lmeyerov said...

In terms of loading, assuming introspective capabilities aren't excessive (*cough* toString of functions in JavaScript), static isn't necessarily the way to go. Doloto (http://research.microsoft.com/projects/doloto/) is very promising, and context sensitive loading (perhaps influenced by concrete usage patters), on average, are even better.

I haven't thought as much about the other points yet, but it's pretty crucial to separate simple implementation concerns from true expressive impacts..

Anonymous said...

In es4, can some combination of namespaces, units and packages offer module functionality?

Dave Herman said...

Pkd--

First of all, units and packages have been dropped. There is a lightweight notion of compilation units implicitly in things like code loaded from separate files, but nothing explicit.

Second, none of these things is quite the same as a module. ES4 namespaces are cross-cutting, not lexical, so they don't give you a textual notion of a separable unit of code, nor do they help nail down the "definedness" of names. They simply allow you to carve up the universe of names into disjoint partitions. Packages were, IMO, just badly designed. Units are important but the committee seemed to sense they hadn't matured enough, so they were dropped for ES4.