Saturday, February 27, 2010

Generalizing Javadot

The idea of Javadot is to allow the rich lexical syntax of Lisp and Scheme with the elegance of the dot-notation from the C tradition by simply allowing scope to trump lexical splitting: if is in scope as an identifier, then it parses as a variable reference; otherwise it parses as "foo" "." "bar". It's a simple compromise, it's easy to understand, it plays well with lexical scope, and (in an infix language) you can always circumvent it with whitespace. (I don't think it needs to complicate parsing too much, either, since you can do a post-hoc splitting of the lexeme, rather than forcing the lexer to understand scope the way you're forced to with the C typedef ambiguity).

My question is: couldn't you use this idea in an infix language, and generalize it to work for all infix operators? This would allow the use of other common operators such as -, +, *, /, <, and >, all of which I've loved being able to use in identifier names in Scheme, and all of which I also really like being able to use as infix operators.


Jay McCarthy said...

Do it! It's easy to imagine how to make it expand properly with #%top and #%variable-reference.

Sjoerd Visscher said...

I have this in my toy language. I love it, as it forces you to use white-space around infix operators, which just looks a lot nicer. In fact, in this language you can use any expression as infix operator. F.e. you can construct a binary tree like this: (leaf branch(1) leaf) branch(2) (leaf branch(3) leaf). Here branch is a constructor with 3 arguments.

Dave Herman said...

Jay: I meant it for infix languages, not for Scheme. I mean, Javadot makes sense in the context of Scheme, but once you start having multiple operators, wouldn't you have to write a general-purpose infix parser?

Sjoerd: When you say "forces" do you mean always? My thought was that it's a nicer compromise that you don't have to put whitespace in unless there's an ambiguity with an existing bound variable. This isn't expressible directly in BNF, but you could just lex it as a multi-part lexeme and let a later pass, after scope analysis, decide what to do with it.

Sjoerd Visscher said...

Yes, I mean always. It is really helpful to be able to read code without any knowledge of bindings. (Both for the parser and the human reader.)

Don't you hate code where there's no white-space around operators? I certainly do.

Dave Herman said...

Um, then you don't "have this in [your] toy language." You in fact disagree with my post.

Sjoerd Visscher said...

The point I picked up from your post was this: "This would allow the use of other common operators such as -, +, *, /, <, and >, all of which I've loved being able to use in identifier names in Scheme, and all of which I also really like being able to use as infix operators."

This is what my toy language has.

But I guess your point was "allowing scope to trump lexical splitting".

Dave Herman said...

Yeah. Maybe I'm being pedantic, but in my head the interesting design question was whether this hybrid approach is a sweet spot.

Certainly languages like C, where scope is required just to parse, tend to be pretty glorious botches. But in this case, you can still parse, you just have to do a little more elucidation at the point of lexical analysis.

Your point about readability is well-taken; readers would have to know that


could either be an identifier or a binary expression. OTOH, I suspect arithmetic expressions could get pretty unwieldy if you have to have spaces anywhere. But maybe not.

Dave Herman said...


Casino Royale Suit said...

It's gracefull to see what i was searching for. That's the material which I was searching about. The matter you've highlighted above is surely valueable for anyone to workout. the concept of your article is very true and It will result in a positive way.Casino Royale Suit