The first part of this note postulates a set of requirements for a mechanism for handling pragma. The requirements are rather restrictive, vastly simplifying the problem. The second part describes three different designs we have been considering and recommends one of them for further exploration.
Syntax pragmas can appear only in certain grammatical contexts; they become part of the actual syntax tree. Three Java syntactic categories need to be expanded to allow syntax pragmas: member declarations (that is, class and interface members), statements, and modifiers for declarations of classes, members, and local variables. (When a declaration declares multiple entities at once, for example, "int x, y;", modifier pragmas may apply to all of the entities; there need not be syntax for separately modifying each entity in the declaration.)
A pragma may not be split across multiple comments. However, more than one pragma syntactic pragma of the same category (for example, more than one modifier pragma) must be allowed in a single comment.
If the parser parsed the while statement such that the while body consisted simply of the assert statement, then the take out principle would be violated. Such a violation would be surprising to the user of a tool like ESC/Java because the program being checked would be different from the program that gets compiled. (In this particular case, one solution would be for the parser to consider the above a syntax error, insisting that the programmer use curly braces when the body of a while statement contains an assert.) We believe the above requirements are sufficient for the current ESC/Java annotation language. They also seem fine for Mr. Java Spidey and a Javadoc-like tool.... while (...) //@ assert ... x = x + 1; ...
We have also identified the following "nice-to-have" feature. Modifier pragmas should be allowed just before the ; terminating field and abstract method declarations; they should also be allowed just before the { of a method body. The idea is to put annotations for member declarations after the header of the declaration; otherwise, one can end up wading through many lines of modifiers before getting to see the name of a method (this is a problem with Javadoc, and could be for ESC method specs too).
The following set of rules could be used to associate pragmas with nodes. For a literal expression, all pragmas appearing after the token preceding the literal and before the literal itself are associated with the literal; pragmas appearing after the literal are left for the syntactic phrase that follows. For a binary expression, pragmas appearing before the left- and right subexpressions are picked up by those subexpressions; pragmas appearing after the first subexpression but before the OP are associated with the binary expression itself. For an if statement, pragmas appearing before the subexpression and substatements are associated with those subphrases; pragmas appearing before the if, the else, and the trailing ; are associated with the if statement itself.Expr ::= Literal Expr ::= Expr OP Expr Stmt ::= 'if' Expr Stmt 'else' Stmt ';'
As these examples illustrate, it may not be too hard to define a complete set of rules for associating pragmas with parse-tree nodes by appling simple rules of thumb like "subphrases pick up the pragmas in front of them but leave behind pragmas after them." However, this approach seems overly general given the requirements listed above. Further, it would require a traversal of the parse tree to move syntactic pragmas from their initial positions as decorations to their final positions as parts of the syntax tree. As a result, this approach is currently out of favor.
So far, the behavior of the scanner and parser described above is completely conventional. Here is how pragma parsing differs from the convention. When the scanner creates the "pragma token," it includes in that token a pointer to a correlated input stream whose contents consist of the contents of the pragma comment. Inside the pragma-parsing method, this input stream is turned into a lexer object from which the pragma is parsed. (An alternative to introducing this input stream would be to have the original scanner continue lexing inside the pragma text. However, such a design is not very graceful when the lexical language inside pragmas is different from Java's lexical language.)
When it comes to integrating the results of the pragma-parsing methods into the parse tree, a nice mechanism already exists for doing so. It turns out that syntactic pragmas all appear in contexts where a sequence of them is always allowed (for example, a modifier pragma can be followed by another modifier pragmas, and a member-declaration pragma can be followed by another member-declaration pragma). The parser handles sequences of phrases as follows: as each member of the sequence is parsed, it is pushed onto a stack. When the sequence is over, all members of the sequence are poped off the stack into an array. Thus, all the pragma-parsing methods need to do to integrate with the tree-building mechanism of the parser is to push the pragmas they parse onto this stack. No change is required to the existing parser to have it pick up these nodes and integrate them into the tree (the existing parser would have to be changed to call the pragma-parsing methods when a pragma token is encountered).
Here are the detailed steps a programmer must take to add a pragma language to a front end under this approach:
When the scanner is created, it is given an instance of the PragmaParser interface which it will use for parsing pragmas. When the scanner encounters a comment, it bundles the text of that comment into a correlated input stream and passes it to the PragmaParser instance. The lexer then calls a method on the PragmaParser to parse the first supertoken and returns the result to the parser. The next time the lexer is called, it first checks the PragmaParser to see if there are more supertokens; if so, the next one is returns, if not, the lexer continues scanning the Java text that follows the pragma comment.
The detailed steps a programmer must take to add a pragma language to a front end under the supertoken approach is largely the same as the steps under the conventional approach: