ESCJ 24: Astgen Manual

Compaq Confidential.

Last modified: September 22, 1998

The astgen tool reads in a file containing annotated, partial implementations of AST classes and writes full implementations for those classes, putting each in its own source file. It also outputs two auxilliary classes. Using the generator leads to a description of AST classes that is more manageable than the full implementation would be because it is in a single file and because it is smaller by a factor an order of magnitude. Also, the generator allows one to easily change an AST hierarchy and the code found inside of AST classes.

The input to astgen looks very much like a set of Java class declarations. These declarations are annotated with Java comments containing pragmas understood by astgen. The input must use Java's lexical language and must follow the following grammar:

 PackageDeclaration_opt ImportDeclarations_opt EndHeader ClassDeclaration*

where the non-terminals other than EndHeader are defined in the Java Language Specification. The EndHeader is a Java single-line comment starting with //# followed by some space then followed by the keyword EndHeader (case is significant to astgen). If a ClassDeclaration in an astgen input file has a superclass, the declaration of that superclass must appear earlier in the input file.

Given such an input file, astgen does the following:

All text (including comments and whitespace) before the EndHeader directive is read as the "generic header." It is meant to include a package declaration and imports that apply to every AST class specified in the input file.
For each ClassDeclaration named C, a file named C.java is created. The generic header is written to this file. Then, all text of C, including whitespace and comments, is copied into this file. (This text includes everthing up to and including the closing brace (}) of C, plus any white space after that brace up to (and including) the first new line.) Along the way, pragmas in C may be expanded, as described below. Also, a number of "boilerplate" members are generated into C.
After all class declarations are processed, some auxilliary files are generated, as described below.

Output

The tool outputs one .java file for each ClassDeclaration in the input file. As discussed above, these per-class .java files consist mostly of the generic header appended to the text of the class declarations, plus some boilerplate methods generated automatically. "Child fields" of an AST node are declared using pragmas (pragmas are described in the next section). Child fields are public fields pointing to what should be the children of an AST node. The child fields of a class declaration play an important role in the generation of the boilerplate for the declaration.

In addition to the per-class .java files, the tool outputs a file called SubTagConstants.java and another called Visitor.java. These files support the boilerplate code generated for the per-class .java files.

The bullet points below describe the boilerplate methods generated plus the two support files for them.

Object construction. Classes generated by astgen are meant to be instantiated via static "maker" methods rather than through constructors. This convention allows one to intern certain AST nodes, for example, the node representing the type int. To keep clients from directly instantiating AST classes, astgen generates a protected constructor that takes no arguments for non-abstract classes.

For each non-abstract class C, astgen generates a public, static method named make. The default implementation of make returns a newly-allocated instance of C. (This default can be overriden using pragmas.) The make method takes an argument for each child field of C, including child fields inherited from the superclass of C. The order of the argument list is as follows: superclass child fields come before subclass child fields; within a class, child-field arguments are ordered according by the order of the pragmas defining the child fields.

Tags. To support the use of switch statements to distinguish different AST node types, astgen outputs an instance method getTag in each non-abstract class and also outputs a constant field declaration in SubTagConstants.

In non-abstract class Name, the automatically-generated getTag method returns the constant TagConstants.NAME (note the change to all-caps). In SubTagConstants, a constant int field named NAME is also generated. The intent is for the (user-written) type TagConstants to extend the automatically-generated interface SubTagConstants.

The SubTagConstants interface is not public (ie, it has "package" level protection). The generic header is also appended to the front of SubTagConstants.java, which will put it into the same package as the other classes generated for a given input file.

Visitors. To support the use of visitors for traversing ASTs, the code generator outputs an instance method accept and also outputs an abstract class Visitor which is the superclass for all visitors.

The accept method for class Name looks like the following:

 public void accept(Visitor v) { v.visitName(this); }

accept

For every class Name (abstract and non-abstract), a visitName method is generated in Visitor. This method takes an argument of type Name and returns void. If Name has no explicitly declared superclass, then visitName is an abstract method. If there is a superclass named Name2, then an implementation of visitName is given which looks like

 public void visitName(Name o) { visitName2(o); }

Visitor

Visitor.java

Children. To help in traversing an AST, two methods, childCount and childAt, are generated into non-abstract classes for counting and extracting the (direct) children of a node.

As in the case of the maker methods, "children" here includes children inherited from superclasses; also, the order of "children" is defined the same as for makers. However, there are two twists. First, children that are primitive types such as int are not counted by childCount or returned by childAt. Also, the pragmas defining child fields allow for the specification of child fields that are really sets of children; we call these "vector children." The childCount method counts each member of a vector child as a separate child; similarly, the numbering of children for the purpose of defining indices given to childAt also counts each member of a vector child as a separate child.

Invariant checking. For all classes (abstract and non-abstract), a method called check is generated for dynamically checking invariants of a node. This method takes no arguments and returns void. This method first calls super.check, then performs checks on each locally defined child field whose type is not a primitive type. The default, per-field checks are (a) to ensure that the field is not null and (b) to call the check method on the object in the field. (These checks can be change via pragmas; see below.) For vector children, the above checks are done on each member of the vector.
Debug presentation. For non-abstract classes, a toString instance method is generated that returns a String representation suitable for debugging output.

The discussion above suggest that many of the above methods are generated only in non-abstract classes. This is not exactly true. As mentioned earlier, if a ClassDeclaration in the input to the tool has a superclass, the declaration of that superclass must appear earlier in the input file. This implies that every input file declares a set of "root" classes that are superclasses of all the other, non-root classes declared in the file. If a class is both abstract and one of these root classes, then the tool generates into it abstract versions of the methods listed above (that is, versions without implementations). This means that the methods defined above can be called on all AST nodes, not just concrete ones.

Pragmas

Inside a ClassDeclaration, between member declarations, astgen recognizes a number of pragmas which either generate member declarations or control the output of boilerplate members.

The syntax of a pragma is a Java single-line comment on a line by itself. Pragma comments are distinguished by starting with //#. Inside class C, the following pragma defines the child fields of C:

"//# Type [*] Identifier {NullOk|NoCheck}*".

*

//# Name1 id1
//# Name2* id2

public Name1 id1;
public Name2Vec id2;

astgen

Name2Vec

???Vec

astgen

The NullOK and NoCheck pragmas control the checking done for the field by the check method. The first supresses the check that a child field is not null; the second supresses the call to the child's check method.

The following pragmas apply to a class as a whole rather than to individual fields. They control non-field specific aspects of the generation of boilerplate methods like make and check. The syntax of these pragma is again a single line comment begining with //# and containing a single keyword.

"//# NoMaker".

Inside class C, this declaration suppresses the generation of C's make method, allowing a custom maker to be written instead (or none at all).

"//# ManualTag".

Inside class C, this declaration suppresses the generation of C's getTag method, allowing a custom one to be written instead. In our Java front-end, we use this feature to allow us to return different tags for BinaryExpr depending on the expression's operation.

"//# PostMakeCall".

Inside class C, this declaration adds the following line to the end of C's automatically-generated make method:

 postMake();

No implementation of postMake is generated. The intent is for the user to write postMake themselves, giving them a hook to customize the initialization of nodes after the child fields are filled in using the arguments to make. In our Java front-end, we use this feature to allow us, in the maker for CompilationUnit, to set the parent pointer of the TypeDecl objects passed as arguments.

"//# PostCheckCall".

Similar to PostMakeCall, this declaration adds the following line to the end of automatically-generated check methods:

 postCheck();

As with postMake, no implementation is generated for postCheck, allowing the user to provide their own checking code. In our Java front-end, we use this feature to ensure that a Name has at least one identifier in it. In place of these class-wide pragmas, an alternative design would have been for astgen to change the code it outputs based on whether a ClassDeclatation contains certain methods. For example, instead of the ManualTag pragma, astgen could generate a getTag method only for classes that do not contain a manually-defined getTag method. In the future, we may change to this design (such a change would be backward compatible).

Legal Statement Privacy Statement