The astgen tool reads in a file containing annotated, partial implementations of AST classes and writes full implementations for those classes, putting each in its own source file. It also outputs two auxilliary classes. Using the generator leads to a description of AST classes that is more manageable than the full implementation would be because it is in a single file and because it is smaller by a factor an order of magnitude. Also, the generator allows one to easily change an AST hierarchy and the code found inside of AST classes.
The input to astgen looks very much like a set of Java class declarations. These declarations are annotated with Java comments containing pragmas understood by astgen. The input must use Java's lexical language and must follow the following grammar:
PackageDeclaration_opt ImportDeclarations_opt EndHeader ClassDeclaration*where the non-terminals other than EndHeader are defined in the Java Language Specification. The EndHeader is a Java single-line comment starting with //# followed by some space then followed by the keyword EndHeader (case is significant to astgen). If a ClassDeclaration in an astgen input file has a superclass, the declaration of that superclass must appear earlier in the input file.
Given such an input file, astgen does the following:
In addition to the per-class .java files, the tool outputs a file called SubTagConstants.java and another called Visitor.java. These files support the boilerplate code generated for the per-class .java files.
The bullet points below describe the boilerplate methods generated plus the two support files for them.
For each non-abstract class C, astgen generates a public, static method named make. The default implementation of make returns a newly-allocated instance of C. (This default can be overriden using pragmas.) The make method takes an argument for each child field of C, including child fields inherited from the superclass of C. The order of the argument list is as follows: superclass child fields come before subclass child fields; within a class, child-field arguments are ordered according by the order of the pragmas defining the child fields.
In non-abstract class Name, the automatically-generated getTag method returns the constant TagConstants.NAME (note the change to all-caps). In SubTagConstants, a constant int field named NAME is also generated. The intent is for the (user-written) type TagConstants to extend the automatically-generated interface SubTagConstants.
The SubTagConstants interface is not public (ie, it has "package" level protection). The generic header is also appended to the front of SubTagConstants.java, which will put it into the same package as the other classes generated for a given input file.
The accept method for class Name looks like the following:
An accept method is generated only for non-abstract classes.public void accept(Visitor v) { v.visitName(this); }
For every class Name (abstract and non-abstract), a visitName method is generated in Visitor. This method takes an argument of type Name and returns void. If Name has no explicitly declared superclass, then visitName is an abstract method. If there is a superclass named Name2, then an implementation of visitName is given which looks like
The Visitor class is public. The generic header is also appended to the front of Visitor.java, which will put it into the same package as the other classes generated for a given input file.public void visitName(Name o) { visitName2(o); }
As in the case of the maker methods, "children" here includes children inherited from superclasses; also, the order of "children" is defined the same as for makers. However, there are two twists. First, children that are primitive types such as int are not counted by childCount or returned by childAt. Also, the pragmas defining child fields allow for the specification of child fields that are really sets of children; we call these "vector children." The childCount method counts each member of a vector child as a separate child; similarly, the numbering of children for the purpose of defining indices given to childAt also counts each member of a vector child as a separate child.
The syntax of a pragma is a Java single-line comment on a line by itself. Pragma comments are distinguished by starting with //#. Inside class C, the following pragma defines the child fields of C:
would be translated into the following field declarations://# Name1 id1 //# Name2* id2
It is up to the astgen user to provide definitions of Name2Vec and other ???Vec types referenced by the output of astgen.public Name1 id1; public Name2Vec id2;
The NullOK and NoCheck pragmas control the checking done for the field by the check method. The first supresses the check that a child field is not null; the second supresses the call to the child's check method.
No implementation of postMake is generated. The intent is for the user to write postMake themselves, giving them a hook to customize the initialization of nodes after the child fields are filled in using the arguments to make. In our Java front-end, we use this feature to allow us, in the maker for CompilationUnit, to set the parent pointer of the TypeDecl objects passed as arguments.postMake();
As with postMake, no implementation is generated for postCheck, allowing the user to provide their own checking code. In our Java front-end, we use this feature to ensure that a Name has at least one identifier in it. In place of these class-wide pragmas, an alternative design would have been for astgen to change the code it outputs based on whether a ClassDeclatation contains certain methods. For example, instead of the ManualTag pragma, astgen could generate a getTag method only for classes that do not contain a manually-defined getTag method. In the future, we may change to this design (such a change would be backward compatible).postCheck();