ESC/Java User Manual ESCJ 21--draft! 26 May 1998 -------------------- This document describes the programming tool ESC/Java, which attempts to statically find common run-time errors in Java programs. This document is supposed to be easy to read and aims at getting ESC/Java users up and running quickly. This means that common situations are described. For information beyond what is described here, please consult the ESC/Java Reference Manual or equivalent. Invoking ESC/Java ~~~~~~~~~~~~~~~~~ The command line interface to ESC/Java resembles that of the javac(5) compiler: escjava sourcefiles* For example, if you would compile a program "Cup.java" with the command line javac Cup.java then you would run ESC/Java on this file with the command line escjava Cup.java Overview ~~~~~~~~ ESC/Java tries to find program points where the given Java program may cause a "checked run-time error". Checked run-time errors include null dereferences, array access errors, division by zero, and type cast errors. Java specifies that the run-time system will detect such run-time errors and turn them into the throwing of an exception, which the Java program can catch at run-time. ESC/Java follows a stricter discipline and will complain if a program may cause an checked run-time error, even if the program actually catches the resulting exception. For example, ESC/Java would complain about a null dereference on the third line of the following program fragment: Object[] a = null; try { int n = a.length; // this will dereference null } catch (NullPointerException e) { } despite the fact that the program fragment includes an exception handler that catches the exception. In addition to checking for checked run-time errors, ESC/Java can be instructed to look for other errors, including concurrency errors like race conditions and deadlocks. ESC/Java is a modular checker. This means that it can check individual parts of a program and does not need global program information. For example, ESC/Java checks the implementation of a class without using the program text of present or future subclasses of that class. Similarly, it checks the implementation of a method without the code that calls the method and without the code that the implementation calls. These virtues of a modular checker can be likened to those of most compilers, which can generate code for a method implementation without knowing the code that calls the method or the code that the method implementation calls. How can a checker be modular? After all, the other parts of a program matter, too. A compiler requires that every method have a declared signature, which contains information like the number of parameters, the types of parameters, and the return type. In laying down target code for a call, for example, the compiler needs only the signature of the callee, not the implementation of the callee. And in an object-oriented program, there is not necessarily a one-to-one mapping from call sites to implementations; nevertheless, compilers can still cope because all possible callees for a given call site have the same signature. A modular checker relies on method specifications and other specifications, similar to the way a compiler uses signatures. A specification is a contract between a caller and a callee, or between a user of a data structure and its supplier. As such, specifications record programmer design decision, which usually cannot be inferred from the program text itself. Therefore, ESC/Java has an annotation language with which a programmer can record various design decisions. From the information provided by annotations, a checker may produce fewer spurious warnings. At the same time, ESC/Java will use the annotations to attempt to find places where the program violates the contracts spelled out. The ESC/Java annotation language has been designed to be simple and useful in revealing errors and determining non-errors in programs. ESC/Java does not catch every program error, nor does every warning it reports indicate an actual program error. The annotation language includes facilities for suppressing particular warning messages without introducing reciprocal checks. A programmer can (and a good engineer will) use these facilities when it appears that the effort involved in writing down further specifications would outweigh the benefit that such specifications will bring. ESC/Java annotations go beyond the type annotations required by Java, but have been designed so that the effort of writing them down is somewhat analogous to that of writing type annotations. In return, the ESC/Java checker can find errors that a type checker never could find. This explains the name ESC/Java: Extended Static Checker for Java. Format of annotations ~~~~~~~~~~~~~~~~~~~~~ ESC/Java annotations, or \pragmas/ as they are also called, are supplied in special Java comments. When the very first character after the "/*" or "//" that begins a Java comment is "@", ESC/Java parses the body of the comment as a sequence ESC/Java pragmas. Such a comment is called a pragma-containing comment. Pragmas fall into four categories: lexical pragmas, statement pragmas, declaration pragmas, and modifier pragmas. All pragmas contained in one pragma-containing comment must be of the same category. A pragma-containing comment with lexical pragmas can occur anywhere a Java comment can occur. A pragma-containing comment with statement pragmas can occur essentially wherever a Java statement can. A pragma-containing comment with declaration pragmas can occur wherever the declaration of a class or interface member can occur. A pragma-containing comment with modifier pragmas can occur anywhere a Java modifier (like "private" or "final") can occur. In addition, for style and readability, such a comment can occur just before the ";" that terminates a variable declaration and just before the "{" that opens a method body. For example, the two declarations /*@ monitored */ int x; and int x /*@ monitored */ ; are equivalent. Similarly, the two declarations /*@ requires p != null */ void m(Object p) { ... } and void m(Object p) /*@ requires p != null */ { ... } are equivalent. Note that the "p" in "p != null" refers to "m"'s parameter, regardless of whether the pragma is placed before or after the method's signature. The order and alternative placement of modifier pragmas is irrelevant. A modifier pragma applies to all identifiers introduced by the declaration. For example, /*@ non_null */ Object x, y, z; and equivalently Object x, y, z /*@ non_null */ ; declare all of "x", "y", and "z" to be "non_null". Note that a pragma-containing comment must be placed either before the type of the declaration or in the alternate position. For example, Object x, y /*@ non_null */, z; is not allowed. Annotations ~~~~~~~~~~~ This section describes the most common ESC/Java pragmas. ---- non_null ---- Perhaps the simplest way to get started with ESC/Java is to use the "non_null" modifier pragma, which can be applied to any program variable (static field, instance variable, local variable, or parameter). If a variable "x" is declared with /*@ non_null */ T x; then ESC/Java will check that every assignment to "x" assigns a non-null value. This also means that "x" can always be deferenced without any danger of causing a null-dereference error. ---- invariant ---- At the heart of ESC/Java is the invariant declaration pragma. By putting the declaration /*@ invariant J; */ in a class "T", where "J" is a boolean expression that can be written in the scope of "T". This declares an \object invariant/ "J". More specially, if "J" mentions (implicitly or explicitly) the special variable "this", then "J" is called an \instance invariant/; otherwise, "J" is called a \static invariant/. Roughly speaking, object invariants are supposed to hold on every method boundary, that is, on entry to and exit from every method. Some examples of useful object invariants are int f, g; /*@ invariant f < g; */ char[] a; /*@ invariant a == null || (0 <= n && n <= a.length); */ int n; --- Specification expressions The expression "J" in "invariant J" is boolean \specification expression/. A specification expression is essentially a Java expression that is free from subexpressions with possible side effects. In particular, a specification expression cannot contain any assignment expressions or method invocations. A specification expression can include some subexpressions that are not part of Java. These are described next. In addition to the Java's boolean operators (like "!", "&&", and "||"), a specification expression can use implication, written "==>". Implication has lower binding power than "!", "&&", and "||". Thus, one of the example invariants displayed above can alternatively be written as /*@ invariant a != null ==> 0 <= n && n < a.length; */ A specification can also contain a quantification. This is useful to express properties about the elements of an array. For example, int[] a; /*@ invariant a != null && (forall int i; 0 <= i && i < a.length ==> 0 <= a[i]); */ says that every element of "a" is a natural number. The existential quantification, which is used much less often, is written similarly: int[] a; /*@ invariant a != null && (exists int i; 0 <= i && i < a.length && a[i] < 0); */ One more specification expression feature is worth mentioning at this time. In addition to the Java expression E instanceof T which is "true" when "E" is null or when the dynamic type of "E" is a subtype of "T", a specification expression can refer more generally to the dynamic types of objects: the dynamic type of an object "E" is written "typeof(x)", a type "T" can be written (in places other than the second argument to "instanceof") as "type(T)", the element type of an array type "T" is written "elemType(T)", and types can be compared using "==" or the subtype relation "<:". For example, the invariant in class Node { Node next; /*@ invariant next == null || typeof(next) == type(Node); */ ... says that the dynamic type of any successor object must be exactly "T" (not a proper subtype of "T"). The invariant in class Node { Node next; /*@ invariant next == null || typeof(next) == typeof(this); */ ... says that lists chained by the "next" field are homogeneous. For example, if "o" is an object of a subtype "DoublyLinkedNode" of "Node", then "o.next" is (either null or) also an object of type "DoublyLinkedNode". (Preconditions of binary methods can easily be written in a similar way.) Finally, the invariant in T[] a; /* invariant a != null && type(U) <: elemType(typeof(a)); */ where "U" is some subtype of "T", implies that any expression whose static type is "U" can be assigned to an element of "a" without any danger of causing an "ArrayStoreException". This invariant can be written equivalently as T[] a; /* invariant a != null && type(U[]) <: typeof(a); */ --- Establishing invariants initially An instance invariant in a class "T" is established for a new object by a constructor of "T". The constructor bodies in Java run from superclasses (beginning with "Object") to subclasses (ending with the class that is the allocated type of the object). Therefore, the instance invariants of subclasses have not yet been established when a superclass constructor is executed. This means that constructors of any non-"final" class "T" should not "give away" the object being constructed ("this"), even if the object meets all instance invariants declared in class "T". ESC/Java will warn about any use of "this" in a constructor other than that of assigning to the fields of "this". Note that a method call on "this" counts as such a use. A class invariant is established initially by the static initializers and bodies of a class. Checking this code is tricky, because any cyclic dependencies between initialization code of different classes can easily cause the checker to miss important errors. ESC/Java takes some preventive measures by imposing some restrictions, but these restrictions are not described here. --- When an object invariant is checked to hold Object invariants need not hold after every assignment; they are checked only on method boundaries. In particular, they are assumed to hold upon entry to a method implementation and are checked to hold upon exit from a method implementation. Checking object invariants only on method boundaries allows a method to perform several assignments before an object invariant has been re-established. However, sometimes the re-establishing of an object invariant "J" may require a few method calls. In such a case, it would be too strict to insist that all object invariants, including "J", hold on every method boundary. ESC/Java attempts to remedy this problem by not requiring that \all/ object invariants hold on every method boundary: at a call site, ESC/Java checks that all static invariants hold, and that all instance invariants hold for objects stored in static fields and passed as parameters to the method being called. The rationale is that the instance invariants a callee is most likely to rely on are the instance invariants of the callee's parameters. ESC/Java assumes that a callee does not falsify any object invariant. ---- assert ---- ESC/Java features the well-known \assert/ statement as a statement pragma. It has the form /*@ assert P; */ where "P" is a boolean specification expression. ESC/Java will check that "P" holds at this program point. Asserts are useful because they provide some redundancy--a programmer can write down a condition believed to be true at a particular program point. ESC/Java will check that the program text and the assert are consistent. ---- nowarn ---- Many of ESC/Java's warning messages point out not real program errors but rather places where a programmer design decision has been made. The common response to such warnings is to add an annotation that documents the design decision. However, sometimes the correctness of a piece of code boils down to something that would require considerable effort to specify with annotations. In such a case, it is better to instruct ESC/Java to ignore the hazard, and move on. A crude way of telling ESC/Java to ignore a particular hazard is the "nowarn" pragma, which specifies a list of hazards for which the programmer will take responsibility. The pragma applies to the program line on which it is given, and applies to all places on that line where an error of the specified kind may possibly appear. For example, suppose ESC/Java outputs a message like Cup.java(221,7): Null dereference error (Null) for the program fragment 219: ... 220: if (x < y) { 221: z = o.f; 222: } This message warns about the possibility of a null dereference error on line 221, column 7, of the source file Cup.java. The parenthesized "(Null)" in the error message is a label that can be used with the "nowarn" pragma. If the reasoning behind why "x < y" implies that "o" is non-null is non-trivial, then perhaps this is a good application for the "nowarn" pragma. The following annotated program fragment will not produce the warning message: 219: ... 220: if (x < y) { 221: z = o.f; //@ nowarn Null 222: } ---- assume ---- While giving up on trying to further specify a program to avoid spurious warning messages is, at some point, a good idea, use of the "nowarn" pragma may seem undesirable since it does not document \why/ a programmer thinks a particular error won't occur. Also, the "nowarn" pragma applies to an entire source line, which may sometimes be too course grained. ESC/Java provides a different pragma that improves on the "nowarn" pragma in these two respects, the "assume" pragma. Both an "assert" and an "assume" state a condition that a programmer believes to hold at a particular program point. The difference is that ESC/Java checks the asserted condition, whereas it simply takes the assumed condition for granted. That is, the programmer takes responsibility for that the assumed condition will hold. The following annotated program fragment shows how to use an "assume" with the example presented with "nowarn" above: ... if (x < y) { /*@ assume o != null; */ z = o.f; } Regrettably, no checker knows every true fact of life. The "assume" pragma can be used here, too. For example, if the absence of checked run-time errors in your program depend on certain properties of multiplication, such as knowing that a square is never negative: if (x*x < a.length) { y = a[x*x]; } you can use an "assume" to make ESC/Java more informed: if (x*x < a.length) { /*@ assume 0 <= x*x; */ y = a[x*x]; } This assume pragma states that the square of the program variable "x" is non-negative. A more natural way to write the pragma may be if (x*x < a.length) { /*@ assume (forall int n; 0 <= n*n); */ y = a[x*x]; } which states the multiplication property more generally. Note that, since the programmer is responsible for anything written in an assume pragma, the checker may become less effective in finding real errors after an assume that mentions an untruth. For example, if the programmer would accidentally write /*@ assume (forall int n; n*n < 0); */ then the checking downstream from this pragma is suspect. Indeed, ESC/Java will do no checking downstream from a pragma /*@ assume false; */ The moral of this story is: Use "assume" pragmas, because you're likely to need them, but be careful when you do. ---- defined_if ---- A static field, instance variable, or local variable "x" of type "T" can be annotated with a pragma T x /*@ defined_if P */; where "P" is a boolean specification expression. This pragma causes ESC/Java to check that "P" is "true" any time the value of "x" is read. For example, suppose that a variable "n" keeps track of how many elements of an array "a" are in use, but that "a" may be null if it is not needed. Then, one can declare int[] a; int n /*@ defined_if a != null */; This annotation records the design decision that the value of "n" is irrelevant when "a" is null. In this case, a program is likely to be in error if it ever reads the value of "n" when "a" is null; the annotation will prompt ESC/Java to look for this error. A different design decision for the same problem can be recorded as int[] a /*@ defined_if 0 < n */; int n; Here, the design decision is that "n" can always be read, but unless "n" is positive, "a" should not be used. Thus, in the first example, a program would test "a != null && 0 < n" to find out if any items are stored in the array, whereas in the second example, a program would simply test "0 < n". Remark. For the picky, it may seem that "defined_only_if" is a better name for this pragma than "defined_if". Yet a more precise name would be "meaningful_only_if". (End of Remark.) Remark. Perhaps unfortunately, ESC/Java does not provide a way to say when particular array elements are defined. This could be useful since one could then express "a[i] is meaningful only if 0 <= i && i < n", which says that only the first "n" elements of "a" are in use. (End of Remark.) ---- uninitialized ---- Java defines a strict and often helpful rule that every local variable must have a \definitely assigned/ value when any access of its value occurs [JLS, ch.16]. This essentially means that a Java program won't compile if one of its methods contains a path between the declaration and use of a local variable with no intervening assignments to that variable, whether or not execution along that path is ever dynamically possible. This rule can catch common mistakes of forgetting to initialize a variable. However, the conservative nature of the test rules out some programs where nothing could go wrong at run-time. As an extreme example, any method containing the following program fragment will result in a compile-time error: int x; if (false) { int y = x; } A more common and useful example of a program fragment that won't compile because of Java's definite assignment rule is one of the form: int x; // declaration of "x" if (B) { x = E; // assignment to "x" } ...; // code that doesn't mention "x" if (C) { y = ... x ...; // use of "x" } If "C" is true only when "B" was true earlier, then this piece of code make sense. However, because there is no \definite/ assignment to "x" before its use (caused by the conservative rules that Java uses to define defintely assignment), this program fragment will cause a compile-time error. There is a simple workaround, however: provide a "dummy" initializer for "x": int x = 0; ... With this assignment to "x", the compiler will no longer complain. On the other hand, the compiler will not catch initialization errors either, so no benefit is reaped from Java's definte assignment rule. In particular, what if the programmer is wrong about thinking that "C" is will true only if "B" was? Then the use of "x" will return the dummy value (here, 0), which means that the programming error goes unnoticed, which is arguably worse. (Raymie Stata has often expressed a wish that Java's definite assignment rule would result in a run-time error, rather than a compile-time error.) ESC/Java to the rescue. By annotating a local variable declaration with the the pragma "uninitialized", ESC/Java will check that the initializing value is never actually used. Thus, one would write the code fragment above as: /*@ uninitialized */ int x = 0; ... The assignment to "x" will silence Java's conservative compile-time checks, and the "uninitialized" pragma will engage ESC/Java more precise compile-time checks. ---- unreachable ---- Programmers commonly write "assert false" in places where control is believed never to enter. ESC/Java provides a special annotation that expresses this fact in a less mysterious way, the statement pragma "unreachable": /*@ unreachable; */ ---- requires ---- As alluded to in the introduction, methods and constructors have specifications, which act as contracts between callers and implementations. TBW. ---- ensures and modifies ---- TBW. ---- loop_invariant ---- TBW. ---- monitored_by ---- TBW. ---- monitored ---- TBW. ---- axiom ---- TBW. Advanced comment on syntax. Pragmas whose syntax is more than a single keyword end with an optional semi-colon. In general, we advocate a annotation style where a terminating semi-colon is used whenever a corresponding Java construct would use a semi-colon. That is, statement pragmas and declaration pragmas look nice with a terminating semi-colon, whereas a modifier pragmas often doesn't. Without semi-colon, there may be ambiguities, however, because most ESC/Java "keywords" are actually parsed as identifiers which are at times compared with a set of supported "keywords". For example, /*@ requires (T) ensures modifies */ may parse as /*@ requires (T) ensures */ /*@ modifies */ which seems to specify a precondition expression "(T) ensures"--that is, an identifier "ensures" cast to a "T"--and an empty modifies clause, or may parse as /*@ requires (T) */ /*@ ensures modifies */ which seems to specify a precondition expression "T" and a postcondition expression consisting of an identifier "modifies". In these cases, ESC/Java will follow the rule of parsing a pragma for as long as possible. If this is not what the programmer intended, mysterious type errors may be reported. Therefore, a programmer may well want to take advantage of the fact that terminating semi-colons are allowed. For example, one may then write either /*@ requires (T) ensures ; modifies */ or /*@ requires (T) ; ensures modifies */ (End of Advanced comment on syntax.)