ESCJ 5: Resolving names in Java

Last modified: August 29, 1997

Names and declared entities

Classes, interfaces, members (that is, fields and methods), and local variables are declared entities referred to using names. Name resolution is the process of determining to which declaration a name refers.

Not all names refer to declared entities. For example, the names given in break statements refer to statements, which are not a declared entity as defined in this document. Also, names can also refer to packages, which are also not considered to be declared entities in this document. (This is a deviation from the Java spec, which talks about packages as if they were declared entities and talks about the package statements that can occur at the top of class files as "package declarations." We discuss this deviation at the end of this document.) [Todo: give a complete enumeration of such names.]

Methods are a special case of declared entities in that the same method can have more than one declaration. Name resolution was defined in terms of finding declarations with this issue in mind. We resolve method names to textual declarations of methods. For example, consider the following code:

  class C {
    int m() { return 1; }
    static void test1() { m(); }
  };

  class D extends D {
    int m() { return 2; }
    static void test2() { m(); }
  };

In test1, m refers to the declaration of m() that appears in class C, while, in test2, m refers to the declaration of m() that appears in class D. In the ESC project, we're used to thinking of names as referring to methods rather than to declarations of methods. In Java, thinking of names as referring to declarations is important when it comes to understanding method overloading.

Categories of names

Names in Java are classified along two dimensions.

The first dimension pertains to the internal form of names. In this dimension, names are categorized into simple names and qualified names. Simple names are atomic identifiers, such as x. Qualified names are sequences of simple names separated by dots ('.'), such as java.lang.String.

The second dimension pertains to how names are resolved. In this dimension, there are five categories: PackageName, TypeName, ExpressionName, MethodName, and AmbiguousName. Pages 90 and 91 of the Java spec explain how the parser should classify names in this dimension:

PackageName is used for names in package statements and in on-demand import declarations (both of which appear at the top of files).
TypeName is used for names in single-type import declarations, in extends and implements clauses, in the types of method signatures and local variables, and in new, instanceof, and cast expressions.
MethodName is used for method names in method invocations.
ExpressionName is used for all names appearing in expressions except for type names in new, instanceof, and cast expressions and for method names in method invocations.
AmbiguousName is is used for parts of names, specifically, for names that appear to the left of a '.' in a qualified ExpressionName, MethodName, or another AmbiguousName. Thus, for example, if x.y.z is classified as a MethodName, then the x.y part of that name is an AmbiguousName.

Resolving the meaning of a name is the process of determining the declaration to which the name refers. TypeNames, ExpressionNames, and MethodNames are each resolved using different rules. AmbiguousNames are resolved by first reclassify them as one of the other kinds of names and then applying the rules applicable to the resulting kind of name.

PackageNames (and AmbiguousNames that are reclassified to PackageNames) do not refer to declarations and are not "resolved" in the same sense that the other kinds of names are. Instead, PackageNames are used in package queries. A package query returns the declaration of a class or interface with a given, simple name within a package of a given name. Package queries are in the rules for resolving Type-, Expression-, and MethodNames and also in the rules that reclassify AmbiguousNames. The exact semantics of package queries are left "host-system dependent" (see Section 7 (p 113) of the Java spec).

The next three sections give the rules for resolving TypeNames, ExpressionNames, and MethodNames. The following section gives the rules for reclassify AmbiguousNames. The final section gives a semantics for package queries defined in terms of the file system and a "class path"; this semantics is meant to be similar to the rules used by Sun's command-line tools.

Resolving TypeNames

Given a TypeName N in compilation unit C declared to be part of package P, the following rules determine what N denotes (taken from p. 93 of the Java spec):

For a TypeName of the form I, a simple name:

if a class or interface named I is declared in C or is imported by C via a single-type import statement, then I denotes that class or interface;
otherwise, if package P declares a class or interface I, then I denotes this type (this is a "package query");
otherwise, if there exists exactly one on-demand import statement "import P2.*" in C such that package P2 declares a class or interface I, then I denotes that type (this is another "package query");
otherwise, if there exists more than one on-demand import statement "import P2.*" in C such that package P2 declares a type I (yet another "package query"), then I is ambiguous (a compile-time error);
otherwise I is undefined (a compile-time error).

For a TypeName of the form Q.I, a qualified name:

if package Q declares a type I, then Q.I denotes this type;
otherwise Q.I is undefined.

Resolving ExpressionNames

We describe the rules for resolving ExpressionNames in the larger context of rules for resolving variable access expressions. Variable access expressions have one of two forms: ExpressionNames or Primary.ExpressionName. Primaries are other forms of expression such as "a[10]" and "a.b(10)"; the exact grammar of Primaries is unimportant for this document.

The following description of rules for resolving variable access expressions are different from the approach taken from the Java spec (see Sections 6.5.5 (p. 95) and 15.10 (p. 319) for that approach). The approach taken here is borrowed from the approach taken by the Java spec for methods.

If the name to resolve is a simple name I that appears within the scope of a local variable declaration (including an argument declaration), then I denotes that declaration. Local variables cannot be shadowed, so there can be at most one local variable named I in scope.

Otherwise, the name is resolved to a declaration of a field in a type according to the following, multi-step process:

Determine the field's simple name and a type to search.

For a simple name I, the class or interface to search is the one containing the invocation and the simple name of the field is I.
For a compound variable access expression of the form Q.I, the simple name of the field is I, and the type to search is determined as follows:

if Q is a PackageName, then the variable access expression Q.I is an ill-formed (a compile-time error);
if Q is a TypeName, then the type to search is the type to which Q resolves (in this case, the field selected must be static);
otherwise, Q is a Primary, and the type to search is the static type of Q.

Select from the type to search all method declarations that are accessible.

Given the field's simple name is I and the type to search is T, the next step is to determine the candidate declarations for this variable access expression, which are all field declarations for I in T that are accessible to the expression. These candidates can include declarations found locally in T and also declarations in supertypes of T. If the candidate set is empty, then the expression is undefined.

The accessibility of a field declaration to a variable access expression has to with the access modifiers public, protected, none, or private. Whether a declaration is accessible to an expression depends, in the usual way, on the location of the variable access expression and on the access modifier of the declaration.

Select the accessible declaration from the least common type.

The final step is to select out of the candidate declarations the one from a type that is a subtype of all the other candidate declarations. If no such declaration exists, then the access is ambiguous.

The expression "super.I" is treated as sugar for "((C)this).I", where C is the direct superclass of the class containing the expression.

Resolving MethodNames

.

More than one method can have the same name, and the particular method denoted in an invocation expression depends on the type of arguments passed to the expression. Resolution for MethodNames must deal with this issue.

The rules given below are taken from Sections 6.5.6 (p. 98) and 15.11 (p. 324) of the Java spec.

Resolution of method designators is best understood as having multiple steps:

Determine the method's simple name and a type to search.

For a simple name I, the class or interface to search is the one containing the invocation and the simple name of the method is I.
For a compound method designator of the form Q.I, the simple name of the method is I, and the type to search is determined as follows:

if Q is a PackageName, then Q.I is an ill-formed method designator;
if Q is a TypeName, then the type to search is the type denoted by Q (in this case, the method selected must be static);
otherwise, Q is a Primary, and the type to search is the static type of Q.

super

((C)

this

).

C

super

((C)

this

).

Select from the type to search all method declarations that are accessible and applicable.

Given the method's simple name is I and the type to search is T, the next step is to determine the candidate declarations for this invocation, which are all method declarations for I in T that are both accessible and applicable to the invocation. These candidates can include declarations found locally T and also in supertypes of T. (Multiple declarations for the same method -- some overriding the others -- may end up in this candidate set.) If the candidate set is empty, then the method designation is undefined.

The accessibility of a method declaration to a method invocation has to with the access modifiers such as public and protected. Whether a method declaration is accessible to a method invocation depends, in the usual way, on the location of the invocation and on the access modifier of the declaration.

The applicability of a method declaration to a method invocation has to do with the numbers and types of arguments. Specifically, a method declaration is applicable to an invocation if (a) the number of formals and actuals are equal and (b) if the type of each actual is "method-invocation compatible" with the type of the corresponding formal.

Method-invocation compatibility of types is defined as follows:

Reference type S is method-invocation compatible with reference type T if S is a subtype of T.
Primitive type P is method-invocation compatible with primitive type Q if all value of P is a subset of all values of Q. Primitive types include numerical types int, double, etc., plus other built-ins such as boolean. Method-invocation compatibility on primitive types is reflexive. Beyond these reflexive relationships, it only applies to the numerical types, thus, the phrase "is a subset of" should be taken to be a subset relationship on reals.

Select the most specific of the accessible, applicable methods.

The final step is to select the unique, maximally specific declaration out of the candidates determined in the previous step. If there is no single declaration in the candidate set that is more specific than all the others, then the invocation is ambiguous.

"More specific" in this context is a binary relation on method declarations that takes into account the names of methods, their argument signatures, and their position in the type hierarchy. More specifically, method declaration M is more specific than method declaration N if and only if:

their names are the same;
they both have the same number of formals;
M is found in a subtype of N ("subtype" here is reflexive);
the type of each formal of M is method-invocation compatible with the type of the corresponding formal of N.

The return type is irrelevant to the resolution of overloaded signatures (see example in Section 15.11.2.4 of the Java spec).
In determining method applicability and specificity, no "narrowing conversions" are applied to numerical constants that appear in actual arguments. Thus, for example, to pass the constant "12" where the formal type is byte, an explicit cast is needed.
It's mentioned above that multiple declarations for the same type can appear in the candidate set. Often, one declaration overrides the others; in this situation, the third rule for method specificity implies that the overriding declaration will be chosen. However, this is not always the case, for example, when an interface extends two other interfaces, both of which declare a method m with the same number and type of arguments. In this case, searching type T for this m will be ambiguous.

Reclassifying AmbiguousNames

if N appears within the scope of a local variable named N, then N is an ExpressionName;
otherwise, if T has one or more fields named N, then N is an ExpressionName;
otherwise, if C declares a class or interface named N or imports a type named N via a single-type import declaration, then N is a TypeName;
otherwise, if package P declares a class or interface named N, then N is a TypeName;
otherwise, if a type named N is declared by one or more on-demand import declarations of C, then if N is declared by exactly on of them then N is a TypeName, otherwise N is ambiguous;
otherwise, N is a PackageName.

For N equal to A.I, an ambiguous name followed by an identifier, recursively reclassify A, then:

if A is classified as a TypeName or ExpressionName, then A.I is an ExpressionName;
if A is classified as a PackageName and package A declares a class or interface named I, then A.I is a TypeName;
otherwise, A.I is a PackageName.

Package queries

Before doing this, it's important to note that neither the dynamic nor static semantics of Java depend on whether or not a particular package "exists" in the sense of being declared. Rather, the Java semantics depends only on a definition of whether or not a package with a given name contains a declaration of a type with a given name. Thus, we are concerned only with this latter question.

More specifically, this section gives a definition of package queries. A package query takes a PackageName P and a simple name I and returns either the class or interface named I in package P or an indication that no such class or interface exists.

Our rules assume that packages are represented as directories in a hierarchical file system and a "class path" variable that says where in the file system these directories are found.

The class path variable is a sequence of absolute paths in the file system. These paths name either directories in the file system of "zip" files, which, for the purpose of this document, can be viewed as equivalent to directories.

We assume the function R(P) maps PackageName P to a relative directory path in the obvious manner (the simple-name components of P are mapped, in order, to directory-path segments in R(P)). We assume the operator "+" takes an absolute path and a relative one and combines them into a new absolute path.

Assume the class path consists of the paths C1, C2, ..., Cn. The package query "find the type named I in package P" is answered as follows. Find the lowest i in 1..n such that Ci + R(P) contains a file named either I.ji, I.java, or I.class. If no such i exists, then indicate that there is no declaration of I in package P. Otherwise, pick I.ji, I.java, or I.class --- the first that exists --- and parse it for a declaration of I. If such a declaration exists, return it; otherwise, indicate that there is no declaration of I in P.

Comparison to Sun's rules:

Sun does not look at I.ji files. ESC/Java has them as a way of supporting annotations apart from source code.
Sun's rules use file modification dates and also the source file attribute of class files to select a class to parse to find a declaration of I. We use a much simpler scheme.
Otherwise, our rules and Sun's should behave the same on "well structured" file hierarchies. "Well structured" means things like the class file for a class named I is found in a file named I.class (rather than K.class). We are writing a checker which checks a file hierarchy to ensure that it's well structured. For hierarchies that fail to meet our definition of well structured, the behavior of Sun's tools differs from ours (and is also hard to understand).

(There is one way in which the Java spec treats P1.P2 as related to P1: it explicitly disallows P1 from containing a class or interface whose simple name is P2. This is a restriction that our tool for checking class paths enforces. I don't think this is rule is sufficient reason to confuse matters by introducing the concept of packages "containing" packages. Also, unlike other entities, the definition of a package is distributed over a file system, database, or network in a way that's explicitly left undefined by the language. Thus, I think name resolution is easier to explain when packages are treated differently from classes, interfaces, members, and local variables, which is why packages are not considered "declared entities" in this document.)

Legal Statement Privacy Statement