ESCJ 5: Resolving names in Java
Last modified: August 29, 1997
Names and declared entities
Classes, interfaces, members (that is, fields and methods), and local variables
are declared entities referred to using names. Name resolution is
the process of determining to which declaration a name refers.
Not all names refer to declared entities. For example, the names given
in break statements refer to statements, which are not a declared
entity as defined in this document. Also, names can also refer to packages,
which are also not considered to be declared entities in this document.
(This is a deviation from the Java spec, which talks about packages as
if they were declared entities and talks about the package statements
that can occur at the top of class files as "package declarations." We
discuss this deviation at the end of this document.) [Todo: give a complete
enumeration of such names.]
Methods are a special case of declared entities in that the same method
can have more than one declaration. Name resolution was defined in terms
of finding declarations with this issue in mind. We resolve method
names to textual declarations of methods. For example, consider the following
code:
class C {
int m() { return 1; }
static void test1() { m(); }
};
class D extends D {
int m() { return 2; }
static void test2() { m(); }
};
In test1, m refers to the declaration of m()
that appears in class C, while, in test2, m
refers to the declaration of m() that appears in class D.
In the ESC project, we're used to thinking of names as referring to methods
rather than to declarations of methods. In Java, thinking of names as referring
to declarations is important when it comes to understanding method overloading.
Categories of names
Names in Java are classified along two dimensions.
The first dimension pertains to the internal form of names. In this
dimension, names are categorized into simple names and qualified
names. Simple names are atomic identifiers, such as x. Qualified
names are sequences of simple names separated by dots ('.'), such as java.lang.String.
The second dimension pertains to how names are resolved. In this dimension,
there are five categories: PackageName, TypeName, ExpressionName, MethodName,
and AmbiguousName. Pages 90 and 91 of the Java spec explain how the parser
should classify names in this dimension:
-
PackageName is used for names in package statements and in on-demand
import declarations (both of which appear at the top of files).
-
TypeName is used for names in single-type import declarations, in
extends and implements clauses, in the types of method signatures
and local variables, and in new, instanceof, and cast expressions.
-
MethodName is used for method names in method invocations.
-
ExpressionName is used for all names appearing in expressions except for
type names in new, instanceof, and cast expressions and for
method names in method invocations.
-
AmbiguousName is is used for parts of names, specifically, for names
that appear to the left of a '.' in a qualified ExpressionName, MethodName,
or another AmbiguousName. Thus, for example, if x.y.z is classified
as a MethodName, then the x.y part of that name is an AmbiguousName.
Resolving the meaning of a name is the process of determining the declaration
to which the name refers. TypeNames, ExpressionNames, and MethodNames are
each resolved using different rules. AmbiguousNames are resolved by first
reclassify them as one of the other kinds of names and then applying the
rules applicable to the resulting kind of name.
PackageNames (and AmbiguousNames that are reclassified to PackageNames)
do not refer to declarations and are not "resolved" in the same sense that
the other kinds of names are. Instead, PackageNames are used in package
queries. A package query returns the declaration of a class or interface
with a given, simple name within a package of a given name. Package queries
are in the rules for resolving Type-, Expression-, and MethodNames and
also in the rules that reclassify AmbiguousNames. The exact semantics of
package queries are left "host-system dependent" (see Section 7 (p 113)
of the Java spec).
The next three sections give the rules for resolving TypeNames, ExpressionNames,
and MethodNames. The following section gives the rules for reclassify AmbiguousNames.
The final section gives a semantics for package queries defined in terms
of the file system and a "class path"; this semantics is meant to be similar
to the rules used by Sun's command-line tools.
Resolving TypeNames
Given a TypeName N in compilation unit C declared to be part
of package P, the following rules determine what N denotes
(taken from p. 93 of the Java spec):
-
For a TypeName of the form I, a simple name:
-
if a class or interface named I is declared in C or is imported
by C via a single-type import statement, then I denotes
that class or interface;
-
otherwise, if package P declares a class or interface I,
then I denotes this type (this is a "package query");
-
otherwise, if there exists exactly one on-demand import statement
"import P2.*" in C such that package P2
declares a class or interface I, then I denotes that type
(this is another "package query");
-
otherwise, if there exists more than one on-demand import statement
"import P2.*" in C such that package P2
declares a type I (yet another "package query"), then I is
ambiguous (a compile-time error);
-
otherwise I is undefined (a compile-time error).
(Package queries are highlighted above but are not in the rest of this
document. Also, we highlight errors as compile-time errors the first time
they appear, as in ambiguous and undefined names above, but not subsequently.)
-
For a TypeName of the form Q.I, a qualified name:
-
if package Q declares a type I, then Q.I denotes
this type;
-
otherwise Q.I is undefined.
Resolving ExpressionNames
We describe the rules for resolving ExpressionNames in the larger context
of rules for resolving variable access expressions. Variable access
expressions have one of two forms: ExpressionNames or Primary.ExpressionName.
Primaries are other forms of expression such as "a[10]" and "a.b(10)";
the exact grammar of Primaries is unimportant for this document.
The following description of rules for resolving variable access expressions
are different from the approach taken from the Java spec (see Sections
6.5.5 (p. 95) and 15.10 (p. 319) for that approach). The approach taken
here is borrowed from the approach taken by the Java spec for methods.
If the name to resolve is a simple name I that appears within
the scope of a local variable declaration (including an argument declaration),
then I denotes that declaration. Local variables cannot be shadowed,
so there can be at most one local variable named I in scope.
Otherwise, the name is resolved to a declaration of a field in a type
according to the following, multi-step process:
-
Determine the field's simple name and a type to search.
-
For a simple name I, the class or interface to search is the one
containing the invocation and the simple name of the field is I.
-
For a compound variable access expression of the form Q.I,
the simple name of the field is I, and the type to search is determined
as follows:
-
if Q is a PackageName, then the variable access expression Q.I
is an ill-formed (a compile-time error);
-
if Q is a TypeName, then the type to search is the type to which
Q resolves (in this case, the field selected must be static);
-
otherwise, Q is a Primary, and the type to search is the static
type of Q.
-
Select from the type to search all method declarations that are accessible.
Given the field's simple name is I and the type to search is
T, the next step is to determine the candidate declarations for
this variable access expression, which are all field declarations for I
in T that are accessible to the expression. These candidates can
include declarations found locally in T and also declarations in
supertypes of T. If the candidate set is empty, then the expression
is undefined.
The accessibility of a field declaration to a variable access expression
has to with the access modifiers public, protected, none,
or private. Whether a declaration is accessible to an expression
depends, in the usual way, on the location of the variable access expression
and on the access modifier of the declaration.
-
Select the accessible declaration from the least common type.
The final step is to select out of the candidate declarations the one
from a type that is a subtype of all the other candidate declarations.
If no such declaration exists, then the access is ambiguous.
The expression "super.I" is treated as sugar
for "((C)this).I", where C is
the direct superclass of the class containing the expression.
Resolving MethodNames
As with ExpressionNames, we describe the rules for resolving MethodNames
in the larger context of rules for resolving the method designation part
of a method invocation. Like variable access expressions, method designations
have two forms, MethodName or Primary.MethodName, but they appear
in invocations, that is, to the left of a list of argument surrounded by
parenthesis.
More than one method can have the same name, and the particular method
denoted in an invocation expression depends on the type of arguments passed
to the expression. Resolution for MethodNames must deal with this issue.
The rules given below are taken from Sections 6.5.6 (p. 98) and 15.11
(p. 324) of the Java spec.
Resolution of method designators is best understood as having multiple
steps:
-
Determine the method's simple name and a type to search.
-
For a simple name I, the class or interface to search is the one
containing the invocation and the simple name of the method is I.
-
For a compound method designator of the form Q.I,
the simple name of the method is I, and the type to search is determined
as follows:
-
if Q is a PackageName, then Q.I is an ill-formed
method designator;
-
if Q is a TypeName, then the type to search is the type denoted
by Q (in this case, the method selected must be static);
-
otherwise, Q is a Primary, and the type to search is the static
type of Q.
The expression "super.I" is a special case. When it comes
to searching for a method declaration, "super.I" can be viewed
as sugar for "((C)this).I", where C
is the direct superclass of the class containing the expression. However,
unlike for field-access expressions, the dynamic semantics of "super.I"
as a method designator is different from the dynamic semantics of "((C)this).I"
(see Section 15.11.4.10 of the Java spec).
-
Select from the type to search all method declarations that are accessible
and applicable.
Given the method's simple name is I and the type to search is
T, the next step is to determine the candidate declarations for
this invocation, which are all method declarations for I in T
that are both accessible and applicable to the invocation. These candidates
can include declarations found locally T and also in supertypes
of T. (Multiple declarations for the same method -- some overriding
the others -- may end up in this candidate set.) If the candidate set is
empty, then the method designation is undefined.
The accessibility of a method declaration to a method invocation has
to with the access modifiers such as public and protected.
Whether a method declaration is accessible to a method invocation depends,
in the usual way, on the location of the invocation and on the access modifier
of the declaration.
The applicability of a method declaration to a method invocation has
to do with the numbers and types of arguments. Specifically, a method declaration
is applicable to an invocation if (a) the number of formals and actuals
are equal and (b) if the type of each actual is "method-invocation compatible"
with the type of the corresponding formal.
Method-invocation compatibility of types is defined as follows:
-
Reference type S is method-invocation compatible with reference
type T if S is a subtype of T.
-
Primitive type P is method-invocation compatible with primitive
type Q if all value of P is a subset of all values of Q.
Primitive types include numerical types int, double, etc.,
plus other built-ins such as boolean. Method-invocation compatibility
on primitive types is reflexive. Beyond these reflexive relationships,
it only applies to the numerical types, thus, the phrase "is a subset of"
should be taken to be a subset relationship on reals.
(For details, see Chapter 5 of Java spec.)
-
Select the most specific of the accessible, applicable methods.
The final step is to select the unique, maximally specific declaration
out of the candidates determined in the previous step. If there is no single
declaration in the candidate set that is more specific than all the others,
then the invocation is ambiguous.
"More specific" in this context is a binary relation on method declarations
that takes into account the names of methods, their argument signatures,
and their position in the type hierarchy. More specifically, method declaration
M is more specific than method declaration N if and only
if:
-
their names are the same;
-
they both have the same number of formals;
-
M is found in a subtype of N ("subtype" here is reflexive);
-
the type of each formal of M is method-invocation compatible with
the type of the corresponding formal of N.
An implication of the third rule is that if a subclass overloads an inherited
method in a contravariant manner (that is, it declares a method with the
same name as an inherited method but with a more general signature than
the inherited method), then casting will be necessary to call either version
of the method from the subclass.
Notes:
-
The return type is irrelevant to the resolution of overloaded signatures
(see example in Section 15.11.2.4 of the Java spec).
-
In determining method applicability and specificity, no "narrowing conversions"
are applied to numerical constants that appear in actual arguments. Thus,
for example, to pass the constant "12" where the formal type is byte,
an explicit cast is needed.
-
It's mentioned above that multiple declarations for the same type can appear
in the candidate set. Often, one declaration overrides the others; in this
situation, the third rule for method specificity implies that the overriding
declaration will be chosen. However, this is not always the case, for example,
when an interface extends two other interfaces, both of which declare a
method m with the same number and type of arguments. In this case,
searching type T for this m will be ambiguous.
Reclassifying AmbiguousNames
Assume that N is an AmbiguousName that appears in the declaration
of class or interface T of compilation-unit C of package
P. The following rules are used to reclassify AmbiguousNames (see
p. 91 of the Java spec):
When N is a simple identifier :
-
if N appears within the scope of a local variable named N,
then N is an ExpressionName;
-
otherwise, if T has one or more fields named N, then N
is an ExpressionName;
-
otherwise, if C declares a class or interface named N or
imports a type named N via a single-type import declaration,
then N is a TypeName;
-
otherwise, if package P declares a class or interface named N,
then N is a TypeName;
-
otherwise, if a type named N is declared by one or more on-demand
import declarations of C, then if N is declared by
exactly on of them then N is a TypeName, otherwise N is ambiguous;
-
otherwise, N is a PackageName.
-
For N equal to A.I, an ambiguous name followed
by an identifier, recursively reclassify A, then:
-
if A is classified as a TypeName or ExpressionName, then A.I
is an ExpressionName;
-
if A is classified as a PackageName and package A declares
a class or interface named I, then A.I is
a TypeName;
-
otherwise, A.I is a PackageName.
Package queries
The Java spec says that "[e]ach Java host determines how packages ... are
created and stored" (p. 115) By "Java host," they mean a set of Java tools
(I think), so ESC/Java would be a "host system." Thus, we have to define
(and publish) how packages work in ESC/Java.
Before doing this, it's important to note that neither the dynamic nor
static semantics of Java depend on whether or not a particular package
"exists" in the sense of being declared. Rather, the Java semantics depends
only on a definition of whether or not a package with a given name contains
a declaration of a type with a given name. Thus, we are concerned only
with this latter question.
More specifically, this section gives a definition of package queries.
A package query takes a PackageName P and a simple name I
and returns either the class or interface named I in package P
or an indication that no such class or interface exists.
Our rules assume that packages are represented as directories in a hierarchical
file system and a "class path" variable that says where in the file system
these directories are found.
The class path variable is a sequence of absolute paths in the file
system. These paths name either directories in the file system of "zip"
files, which, for the purpose of this document, can be viewed as equivalent
to directories.
We assume the function R(P) maps PackageName P to a relative
directory path in the obvious manner (the simple-name components of P
are mapped, in order, to directory-path segments in R(P)). We assume
the operator "+" takes an absolute path and a relative one and combines
them into a new absolute path.
Assume the class path consists of the paths C1, C2, ...,
Cn. The package query "find the type named I in package P"
is answered as follows. Find the lowest i in 1..n such that Ci
+ R(P) contains a file named either I.ji, I.java,
or I.class. If no such i exists, then indicate that
there is no declaration of I in package P. Otherwise, pick
I.ji, I.java, or I.class
--- the first that exists --- and parse it for a declaration of I.
If such a declaration exists, return it; otherwise, indicate that there
is no declaration of I in P.
Comparison to Sun's rules:
-
Sun does not look at I.ji files. ESC/Java has them as a way of supporting
annotations apart from source code.
-
Sun's rules use file modification dates and also the source file attribute
of class files to select a class to parse to find a declaration of I.
We use a much simpler scheme.
-
Otherwise, our rules and Sun's should behave the same on "well structured"
file hierarchies. "Well structured" means things like the class file for
a class named I is found in a file named I.class
(rather than K.class). We are writing a checker which checks
a file hierarchy to ensure that it's well structured. For hierarchies that
fail to meet our definition of well structured, the behavior of Sun's tools
differs from ours (and is also hard to understand).
One more, nit-picky note. If P1.P2 is a PackageName, I don't think it's
particularly useful to think of P2 as a "subpackage" of "P1". There are
no special relationships between entities in P1 and P1.P2, for example,
P1.P2 doesn't have access to "hidden" entities of P1. P1 names a package
that's not relevant to package P1.P2. Thus, this document has not made
mention of any concept of "packages" "containing" packages, while the Java
spec does.
(There is one way in which the Java spec treats P1.P2 as related to
P1: it explicitly disallows P1 from containing a class or interface whose
simple name is P2. This is a restriction that our tool for checking class
paths enforces. I don't think this is rule is sufficient reason to confuse
matters by introducing the concept of packages "containing" packages. Also,
unlike other entities, the definition of a package is distributed over
a file system, database, or network in a way that's explicitly left undefined
by the language. Thus, I think name resolution is easier to explain when
packages are treated differently from classes, interfaces, members, and
local variables, which is why packages are not considered "declared entities"
in this document.)
Legal
Statement Privacy Statement