ESCJ 5: Resolving names in Java

Last modified: August 29, 1997

Names and declared entities

Classes, interfaces, members (that is, fields and methods), and local variables are declared entities referred to using names. Name resolution is the process of determining to which declaration a name refers.

Not all names refer to declared entities. For example, the names given in break statements refer to statements, which are not a declared entity as defined in this document. Also, names can also refer to packages, which are also not considered to be declared entities in this document. (This is a deviation from the Java spec, which talks about packages as if they were declared entities and talks about the package statements that can occur at the top of class files as "package declarations." We discuss this deviation at the end of this document.) [Todo: give a complete enumeration of such names.]

Methods are a special case of declared entities in that the same method can have more than one declaration. Name resolution was defined in terms of finding declarations with this issue in mind. We resolve method names to textual declarations of methods. For example, consider the following code:

  class C {
    int m() { return 1; }
    static void test1() { m(); }
  };

  class D extends D {
    int m() { return 2; }
    static void test2() { m(); }
  };
In test1, m refers to the declaration of m() that appears in class C, while, in test2, m refers to the declaration of m() that appears in class D. In the ESC project, we're used to thinking of names as referring to methods rather than to declarations of methods. In Java, thinking of names as referring to declarations is important when it comes to understanding method overloading.

Categories of names

Names in Java are classified along two dimensions.

The first dimension pertains to the internal form of names. In this dimension, names are categorized into simple names and qualified names. Simple names are atomic identifiers, such as x. Qualified names are sequences of simple names separated by dots ('.'), such as java.lang.String.

The second dimension pertains to how names are resolved. In this dimension, there are five categories: PackageName, TypeName, ExpressionName, MethodName, and AmbiguousName. Pages 90 and 91 of the Java spec explain how the parser should classify names in this dimension:

Resolving the meaning of a name is the process of determining the declaration to which the name refers. TypeNames, ExpressionNames, and MethodNames are each resolved using different rules. AmbiguousNames are resolved by first reclassify them as one of the other kinds of names and then applying the rules applicable to the resulting kind of name.

PackageNames (and AmbiguousNames that are reclassified to PackageNames) do not refer to declarations and are not "resolved" in the same sense that the other kinds of names are. Instead, PackageNames are used in package queries. A package query returns the declaration of a class or interface with a given, simple name within a package of a given name. Package queries are in the rules for resolving Type-, Expression-, and MethodNames and also in the rules that reclassify AmbiguousNames. The exact semantics of package queries are left "host-system dependent" (see Section 7 (p 113) of the Java spec).

The next three sections give the rules for resolving TypeNames, ExpressionNames, and MethodNames. The following section gives the rules for reclassify AmbiguousNames. The final section gives a semantics for package queries defined in terms of the file system and a "class path"; this semantics is meant to be similar to the rules used by Sun's command-line tools.

Resolving TypeNames

Given a TypeName N in compilation unit C declared to be part of package P, the following rules determine what N denotes (taken from p. 93 of the Java spec):

Resolving ExpressionNames

We describe the rules for resolving ExpressionNames in the larger context of rules for resolving variable access expressions. Variable access expressions have one of two forms: ExpressionNames or Primary.ExpressionName. Primaries are other forms of expression such as "a[10]" and "a.b(10)"; the exact grammar of Primaries is unimportant for this document.

The following description of rules for resolving variable access expressions are different from the approach taken from the Java spec (see Sections 6.5.5 (p. 95) and 15.10 (p. 319) for that approach). The approach taken here is borrowed from the approach taken by the Java spec for methods.

If the name to resolve is a simple name I that appears within the scope of a local variable declaration (including an argument declaration), then I denotes that declaration. Local variables cannot be shadowed, so there can be at most one local variable named I in scope.

Otherwise, the name is resolved to a declaration of a field in a type according to the following, multi-step process:

  1. Determine the field's simple name and a type to search.
  2. Select from the type to search all method declarations that are accessible.
  3. Given the field's simple name is I and the type to search is T, the next step is to determine the candidate declarations for this variable access expression, which are all field declarations for I in T that are accessible to the expression. These candidates can include declarations found locally in T and also declarations in supertypes of T. If the candidate set is empty, then the expression is undefined.

    The accessibility of a field declaration to a variable access expression has to with the access modifiers public, protected, none, or private. Whether a declaration is accessible to an expression depends, in the usual way, on the location of the variable access expression and on the access modifier of the declaration.

  4. Select the accessible declaration from the least common type.
  5. The final step is to select out of the candidate declarations the one from a type that is a subtype of all the other candidate declarations. If no such declaration exists, then the access is ambiguous.

    The expression "super.I" is treated as sugar for "((C)this).I", where C is the direct superclass of the class containing the expression.

    Resolving MethodNames

    As with ExpressionNames, we describe the rules for resolving MethodNames in the larger context of rules for resolving the method designation part of a method invocation. Like variable access expressions, method designations have two forms, MethodName or Primary.MethodName, but they appear in invocations, that is, to the left of a list of argument surrounded by parenthesis.

    More than one method can have the same name, and the particular method denoted in an invocation expression depends on the type of arguments passed to the expression. Resolution for MethodNames must deal with this issue.

    The rules given below are taken from Sections 6.5.6 (p. 98) and 15.11 (p. 324) of the Java spec.

    Resolution of method designators is best understood as having multiple steps:

    1. Determine the method's simple name and a type to search.
    2. Select from the type to search all method declarations that are accessible and applicable.
    3. Given the method's simple name is I and the type to search is T, the next step is to determine the candidate declarations for this invocation, which are all method declarations for I in T that are both accessible and applicable to the invocation. These candidates can include declarations found locally T and also in supertypes of T. (Multiple declarations for the same method -- some overriding the others -- may end up in this candidate set.) If the candidate set is empty, then the method designation is undefined.

      The accessibility of a method declaration to a method invocation has to with the access modifiers such as public and protected. Whether a method declaration is accessible to a method invocation depends, in the usual way, on the location of the invocation and on the access modifier of the declaration.

      The applicability of a method declaration to a method invocation has to do with the numbers and types of arguments. Specifically, a method declaration is applicable to an invocation if (a) the number of formals and actuals are equal and (b) if the type of each actual is "method-invocation compatible" with the type of the corresponding formal.

      Method-invocation compatibility of types is defined as follows:

      (For details, see Chapter 5 of Java spec.)
    4. Select the most specific of the accessible, applicable methods.
    5. The final step is to select the unique, maximally specific declaration out of the candidates determined in the previous step. If there is no single declaration in the candidate set that is more specific than all the others, then the invocation is ambiguous.

      "More specific" in this context is a binary relation on method declarations that takes into account the names of methods, their argument signatures, and their position in the type hierarchy. More specifically, method declaration M is more specific than method declaration N if and only if:

      1. their names are the same;
      2. they both have the same number of formals;
      3. M is found in a subtype of N ("subtype" here is reflexive);
      4. the type of each formal of M is method-invocation compatible with the type of the corresponding formal of N.
      An implication of the third rule is that if a subclass overloads an inherited method in a contravariant manner (that is, it declares a method with the same name as an inherited method but with a more general signature than the inherited method), then casting will be necessary to call either version of the method from the subclass.
    Notes:

    Reclassifying AmbiguousNames

    Assume that N is an AmbiguousName that appears in the declaration of class or interface T of compilation-unit C of package P. The following rules are used to reclassify AmbiguousNames (see p. 91 of the Java spec):

    Package queries

    The Java spec says that "[e]ach Java host determines how packages ... are created and stored" (p. 115) By "Java host," they mean a set of Java tools (I think), so ESC/Java would be a "host system." Thus, we have to define (and publish) how packages work in ESC/Java.

    Before doing this, it's important to note that neither the dynamic nor static semantics of Java depend on whether or not a particular package "exists" in the sense of being declared. Rather, the Java semantics depends only on a definition of whether or not a package with a given name contains a declaration of a type with a given name. Thus, we are concerned only with this latter question.

    More specifically, this section gives a definition of package queries. A package query takes a PackageName P and a simple name I and returns either the class or interface named I in package P or an indication that no such class or interface exists.

    Our rules assume that packages are represented as directories in a hierarchical file system and a "class path" variable that says where in the file system these directories are found.

    The class path variable is a sequence of absolute paths in the file system. These paths name either directories in the file system of "zip" files, which, for the purpose of this document, can be viewed as equivalent to directories.

    We assume the function R(P) maps PackageName P to a relative directory path in the obvious manner (the simple-name components of P are mapped, in order, to directory-path segments in R(P)). We assume the operator "+" takes an absolute path and a relative one and combines them into a new absolute path.

    Assume the class path consists of the paths C1, C2, ..., Cn. The package query "find the type named I in package P" is answered as follows. Find the lowest i in 1..n such that Ci + R(P) contains a file named either I.ji, I.java, or I.class. If no such i exists, then indicate that there is no declaration of I in package P. Otherwise, pick I.ji, I.java, or I.class --- the first that exists --- and parse it for a declaration of I. If such a declaration exists, return it; otherwise, indicate that there is no declaration of I in P.

    Comparison to Sun's rules:

    One more, nit-picky note. If P1.P2 is a PackageName, I don't think it's particularly useful to think of P2 as a "subpackage" of "P1". There are no special relationships between entities in P1 and P1.P2, for example, P1.P2 doesn't have access to "hidden" entities of P1. P1 names a package that's not relevant to package P1.P2. Thus, this document has not made mention of any concept of "packages" "containing" packages, while the Java spec does.

    (There is one way in which the Java spec treats P1.P2 as related to P1: it explicitly disallows P1 from containing a class or interface whose simple name is P2. This is a restriction that our tool for checking class paths enforces. I don't think this is rule is sufficient reason to confuse matters by introducing the concept of packages "containing" packages. Also, unlike other entities, the definition of a package is distributed over a file system, database, or network in a way that's explicitly left undefined by the language. Thus, I think name resolution is easier to explain when packages are treated differently from classes, interfaces, members, and local variables, which is why packages are not considered "declared entities" in this document.)

Legal Statement Privacy Statement