Too Much Theory, Part 3

This is the 3rd and last part of this blog!  (thank heavens!), where I wax poetic about Lattices and Type Theory, and their applications to Compilers and in particular Java JITs.

Part 1: Too Much Theory

Part 2:  Too Much Theory, Part 2

A Quick Recap

We’re building a lattice, to allow us to do exciting optimizations on Java programs.  Our lattice has distinguished top and bottom types, and is Commutative, Associative, and Symmetric.  We understand pointers can be null, not_null, or unknown (e.g. bottom).  Because of symmetry across the lattice centerline, we also have the dual of not_null which we’ll call any_null.  For this recap, we’ll also have Java Classes (e.g. String), but we’re going to ignore subclassing for the moment.  Our lattice looks something like this:

Wow, that’s getting fancy…  lets revisit some of the elements in this lattice real quick:

  • bottom: All possible values, including null, as computed by running the program.  No constants.
  • String:bot: All possible Strings, including null, as computed by running the program.  No constants, no XMLNodes.  For brevity I will sometimes refer to this as String (no trailing :bot notation).
  • XMLNode:bot or plain XMLNode: All possible XMLNodes, including null, as computed by running the program.  No constants, no Strings.
  • bottom:not_null: All possible values, as computed by running the program.  No constants, no null.
  • String:not_null: All possible Strings, as computed by running the program.  No constants, no XMLNodes, no null.
  • String:hello_world: The constant String “hello world”, and nothing else.
  • null: The constant null.
  • String:any_null: All possible String constants, all at once.  No XMLNodes, nor null.  An impossible situation in a real program, but a useful mathematical notion when running Constant Propagation.
  • String:top: All possible String constants and the null constant, all at once.  No XMLNodes.
  • top: All possible constants, including all Strings and all XMLNodes and null, and all them all at once.

Notice the symmetry: ever Node below the centerline of constants also appears in dual form above the centerline.  The edges are also symmetric: e.g. top has 4 edges out and bottom has 4 edges in, String:top has 1 in, 2 out, and String:bot has 2 in, 1 out, etc. This lattice will let us find the constants in code like this example from last time:

final class LinkedListThing {
  LinkedListThing _next; // instance field
  int hashCode() {
    int hash = 0;
    Object x = this; // x is LinkListThing:not_null
    while( x != null ) {
      hash += x.hashCode(); // x is LinkListThing:not_null
      if( x instanceof LinkedListThing )
         x = ((LinkedListThing)x)._next; // x is LinkedListThing
      else // with Conditional Constant Propagation...
         x = "fail"; // ...this path is dead
      // x is LinkedListThing:bottom and not a String
    return hash;


And Now Gentle Readers, Let Us Ponder Subtypes

Java, of course, has subtypes. So does C++ and many other languages. Lets take a look at what it would take to add subtypes to our lattice. This tiny program snippet makes exactly a Hashtable:

  Hashtable x = new Hashtable(); y = x.get(); // Calls Hashtable.get

And this snippet makes a known subclass of Hashtable, but treats it like a Hashtable:

  Hashtable x = new Provider();  y = x.get(); // Calls Provider.get

And this snippet mentions an UNknown subclass of Hashtable:

  void foo( Hashtable x ) {      y = x.get(); // Calls ???

When does ‘Hashtable x’ refer to exactly a Hashtable, and when does it refer to some subclass of Hashtable? Knowing if some value is exactly a Hashtable versus a subclass is very useful: we can optimize calls made to exactly known classes. Example: What function is called when we call “x.get()”? Well, if x is exactly a Hashtable, we get Hashtable.get() … and we can inline “get()”. If x is a Hashtable-or-a-subclass, then “x.get()” might be Provider.get() or Hashtable.get() or some other user-specified derived version of “get()”, and we cannot inline. The job of figuring out if ‘x’ is exactly of class Hashtable, or some subclass of Hashtable falls to Constant Propagation – and that requires we represent this notion of ‘exact class’ in the lattice.

Lets add the notion ‘Hashtable:exact‘ to mean exactly a Hashtable, and NO subclasses, while plain ‘Hashtable‘ allows subclasses. This is an independent axis from allowing null, so we can still have e.g. Hashtable:exact:not_null.  Note that final classes like String cannot subclass and so are always String:exact; constants are always the exact class that they are, e.g. Hashtable:exact:0x1234.

I have another “Lattice” for you to ponder, with Hashtables instead of XMLNodes.  I’m still using bottom in places but I might as well use Object:bot or just plain Object.  Also, I ‘twisted’ the picture slightly: lattice elements in the middle all exclude null, and those on the outer edges all include null.  I had to duplicate the null element to make the graph lay flat visually, but really they are the same element.  I could have laid the graph ‘flat’ on a cylinder, but the web’s not up to 3-D visualization yet.


And Now The Punch Line

And now for the Trick Question: what happens when I end up mixing together (with Phi functions on loop backedges) Hashtable:top with String:exact:not_null?  (A different question is how I come to such a situation that I attempting to mix these types… but trust me I’ve seen this happen in QA from suitably complex programs!)  So back to my Question: I am look for the most precise answer possible, anything less and I’ll be losing type information for no good reason.  So for example bottom is a correct answer, but very conservative – we can do better than that!

How about bottom:not_null?  Since for Hashtable:top the compiler can pick any Hashtable it wants – we want it to pick a Hashtable, ANY Hashtable.  The result of mixing that with String:exact:not_null is … some random not-null Object: bottom:not_null.  This might let me remove a null-pointer check at compile time.

But we could also take just the null from Hashtable:top, since the compiler is allowed to pick that instead!  Remember: the top label means the compiler “gets to choose” during the course of Constant Propagation – and picking a more precise answer makes it more likely we’ll find a constant and have our choice remain correct.  So mixing null and String:exact:not_null yields String:exact.  This might, e.g., let me convert an unknown x.hashCode() into a known String:hashCode() call.

So which one do I pick?  bottom:not_null or String:exact?  Typically I have no idea during the course of CP which one will pan out better – or actually if either will lead to a better final answer.  The Real Trick here is: I should not be required to pick.  The Constant Propagation algorithm relies on having a lattice – and lattices are defined as having a unique lower-bound (and unique upper-bound) for any two elements.  This structure does NOT have a unique lower-bound, and so is no longer a lattice!  For some particular Java program and some particularly bad luck with our CP algorithm – the CP algorithm can hang indefinitely, or miss some constants on some runs – but find those constants on other runs for the same Java program.  Garbage-In Garbage-Out.  CP is technically wedged.

In practice HotSpot’s CP does not hang, although I suspect I can carefully arrange a Java program for which it would.  Instead, I end up triggering asserts in QA about how my lattice is lacking the proper symmetry (and hence losing constants for no other good reason, but only in weird programs).  I *did* construct 2 simple programs for which choosing bottom:not_null versus String:exact would find a constant (and enable an optimization) in one program and not the other… and vice-verse for reversing  choices.



As you might have guessed by finding this blog, I’m off to a new adventure – this time using a JVM instead of building one (well maybe: I’m very pragmatic; may the best language win!).  And while I’ll probably still be tinkering in HotSpot from time to time, I think my future blogs will mostly be about my new adventure!

— till next time, Cliff


4 thoughts on “Too Much Theory, Part 3

  1. A few random thoughts.

    First off, I wouldn’t have thought that Object is the same as bottom, because Object is semantically NOT a supertype of interfaces.

    Secondly, could you solve the problem by NOT making “null” the same element everywhere? The way I would imagine this working is to make “null” typed in the lattice, so that String:null and Hashtable:null are different elements.

    The null constant would be easy to handle: it is top:null. The complication would come when dealing with a case such as the following. You’d like to eliminate the cast here:

    Object o = new String(“foo”);
    String s = (String)o;

    and make this throw an exception without doing the class test:

    Object o = new String(“foo”);
    XMLNode n = (XMLNode)o;

    This, however, is valid, if silly:

    Object o = (String)null;
    XMLNode n = (XMLNode)o;

    But then, even if null is a single element in the lattice, you have the same problem:

    someMethod(XMLNode param)
    Object o = param;
    String s = (String)o; // Can’t optimise the cast away just in case param ls null.

    Perhaps this is all just evidence that null isn’t a mathematically sound concept?

    • In the Azul JVM version of the server compiler (NOT Oracle HotSpot), interfaces became a set-like attribute; any given class had its position in the classic Java hierarchy, and also a list of the set of interfaces that it had to implement. So interfaces became orthogonal to the class hierarchy.

      Actually, I like the notion of the null constant being top:null; I haven’t pondered that version deeply enough. Basically your solution claims null is all possible classes, which pretty much makes sense.


      • That’s an interesting way of doing it, but I think you’d lose quite a bit of precision for inferring interfaces. But I guess that would depend what you’re using the analysis for.

        I’ve wondered for a while if it would make sense for Java to have, possibly under the covers, an interface from which all interfaces are derived, akin to Object.

        What this indicates to me is that language designers really should think about this sort of thing up front. Would inheritance hierarchies make more sense if they were semi-lattices to begin with, with only a few pseudo-elements required to make it a (not necessarily symmetric) lattice?

        The sad truth is that for non-academic languages, semantics is usually an afterthought.

        • Actually I get a lot MORE precision about interfaces when I treat them as a set, because (again until the old setup) – interfaces would *replace* the class heirarchy – the server compiler would understand one or the other but not both. For Azul, we get BOTH.

Leave a Reply

Your email address will not be published. Required fields are marked *