More news from the Internet connected-tube-thingy:


Cool –

From: cmck….

I’m going to reference the blog on the landing page tomorrow. I know the readership will be more than pleased that we successfully poked the bear and got some insight from Cliff.


Also –


TheServerSide.com managed to raise the ire of Azul Systems’ Cliff Click, Jr., …


I’m not just a bear, I’m an irate bear!  


Just so’s you know – it takes a lot more than a casual question about “where’s my Java Optimized Hardware” to make me irate.  In fact, that’s a very good question – because the answer is not obvious (ok, it was obvious to me 15 years ago, but I was already both a serious compiler geek and an embedded systems guy then).  But it’s not-obvious enough that quit a few million $$$ have been spent trying to make a go of it.  Let me see if I can make the answer a little more bear-able:


Let’s compare a directly-executes-bytecodes CPU vs a classic RISC chip with a JIT.


The hardware guys like stuff simple – after all they deal with really hard problems like real physics (which is really analog except where it’s quantum) and electron-migration and power-vs-heat curves and etc… so the simpler the better.  Their plates are full already.  And if it’s simple, they can make it low power or fast (or gradually both) by adding complexity and ingenuity over time (at the hardware level).  If you compare the *spec* for a JVM, including all the bytecode behaviors, threading behaviors, GC, etc vs the *spec* for a classic RISC – you’ll see that the RISC is hugely simpler.  The bytecode spec is *complex*; hundreds of pages long.  So complex that we know that the hardware guys are going to have to bail out in lots of corner cases (what happens on a ‘new’ when the heap is exhausted?  does the hardware do a GC?).  The RISC chip *spec* has been made simple in a way which is known to allow it to be implemented fast (although that requires complexity), and we know we can JIT good code for it fairly easily.


When you compare the speed & power of a CPU executing bytecodes, you’ll see lots of hardware complexity around the basic execution issues (I’m skipping on lots of obvious examples, but here’s one: the stack layout sucks for wide-issue because of direct stack dependencies).  When you try to get the same job done using classic JIT’d RISC instructions the CPU is so much simpler – that it can be made better in lots of ways (faster, deep pipes, wide issue, lower power, etc).  Of course, you have to JIT first – but that’s obviously do-able with a compiler that itself runs on a RISC. 


Now which is better (for the same silicon budget): JIT’ing+classic-RISC-executing or just plain execute-the-bytecodes?  Well… it all depends on the numbers.  For really short & small things, the JIT’ing loses so much that you’re better off just doing the bytecodes in hardware (but you can probably change source languages to something even more suited to lower power or smaller form).  But for anything cell-phone sized and up, JIT’ing bytecodes is both a power and speed win.  Yes you pay in time & power to JIT – but the resulting code runs so much faster that you get the job done sooner and can throttle the CPU down sooner – burning less overall power AND time.


Hence the best Java-optimized hardware is something that makes an easy JIT target.  After that Big Decision is made, you can further tweak the hardware to be closer to the language spec (which is what Azul did) or your intended target audience (large heap large thread Java apps, hence lots of 64-bit cores).  We also targeted another Java feature – GC – with read & write barrier hardware.  But we always start with an easy JIT target…




8 thoughts on “Un-Bear-able

  1. What is the underlying operating system in Azul machines?

    Yep, as a compiler writer it was immediately obvious to me that executing Java bytecode on a real CPU is a bad idea.

    I’m also curious, what is the underlying operating system in Azul machines? Do you use Linux (or other stock OS) or do you have your own kernel?

    PS: is there a way to play with Azul systems without paying $$$$$$$$?

    • It’s our own

      It’s our own micro-kernel-style OS.  No swap.  No device drivers.  Scheduler for 100’s of CPUs with guaranteed memory & CPU resources.





  2. Thanks for sharing the knowledge

    Reading your blog is always great. I learn so much and feel so stupid at the same time.

  3. No RSS feed link?!

    I may be crazy, but I can’t find an RSS feed link for your new blog. Is there one? If not… well… your readership is about to take a serious dive!


  4. “Reading your blog is always

    “Reading your blog is always great. I learn so much and feel so stupid at the same time.”

    ha ha, feel the same. Dr. Click always a pleasure to read your blog.

  5. VM bytecode design choices

    Cliff: OK, so Java’s stack-based bytecodes do not directly map well with classic RISC CPUs, but nevertheless can be JITed quite nicely into them.

    My question is if you were designing a new language today, with all that you know now, would you design its bytecodes similar to Java’s?

    Are there lots of other advantages (e.g. in the bytecode compiler, or in the elegance of the spec, or something else) that make Java’s stack-based choice still good?

    Or would you implement the bytecodes in a way that more easily maps to high performance hardware, so that custom CPUs could be made which directly execute the bytcodes if need be?

  6. I’d go with a register-based

    I’d go with a register-based design pattern, instead of a stack-based pattern. When Java was first taking off there was a lot of research done into alternate bytecodes styles; some of those results looked quite impressive – but it’s “too late”. That horse left the barn long ago.


Leave a Reply

Your email address will not be published. Required fields are marked *