A Non-Blocking HashTable

I’ve been wrestling with concurrent algorithms again.  This time, it’s a Non-Blocking (concurrent, lock-free) HashTable.  I’ve had it basically figured out for a few months now and I’m slowing widening the circle of people I expose it too.  This HashTable seems too good to be true, which is why I’m trying hard to vet it’s correctness before claiming victory.  It’s single-threaded performance appears to equal java.util.HashTable, and at high CPU count (750 cpus on a high-end Azul machine) it’s about twice as fast as Doug Lea’s marvelous java.util.concurrent.ConcurrentHashMap – with 4096-way striping.  This is for very high read-to-write ratios (99% reads, 1% table mods).  For higher write ratios the Non-Blocking table scales much better than ConcurrentHashMap.  Of course, if you lower the 4096-way striping to the default 16-way, then the Non-Blocking table becomes easily 100x faster.  Performance summary:

  1. As fast as java.util.HashTable for single threads
  2. As fast as ConcurrentHashMap for < 32 cpus and >95% read rates
  3. Faster for higher write rates than ConcurrentHashMap.  (2x to 8x faster, depending)
  4. Faster for more CPUs.  Much faster if not striping ConcurrentHashMap enough (easily 100x faster)
  5. Scales linearly for all read/write rates at least up 750 cpus.  Tested on 4-way X86, Niagra, 32-way UltraSparc2, 384-way Azul Vega1 and a 768-way Azul Vega2.

Ok, how do I do it?

 

By writing a State-Machine based algorithm, instead of the usual concurrent programming style.

 

From this idea I get a simple state machine, a short Java implementation of it, and a very fast very scalable algorithm. 

 

First the easy stuff: I assume you have written hash tables in the past (nearly every non-Java programmer does at one point or another; the Java guys get it easy).  A hash table is a collection of {key,value} pairs with fast random access (first access is via the hash function).  I’ve chosen a power-of-2 closed table – collisions cause re-probing into the table (contrast this to an open table where collisions turn into linked-list traversals).  I chose power-of-2 instead of a prime number because ‘AND’ is faster than ‘MOD’ by an amount larger than an L1-miss-L2-hit.  I re-probe by adding 1 and AND’ing to the table size (better cache behavior for repeated re-probes).  Other design decisions make sense here, depending on your exact usage pattern.

 

Now the fun stuff: How do I handle the {key,value} pairs?  This is where the state-machine kicks in.  I define states for each interesting key & value.  For keys the states are {null,K} – where K is any key.  A key-slot makes a 1-time transition from null to some K.  For values the states are {null,V/T} – where V is any value and T is a Tombstone, a special token that is not any valid value and represents a deleted key.  A value-slot makes a 1-time transition from null to some V, and can waffle about being different V’s or T (deleted) according to whatever was the last table modification.

 

This gives me 2 key-states and 3 value-states, so the {key,value} pair has 6 states.

  • {null,null} – Empty
  • {K,null} – Partially inserted key/value
  • {K,V} – Fully functional {key,value} pair
  • {K,T} – Previously inserted, now deleted Key
  • {null,V} – partially inserted K/V pair being read out-of-order
  • {null,T} – partially inserted K/T pair being read out-of-order

 

Now here’s the cool thing: it doesn’t matter what order I read or write these bits within each pair, I always get some sane state.  I can act on that state.  This means I do not need to reason about ordering when either inserting or looking up data in the table!  This also means I do not need any memory fencing or any locking!  I do need CAS when changing the table (but not double-CAS, since I can read out-of-order I can write out-of-order as well – hence I do not need to atomically update both words).  Here, then, is the ‘get(Object key)’ code:
 

&nbsp; Object get( Object K ) {
&nbsp; &nbsp; idx = hash = K.hash();
&nbsp; &nbsp; while( true ) {
&nbsp; &nbsp;&nbsp; &nbsp;idx &amp;= (size-1);
&nbsp; &nbsp;&nbsp; &nbsp;key = table[2*idx+0]; // key/val kept in even/odd pairs
&nbsp; &nbsp;&nbsp; &nbsp;val = table[2*idx+1]; // in the large array
&nbsp; &nbsp;&nbsp; &nbsp;if( key == null ) return null;&nbsp; // a miss
&nbsp; &nbsp;&nbsp; &nbsp;if( K == key || key.equals(K) ) // key hit?
&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; return val == T ? null : val; // a hit (unless deleted)
&nbsp; &nbsp;&nbsp; &nbsp;idx++;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp; // reprobe
&nbsp; &nbsp; }
&nbsp; }

I’ve left out some details (e.g. using the hash to avoid doing some key-compares that are doomed to fail), but this is the gist of it.  ‘get’ is extremely lightweight – just the basic hash, lookup and return value for a hit.  More next time on how I do ‘put()’ as that is where the state machine gets a real workout.  But this post is already too long.

 

Cliff

 

PS – Slides will be online soon, and the source code as well (need to get the license figured out)

Leave a Reply

Your email address will not be published. Required fields are marked *