Progress + Vacation

It’s been a freak’n month since I last blogged!  Where’s the time gone???

Mostly I’ve been furiously coding.  ‘wc *java’ of our ‘src’ directory now reports 31500 lines.  We’ve cleaned up and CSS’s the web interface.  We added LevelDB to handle zillions of small K/V pairs (larger ones go to the local file system directly, and of course we still handle S3 and HDFS natively (either using an existing hadoop install, or directly *being* a distributed hadoop)).  We’re still 100% peer-to-peer, even for the direct HDFS stuff.  Last week I hacked a concurrent Patricia Trie (leaving the making of a *distributed* concurrent Trie for later, but now I know how to do it…). Then we ran all 36Gig of Wikipedia data through WordCount, using that Trie – it took less than an hour on 1 node.

This week it’s about running a Linear Regression *distributed*, using distributed Fork/Join as the programming paradigm.  Also, integrating a HashMap-in-a-Value (so we can pass about & maintain the Map interface in the Value piece of our K/V store – think: distributed JS objects), plus the final bits of VectorClocks (all behind the scenes; the VCs will let us do atomic update and strong coherence of Keys but they’re a horrible API to expose).  We’re building a toolkit approach to solving the problem of building a reasonable database over the Cloud.  Either (distributed) Patricia Tries or (distributed) Concurrent Skip Lists for range queries, plus JS-like objects in Values, plus atomic (transactional) update of individual JS objects using a Compare-And-Swap like approach (instead of locking: CAS is much faster under load, as threads can optimistically make progress).

More on all of the above later this week – as we have a hard deadline to finally *open* our Open Source project.  Yeah, yeah, yeah, I’ve been hassled plenty about calling ourselves Open Source and not (yet) having any open source… we’ve been trying to get the basics done first… but the real news:  I’m finally going on Vacation!!!

Yes, Nessie, the 31′ 7-ton Class C RV of Doom is being prepared for our 7000 mile Epic Cross-Country Journey.  I’ve been wanting to do this for a decade now: take the entire clan (7 of us!) across country, touring all the junk tourist traps we can and visiting our scattered family as we go.  We got family in Tucson AZ, San Antonio TX (well, Luling really), Houston, Atlanta, DC area, and Connecticut.  I’m giving an invited lecture at UIUC on our way back, and have been assured I can use that lecture as a reason to declare this a “business trip”, and deduct all the gas and mileage costs – I figure about $3500 in gas alone.  We stopping at Stone Mountain in GA over the 4th of July, visiting my brother and camping at the lakeside facing the mountain where we’ll watch the fireworks and lazer show from the RV roof.  We’re going to visit Carlsbad Caverns.  We’ll pass through DC and maybe attempt the Smithsonian (not sure about that one; depends on the schedule and how badly I want to fight the RV through DC traffic).  We’re visiting my Uncle’s classic family farm in Connecticut where my 4 cousins live – all my age, all married with 3 to 4 kids each… all about the same age as my 4 kids.  We’re talking now about 15 to 20 neices and nephews, plus Aunts & Uncles galore, and of course pigs and chickens and horses.  It’ll be a regular zoo.

So if you see a large white whale heading east on I10 with a frazzled Shelley or my excited 19yr-old at the helm, honk, wave Hi and give us a wide berth…



Quote(s) of the Month from Kevin Normoyle (Sun/Sparc & Azul L2 Cache Designer Extraordinaire, Cache Coherence Advisor to 0xdata):

Reminds me of CS101, on one of my first programs.  The grader wrote in big red letters over my big comment block:
“Don’t document your bugs, Fix them”

So I asked Kevin if I could quote him, and I got this response back:

ah that’s fine…I spout “Advice” left and right to everyone… Many dismiss it as “Rant”.  There’s always that fine line between being a Prophet, and just another crazy guy standing on the corner yelling.  One could argue that everyone who every posts to Twitter is an “Advisor” of some sort, to the world.

Sound advice, from a (reluctant) adviser to the world.

4 thoughts on “Progress + Vacation

    • We’ve made some JNI wrappers. We’re using LevelDB for handling persistence of zillions of tiny K/V pairs on a single node, we’re during the distribution & repair logic.

  1. Are there any less mathematically oriented papers on VectorClocks or distributed computing in general ? Something that can be used to code simple prototypes in Java.

Leave a Reply

Your email address will not be published. Required fields are marked *