Holiday Reading: Spark, Shark and MapReduce


Holiday Reading for GeeksThere I was, sitting in the shade in the Lake District enjoying 30C temperatures a few months ago (no, really) surfing the web on the ipad mini, as you do. Weather, check. Football, check. Investments, check. Nearby mountain bike trails, check.

I'm not sure how I got there, but I ended up on the AMPLab site at UCB, as you do. A bit of 'light holiday reading' filled a few hours covering stuff like Spark and Shark. Interesting, no, make that *very* interesting.

I emailed one of the UCB researchers to alert him to a typo in one of the papers he'd co-authored. That's just the type of kind-hearted chap I am. Somewhat predictably, I got a swift response even though it was a Sunday. Geeks are never off duty - unless, of course, someone's PC is broken and they need your help (tell them to get a Mac).

We traded emails back and forth quite a few times. What really struck me was just how much these folks 'get' the space, and that they appeared to be addressing issues that are well worth attention.

The AMPLab folks are building Big Data solutions and making them available as open source Apache projects - very commendable. There is also a well funded startup called Databricks designed to exploit the fact that they developed Spark and Shark. Should be interesting to see how that plays out against MapReduce, to say the least.

A few months ago we were in discussions with a client about some work we were doing to analyse web log data using AWS. We pitched the idea of using Spark on AWS. The suggestion was well received, so off we went. The results have been very encouraging. Using a combinations of Spark and Mahout we've been able to generate high-quality personalised customer recommendations.

As testament to the Spark vision, it is worth noting the interplay with Cloudera.

If you can spare the time, the first  Spark Summit is planned for early December in San Francisco (where else?).

Summary - how we go about doing 'Big Data' is far from a done deal. Watch this space.