Team VLDB were delighted to attend the Pivotal UK launch event in London on Wednesday evening this week. The champagne, wine and beer certainly flowed. We're guessing not everybody made their morning appointments the next day. You know who you are!
Anyway, we were chatting away with the Pivotal folks in Shoreditch's award-winning McQueen Bar. A certain Mr Day reminded us that we first met in Vienna at an EMC conference 4 years earlier, on what was his first day at EMC Greenplum. He also pointed out that yours truly explained the concept of massively parallel processing (MPP) using a supermarket analogy. Cue much rolling of eyes from messrs Jackson and Agnew. Thanks chaps.
So, without further ado, let's move on to said analogy as an aid to understanding parallel processing.
The traditional data processing stack consists of a multi-core SMP server plus some storage - think Windows and SQL Server on a Dell box, or Linux and Oracle on a Sun box, for example. Although this is by far and away the most dominant setup in use today, and has been for many years, processing bottlenecks abound:
"The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and power consumption of the interconnect among the various processors, the memory, and the disk arrays." (source: http://en.wikipedia.org/wiki/Symmetric_multiprocessing).
When users complain about performance, or when requirements change, the answer is to add more 'stuff' to the SMP stack, or to upgrade the software and fiddle with settings. Adding more hardware in the form of CPU or RAM is know as 'scaling up' and leads to what is known as 'fat nodes', where a node is an SMP server.
The issue of SMP scalability, or lack thereof, and the scaling up response, is explained with the supermarket analogy:
The challenge is to retrieve a list of groceries - let's say 40 items - from a supermarket containing 20 aisles. We do not know where the items are to be found, so we set off and walk up and down each aisle until we have all 40 items in our basket. So far so good. However, what can we do if we took longer than expected? Well, we can run up and down the aisles - that should knock a bit of time off. Still too long? Run faster! Still too long? Ask a faster runner to take over. The point here is that there is a limit - which was built in right from the start - as to how fast we can run up and down the aisles in search of the required items. Once we have somehow managed to convince Usain Bolt to take over we simply can't go any faster. It's also going to be very expensive to hire Mr Bolt, eh Mr Branson?
So, what can we do?
Easy - sack Mr Bolt and get some helpers. We've been out into the car park and asked the kids on skateboards to come and help. In fact, we've enlisted the help of 10 of them - 1 per 2 aisles. All they have to do is walk up and down 2 aisles each and retrieve the items on the list that happen to be in their aisles. Some will find more items than others, but that doesn't matter. The important point is that each teenager has only had to walk up and down 2 aisles. They can manage that faster than Usain Bolt can run up and down 20 aisles, even whilst messaging their friends and tripping over their ill-fitting jeans. This is the power of parallelism - chopping a big problem up into more easily solvable smaller problems.
So, what if we want to go faster still?
Easy - get even more helpers. If we ask another 10 teenagers to help, we end up with 20, meaning each teenager only has to walk up and down a single aisle. We've increased the number of helpers by 2x, which will speed things up by 2x. This is known as linear scalability. The key difference here to the previous attempt to speed things up is that we don't have to
a) swap the teenagers for faster runners or
b) ask each teenager to stop messaging, pull up their jeans and run faster. We've achieved a 2x speed up by simply assigning less work to each teenager.
This has been achieved by 'scaling out'.
Let's hope that makes sense. Parallel processing is not an easy topic to digest.
I've always been a big analogy fan. Many thanks to Rick Day for the reminder about how supermarkets can be used to explain parallel processing.