Data Warehouse Expansion/Upgrade via SMP Scale-Up

Most data warehouses run on SMP servers with SAN/NAS as the storage.

How do we know this? Well there can't be many data warehouses running on a single CPU, surely? There aren't many running MPP - maybe a few thousand globally (Teradata - thousands, Netezza - hundreds)? That leaves the rest, probably tens or hundreds of thousands of systems running the traditional SMP stack. So what happens when Johnny User inevitably complains about query performance, and/or lack of  space to hold his wondrous creations? Fast is never fast enough and more space is never enough. This is hardly new news to most of us.

So, when the inevitable complaint reaches IT what happens? Well, most of the time an 'upgrade' is proposed. The sales guy promised it would be X times faster or Y% bigger, the users are footing the bill (ho, ho!) and the techies are more than happy noodling away adding better/faster/more 'stuff' to the system. An all round winner, surely?

This is the 'SMP scale-up' or 'fat node' approach in action - throwing more 'stuff' at the problem e.g CPUs, RAM, disks and making a fatter SMP node. But, crucially, it's a fatter single SMP node.

What are the benefits of this approach:

- uses 'commodity' x86 servers from HP, Dell, IBM etc

- uses existing SAN/NAS infrastructure

- DBMS skills abundant in-house

- OS skills abundant in-house

Even when there is compelling reason for a platform change, at least in the eyes of a 3rd party observer, this is often the last thing that will get discussed. Why? Well, 'turkeys don't vote for Xmas' and DBAs don't vote for change."Will this work?" asks users or management..."of course it will, after we've upgraded to version X and added more Y..." reply the techies.

So off they go and build a star schema with a plan to "add more stuff" as and when it's needed as the get-out-of-jail-card.SMP data warehouses use general purpose/OLTP databases such as SQL Server or Oracle to support BI, which is not what they were designed for. Fundementally though, the underlying single SMP 'fat node' does not scale and every upgrade costs more than the last - diminishing returns in action.

'Fat node' drawbacks include the following:

- poor load performance

- poor backup performance

- poor query performance

- poor query concurrency

- poor unload performance

- poor query optimisation

- impact on shared SAN/NAS users

It's also hard to know why it doesn't work when the stack consists of a general purpose DBMS running on SMP tin and SAN/NAS storage, which is often shared  ('not my fault' syndrome). Ironically, none of the components are normally broken, it's just resource contention inherent in the approach.

So, if the SMP 'fat node' stack is hard to tune and lacks scalability, why do most data  warehouses end up getting built this way? Well, at the outset it looks/feels like the right thing to do ("we have all the parts and the skills") and there is often a mistaken belief that it will work. The rest of the industry can't be wrong, after all.

Data volumes in the past may have meant that most sites could actually get away with this approach, but this is increasingly not the case. The evidence in support of this is the recent advent of Oracle Exadata and Microsoft SQL Server 2008 R2 Parallel Data Warehouse ("Madison").

Oracle and Microsoft clearly recognise the limits inherent in the SMP 'fat node' approach to data warehousing.The future is parallel ;-)

Footnote - I actually presented a paper on SMP and MPP scaling several years ago.