data warehouse appliance

Data Warehouse Appliances - Past, Present and Future

Data Warehouse Appliances

About 10 years ago, around 2003/04, we started to see the real world adoption of the ‘data warehouse appliance’, specifically the Netezza appliance. The Netezza co-founder and CTO, Foster Hinshaw, is credited with being the ‘father of the data warehouse appliance’. The world was seemingly going appliance crazy back then.

The data warehouse appliance concept, as per any appliance, simply means that the entire hardware/software stack - cabinet, servers, storage, OS, DBMS, networking - is delivered pre-configured as a self-contained unit by the vendor. The benefits are that it 'just works' and that there is a 'single throat to choke' when things, ahem, don't work as well as hoped.

Compare and contrast the appliance approach with the more common approach of a mixed bag of general purpose servers (Sun/Dell/HP/IBM), operating system (Windows/Unix/Linux), storage (EMC/Netapp) and database software (Oracle/SQL Server). Not having to spend forever tuning and optimising the traditional non-appliance stack is the main appliance value-add. Appliances are delivered pre-configured and optimised to perform a specific task, in this case data warehousing. As Foster liked to say: “You don’t tune your fridge for milk”. Well said that man.

Netezza's original line of 'Mustang' appliances were ported to IBM blade hardware, unsurprisingly, when IBM purchased the company. No doubt this work started before the purchase and also made the decision to join forces an easy one for both parties. The 'CPU plus FPGA' architecture remained, hence the Netezza 'TwinFin' (2 blades) moniker.With the rise of Netezza, several folks started to mumble that Teradata had been around since the 1980's and that the Teradata offering was an actually an appliance, in fact, an appliance that clearly pre-dated Netezza. We tended to agree. The difference is that from the 1980’s until Netezza came on the scene in the early 2000’s Teradata failed to refer to their offerings as an appliance. Teradata offered single node SMP systems, multi-node MPP systems, high capacity systems (high storage per CPU), balanced systems (balanced storage and CPU), high compute systems (low storage per CPU) - all pre-configured and delivered by Teradata, but never something they called an ‘appliance’.

Why did Teradata start to offer systems they actually called ‘Teradata appliances’?

Teradata launched a series of appliances alongside the long-established 'enterprise' systems not long after Netezza started to gain market traction with their ‘data warehouse appliance’ messaging. Teradata differentiated the new appliance offerings from the traditional enterprise offering in key areas:- inability to scale out the appliance by adding extra SMP nodes- TASM-lite workload management- slower/cheaper disk sub-system

None of the above matters a great deal to the end users - a Teradata appliance is still Teradata and works just like the enterprise Teradata systems...for a lot less money!

Why do Teradata and Netezza sell pre-configured data warehouse appliances?

No doubt this is partly because 'engineered systems' (to use the current Oracle vernacular) are designed, configured, packaged and tested to 'just work'. This is a good thing from the perspective of both the vendor and the customer. There is also the significant benefit for the vendor that it makes support a lot easier.

There is another, perhaps less obvious, reason - data warehouse appliance offerings from both Teradata and Netezza contain proprietary hardware. In Teradata's case the bynet interconnect, and in Netezza's case the FPGAs. The hardware required to run Teradata or Netezza simply does not exist in a commodity off-the-shelf (COTS) stack of server, storage and networking hardware.

So, it could be argued, quite reasonably, that both Teradata and Netezza have no choice but to deliver their respective offerings as a data warehouse appliance or 'engineered system'. They simply can't be put together out of commodity hardware.

Are there any downsides to the data warehouse appliance approach?

Well, with the benefit of 10 years worth of observations of data warehouse appliances out in the real world, we can think of a few:

- the need to pick the right sized appliance

- large capital outlay to get started

- appliance hardware can't be re-deployed elsewhere

- appliances can't be upgraded easily (if at all)

- hosting non-standard data warehouse appliances in 3rd party data centres

- each non-production platform (development, UAT, QA, HA etc) requires another physical appliance

The main issue with the data warehouse appliance can be summarised quite simply as: inflexibility.

What does the future hold for data warehouse appliances?

Until relatively recently we'd have suggested that the market was shifting away from expensive 'enterprise data warehouse' class systems towards cheaper 'data warehouse appliance' systems. And quite reasonably so, in our opinion. The move towards COTS hardware has no doubt been fuelled by the Hadoop crowd and their non-reliance on expensive hardware.

However, what we've experienced over the last year or so is a strong preference for 'the cloud' - and this means public cloud - to be the answer. Along with 'Hadoop' and 'Big Data', the cloud is the other technology theme riding the hype cycle at the moment, so this should come as no surprise.

In fact, we've noted not just a strong preference for the cloud to be the answer, but it is almost always the number one choice, if not the only choice in the minds of customers and prospects alike. Who are we to argue?

The adoption of public cloud as the preferred data warehouse deployment platform presents challenges for vendors whose current data warehouse offering is 'appliance only'. Those that require proprietary hardware that makes public cloud deployment a non-starter face an even bigger challenge.

Does the cloud spell the end of the road for the data warehouse appliance?

Not in the short term, but there is no doubt that data warehouse users are now demanding a public cloud version of their preferred data warehouse DBMS.

The main motivation: flexibility.