Teradata Is Still The Market Leader
When it comes to Data Warehouse databases, there has been a clear market leader for a long, long time. No surprises which technology it is…wait for it…Teradata!
Teradata were first to market with their eponymous analytic database way back in the 1980s, when a Terabyte (TB) was a lot of data. Yours truly was even running queries on a Teradata DBC1012 back then. That image is my original Teradata Reference Cards dated 1988 onwards.

The Data Warehouse space has hotted up considerably in the last few years. There are now more choices than ever. Teradata might be the market leader, but not everybody needs Teradata.
So, if Teradata is still the best, what about the rest?
Well, for those that have been paying attention, we’ve been *big* fans of Greenplum for a long time.
Given that ‘sharing is caring’ (apparently), here are the top 10 reasons why ‘team VLDB’ are fans of Greenplum.
1. Greenplum Is ‘Built For Anywhere’ Not Just the Cloud
Legacy Data Warehouse systems are optimised for on-premise physical hardware. That’s not surprising as legacy systems pre-date the advent of public clouds such as AWS, Azure and Google.
In contrast, ‘modern Data Warehouse systems’ such as Redshift, BigQuery and Snowflake are ‘cloud only’. None of the modern Data Warehouse systems support a non-cloud offering. Redshift is AWS only, BigQuery is Google only. There’s not always a lot of choice – pick your database or your platform, but not both.
Greenplum is the Data Warehouse literally ‘built for anywhere’. Greenplum can run on physical hardware, private VMWare clouds and public cloud platforms such as AWS, Azure or Google. To keep up with the kool kidz, Kubernetes (K8S) is also supported.
So long as your chosen platform supports Linux, with Greenplum you’re good to go.
2. Greenplum Is MPP
It is well established that general purpose SMP databases, such as Oracle, SQL Server & MySQL, can’t always scale up to meet the demands of a Data Warehouse system.
Unlike general purpose SMP databases, Greenplum is deployed as a clustered ‘Massively Parallel Processing’ (MPP) architecture. Unlike SMP, the MPP architecture is linearly scalable so it is always possible to add more compute, storage or network bandwidth.

Thanks to our friends at Teradata the MPP architecture has a successful 30 year track record out in the real world.
Greenplum’s scalable MPP architecture allows you to start with a single node and scale the cluster as required to meet your changing capacity, throughput and performance demands.
Thanks to the proven MPP architecture you can always scale out a Greenplum cluster to meet your exact requirements.
3. Greenplum Is Parallel Postgres
The Data Warehouse is often the key reporting, analytic and decisioning system within the modern enterprise. It is important to trust such activities to systems with a demonstrable track record.
Greenplum is a ‘Massively Parallel Postgres’ system, and is the *only* Open Source MPP Data Warehouse system.
The Postgres database has a 30 year track record and is regarded as ‘The World’s Most Advanced Open Source Database’.

Parallel Postgres databases such as Greenplum and Netezza (RIP) have a 15 year track record, and also demonstrate the benefits of building an MPP platform based on Postgres.
Although originally forked from Postgres in 2005, Greenplum is targeted to merge back into the current version of Postgres during 2020.
Greenplum is the only parallel Postgres system with a near-term roadmap to deliver full Postgres alignment and take advantage of the new features developed by the global open source Postgres community.
4. Greenplum Is Backed by Pivotal
The core Greenplum database software is Open Source. It is mainly developed & promoted by Pivotal. Along with technology industry giants such as VMWare, RSA and EMC, Pivotal is part of the Dell Technologies group of companies.
Similar to the Linux/RedHat model, paid-for support for Greenplum is provided by Pivotal.
The Pivotal supported version of Greenplum also includes additional value-added components such as the Greenplum Command Centre (GPCC).
Greenplum has the backing of Pivotal, part of Dell. Enterprise grade support is available for the open source Greenplum database.
5. No Evaluation Licence Costs
A ‘proof of concept’ project, or POC, is often an important step to proving value with any new technology. Data Warehouse systems are not an exception to this approach.
The Pivotal licensing model allows Greenplum to be used for free for a fixed period of time during a POC via an evaluation license.
The features & capability of Greenplum can be evaluated on your chosen infrastructure with no initial Greenplum software license cost.

Free evaluation licences means the value of Greenplum can be demonstrated quickly with no wrangling over access to time-limited or feature-limited POC systems.
6. Simple CPU Core Based Pricing
Legacy Data Warehouse systems typically demand that a full hardware/software/storage stack is purchased. The legacy approach can involve significant up-front capital expenditure (‘CapEx’).
Modern cloud-only Data Warehouse systems are often priced on a consumption or ‘pay-as-you-go’ (PAYG) basis. Although the PAYG model avoids up-front CapEx, the consumption-based model can lead to unpredictable and unconstrained operational expenditure (OpEx).
Greenplum is licensed by Pivotal on a subscription basis with simple per CPU core pricing. Pivotal’s compute-based subscription model requires no up-front CapEx, and avoids the risk of unpredictable and unconstrained OpEx, which is often the case with modern cloud-only offerings.

With the same simple per CPU core licence, either deploy Greenplum on premise or via the public cloud – either way there are no further costs.
7. Greenplum Runs on SQL
The combination of a Relational Database Management System (RDBMS) and Structured Query Language (SQL) have been the bedrock of the data management world since the 1970s.
From embedded systems running SQLite, up to peta-scale Data Warehouse clusters, it is an inescapable fact that the data management world still runs on SQL. This is unlikely to change any time soon (while I remember, can someone tell the Hadoop crowd?).
As a clustered Postgres system, Greenplum runs on SQL. To the outside world it looks just like Postgres.

There are no new programming languages to learn; no extensive ETL developer, BI developer or end-user training programmes to undertake; no new BI tools to purchase.
Greenplum runs on SQL, which is all you need. You already all know SQL, right?
8. Data Science Is Covered
SQL is fine for traditional Data Warehouse activities such as ELT style ETL, KPI reporting, BI tools and end user queries, but what about those pesky new-fangled ‘Data Scientists’?
The good news is that Data Science is also covered with Greenplum via R, Python and Madlib.
Greenplum supports procedural Python (PL/Python) and procedural R (PL/R).

Both Python and R can be used to create User Defined Functions (UDFs) to deliver scalable, in-database Data Science applications.
Apache Madlib is an open-source mathematical, statistical and machine-learning library that can be used with Greenplum against structured and unstructured data to deliver scalable in-database analytics.
SQL-based algorithms can be developed with Madlib with no need to transfer data between Greenplum and other analytic tools.

The combination of Python, R and SQL-powered Madlib can be used to develop and deploy in-database Data Science applications with Greenplum at no extra cost.
9. Greenplum Is Rated Highly By Gartner
The technology industry has put a lot of faith in analyst ratings for many decades. Gartner is almost certainly the most influential analyst firm in the technology field.
A Gartner vendor analysis published in March ‘19 for the ‘Traditional Data Warehouse’ use case ranked each of the main Data Warehouse product/service vendors.
Unsurprisingly, Teradata was ranked in 1st place with a score of 3.73 (out of 5). Pivotal Greenplum was ranked a very close 3rd with a score of 3.49, behind Oracle Exadata in 2nd place with a score of 3.54.
According to Gartner, Greenplum ranked higher than SAP HANA (3.35), Google BigQuery (3.27), IBM DB2 (3.22), Snowflake (3.22), Amazon Redshift (3.16) and Microsoft Azure SQL Data Warehouse (3.15).
Perhaps unsurprisingly, given the Traditional Data Warehouse use case, the Hadoop vendors (MapR, Hortonworks and Cloudera) all scored below 3.0.

Greenplum is very highly rated by Gartner for the ‘Traditional Data Warehouse’ use case. Unless you want an Oracle Exadata system, only Teradata was ranked higher by Gartner.
10. Prime-Time Production Systems
Perhaps one of the great truisms in the technology industry is the fact that “everything works in PowerPoint”. Isn’t that just the case!
The POC is often how claims are verified before a commitment to purchase is made. POCs can no doubt add confidence, but there’s still a big leap of faith required to be sure that a Data Warehouse technology can cope with ‘prime time’.
Here at team VLDB we’re very much of the “show me, don’t tell me” persuasion. We’re fussy like that!
Greenplum’s prime-time production users include Morgan Stanley and Conversant Media.
The Morgan Stanley production Greenplum environment consists of hundreds of servers and supports 20PB of raw data (10PB compressed). On a similar scale, Conversant’s Greenplum system has individual tables that contain quintillions of rows (that’s a lot).
Why does this matter? Well, end users such as Morgan Stanley and Conversant provide all the evidence you could need that Greenplum is ready to handle your demanding production workloads.
Like the man said:
“Whatever use case we can dream up and whatever ways we can think of to better understand the user, Greenplum allows us to do it.”
John Conley, Vice President of Data Warehousing, Conversant
Greenplum Summary
So there you have it – the top 10 reasons we think Greenplum is the ‘best of the rest’, after Teradata, when it comes to Data Warehouse platforms.
We’ve been Greenplum users since the very early days – over 15 years in fact. As our confidence in the product, and the support, has grown over the years it has steadily become the default platform here at VLDB. Hopefully this article will highlight some of the reasons why.
Enjoy!

