Our job at VLDB is to ‘get stuff done’. As firm believers in SQL-powered scalable data platforms, we’ve been using Greenplum for nearly 15 years. It has been our default data platform for client engagements for most of that time.
We’re very well positioned to explain why we put our faith in Greenplum, so here goes…
TL;DR
- runs anywhere
- proven MPP architecture
- based on open source PostgreSQL
- well established & supported
- simple all-inclusive pricing model
- runs on SQL
- supports lake, lakehouse & warehouse architectures
- data science via MADlib, Python & R
- favourable analyst ratings
- prime-time production systems
- offline development via VMs & containers

Greenplum Is ‘Built For Anywhere’
Legacy Data Warehouse systems are optimised for on-prem physical hardware. These systems pre-date the adoption of public clouds such as AWS, Azure and Google.
So-called ‘modern data stack’ systems such as Redshift, BigQuery and Snowflake are ‘cloud only’. Most are ‘single cloud only’: Redshift only runs on AWS, BigQuery only runs on Google and Synapse only runs on Azure.
As a software only solution, Greenplum is the data platform ‘built for anywhere’. Greenplum can run on physical hardware, private VMWare clouds and public cloud platforms such as AWS, Azure or Google. So long as your chosen platform supports Linux, with Greenplum you’re good to go.
Whether you are committed to on-prem, public cloud or private cloud, Greenplum can run in any suitable environment and can easily move from one environment to another.
Greenplum’s Architecture Is MPP
Unlike general purpose databases, Greenplum is deployed as a clustered or ‘Massively Parallel Processing’ (MPP) architecture.
The MPP architecture is linearly scalable through ‘scale out’. More query processing speed and storage is delivered as required by adding more nodes to the cluster. The scalability is effectively unlimited. As demand on the platform grows, so can the platform to meet the demand.
MPP’s track record running analytics at the world’s biggest companies dates back to the 1980s. No other analytic architecture can make this bold claim.
Greenplum’s linearly scalable MPP architecture allows you to start with a single node and scale the cluster to meet your changing demands. You no longer need to agonise over ‘how much to buy’.

Greenplum Is Open Source Parallel Postgres
Greenplum is an open source ‘Massively Parallel Postgres’ data platform that supports analytics, machine learning and AI.
The PostgreSQL database on which Greenplum is built has a 30 year track record and is regarded as ‘The World’s Most Advanced Open Source Relational Database’ (see postgresql.org).
Greenplum presents itself to the outside world as ‘just Postgres’. This means Postgres tools like PGAdmin, psql and your favourite SQL desktop clients will work with Greenplum.
Starting with PostgreSQL to build a scale-out MPP data platform is a well-trodden path. Uniquely, Greenplum is open source and leverages the ongoing development efforts of the Postgres community.

Greenplum Is Well Established & Backed
The core Greenplum database software was open sourced under the Apache licence in 2015.
Similar to the Linux/RedHat model, paid-for support for Greenplum is provided by VMWare, who are also the main sponsors of Greenplum’s ongoing development.
For those that don’t require the comfort of a support agreement, Greenplum can be downloaded and used for free subject to the terms of the Apache license.
Greenplum has a lengthy track record and is backed by one of the biggest names in the tech world. Why take a chance on a start-up promoting a beta product with a poor support infrastructure?
Greenplum Has A Simple All-Inclusive Pricing Model
The Greenplum model allows use for free via an evaluation license. Greenplum can be evaluated on your chosen infrastructure with no software license cost. Build the biggest system you like, load as much data as you like, run as many queries as you like, all at no software cost during the evaluation period.
Post-evaluation, Greenplum is licensed on a subscription basis with simple ‘per CPU core’ pricing. This compute-based subscription model avoids the risk of unpredictable and unconstrained data platform costs.
There are no bronze/silver/gold versions of Greenplum with different prices for each. All of Greenplum’s features are always included.
In addition to the ‘always free & unsupported’ version of Greenplum, the pricing model for the VMWare -supported version allows free evaluations, all-inclusive features and simple per CPU core licenses.
Greenplum Runs on SQL
The combination of a Relational Database Management System (RDBMS) and Structured Query Language (SQL) have been the bedrock of the data management world since the 1970s.
As a clustered version of Postgres, Greenplum runs on SQL.
The only significant difference between Greenplum and Postgres is the way Greenplum shards the data over all of the nodes in the cluster. Logically, Greenplum is just a ‘go-faster’ version of Postgres.
There are no new programming languages to learn. No extensive developer, data scientist or end-user training programmes to undertake, and no new BI tools to purchase.
Greenplum runs on SQL, which is the only language required.

Greenplum Supports All Data Architectures
Once upon a time there was the Data Warehouse, which was almost always supported by a relational database management system (RDBMS). The Data Warehouse lived happily all on it’s own for many a decade.
Then along came the Data Lake crowd, who were subsequently joined by the Data Lakehouse crowd.
Data can be stored natively within Greenplum as tables on traditional block storage. In addition, data can be stored & accessed externally on object storage or HDFS. Support for non-block storage is provided by the Platform Extension Framework (PXF).
Greenplum supports all data architectures including data lake, data lakehouse and data warehouse.
Data Science Is Covered
SQL is fine for ELT style ETL, KPI reporting, BI tools and end user queries, but what about those pesky ‘Data Scientists’? The good news is that Data Science is also covered with Greenplum.
Greenplum supports procedural Python (PL/Python) and procedural R (PL/R). Either can be used to create User Defined Functions (UDFs) to deliver scalable, in-database Data Science applications.
In addition, Greenplum supports Apache MADlib which is an open-source mathematical, statistical and machine-learning library. MADlib can be used with Greenplum against structured and unstructured data to deliver scalable in-database analytics.
Scalable SQL-based Data Science algorithms can be developed with MADlib with no need to transfer data between Greenplum and other analytic tools.
High Analyst Ratings
The technology industry has put a lot of faith in analyst ratings for many decades. Gartner is almost certainly the most influential analyst firm in the technology field.
A Gartner vendor analysis published in March ‘19 for the ‘Traditional Data Warehouse’ use case ranked each of the main Data Warehouse vendors. Unsurprisingly, Teradata was ranked in 1st place. Pivotal (now VMWare Tanzu) Greenplum was ranked a very close 3rd behind Oracle Exadata in 2nd place.
According to Gartner, Greenplum ranked higher than SAP HANA, Google BigQuery, IBM DB2, Snowflake, Amazon Redshift, Microsoft Azure SQL Data Warehouse.
Don’t take our word for it, Greenplum can hold its own against more well known competition.
Prime-Time Production Systems
Perhaps one of the great truisms in the technology industry is the fact that “everything works in PowerPoint”. Isn’t that the case!
Greenplum’s prime-time production users include Morgan Stanley and Conversanerst Media.
The Morgan Stanley production Greenplum environment consists of hundreds of servers and supports 20PB of raw data (10PB compressed). On a similar scale, Conversant’s Greenplum system has individual tables that contain quintillions of rows (that’s a lot).
“Whatever use case we can dream up and whatever ways we can think of to better understand the user, Greenplum allows us to do it.”
John Conley, Vice President of Data Warehousing, Conversant

Greenplum Summary
So there you have it – the reasons we choose Greenplum.
We’ve been Greenplum users since the very early days – over 15 years in fact. As our confidence in the product, and the support, has grown over the years it has steadily become the default platform here at VLDB.
Hopefully this article will highlight some of the reasons why.
Author: Paul Johnson

