Greenlight DBaaS: Frequently Asked Questions
What is Greenlight?
Greenlight is a ‘Database as a Service’ (DBaaS) that leverages Greenplum, the world’s first open source ‘Massively Parallel Processing’ (MPP) database. Greenlight is provided by VLDB Solutions (‘VLDB’), experts in MPP database technologies such as Greenplum.
Why do I need Greenlight?
As a fully managed service, Greenlight takes the risk & complexity out of deploying and managing a data warehouse platform. The team at VLDB have decades of MPP expertise, including over 10 years working with Greenplum.
This way, you can leave the boring computer stuff to the experts so that you can get on with the funky BI, visualisation and data science.
If Greenplum is open source, what’s to stop us setting up our own Greenplum cluster?
Good luck — let us know how you got on!
So, unlike open source Greenplum, Greenlight is supported?
Yes. Greenlight is a service (not software) provided by VLDB, a Pivotal partner. VLDB pay Pivotal for Greenplum software support and valued-added features that aren’t available in the open source version of Greenplum.
Greenlight users raise support tickets with VLDB. If required, VLDB raise support tickets with Pivotal.
So why not just buy supported Greenplum from Pivotal?
You can — once you’ve bought your database software licence, you’ll need to know how to design, configure, build and test a suitable cluster. Then there’s the documentation to understand and the DBA(s) to hire or train.
VLDB will take care of all of this for you in the Greenlight subscription.
What’s in Greenlight that’s not in open source Greenplum?
Well, support for starters. Greenlight also contains goodies such as the Greenplum Command Centre (GPCC), Greenplum Workload Manager and the S3 load connectors.
The support and extra stuff is worth having!
Where can I deploy Greenlight?
Greenlight can be deployed on virtually any compute infrastructure that supports Linux, including public cloud (AWS, Azure, Google), private cloud (VMWare, Xen), third party managed servers or even on premises (‘on-prem’) using your preferred server/storage hardware.
Due to the flexible deployment options, Greenlight can support a ‘hybrid’ architecture that mixes ‘on-prem’ (production) and cloud (dev/test/QA/HA).
We can get Greenlight up and running on your favourite cloud platform without asking the IT department for permission.
What does Greenlight manage for me?
Greenlight includes system or database administration options:
- Bronze — monitoring, backup, patching, upgrades
- Silver — monitoring, backup, patching, upgrades & Mon-Fri 9-5 DBA
- Gold — monitoring, backup, patching, upgrades & 24x7 DBA
That’s an actual DBA that you can call and ask for stuff to be done on your behalf. You know, that boring stuff like creating databases, schemas, tables, views and users, checking table distributions, partitioning tables, slow running queries, etc.
That’s right, you won’t have to read tons of boring manuals or hire another techie or two to look after Greenlight.
How does Greenlight compare to general purpose databases for data warehousing and analytics?
Greenlight takes advantage of the Greenplum ‘Massively Parallel Processing’ (MPP) architecture to deliver fast query performance at any dataset size, from gigabytes (GB) to petabytes (PB).
Unlike general purpose SMP databases, MPP databases are optimised from the ground up for analytics. Greenlight automatically executes all SQL queries in parallel across all CPUs in the database cluster. Extra Greenlight query performance is delivered by adding more CPUs to the cluster. General purpose databases simply can’t compete on performance with the parallel query processing that defines the MPP architecture.
MPP has been around since the 1980s. It’s what the big banks, retailers and telecoms companies have been using for high-end analytics for a long time. They can’t all be wrong, surely?
How long does it take to create a Greenlight data warehouse cluster?
As always, that depends. There are two main factors that determine the creation time for each Greenlight cluster: the deployment environment (cloud/on-prem) and the cluster size (number of database nodes). Small/medium Greenlight clusters (<10 nodes) can often be up and running via the public cloud in a matter of hours.
Don’t worry, we take care of all Greenlight setup/installation — you’ll be loading and querying your data in no time.
What does a Greenlight node consist of?
Like all analytic MPP databases, Greenlight nodes consist of multiple CPU cores, RAM and storage. The limiting factor for analytic databases is typically ‘input-output’ (IO) — the ability to read/write data to/from disk. With this in mind, Greenlight nodes default to fast SSD storage.
A typical Greenlight node consists of 8 x CPU cores, a minimum of 8GB RAM per core, and 2TB of SSD storage. This represents a ‘balanced node’ approach in which it is possible to read data from storage at sufficient speeds to keep the CPUs busy.
We’ve spent a lot of time making sure we know how to get this stuff up and running well, so you don’t have to.
Is Greenlight storage persistent?
In a nutshell, yes. All of your Greenlight data survives a node/cluster restart or power down.
How does Greenlight performance scale?
The Greenlight MPP ‘scale out’ architecture delivers linear query scalability. Doubling the number of nodes in the Greenlight cluster delivers double the query performance.
Adding more nodes to the Greenlight cluster will always deliver improved query performance.
What is the maximum number of nodes in a Greenlight cluster?
The maximum number of nodes in a Greenlight cluster depends on the chosen deployment environment. Don’t worry, it’s a big number!
For public cloud deployments, you are likely to run out of money before Greenlight runs out of nodes.
How do I query a Greenlight cluster?
Greenlight can be queried via ODBC/JDBC using your favourite SQL client, BI, analytics or visualisation tool.
Greenlight is ‘just a database’ like SQL Server, Oracle etc.
How is data loaded into Greenlight?
Greenlight supports data loading from a variety of sources such as Amazon S3, CSV files, JSON files, message queues, and data streams. ETL tools and good old fashioned scripting are both supported.
Is it possible to access Greenlight cluster nodes?
Yes. SSH is supported to the Greenlight cluster's master node. From the master node, it is possible to SSH to the segment servers via the cluster’s private network.
Is it possible to customise/harden the Greenlight Linux OS packages?
In short, yes, so long as the Greenplum database isn’t compromised by the OS changes.
How is Greenlight data backed up?
Each Greenlight cluster is automatically backed up with an agreed scope, frequency and retention period. Incremental backups ensure only new/changed data is processed during each backup cycle.
How is Greenlight monitored?
The Greenplum Command Centre (GPCC) is a web interface that provides full visibility of resource utilisation and query execution with the Greenlight cluster. GPCC is accessed via your favourite web browser.
SSH access to all nodes in the Greenlight cluster also allows keen techies to run stuff like top, iostat, netstat, mpstat, sar etc.
What’s the Greenlight pricing model?
Greenlight is priced on a simple, all inclusive, per node per month basis, irrespective of where you choose to deploy. We take care of all underlying software, hardware and storage costs.
For each paid-for Greenlight cluster we’ll even include a free development environment.
Free dev environment, you say?
Yes, free as in beer. For each paid-for Greenlight cluster, we’ll throw in a single node dev environment.
How do I get started with Greenlight?
Getting started with Greenlight is as simple as getting in touch with VLDB by phone, email or web contact form in either English or Welsh.
We’re based in Liverpool. We really like to chat. Go on, pick up the phone.