A couple of articles on Google and SQL versus noSQL caught my eye recently and inspired me to go on a bit of a ramble, so here goes..
Google's Head Fake
Firstly, there's the splendidly titled "Did Google Send the Big Data Industry on a 10 Year Head Fake?" Apparently a 'head fake' is to trick your opponent as to your intentions. Probably akin to 'dropping the shoulder' in football.
So, the question is, did the Big G send us all the wrong way for a decade? Short answer: no. Longer answer: if you did march the wrong way for several years, have a word with yourself.
The article quite rightly points out that neither distributed file systems nor distributed computing can be attributed to Google. As Curt Monash succinctly stated on DBMS2 in 2010, 'popularization != invention'.
However, the thinking goes that Google's famous research papers gave rise to Doug Cutting efforts at Yahoo, which led to Hadoop, whilst all the while the Big G was working on plain old SQL-based Spanner.
The Big G may have been working on other SQL-based stuff whilst the Hadoop crowd were beavering away, but does this amount to a head fake or a shoulder drop? No, not in the slightest.
Many considered the old-fangled SQL-based data warehouse to be under immediate threat due to the rise of the 'new' MapReduce-based SQL-free computing paradigm. Shiny & new always wins, right?Sadly, that can be the case. I've witnessed 'big data' POCs at first hand where the main motivation was to be seen to be 'doing big data'. I kid you not. Forget the impact on the users and existing toolsets, stop worrying about ROI, this stuff is free and cool, the vendor is saying all the right things, so it's definitely the way to go. Woo-hoo!!!
Let's never forget, there are armies of folks looking to monetise anything that moves in IT. The VCs, start-ups, analysts and conference organisers make their living out of getting mere mortals to believe they have the silver bullet that's been missing until now.
What worked for Google, Yahoo, Facebook and LinkedIn doesn't necessarily translate to the mainstream - this doesn't make it bad tech. Hadoop is a great example. Out in the real world, SQL is the answer for most analytic applications for most folks most of the time, plain and simple.
If you believed the hype and 'Hadoop' didn't turn out to be your silver bullet, don't blame the Big G for sharing their thoughts or others for building/promoting 'shiny & new'. No-one told you to put your finger in the fire, did they?
Why SQL Is Beating no SQL
The article entitled 'Why SQL is beating NoSQL, and what this means for the future of data' really got my attention. The notion that SQL was 'left for dead' and is 'making a comeback' is somewhat at odds with reality, me thinks.
There may have been a belief in certain quarters that the 'answer' was no longer SQL, but this has never been reflected out in the real world where regular folks run queries, and lots of them.
SQL is very deeply embedded in every single organisation I've encountered since the late 1980's. I don't see this changing any time soon. There is simply too much sunk cost and too little benefit in trying to ditch SQL.
Software developers may have 'cast aside SQL as a relic' but that certainly isn't what happened throughout corporate IT organisations or analyst communities.
Its clearly untrue that SQL 'couldn’t scale with these growing data volumes'. We've had scale-out Teradata MPP systems chomping on SQL since the 1980's, and newer MPP & SQL players like Netezza & Greenplum for over 15 years. A low industry profile doesn't mean something doesn't exist.
As the article states, the developers of SQL recognised that 'much of the success of the computer industry depends on developing a class of users other than trained computer specialists.' which is partly why it has become the de facto standard for interacting with databases.
Apparently '...SQL was actually fine until another engineer showed up and invented the World Wide Web, in 1989.' I'll help out here. His name is Sir Tim Berners-Lee OM KBE FRS FREng FRSA FBCS. He's quite well known.
There is no doubt that mainstream general purpose databases (SQL Server, Oracle, mySQL etc) struggled to cope with data volumes thrown out by all things digital in the new-fangled age of 'the web' and 'the net'.
Some of us were already scaling out (admittedly expensive) MPP systems running SQL databases as early as the 1980's. Big name retailers, banks and telcos have been running SQL on scale-out database systems for decades.
My first multi-billion row Teradata table is over 20 years old and still alive and well at a retail bank. A 100 billion row telco CDR table is over a decade old and runs on a parallel PostgreSQL system. How much scale do you want?
The rational behind attempts to ditch legacy SQL databases is neatly summarised: 'NoSQL was new and shiny; it promised scale and power; it seemed like the fast path to engineering success. But then the problems started appearing.
'The likes of DeWitt & Stonebraker know a thing or two about this stuff and were early nay-sayers. Feel free to disagree, obviously, but dismiss their observations at your peril.
Most of the post-Teradata attempts to develop scale-out SQL-compliant databases have leveraged PostgreSQL. This approach dates back to Netezza over 15 years ago, and includes the mega-popular AWS Redshift.
The light-bulb moment and conversion to SQL is something Teradata went through over 30 years ago: 'One day we realized that building our own query language made no sense. That the key was to embrace SQL. And that was one of the best design decisions we have made. Immediately a whole new world opened up.'Robb Klopp covers Chuck McDevitt's 'SQL is the answer' light-bulb moment whilst at Teradata in 1985 in Chuck's obituary. RIP Chuck, without doubt owner of the biggest brain I ever met.
The King Is Dead. Long Live The King.
SQL didn't die. It didn't recede. SQL is far from perfect, but it isn't likely to go away any time soon.
Kool-Aid drinkers took themselves on a detour into the land of file processing via developer-only languages all of their own accord, whilst all the time mainstream IT organisations and user communities carried on quite happily with boring old legacy SQL. The same SQL that runs the world.
The first time I ran an SQL query on Teradata in the late 1980's I realised I was immediately free from the pain and suffering of developing Cobol for analytics. To this day I can clearly remember the relief (sorry Grace).
It seems a certain section of the IT community is going through a similar realisation: if you want to play with data, SQL was, is, and should be the default starting place.