Running SQL Queries With IBM BigSQL


There's no doubt that 'SQL on Hadoop' (or HDFS) is a hot topic at the moment. Almost all of the Hadoop POCs we encounter have a mission to offload SQL processing from the Enterprise Data Warehouse (EDW) to somewhere with a lower perceived TCO, which usually means Hadoop.

As most EDW developers, users and tools are SQL-centric, that means we're going to need some 'SQL on Hadoop' to run EDW workloads elsewhere, right?

In addition to the pesky Hadoop upstarts such as Cloudera, Hortonworks and MapR, the established big analytics players like IBM, Pivotal and Teradata also have Hadoop/Big Data offerings. No surprises there.

IBM's Big Data offering is called 'IBM InfoSphere BigInsights'. In addition to the proprietary GPFS file system, the SQL component is a proprietary IBM offering called 'Big SQL'.

OK, so let's roll up our sleeves and get started...

1. IBM InfoSphere BigInsights documentation is here.

2. Download the IBM InfoSphere BigInsights Quick Start demo software from here. The download is either native software or a virtual machine (VM). For this demo we went with the native software and a CentOS 6.5 x64 server.

3. Work through the pre-requisite tasks before the BigInsights software can be installed. This is checked by running the script as root:

# ./ -u root==============================================BigInsights Pre-Installation Check Script v1.0==============================================

[INFO] No cluster hostname list passed in; using localhost

[INFO] Verifying for single-node cluster.

[INFO] Running in QUICK_START mode.

[INFO] Dig not detected, using host.Verify root passwordless SSH                              

[  OK  ]Verify root passwordless SSH                              

[  OK  ]Verify root has NOPASSWD in /etc/sudoers   

[ OK ]Verify requiretty disabled for root in /etc/sudoers       

[  OK  ][INFO] Running subsequent tasks as rootVerify SSH using port 22                                  

[  OK  ]Verify root shell = bash                                  

[  OK  ]Verify consistent root uid/gid                            

[  OK  ]Verify home dir owner = root:root                         

[  OK  ]Verify same time zone on all nodes                        

[  OK  ]Verify clock drift <= 30 sec                           

[  OK  ]Verify pre-req checker tests enabled                      

[ OK ]Verify all cluster entries are valid IPs                  

[  OK  ]Verify all host names are RFC-1123 valid                  

[  OK  ]Verify localhost defined in etc/hosts                     

[  OK  ]Verify cluster hostname list is all FQDN or short name    

[  OK  ]Verify loopback defined in ifconfig                       

[  OK  ]Verify that Expect is installed                           

[  OK  ]Verify firewall stopped                                   

[  OK  ]Verify Secure Linux disabled                              

[  OK  ]Verify enough resources in /etc/security/limits.conf      

[  OK  ]Verify /etc/fstab uses UUID for data disk                 

[  OK  ]Verify IPv6 disabled                                      

[  OK  ]Verify boottime kernel.pid_max and port range             

[  OK  ]Verify runtime kernel.pid_max and port range              

[  OK  ]Verify /etc/issue contains OS info                        

[  OK  ]Verify install process not in background                  

[  OK  ]Verify port 8300 not in use                               

[  OK  ][INFO] For solutions to failed checks, see log:

[INFO] /home/download/bi-prechecker.20140211_184958.log

4. Once the pre-requisite tasks are complete we can start the Big Insights Installer web server.

First, run as admin:

$ ./start.shartifacts/ibm-java-sdk-6.0-12.0-linux-x86_64.tgzExtracting Java

....Verifying port 8300 availabilityport 8300 availableStarting BigInsights Installer

......Application server is up and running...BigInsights Installer started, please use a browser to access:http://<server>:8300/Install

After you are finished, run the following command to stop the installer web server: shutdown

Then, open http://<server>:8300/Install in a web browser and use the wizard to install the BigInsights software:       

Once the install is complete, run ' shutdown' as root (see above) to stop the BigInsights installation server:

# ./ shutdown


GERONIMO_HOME:   /home/download/biginsights-quickstart-linux64_b20130821_1818/installer-consoleUsing

GERONIMO_TMPDIR: var/tempUsing JRE_HOME:        /home/download/biginsights-quickstart-linux64_b20130821_1818/_jvm/ibm-java-x86_64-60/jrelog4j:

WARN No appenders could be found for logger (org.apache.geronimo.kernel.basic.BasicKernel).log4j:

WARN Please initialize the log4j system properly.log4j:

WARN See for more info.

Locating server on localhost:1319

... Server found.

Server shutdown started

Server shutdown completed

5. Start the IBM BigInsights console by running the script as admin:

$ /opt/ibm/biginsights/bin/

[INFO] DeployCmdline - [ IBM InfoSphere BigInsights QuickStart Edition ]

[INFO] Progress - Start zookeeper

[INFO] @localhost - zookeeper started, pid 2075

[INFO] Deployer - zookeeper service started

[INFO] Progress - 10%

[INFO] Progress - Start hadoop

[INFO] @localhost - namenode started, pid 2204

[INFO] @localhost - secondarynamenode started, pid 2416

[INFO] @localhost - datanode started, pid 2559

[INFO] Progress - 15%[INFO] Deployer - Waiting for Namenode to exit safe mode...

[INFO] Deployer - Wait namenode to exit safemode for another 5 seconds, please check namenode log for details

[INFO] Deployer - Wait namenode to exit safemode for another 5 seconds, please check namenode log for details

[INFO] Deployer - Wait namenode to exit safemode for another 10 seconds, please check namenode log for details

[INFO] Deployer - Wait namenode to exit safemode for another 10 seconds, please check namenode log for details

[INFO] Deployer - HDFS cluster started successfully

[INFO] @localhost - jobtracker started, pid 3101

[INFO] @localhost - tasktracker started, pid 3301

[INFO] Progress - 20%

[INFO] Deployer - MapReduce cluster started successfully

[INFO] Progress - Start derby

[INFO] @localhost - derby started, pid 3480

[INFO] Progress - 30%

[INFO] Progress - Start hive

[INFO] @localhost - derby already running, pid 3480

[INFO] Progress - 31%

[INFO] @localhost - hive-web-interface started, pid 3565

[INFO] @localhost - hive-server started, pid 3816

[INFO] Progress - 40%

[INFO] Progress - Start hbase

[INFO] Deployer - check zookeeper services, make sure zookeeper service is started before start hbase service

[INFO] @localhost - hbase-master(active) started

[INFO] @localhost - hbase-regionserver started

[INFO] Deployer - hbase service started

[INFO] Progress - 50%[INFO] Progress - Start bigsql

[INFO] @localhost - bigsql-server started, pid 4443

[INFO] Progress - 60%[INFO] Progress - Start oozie

[INFO] @localhost - oozie started, pid 4625

[INFO] Progress - 70%

[INFO] Progress - Start orchestrator

[INFO] @localhost - orchestrator started, pid 4892

[INFO] Progress - 80%[INFO] Progress - Start console

[INFO] Deployer - /opt/ibm/biginsights/console/wlp

[INFO] Deployer - starting pigserver, logging to /var/ibm/biginsights/sheets/logs/bigsheets-admin-pigserver.out

[INFO] Deployer - Server waslp-server started with process ID 5144.

[INFO] Deployer - [AUDIT   ] CWWKF0011I: The server waslp-server is ready to run a smarter planet.

[INFO] Deployer - BigInsights Management Console started, pid 5144

[INFO] Progress - 90%

[INFO] Progress - Start httpfs

[INFO] @localhost - httpfs-server started, pid 5276

[INFO] Deployer - httpfs service started

[INFO] Progress - 100%

INFO] DeployManager - Start; SUCCEEDED components: [zookeeper, hadoop, derby, hive, hbase, bigsql, oozie, orchestrator, console, httpfs]; Consumes : 137644ms

So far so good.

6. Once the system is started, we can login to the IBM Big Insights web console via "http://<server>:8080" :        

7. Click on 'Cluster Status' to see what services are running:        

8. Selecting "Run Big SQL Queries", unsurprisingly, takes us to the Big SQL interface:   OK, so we've got the Big SQL web interface working, but let's not get too carried away just yet.

To quote IBM:

"While Big SQL doesn't turn BigInsights into a relational DBMS, it does provide experienced SQL users with a familiar on-ramp to an increasingly popular environment for analyzing and storing big data."

That's all for now folks!