Apache Drill:Connecting and Querying HBase.

Apache Drill:Connecting and Querying HBase.

So This is the next tutorial in apace drill, so just for introduction sake Apache Drill is Apace drill is open source Implementation of Google's Dremel system used by Google BigQuery, Initially released on  second quarter of 2015 and also Drill  is one the top projects of Apace. Which allows to query nosql db without creating any schema to know more follow this url [https://blogs.ashrithgn.com/2019/09/20/apache-drill-why-and-how-basic-usage-features-drill-bit/]

Also you can follow the blog to connect apache drill with spring boot [https://blogs.ashrithgn.com/2019/09/20/apache-drill-running-sql-query-from-spring-boot/].

So lets get started by loading the data to hbase.

Setting Up Environment.

1. Create HBase table

create 'users','account','address'
create 'clicks','clickinfo','iteminfo'

2. Load data to HBase

put 'users','users1','account:name','Alice'
put 'users','users1','address:street','123 Ballmer Av'
put 'users','users1','address:zipcode','12345'
put 'users','users1','address:state','CA'
put 'users','users2','account:name','Bob'
put 'users','users2','address:street','1 Infinite Loop'
put 'users','users2','address:zipcode','12345'
put 'users','users2','address:state','CA'
put 'users','users3','account:name','Frank'
put 'users','users3','address:street','435 Walker Ct'
put 'users','users3','address:zipcode','12345'
put 'users','users3','address:state','CA'
put 'users','users4','account:name','Mary'
put 'users','users4','address:street','56 Southern Pkwy'
put 'users','users4','address:zipcode','12345'
put 'users','users4','address:state','CA'
put 'clicks','click1','clickinfo:usersid','users1'
put 'clicks','click1','clickinfo:url','http://www.google.com'
put 'clicks','click1','clickinfo:time','2014-01-01 12:01:01.0001'
put 'clicks','click1','iteminfo:itemtype','image'
put 'clicks','click1','iteminfo:quantity','1'
put 'clicks','click2','clickinfo:usersid','users1'
put 'clicks','click2','clickinfo:url','http://www.amazon.com'
put 'clicks','click2','clickinfo:time','2014-01-01 01:01:01.0001'
put 'clicks','click2','iteminfo:itemtype','image'
put 'clicks','click2','iteminfo:quantity','1'
put 'clicks','click3','clickinfo:usersid','users2'
put 'clicks','click3','clickinfo:url','http://www.google.com'
put 'clicks','click3','clickinfo:time','2014-01-01 01:02:01.0001'
put 'clicks','click3','iteminfo:itemtype','text'
put 'clicks','click3','iteminfo:quantity','2'
put 'clicks','click4','clickinfo:usersid','users2'
put 'clicks','click4','clickinfo:url','http://www.ask.com'
put 'clicks','click4','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click4','iteminfo:itemtype','text'
put 'clicks','click4','iteminfo:quantity','5'
put 'clicks','click5','clickinfo:usersid','users2'
put 'clicks','click5','clickinfo:url','http://www.reuters.com'
put 'clicks','click5','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click5','iteminfo:itemtype','text'
put 'clicks','click5','iteminfo:quantity','100'
put 'clicks','click6','clickinfo:usersid','users3'
put 'clicks','click6','clickinfo:url','http://www.google.com'
put 'clicks','click6','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click6','iteminfo:itemtype','image'
put 'clicks','click6','iteminfo:quantity','1'
put 'clicks','click7','clickinfo:usersid','users3'
put 'clicks','click7','clickinfo:url','http://www.ask.com'
put 'clicks','click7','clickinfo:time','2013-02-01 12:45:01.0001'
put 'clicks','click7','iteminfo:itemtype','image'
put 'clicks','click7','iteminfo:quantity','10'
put 'clicks','click8','clickinfo:usersid','users4'
put 'clicks','click8','clickinfo:url','http://www.amazon.com'
put 'clicks','click8','clickinfo:time','2013-02-01 22:01:01.0001'
put 'clicks','click8','iteminfo:itemtype','image'
put 'clicks','click8','iteminfo:quantity','1'
put 'clicks','click9','clickinfo:usersid','users4'
put 'clicks','click9','clickinfo:url','http://www.amazon.com'
put 'clicks','click9','clickinfo:time','2013-02-01 22:01:01.0001'
put 'clicks','click9','iteminfo:itemtype','image'
put 'clicks','click9','iteminfo:quantity','10'

3. Start drill bit in embedded mode

/<drill-path>/bin/drill-embedded

4. Navigate to drill web UI and activate the Hbase storage plugin. default path will be http://localhost:8047/storage
5. After enabling press update and configure your plugin.
6. example config.

{
  "type": "hbase",
  "config": {
    "hbase.zookeeper.quorum": "localhost",
    "hbase.zookeeper.property.clientPort": "2181"
  },
  "size.calculator.enabled": false,
  "enabled": false
}

8. proceed to drill bit shell.

Drill Basic Queries

  • Basic select Statement

SELECT * FROM hbase.users;

  • Flattening Data
 SELECT CONVERT_FROM(row_key, 'UTF8') AS userid, 
        CONVERT_FROM(users.account.name, 'UTF8') AS name, 
        CONVERT_FROM(users.address.state, 'UTF8') AS state, 
        CONVERT_FROM(users.address.street, 'UTF8') AS street, 
        CONVERT_FROM(users.address.zipcode, 'UTF8') AS zipcode 
 FROM students;
  • Joining Multiple table

SELECT * FROM 
(
SELECT CONVERT_FROM(row_key, 'UTF8') AS userid, 
        CONVERT_FROM(users.account.name, 'UTF8') AS name, 
        CONVERT_FROM(users.address.state, 'UTF8') AS state, 
        CONVERT_FROM(users.address.street, 'UTF8') AS street, 
        CONVERT_FROM(users.address.zipcode, 'UTF8') AS zipcode 
 FROM students;
) t LEFT JOIN 
(
    SELECT CONVERT_FROM(row_key, 'UTF8') AS clickid, 
       CONVERT_FROM(clicks.clickinfo.usersid, 'UTF8') AS userid, 
       CONVERT_FROM(clicks.clickinfo.`time`, 'UTF8') AS `time`,
       CONVERT_FROM(clicks.clickinfo.url, 'UTF8') AS url 
  FROM clicks
) c ON t.userid = c.userid;

Few points to remember

  1. CONVERT_FROM converts json to utf string.
  2. Drill down is always by table.columnfamily.columnQualifier.
  3. CAST(variable as type) to convert desired data type as data will be in string format.
  4. Use case statement to filter empty value.
  5. Use COALESCE to filter null value.
  6. When using where function use CONVERT_FROM(clicks.clickinfo.usersid, 'UTF8') where condition on alias wont work.