Apache Drill:Connecting and Querying HBase.
So This is the next tutorial in apace drill, so just for introduction sake Apache Drill is Apace drill is open source Implementation of Google's Dremel system used by Google BigQuery, Initially released on second quarter of 2015 and also Drill is one the top projects of Apace. Which allows to query nosql db without creating any schema to know more follow this url [https://blogs.ashrithgn.com/2019/09/20/apache-drill-why-and-how-basic-usage-features-drill-bit/]
Also you can follow the blog to connect apache drill with spring boot [https://blogs.ashrithgn.com/2019/09/20/apache-drill-running-sql-query-from-spring-boot/].
So lets get started by loading the data to hbase.
Setting Up Environment.
1. Create HBase table
create 'users','account','address'
create 'clicks','clickinfo','iteminfo'
2. Load data to HBase
put 'users','users1','account:name','Alice'
put 'users','users1','address:street','123 Ballmer Av'
put 'users','users1','address:zipcode','12345'
put 'users','users1','address:state','CA'
put 'users','users2','account:name','Bob'
put 'users','users2','address:street','1 Infinite Loop'
put 'users','users2','address:zipcode','12345'
put 'users','users2','address:state','CA'
put 'users','users3','account:name','Frank'
put 'users','users3','address:street','435 Walker Ct'
put 'users','users3','address:zipcode','12345'
put 'users','users3','address:state','CA'
put 'users','users4','account:name','Mary'
put 'users','users4','address:street','56 Southern Pkwy'
put 'users','users4','address:zipcode','12345'
put 'users','users4','address:state','CA'
put 'clicks','click1','clickinfo:usersid','users1'
put 'clicks','click1','clickinfo:url','http://www.google.com'
put 'clicks','click1','clickinfo:time','2014-01-01 12:01:01.0001'
put 'clicks','click1','iteminfo:itemtype','image'
put 'clicks','click1','iteminfo:quantity','1'
put 'clicks','click2','clickinfo:usersid','users1'
put 'clicks','click2','clickinfo:url','http://www.amazon.com'
put 'clicks','click2','clickinfo:time','2014-01-01 01:01:01.0001'
put 'clicks','click2','iteminfo:itemtype','image'
put 'clicks','click2','iteminfo:quantity','1'
put 'clicks','click3','clickinfo:usersid','users2'
put 'clicks','click3','clickinfo:url','http://www.google.com'
put 'clicks','click3','clickinfo:time','2014-01-01 01:02:01.0001'
put 'clicks','click3','iteminfo:itemtype','text'
put 'clicks','click3','iteminfo:quantity','2'
put 'clicks','click4','clickinfo:usersid','users2'
put 'clicks','click4','clickinfo:url','http://www.ask.com'
put 'clicks','click4','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click4','iteminfo:itemtype','text'
put 'clicks','click4','iteminfo:quantity','5'
put 'clicks','click5','clickinfo:usersid','users2'
put 'clicks','click5','clickinfo:url','http://www.reuters.com'
put 'clicks','click5','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click5','iteminfo:itemtype','text'
put 'clicks','click5','iteminfo:quantity','100'
put 'clicks','click6','clickinfo:usersid','users3'
put 'clicks','click6','clickinfo:url','http://www.google.com'
put 'clicks','click6','clickinfo:time','2013-02-01 12:01:01.0001'
put 'clicks','click6','iteminfo:itemtype','image'
put 'clicks','click6','iteminfo:quantity','1'
put 'clicks','click7','clickinfo:usersid','users3'
put 'clicks','click7','clickinfo:url','http://www.ask.com'
put 'clicks','click7','clickinfo:time','2013-02-01 12:45:01.0001'
put 'clicks','click7','iteminfo:itemtype','image'
put 'clicks','click7','iteminfo:quantity','10'
put 'clicks','click8','clickinfo:usersid','users4'
put 'clicks','click8','clickinfo:url','http://www.amazon.com'
put 'clicks','click8','clickinfo:time','2013-02-01 22:01:01.0001'
put 'clicks','click8','iteminfo:itemtype','image'
put 'clicks','click8','iteminfo:quantity','1'
put 'clicks','click9','clickinfo:usersid','users4'
put 'clicks','click9','clickinfo:url','http://www.amazon.com'
put 'clicks','click9','clickinfo:time','2013-02-01 22:01:01.0001'
put 'clicks','click9','iteminfo:itemtype','image'
put 'clicks','click9','iteminfo:quantity','10'
3. Start drill bit in embedded mode
/<drill-path>/bin/drill-embedded
4. Navigate to drill web UI and activate the Hbase storage plugin. default path will be http://localhost:8047/storage
5. After enabling press update and configure your plugin.
6. example config.
{
"type": "hbase",
"config": {
"hbase.zookeeper.quorum": "localhost",
"hbase.zookeeper.property.clientPort": "2181"
},
"size.calculator.enabled": false,
"enabled": false
}
8. proceed to drill bit shell.
Drill Basic Queries
- Basic select Statement
SELECT * FROM hbase.users;
- Flattening Data
SELECT CONVERT_FROM(row_key, 'UTF8') AS userid,
CONVERT_FROM(users.account.name, 'UTF8') AS name,
CONVERT_FROM(users.address.state, 'UTF8') AS state,
CONVERT_FROM(users.address.street, 'UTF8') AS street,
CONVERT_FROM(users.address.zipcode, 'UTF8') AS zipcode
FROM students;
- Joining Multiple table
SELECT * FROM
(
SELECT CONVERT_FROM(row_key, 'UTF8') AS userid,
CONVERT_FROM(users.account.name, 'UTF8') AS name,
CONVERT_FROM(users.address.state, 'UTF8') AS state,
CONVERT_FROM(users.address.street, 'UTF8') AS street,
CONVERT_FROM(users.address.zipcode, 'UTF8') AS zipcode
FROM students;
) t LEFT JOIN
(
SELECT CONVERT_FROM(row_key, 'UTF8') AS clickid,
CONVERT_FROM(clicks.clickinfo.usersid, 'UTF8') AS userid,
CONVERT_FROM(clicks.clickinfo.`time`, 'UTF8') AS `time`,
CONVERT_FROM(clicks.clickinfo.url, 'UTF8') AS url
FROM clicks
) c ON t.userid = c.userid;
Few points to remember
- CONVERT_FROM converts json to utf string.
- Drill down is always by table.columnfamily.columnQualifier.
- CAST(variable as type) to convert desired data type as data will be in string format.
- Use case statement to filter empty value.
- Use COALESCE to filter null value.
- When using where function use CONVERT_FROM(clicks.clickinfo.usersid, 'UTF8')
where condition on alias wont work.