Querying MongoDB from Apache Drill and tips on few basic configuration.
So Apache Drill provides storage plugin to connect to MongoDb and Query Mongo's Collection. To begin with The Apache drill is a Schema free SQL query engine for Hadoop, Json File, NO SQL and some cloud storage like AWS S3. Using drill you can query HBase, MongoDB or S3 by writing ANSI Standard SQL for more detailed introduction follow this URL [https://blogs.ashrithgn.com/2019/09/20/apache-drill-why-and-how-basic-usage-features-drill-bit/].
Apache Drill and MongoDB.
Apache Drill Connects to Mongo using Java driver which is included Drill's library,and connection configuration is stored in Storage plugin as Json format, Using Drill we can read data from collection but updating data is not possible yet..
Setting Up Mongo
- Download sample listing.csv
- Load CSV to MongoDB By this command
mongoimport -d <DB> -c listings --type csv --file locations.csv --headerline
Setting Up Apache Drill And Configuring MongoDB
- Start the Drill in Embedded mode
/<home>/<drill-path>/bin/drill-embedded
- Navigate to Web console http://localhost:8047/storage
- Enable mongo plugin
- Update mongo plugin with suitable configuration
{
"type": "mongo",
"connection": "mongodb://localhost:27017/",
"enabled": true
}
- If your MongoDb is secured by user name and password user name password can be provide in url
mongodb://<uname>:<password>localhost:27017/"
Querying SQl on MongoDB
So now lets us procced to Drill's Shell and do the Basic Query
SELECT * FROM mongo.<db>.listings
Casting Data to required datatype and using where Condition
select l.`_id`.`$oid` as id, l.host_name,COALESCE(CAST(l.`reviews_per_month` as decimal),0.0) as reviews from mongo.drill.listing l
WHERE COALESCE(CAST(l.`reviews_per_month` as decimal),0.0) > 0.0 order by reviews desc;
- COALESCE returns no null value from expression.
- CAST(column as DATATYPE) cast the data to desired data type.
- l.
_id
.$oid
will get string Id from mongos's BSON Object.
Now lets proceed with Aggreation Query And Group By function
SELECT SUM(COALESCE(CAST(l.`reviews_per_month` as decimal),0.0)) as reviews_per_month, SUM(COALESCE(CAST(l.number_of_reviews AS INT),0)) as reviews,l.neighbourhood from mongo.drill.listing l GROUP BY l.neighbourhood;
Please read complete article, i know sample file with example query provide the error
Tuning the Options
- Go to option tab in drills's web ui http://localhost:8047/options
- Search for
mongo.
in search bar - We might get these options
store.mongo.all_text_mode
,store.mongo.bson.record.reader
,
So lets Resolve the error of select statement of first example by changing Configuration
Error: INTERNAL_ERROR ERROR: You tried to write a VarChar type when you are using a ValueWriter of type NullableFloat8WriterImpl.
Fragment 0:0
Please, refer to logs for more information.
[Error Id: 197c6703-28c3-4a77-aa42-4a26bc037a04 on ashrith-costrategix:31010] (state=,code=0)
Set value store.mongo.all_text_mode
true and press update. by changing this config the drill will read all values as text, as we had imported from csv number null value had been replace to empty string, so dynamic type detection was providing error.
So after the first config update next error could be:
Failed toString() invocation on an object of type [org.apache.drill.exec.store.mongo.MongoRecordReader]
Reported exception:
org.bson.BsonInvalidOperationException: getCurrentName can only be called when State is VALUE, not when State is TYPE.
at org.bson.AbstractBsonReader.throwInvalidState(AbstractBsonReader.java:669)
at org.bson.AbstractBsonReader.getCurrentName(AbstractBsonReader.java:52)
at org.apache.drill.exec.store.bson.BsonRecordReader.toString(BsonRecordReader.java:378)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.drill.exec.store.mongo.MongoRecordReader.toString(MongoRecordReader.java:222)
at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:299)
at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:271)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:233)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:173)
at ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:115)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:439)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.error(Logger.java:554)
at org.apache.drill.exec.physical.impl.ScanBatch.dump(ScanBatch.java:642)
at org.apache.drill.exec.physical.impl.BaseRootExec.dumpBatches(BaseRootExec.java:148)
at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:211)
at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329)
at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
for that error to resolve update config store.mongo.bson.record.reader
to false this will instruct drill to read id as BSON Object as is not to convert to string And There is an empty string in column reviews per month drill.exec.functions.cast_empty_string_to_null
as true to avoid.
Error: SYSTEM ERROR: NumberFormatException
Fragment 0:0
Please, refer to logs for more information.
[Error Id: c818c6c9-f7f7-47de-8b96-b04a69007293 on ashrith-costrategix:31010] (state=,code=0)
Finally your options would be
- drill.exec.functions.cast_empty_string_to_null : true.
- store.mongo.all_text_mode : true.
- store.mongo.bson.record.reader : false.