Apache Drill What,Why and How....!!!

Apache Drill What,Why and How....!!!

Apace drill is open source Implementation of Google's Dremel system used by Google BigQuery, Initially released on  second quarter of 2015 and also Drill  is one of the top projects of Apace.

What is Apace drill ?

Its a Schema free SQL query engine for Hadoop,Json File, NOSQL and some cloud storage like AWS S3. Using drill you can query Hbase,Mongdb or S3 by writing ANSI Standard SQL.

Why do we  need Apache Drill ?

So Apache drill connects to various data sources as mentioned above  and provides as JDBC connection for other application to connect and query. So just like connecting SQL databases, Using this feature we can connect from our spring application, or use existing BI tools to generate reports, along with that join data from multiple data source and get the result set with in no time.

And few key highlights of  Drill

  • Its Distributed and it can be scaled from single node to 1000s of node.
  • Supports complex data and schema-free data. It uses a shredded, in-memory, columnar data representation.
  • Drill is able to stream data in memory between operators. Drill minimizes the use of disks unless needed to complete the query
  • Its like Point-and-query unlike others who Ingest -> Schema -> Query
  • Drill Supports Complex Data Type like Array, Json
  • Anti Schema
  • its low latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data
  • Drill is also useful for short, interactive ad-hoc queries on large-scale data sets.
  • Provide Shell,Web UI,JDBC/ODBC and C++ API for broader Connectivity
  • Supports to Control CPU Utilization.
  • Supports Partition pruning to query sub set of data from file or Hive tables
  • Add Custom functions to Drill for extensibility.
  • Availablity of docker container
  • Self Managed
  • Decent documentation

Getting Started

  1. Download apache drill from this url
  2. Extract the content
  3. Navigate to /bin
  4. Execute ./drill-embedded.
  5. Navigate to http://localhost:8047/storage
  6. Activate Necessary plugin
  7. And Start querying from the shell or web UI.