Trino was formerly known as Presto.
Trino is a distributed SQL query engine, Basically this widely adapted for running Analytical queries,
which can be connected by various Business intelligence tools or through JDBC drivers.
So if you're new to Data Science, you would be wondering, what is the need for a SQL query engine?
So traditionally in the real world, we store data in various formats like file, SQL DB, NO-SQL DB,
Not just Database Bases, we might use various formats like plain text, CSV, or JSON. And the problem is finding meaningful information and making good use of data from different sources, one such place where we consume data to information is BI tools. And globally adapted Query language is SQL and it is used in every BI tool.
So Trino helps to aggregate data from various sources and formats using connectors and helps us to Query using SQL, it is like an adapter to your learning curve :-P
So before doing anything else we shall learn to setup the Trino server on Ubuntu 20.04 LTS Development machine.
Installing Java and JDK
sudo apt-get update sudo apt install default-jre sudo apt install default-jre
Installing Java and JDK
sudo apt-get install python sudo apt-get install python-is-python3 #this is to make python 3 as default
Download Trino tar file from here : Trino's link, After downloading the trino extract to desired location
Basic Configuration to begin with:
- Create etc folder inside the extracted path
- create file ''node.properties" and paste the below contents
node.environment=production node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/tmp/ #<specfy somewhere outside installation folder>
Make Sure to replace `node.data-dir` to required path.
3. Create `logo.properties` and add this line `io.trino=INFO`
4. Create `config.properties` and add this content
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://localhost:8080
5. Create `jvm.config` and paste the following content
-server -Xmx16G -XX:-UseBiasedLocking -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize=2000000
So test all the configuration use the command in side the bin directory of trino `./launcher run` .
And `./launcher start` to start the server in daemon mode