Installing Trino (Presto) in ubuntu 20.04 LTS.
Trino was formerly known as Presto.
Trino is a distributed SQL query engine, Basically this widely adapted for running Analytical queries,
which can be connected by various Business intelligence tools or through JDBC drivers.
So if you're new to Data Science, you would be wondering, what is the need for a SQL query engine?
So traditionally in the real world, we store data in various formats like file, SQL DB, NO-SQL DB,
Not just Database Bases, we might use various formats like plain text, CSV, or JSON. And the problem is finding meaningful information and making good use of data from different sources, one such place where we consume data to information is BI tools. And globally adapted Query language is SQL and it is used in every BI tool.
So Trino helps to aggregate data from various sources and formats using connectors and helps us to Query using SQL, it is like an adapter to your learning curve :-P
So before doing anything else we shall learn to setup the Trino server on Ubuntu 20.04 LTS Development machine.
Prerequisites
Installing Java and JDK
sudo apt-get update
sudo apt install default-jre
sudo apt install default-jre
Installing Java and JDK
sudo apt-get install python
sudo apt-get install python-is-python3 #this is to make python 3 as default
Download Trino tar file from here : Trino's link, After downloading the trino extract to desired location
Basic Configuration to begin with:
- Create etc folder inside the extracted path
- create file ''node.properties" and paste the below contents
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/tmp/ #<specfy somewhere outside installation folder>
Make Sure to replace `node.data-dir` to required path.
3. Create `logo.properties` and add this line `io.trino=INFO`
4. Create `config.properties` and add this content
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
5. Create `jvm.config` and paste the following content
-server
-Xmx16G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
So test all the configuration use the command in side the bin directory of trino `./launcher run` .
And `./launcher start` to start the server in daemon mode