Big Data

Installing Trino (Presto) in ubuntu 20.04 LTS.

Ashrith G N

Apr 5, 2021 • 2 min read

Trino was formerly known as Presto.
Trino is a distributed SQL query engine, Basically this widely adapted for running Analytical queries,
which can be connected by various Business intelligence tools or through JDBC drivers.
So if you're new to Data Science, you would be wondering, what is the need for a SQL query engine?

So traditionally in the real world, we store data in various formats like file, SQL DB, NO-SQL DB,
Not just Database Bases, we might use various formats like plain text, CSV, or JSON. And the problem is finding meaningful information and making good use of data from different sources, one such place where we consume data to information is BI tools. And globally adapted Query language is SQL and it is used in every BI tool.
So Trino helps to aggregate data from various sources and formats using connectors and helps us to Query using SQL, it is like an adapter to your learning curve :-P
So before doing anything else we shall learn to setup the Trino server on Ubuntu 20.04 LTS Development machine.

Prerequisites

Installing Java and JDK

sudo apt-get update
sudo apt install default-jre
sudo apt install default-jre

Installing Java and JDK

sudo apt-get install python
sudo apt-get install python-is-python3 #this is to make python 3 as default

Download Trino tar file from here : Trino's link, After downloading the trino extract to desired location

Basic Configuration to begin with:

Create etc folder inside the extracted path
create file ''node.properties" and paste the below contents

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/tmp/ #<specfy somewhere outside installation folder>

Make Sure to replace `node.data-dir` to required path.

3. Create `logo.properties` and add this line `io.trino=INFO`

4. Create `config.properties` and add this content

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

5. Create `jvm.config` and paste the following content

-server
-Xmx16G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000

So test all the configuration use the command in side the bin directory of trino `./launcher run` .

And `./launcher start` to start the server in daemon mode

Sign up for more like this.