Trino

Send data from Trino to your entire stack.

Trino is a distributed SQL query engine for efficient, low-latency big data analytics.

RudderStack supports Trino as a source from which you can ingest data and route it to your desired downstream destinations.

Prerequisites: Trino server setup

Before you set up Trino as a source in RudderStack, make sure your Trino server is configured correctly by noting these points:

hive.allow-drop-table=true
hive.metastore.thrift.delete-files-on-drop=true

Granting permissions

RudderStack requires you to grant certain user permissions on your Trino instance to successfully access data from it.

Follow these sections in the exact order to grant the relevant permissions:

Step 1: Assigning read access to tables

This step gives RudderStack the necessary permissions to read the relevant table records in Trino.

info
As mentioned above, RudderStack uses the file-based access control mechanism for this integration.

To sync a table sample_table in user_schema for a user test, copy the below JSON in to your access control config JSON file:

{
  "tables": [{
    "user": "test",  // Replace with your RudderStack user name
    "catalog": "catalog_name ", // Replace with the catalog you wish to sync
    "schema": "user_schema ", 
    "table": "sample_table ",
    "privileges": ["SELECT"]
  }]
}

Step 2: Creating RudderStack schema and granting permissions

CREATE SCHEMA "_rudderstack"

To add this schema to a particular location, run the following query:

CREATE SCHEMA "_rudderstack" WITH (location = "s3://<your_location>/")
warning
Make sure to create the _rudderstack schema before syncing your data.

Step 3: Granting ownership to _rudderstack schema

The following grants RudderStack the necessary permissions to perform relevant actions on the tables in the _rudderstack schema:

{
  "catalogs": [{
    "user": "test",
    "catalog": "catalog_name ",
    "allow": "all"
  }],
  "schemas": [{
    "user": "test",
    "catalog": "catalog_name ",
    "schema": "_rudderstack ",
    "owner": true
  }],
  "tables": [{
    "user": "test",
    "catalog": "catalog_name ",
    "schema": "_rudderstack ",
    "privileges": ["SELECT",
      "INSERT",
      "DELETE",
      "UPDATE",
      "OWNERSHIP"
    ]
  }]
}

Setting up Trino source in RudderStack

To set up Trino as a source in RudderStack, follow these steps:

  1. Log in to your RudderStack dashboard.
  2. From the left navigation bar, go to Source > New Source > Reverse ETL. Then, select Trino.
  3. Assign a name to your source and click Continue.

Configuring connection credentials

  1. From Source type, choose the relevant option from Table, Model, or Audience to use the source to sync data from either a warehouse table, model, or an audience.
info
If you have chosen the Model or Audience option, skip the next steps and refer to the Schedule settings section directly.
  1. Enter the relevant settings in the Connection Credentials section as listed below:
  • Host: Enter the host name or IP address of your Trino coordinator server.
warning

Make sure to enter only the host name and not the complete URL. Otherwise, you will encounter an error.

For example, if the URL is https://trino-server.example.com, host name should be trino-server.example.com.

  • Catalog Name: Specify the catalog to use when RudderStack executes queries in Trino.
  • User: Enter the user with relevant access to the above settings.
  • Password: Enter the password for the above user.
  • Port: Enter the port number of your Trino coordinator server. This is an optional setting.
info
If you’ve configured Trino as a source before, you can select the existing credentials under the Use existing credentials option.
  1. Click Continue. RudderStack will then verify and validate your credentials.
info
For more information on these validation steps, refer to the FAQ section.
  1. Once verified, click Continue to proceed.

Schedule settings

  1. Specify the Schedule Settings to schedule the data syncs from your Trino instance.
info
RudderStack lets you schedule data syncs for your Reverse ETL sources and specify how and when the syncs will run. For more information on the Basic, CRON, and Manual schedule types, see Sync Schedule Settings.
  1. After specifying the schedule type and run settings, click Continue to finish the setup.

Trino is now successfully configured as a source in your RudderStack dashboard. You can connect this source to your preferred destination by clicking the Add Destination button:

Add destination in RudderStack
info
If you have already configured a destination in RudderStack, choose the Use Existing Destinations option which will take you to the Schema tab in the source settings. To add a new destination from scratch, select the Create New Destination option which will take you to the destination configuration page.

Specifying the data to import

While connecting a destination to your Reverse ETL source, you can use the default JSON mapping or the Visual Data Mapping feature.

info

Based on the option(Table/Model) you chose while setting up the Reverse ETL source, follow the relevant guide for detailed steps:

FAQ

What do the three validations under Verifying Credentials imply?

When setting up a Reverse ETL source, once you proceed after entering the connection credentials, you will see the following three validations under the Verifying Credentials option:

Validations

These options are explained below:

  • Verifying Connection: This option indicates that RudderStack is trying to connect to the warehouse with the information specified in the connection credentials.
warning
If this option gives an error, it means that one or more fields specified in the connection credentials are incorrect. Verify your credentials in this case.
  • Able to List Schema: This option checks if RudderStack is able to fetch all schema details using the provided credentials.
  • Able to Access RudderStack Schema: This option implies that RudderStack is able to access the _rudderstack schema you have created by successfully running all commands in the User Permissions section.
warning
If this option gives an error, verify if you have successfully created the _rudderstack schema and given RudderStack the required permissions to access it. For more information, refer to the User Permissions sections.

Which Trino connectors are supported for the Trino source integration?

The Trino source supports only the Apache Hive connector currently.

To use this connector, make sure to add the following configuration in your object store:

hive.allow-drop-table=true
hive.metastore.thrift.delete-files-on-drop=true

Which data types are supported for this integration?

The Trino source supports all data types listed in the Trino documentation except the Row data type.



Questions? Contact us by email or on Slack