Databricks is a data analytics platform that lets you easily integrate with open source libraries. It offers a simple collaborative environment to run interactive and scheduled data analysis workloads.
RudderStack supports Databricks as a source from which you can ingest data and route it to your desired downstream destinations.
RudderStack requires you to grant certain user permissions on Databricks to successfully access data from it.
Follow the steps listed in the following sections in the exact order to grant these permissions:
- Add a new user (for example, email@example.com) by following the steps in the Databricks documentation.
- Create a dedicated schema
CREATE SCHEMA `_rudderstack`;
_rudderstackschema is used by RudderStack for storing the state of each data sync. This name should not be changed.
- Grant full access to the schema
_rudderstackfor the user created in step 1.
GRANT ALL PRIVILEGES ON SCHEMA `_rudderstack` TO `firstname.lastname@example.org`
email@example.com with the user created in step 1.
To set up Databricks as a source in RudderStack, follow these steps:
- Log into your RudderStack dashboard.
- From the left panel, go to Source > New Source > Reverse ETL. Then, select Databricks, as shown:
- Assign a name to your source.
- Enter the relevant settings from Databricks in the Connection Credentials section as shown below:
- Host - Enter the server hostname.
- Port - Enter the port number.
- Path - Enter the HTTP path.
- Token - Enter the personal access token.
- Click Continue to proceed.
- Specify the Schedule Settings to schedule the data syncs from your Databricks source.
- After specifying the schedule type and run settings, click Continue to finish the setup.
Databricks is now successfully configured as a source in your RudderStack dashboard. You can further connect this source to your preferred destination by clicking on Add Destination button, as shown:
While connecting a destination to your Databricks source, you can use the default JSON mapping feature.
To obtain the Host, Path, and Port number, go to your Databricks account and follow these steps:
- Go to the Compute tab and select your Databricks cluster.
- Click Advanced options > JDBC/ODBC tab to find the required settings:
To obtain the Token, go to the Settings > User Settings in your Databricks account and generate a new personal access token, as shown:
When setting up a Reverse ETL source, once you proceed after entering the connection credentials, you will see the following three validations under the Verifying Credentials option:
These options are explained below:
- Verifying Connection: This option indicates that RudderStack is trying to connect to the warehouse with the information specified in the connection credentials.
- Able to List Schema: This option checks if RudderStack is able to fetch all the schema details using the provided credentials.
- Able to Access RudderStack Schema: This option implies that RudderStack is able to access the
_rudderstackschema you have created by successfully running all the commands in the Creating the RudderStack schema and granting permissions section.
_rudderstackschema and given RudderStack the required permissions to access it. For more information, refer to the Creating the RudderStack schema and granting permissions section.