Databricks is a data analytics platform that lets you easily integrate with open source libraries. It offers a simple collaborative environment to run interactive and scheduled data analysis workloads.
RudderStack supports Databricks as a source from which you can ingest data and route it to your desired downstream destinations.
You can now ingest data into RudderStack by running queries on your Databricks cluster or SQL warehouse.
Granting permissions
RudderStack requires you to grant certain user permissions on Databricks to successfully access data from it.
Follow the steps listed in the following sections in the exact order to grant these permissions:
From the left panel, go to Collect > Sources > New source > Reverse ETL. Then, select Databricks.
Assign a name to your source and click Continue.
Configuring the connection credentials
Choose from the Table or Model option to sync data from either a warehouse table or a model.
Enter the connection details of your Databricks cluster or SQL warehouse in the Connection Credentials section:
For most use cases, RudderStack recommends using a SQL warehouse over a cluster as they generally cost less and are faster to spin up. In contrast, clusters are used for much larger operations that require more resources.
Host - Enter the server hostname.
Port - Enter the port number.
Path - Enter the HTTP path.
Token - Enter the personal access token.
Catalog - Enter the name of your Unity catalog. See Databricks documentation for more information on getting the catalog details.
Note the following:
See this FAQ for more information on getting the host, port, path, and token for your Databricks cluster.
See this FAQ for more information on getting the host, port, path, and token for your SQL warehouse.
If you’ve already configured Databricks as a source before, your existing credentials will automatically appear under Use Existing Credentials.
Click Continue to proceed.
Schedule settings
Specify the Schedule Settings to schedule the data syncs from your Databricks source.
RudderStack lets you schedule data syncs for your Reverse ETL sources and specify how and when the syncs will run. For more information on the Basic, CRON, and Manual schedule types, refer to the Sync Schedule Settings guide.
After specifying the schedule type and run settings, click Continue to finish the setup.
Databricks is now successfully configured as a source in your RudderStack dashboard. You can further connect this source to your preferred destination by clicking on Add Destination button:
Specifying the data to import
While connecting a destination to your Reverse ETL source, you can use the default JSON mapping or the Visual Data Mapping feature.
Based on the option(Table/Model) you chose while setting up the Reverse ETL source, follow the relevant guide for detailed steps:
What do the three validations under Verifying Credentials imply?
When setting up a Reverse ETL source, once you proceed after entering the connection credentials, you will see the following three validations under the Verifying Credentials option:
These options are explained below:
Make sure your Databricks SQL warehouse/cluster is active when running the validations. Otherwise, the validations might fail.
Verifying Connection: This option indicates that RudderStack is trying to connect to the warehouse with the information specified in the connection credentials.
If this option gives an error, it means that one or more fields specified in the connection credentials are incorrect. Verify your credentials in this case.
Able to List Schema: This option checks if RudderStack is able to fetch all schema details using the provided credentials.
Able to Access RudderStack Schema: This option implies that RudderStack is able to access the _rudderstack schema you have created by successfully running all commands in the Creating the RudderStack schema and granting permissions section.
If this option gives an error, verify if you have successfully created the _rudderstack schema and given RudderStack the required permissions to access it. For more information, refer to the Creating the RudderStack schema and granting permissions section.
Does my SQL warehouse/cluster need to be active when running the validations?
This site uses cookies to improve your experience while you navigate through the website. Out of
these
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
essential
for the working of basic functionalities of the website. We also use third-party cookies that
help
us
analyze and understand how you use this website. These cookies will be stored in your browser
only
with
your
consent. You also have the option to opt-out of these cookies. But opting out of some of these
cookies
may
have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
This site uses cookies to improve your experience. If you want to
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.