Databricks is a data analytics platform that lets you easily integrate with open source libraries. It offers a simple collaborative environment to run interactive and scheduled data analysis workloads.
RudderStack supports Databricks as a source from which you can ingest data and route it to your desired downstream destinations.
You can now ingest data into RudderStack by running queries on your Databricks cluster or SQL warehouse.
RudderStack requires you to grant certain user permissions on Databricks to successfully access data from it.
Follow the steps listed in the following sections in the exact order to grant these permissions:
From the left panel, go to Collect > Sources > New source > Reverse ETL. Then, select Databricks.
Assign a name to your source and click Continue.
Configuring the connection credentials
Choose from the Table or Model option to sync data from either a warehouse table or a model.
Enter the connection details of your Databricks cluster or SQL warehouse in the Connection Credentials section:
For most use cases, RudderStack recommends using a SQL warehouse over a cluster as they generally cost less and are faster to spin up. In contrast, clusters are used for much larger operations that require more resources.
Host - Enter the server hostname.
Port - Enter the port number.
Path - Enter the HTTP path.
Token - Enter the personal access token.
Catalog - Enter the name of your Unity catalog. See Databricks documentation for more information on getting the catalog details.
Note the following:
See this FAQ for more information on getting the host, port, path, and token for your Databricks cluster.
See this FAQ for more information on getting the host, port, path, and token for your SQL warehouse.
If you’ve already configured Databricks as a source before, your existing credentials will automatically appear under Use Existing Credentials.
Click Continue to proceed.
Specify the Schedule Settings to schedule the data syncs from your Databricks source.
RudderStack lets you schedule data syncs for your Reverse ETL sources and specify how and when the syncs will run. For more information on the Basic, CRON, and Manual schedule types, refer to the Sync Schedule Settings guide.
After specifying the schedule type and run settings, click Continue to finish the setup.
Databricks is now successfully configured as a source in your RudderStack dashboard. You can further connect this source to your preferred destination by clicking on Add Destination button:
Specifying the data to import
While connecting a destination to your Reverse ETL source, you can use the default JSON mapping or the Visual Data Mapping feature.
Based on the option(Table/Model) you chose while setting up the Reverse ETL source, follow the relevant guide for detailed steps:
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
for the working of basic functionalities of the website. We also use third-party cookies that
analyze and understand how you use this website. These cookies will be stored in your browser
consent. You also have the option to opt-out of these cookies. But opting out of some of these
have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.