Schedule and Trigger Reverse ETL Syncs with Airflow Provider
Schedule and trigger Reverse ETL syncs programmatically with RudderStack’s Airflow provider.
4 minute read
RudderStack’s Airflow Provider lets you programmatically schedule and trigger your Reverse ETL syncs from outside RudderStack and integrate them with your existing Airflow workflows.
For more information on the codebase and sample implementation, see the GitHub Repository.
Prerequisites
To use the Airflow Provider, you must have a working Apache Airflow installation. For more information, see the Airflow documentation.
Follow the steps in the below sections to use the RudderStack Airflow Provider:
Run Airflow
Initialize all dependencies by running Apache Airflow via the following command:
airflow standalone
The Airflow standalone server is not meant for use in production. It is highly recommended using alternate methods to install and run Airflow in a production environment.
Install Airflow Provider
Install the RudderStack Airflow Provider by running the following command:
pip install rudderstack-airflow-provider
Create Airflow connection
To create a new Airflow connection, follow these steps:
In your Airflow dashboard, go to Admin > Connections:
Add a new connection by configuring the following details:
Connection ID: Specify a unique connection name. RudderstackRETLOperator will pick the connection with the name rudderstack_default by default. If you have created a connection with a different name, make sure that name is passed as a parameter to RudderstackRETLOperator.
Connection Type: For this field, select HTTP from the list.
Host: Set the value for this field depending on your region:
Next, define a DAG with the tasks as per your requirement.
The following code snippet highlights an Airflow DAG with one task named rs_trigger_sync for the Reverse ETL connection ID 20dQV6yuUDUw31peWA8f7xxgHdN:
For more information on obtaining the connection ID, see the FAQ section below:
Make sure the Airflow scheduler is running in the background. Also, you must enable the DAG in the Airflow dashboard:
You can trigger a DAG by clicking on the play button on the right as seen above, and selecting Trigger DAG. Note that stopping the DAG will not cancel the ongoing sync.
FAQ
Where can I find the connection ID for my Reverse ETL connection?
The connection ID is a unique identifier for any Reverse ETL connection set up in RudderStack.
To obtain the connection ID, click the destination connected to your Reverse ETL source and go to the Settings tab.
This site uses cookies to improve your experience while you navigate through the website. Out of
these
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
essential
for the working of basic functionalities of the website. We also use third-party cookies that
help
us
analyze and understand how you use this website. These cookies will be stored in your browser
only
with
your
consent. You also have the option to opt-out of these cookies. But opting out of some of these
cookies
may
have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
This site uses cookies to improve your experience. If you want to
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.