Google BigQuery reverse ETL source

Send data from Google BigQuery to your entire stack.

Google BigQuery is an industry-leading, fully-managed cloud data warehouse that lets you store and analyze petabytes of data in no time.

RudderStack supports Google BigQuery as a source from which you can ingest data and route it to your desired downstream destinations.

Granting permissions

RudderStack requires you to grant certain user permissions on your BigQuery warehouse to successfully access data from it.

Perform the following steps in the exact order to grant these permissions:

Step 1: Creating a role and granting permissions

  1. Go to the Roles section of Google Cloud Platform dashboard and click CREATE ROLE.
Google Cloud Platform dashboard create role
  1. Fill in the details as shown:
GCP role details
  1. Click ADD PERMISSIONS and add the following permissions individually:
bigquery.datasets.get
bigquery.jobs.create
bigquery.jobs.list
bigquery.tables.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData
  1. After adding the permissions, click CREATE.
BigQuery role permissions

Step 2: Creating a service account and attaching the role to it

  1. Go to Service Accounts and select the project which has the dataset or the table that you want to use.
  2. Click CREATE SERVICE ACCOUNT.
Create service account in GCP
  1. Fill in the Service Account details as shown below, and click CREATE AND CONTINUE:
Service account role details
  1. Under Grant this service account access to project, select the role you created in Step 1: Creating a role and granting permissions section above.
Service account role connection
  1. Click DONE to move to the list of service accounts.
info

Note down the service account ID. You will need this ID while creating the RudderStack schema and granting the required permissions to it.

Service account ID

Step 3: Creating and downloading the JSON key

  1. Click the three dots icon under Actions in the service account that you just created and select Manage keys:
Managing keys in GCP
  1. Click ADD KEY, followed by Create new key:
GCP Adding a new key
  1. Select JSON and click CREATE.
Select Reverse ETL source in RudderStack

A JSON file will be downloaded on your system. This file is required while creating a BigQuery warehouse source in RudderStack, explained further in this document.

Step 4: Creating the RudderStack schema and granting permissions

  1. From your BigQuery SQL workspace, run the following command to create a dedicated schema rudderstack_.
danger
The rudderstack_ schema is used by RudderStack for storing the state of each data sync. This name should not be changed.
create schema rudderstack_;
warning

The rudderstack_ schema is created in the default region of your BigQuery instance.

If the GCP cloud storage bucket used as a staging bucket by RudderStack is in a different region, you will need to adjust the above SQL statement to match that region. For example, to create the schema in the europe-west-3 region, run the following statement:

create schema rudderstack_ OPTIONS (location = "europe-west3");
  1. Grant full access to the rudderstack_ schema for the RudderStack service account you created above. Replace <SERVICE_ACCOUNT_ID> with the service account ID you specified in Step 2: Creating a service account and attaching role to it.
info
The <SERVICE_ACCOUNT_ID> takes the form of name@your-gcp-project.iam.gserviceaccount.com. You can also find it in the client_email key of the service account credentials JSON file downloaded in Step 3: Creating and downloading the JSON key.
GRANT `roles/bigquery.dataOwner`
     ON SCHEMA rudderstack_
     TO "serviceAccount:<SERVICE_ACCOUNT_ID>";

Setting up the BigQuery source in RudderStack

To set up BigQuery as a source in RudderStack, follow these steps:

Naming the source

  1. Log into your RudderStack dashboard.
  2. From the left navigation bar, go to Source > New Source > Reverse ETL. Then, select BigQuery, as shown:
Select Reverse ETL source in RudderStack
  1. Assign a name to your source.

Configuring the connection credentials

  1. Choose the relevant option from Table or Model to use the source to sync data from either a warehouse table or a model.
info
For more information on the difference between the Table and Model options when creating a Reverse ETL source, refer to the FAQ section below.
info
If you have chosen the Model option, skip the next steps and refer to the Schedule settings section directly.
  1. Enter the relevant settings in the Connection Credentials section as listed below:
Add destination in RudderStack
info
If you have already configured a destination in RudderStack, choose the Use Existing Destination option which will take you to the Schema tab in the source settings. To add a new destination from scratch, select the Create New Destination option which will take you to the destination configuration page.

Specifying the data to import

While connecting a destination to your Reverse ETL source, you can use the default JSON mapping or the Visual Data Mapping feature.

info

Based on the option(Table/Model) you chose while setting up the Reverse ETL source, follow the relevant guide for detailed steps:

FAQ

What is the difference between the Table and Model options when creating a Reverse ETL source?

When creating a new Reverse ETL source, you are presented with the following two options from which RudderStack will sync the data:

  • When you choose Table, RudderStack imports all data associated with the specified table during the sync.
  • When you choose Model, RudderStack imports the data by running the query specified in the connected model, during the sync.

What do the three validations under Verifying Credentials imply?

When setting up a Reverse ETL source, once you proceed after entering the connection credentials, you will see the following three validations under the Verifying Credentials option:

These options are explained below:

  • Verifying Connection: This option indicates that RudderStack is trying to connect to the warehouse with the information specified in the connection credentials.
warning
If this option gives an error, it means that one or more fields specified in the connection credentials are incorrect. Verify your credentials in this case.
  • Able to List Schema: This option checks if RudderStack is able to fetch all schema details using the provided credentials.
  • Able to Access RudderStack Schema: This option implies that RudderStack is able to access the rudderstack_ schema that you have created by successfully running all commands in the Creating the RudderStack schema and granting permissions section.
warning
If this option gives an error, verify if you have successfully created the rudderstack_ schema and given RudderStack the required permissions to access it. For more information, refer to Creating the RudderStack schema and granting permissions section.

Questions? Contact us by email or on Slack