BigQuery Stream

Send your event data from RudderStack to BigQuery via Google’s streaming API.

Google BigQuery offers a streaming API which lets you insert data into BigQuery in near real-time and have it ready for analysis in no time.

Find the open source transformer code for this destination in the GitHub repository.

Prerequisites

Before you set up BigQuery Stream as a destination in the RudderStack dashboard, make sure to obtain the required access to write to BigQuery.

info
RudderStack recommends creating a service account in your Google Cloud Console with the BigQuery Data Editor role.

Connection compatibility

Destination Information
  • Status: Generally Available
  • Supported sources: Android, iOS , Web, Unity, AMP , Cloud, React Native , Flutter, Cordova, Warehouse, Shopify
  • Refer to it as BQSTREAM in the Integrations object.
Connection Modes
SourceCloud modeDevice modeHybrid mode
AMPsupportednot supportednot supported
Androidsupportednot supportednot supported
Cloudsupportednot supportednot supported
Cordovasupportednot supportednot supported
Fluttersupportednot supportednot supported
iOSsupportednot supportednot supported
React Nativesupportednot supportednot supported
Shopifysupportednot supportednot supported
Unitysupportednot supportednot supported
Warehousesupportednot supportednot supported
Websupportednot supportednot supported
Supported Message Types
SourceIdentifyPageTrackScreenGroupAliasRecord
Cloud mode
Supported sourcesnot supportednot supportedsupportednot supportednot supportednot supportednot supported

Get started

  1. In your RudderStack dashboard, set up a source.
  2. Go to the Overview tab of your source and select Add Destination > Create new destination.
Add new destination in RudderStack dashboard
  1. Select BigQuery Stream from the list of destinations. Then, click Continue.

Connection settings

SettingDescription
Project IDEnter your BigQuery project ID.
Dataset IDEnter the ID of the project dataset associated with the Project ID above.
Table IDProvide the ID of the table into which you want to stream the event data.
Insert IDThis is an optional field. Enter the insertId used by Google to deduplicate the data sent to BigQuery.

See Deduplicate data for more information on this setting.
CredentialsEnter the contents of the credentials JSON you downloaded after creating your service account.

The following screenshot shows the fields associated with the Project ID, Dataset ID, and Table ID settings listed above:

BigQuery Stream connection settings

Send events to BigQuery Stream

RudderStack supports sending only track events to BigQuery Stream.

warning

Note the following:

  • Make sure your track event payload format matches the table schema corresponding to Table ID specified in the dashboard settings.
  • RudderStack does not support the templateSupportSuffix feature which creates a table schema during a streaming insert action.

Suppose you want to stream the events from your web source to BigQuery and the table schema in your BigQuery dataset is as follows:

BigQuery table schema

To successfully stream the events, the event tracked from your JavaScript SDK should look like the following:

rudderanalytics.track("event", {
  productId: 10,
  productName: `Product-10`,
  count: 12
});

Note that the track properties in the above payload match with the fields specified in your table schema. Once streamed, you can view this event in your BigQuery console by running the following SQL command :

BigQuery result

Deduplicate data

Google leverages the insertId to deduplicate the data sent to BigQuery. insertId is essentially an event property that uniquely identifies an event.

warning
RudderStack currently supports only numeric or string values as insertId.

For more information on the deduplication process in BigQuery, refer to the BigQuery documentation.

Use case

Consider the following table schema:

BigQuery table schema

When sending an Insert Product event to BigQuery, you can use the productId field to uniquely identify the product. Upon setting productId as the insertId, BigQuery uses it to deduplicate the data.

Configure insertId dynamically

To dynamically configure insertId via the event payload, make sure that insertId is the column name present in your schema (or in the properties object in the event payload) used to uniquely identify an event.

Consider the following schema:

BigQuery table schema

Suppose you have a dynamic configuration like {{ message.uniqueId || "productId" }} for the above schema. There are three cases to consider here:

Case 1: Unique ID is sent as a value which is not a key in the event properties

Consider the following payload:

{
  "properties": {
    "productId": 212,
    "productName": "my product",
    "count": 24
  },
  ...,
  "uniqueId": <some_value> ,
  ...
}

In the above case, deduplication is not applicable as the event properties do not contain <some_value> present in the payload.

Case 2: Unique ID is sent as a value which is a key in the event properties

Consider the following payload:

{
  "properties": {
    "productId": 212,
    "productName": "my product",
    "count": 24
  },
  ...,
  "uniqueId": "productId",
  ...
}

In this case, deduplication is applicable as RudderStack sends the productId value (212) as the insertId to Google.

Case 3: Unique ID is not sent in the event payload

Consider the following payload:

{
  "properties": {
    "productId": 212,
    "productName": "my product",
    "count": 24
  },
  ...
}

In this case, deduplication is applicable as RudderStack sends the productId value (212) as the insertId to Google.

If you use the dynamic destination configuration for insertId by passing a random value (e.g. 1234) in the above payload, deduplication will not be applicable as the properties object does not contain the value 1234.

Create a service account

To create a service account in your Google Cloud Console, follow these steps:

  1. In the left sidebar, go to APIs & Services > Credentials.
  2. Then, click CREATE CREDENTIALS > Service account:
Service account under Create Credentials
  1. Enter the service account details and click CREATE AND CONTINUE.
  2. In the Select a role field, search and select the BigQuery Data Editor role and click CONTINUE.
BigQuery User role
  1. Click DONE to finish the setup.
  2. Next, you need the service account credentials JSON required for RudderStack to send the data to BigQuery. To obtain this JSON, go to your service account.
Service account
  1. Then, go to KEYS > ADD KEY > Create new key.
  2. Select the Key type as JSON and click CREATE.
Service account type

Your JSON key will be automatically downloaded. Copy and paste the contents of this JSON key in the Credentials field while configuring BigQuery Stream as a destination in RudderStack.

Considerations

Note the following before sending your event data to the BigQuery Stream destination:

  • You must ensure that the instrumented event aligns with the BigQuery table schema. You can leverage Transformations to modify the event.

  • For fields defined as JSON and RECORD types in the BigQuery table, there are additional aspects to consider:

    • JSON: For fields defined as JSON, the element should be stringified before sending to BigQuery — you can leverage Transformations to do this.
    • RECORD: For fields defined as RECORD, the JSON object can be sent as-is. However, it should only include the fields explicitly defined within the RECORD schema.
  • During ingestion, RudderStack does not fetch or validate the schema — the payload is forwarded as received.

Troubleshooting

See the BigQuery documentation for troubleshooting the different errors you might encounter while sending your event data to BigQuery Stream.

FAQ

How much time does it take for RudderStack (client) to see the new DDL changes?

RudderStack doesn’t fetch the schema. However, when the DDL of a BigQuery table is updated, it may take a few seconds for those changes to propagate in BigQuery.

During this window, clients may still encounter failures until the updated schema is fully applied and visible to the client.



Questions? Contact us by email or on Slack