JSON Column Support

Send event data that is not defined by a fixed schema.

With RudderStack’s JSON column support feature you can:

  • Ingest semi-structured event data not defined by a fixed schema.
  • Directly store a nested event payload in your warehouse columns without worrying about the length limit.
info

Note that:

How to use JSON column feature

You can use the JSON column feature using any of the following approaches:

1. At event level

You can use the SDK’s integrations: options parameter to specify the JSON columns at the event level. Follow these steps:

  1. Configure the SDK to load the options parameter by default. See the following SDK guides for more information:
  2. Include the column path from the event payload in the <destination>.options.jsonPaths list.
info

Note the following:

  • Replace <destination> with the destination-specific object name. See Warehouse names in integrations object for more information.
  • The column path from the event payload must be in the following format: <event_type>.<full_path_of_attribute>`. See Use case for more information.

Use case

Consider the following event payload:

{
  "type": "track",
  "context": {
    "ctestMap": {
      "cnestedMap": {
        "dynamic_property_1": "value_1",
        "dynamic_property_2": "value_2",
        "dynamic_property_n": "value_n"
      }
    }
  },
  "properties": {
    "testMap": {
      "nestedMap": {
          "dynamic_property_1": "value_1",
          "dynamic_property_2": "value_2",
          "dynamic_property_n": "value_n"
      }
    }
  }
}

The following example shows how to use the integrationOpts parameter of the JavaScript SDK to declare the JSON columns:

"integrations": {
  "All": true,
  "SNOWFLAKE": { // Use SNOWFLAKE for Snowflake, RS for Redshift, BQ for BigQuery, POSTGRES for PostgreSQL.
    "options": {
      "jsonPaths": ["track.properties.testMap.nestedMap", "track.context.ctestMap.cnestedMap"]
    }
  }
}

Here, nestedMap and cnestedMap are declared as JSON columns.

The resulting output in the warehouse is shown:

[{
  "data": {
    "anonymous_id": "e6ab2c5e-2cda-44a9-a962-e2f67df78bca",
    ...

    "context_app_name": "RudderLabs JavaScript SDK",
    "context_app_namespace": "com.rudderlabs.javascript",
    "context_app_version": "1.0.5",
    "context_ctest_map_cnested_map": "{\"n1\":\"context nested prop 1\"}",
    "context_ip": "0.0.0.0",
    "context_library_name": "RudderLabs JavaScript SDK",
    ...

    "test_map_nested_map": "{\"n1\":\"nested prop 1\"}",
    "timestamp": "2020-01-24T06:29:02.403Z",
    "user_id": "9bb5d4c2-a7aa-4a36-9efb-dd2b1aec5d33"
  },
  "metadata": {
    "columns": {
      "anonymous_id": "string",
      "context_ctest_map_cnested_map": "json",
      ...

      "id": "string",
      "original_timestamp": "datetime",
      "received_at": "datetime",
      "revenue": "int",
      "sent_at": "datetime",
      "test_map_nested_map": "json",
      "timestamp": "datetime",
      "user_id": "string",
      "uuid_ts": "datetime"
    },
    "receivedAt": "2020-01-24T11:59:02.403+05:30",
    "table": "tracks"
  }
}]

2. In transformation

You can also declare the JSON columns in a transformation.

info
You can override the JSON columns declared at the event level with the JSON columns declared using a transformation.

The following transformation applies the JSON column to only product_clicked type of track events coming from a particular sourceId:

export function transformEvent(event, metadata) {
  const meta = metadata(event);
  if (event.event === 'product_clicked' && meta.sourceId === "source_id" && meta.destinationType === "SNOWFLAKE") { // ex: BQ, POSTGRES, RS
    if (!event.integrations) event.integrations = {};
    event.integrations["SNOWFLAKE"] = {
      options: {
        jsonPaths: ["track.properties.testMap.nestedMap", "track.context.ctestMap.cnestedMap"]
      }
    }
  }
  return event;
}

As explained in the above use case, to declare nestedMap and cnestedMap as JSON columns, add track.properties.testMap.nestedMap and track.context.ctestMap.cnestedMap in the above jsonPaths list.

3. While configuring warehouse destination

While configuring your warehouse destination in RudderStack, pass the required JSON column paths using the dot notation in the JSON columns field. You can specify multiple JSON columns separated by a comma.

info
This setting applies to all track events sent to the warehouse destination.
JSON column setting in RudderStack dashboard

Warehouse names in integrations object

Use the following object names for the supported warehouse destinations while declaring the JSON columns:

Warehouse destinationObject name
Amazon RedshiftRS
Google BigQueryBQ
PostgreSQLPOSTGRES
SnowflakeSNOWFLAKE

Semi-structured data usage and limitations

For more information on working with semi-structured data and the associated limitations in different warehouses, see the following guides:

Warehouse
Data typeReference
RedshiftSUPERDocumentation
BigQuerySTRING

(More information on how RudderStack treats the JSON columns as strings in this FAQ)
Documentation
PostgreSQLJSONBDocumentation
SnowflakeVARIANTDocumentation

FAQ

Which events are supported in the JSON column feature?

The JSON column feature supports all the event types (identify, track, page, screen, group) and their associated attributes/properties. It also supports data from the Cloud Extract sources.

Which data types are supported in the JSON column feature?

The JSON data type includes String, Integers, Float, Arrays, Booleans, and Maps.

How can I use the JSON column feature in BigQuery?

Google BigQuery has released JSON support only as a preview feature. Hence, RudderStack treats the JSON columns as strings and inserts the JSON strings as values. You can then query the semi-structured data using the JSON functions.

For more information, see BigQuery documentation.

How does RudderStack determine the column data type?

RudderStack determines the column data type based on its value in the first event (during the first upload sync).

For example, suppose column_x is received with the value as 1. RudderStack then sets the data type of this column as int in the event table.

Can I change an existing data type for a column?

You can change the column data type in the warehouse at any point. However, RudderStack applies the changes to the events only from the next sync.

To set your preferred data type for a particular column:

  1. Create a column in the warehouse with a placeholder name and the required data type.
  2. Cast the data from the original column and load it into the placeholder column.
  3. Drop the original column.
  4. Rename the placeholder column to the original column name.
warning

Note that:

  • During steps 3 and 4, the tables will be in a locked state. This might impact realtime data uploads/syncs.
  • If the above steps take too long to complete, you can halt the warehouse operations in the interim. For more information, see this FAQ.

Questions? Contact us by email or on Slack