Data Graph YAML Reference

YAML schema reference for defining a Data Graph with the Rudder CLI — entities, events, and relationships.
Available Plans
  • growth
  • enterprise

This reference documents the YAML schema for defining a Data Graph with the Rudder CLI. Use it alongside the CLI to author, version-control, and sync data graph definitions as code.

File structure

A data graph YAML file has the following top-level structure:

version: "rudder/v1"
kind: "data-graph"
metadata:
  name: "ecommerce-data-graph"
spec:
  id: "ecommerce-data-graph"
  account_id: "<warehouse-account-id>"
  models:
    - ...

Top-level fields

FieldTypeDescription
version
Required
StringSchema version. Use rudder/v1.
kind
Required
StringResource kind. Must be data-graph.
metadata.name
Required
StringHuman-readable name for the data graph
spec.id
Required
StringUnique ID for the data graph. Used as its stable identifier across syncs.
spec.account_id
Required
StringThe ID of the warehouse account the data graph reads from.
spec.models
Required
ListList of entity and event models that make up the data graph. See Models for more information.

Models

The spec.models list contains all the entities and events the data graph exposes to the Audience Builder. Each model points at a warehouse table and optionally declares relationships to other models.

Model fields

FieldTypeDescription
id
Required
StringUnique ID for the model within this data graph. Used as the target of relationships (see Relationships).
display_name
Required
StringName shown in the Audience Builder UI (for example, Customers, Sales).
type
Required
StringEither entity (dimension-style table) or event (timestamped fact table).
table
Required
StringFully qualified warehouse table name, for example, ECOMMERCE_DB.E_MART.DIM_CUSTOMERS.
descriptionStringHuman-readable description of the model. Shown as a tooltip in the builder.
primary_id
Required
StringColumn that uniquely identifies a row in the table. Required for entities; Optional for events.
timestamp
Required
StringColumn holding the event timestamp. Required when type: event. Used for time-window filtering in the Audience Builder. Optional for entities.
relationships
Optional
ListList of relationships this model has to other models. See Relationships for more information.

Entity vs. event

  • Entity: A dimension-like table representing a business object (Customers, Products, Stores). Use type: entity and set primary_id.
  • Event: A fact-like table where each row represents something that happened at a point in time (Sales, Customer Interactions, Loyalty Points). Use type: event and set timestamp. Events can be filtered with a time window in the Audience Builder.

Relationships

Relationships connect two models so marketers can filter one model using conditions on related records (for example, “customers with 3 or more orders”). Relationships are declared on the source model under its relationships list.

Relationship fields

FieldTypeDescription
id
Required
StringUnique ID for the relationship within the source model.
display_name
Required
StringName shown in the Audience Builder UI (for example, Has Sales, Belongs To Account).
cardinality
Required
StringOne of one-to-many, many-to-one, or one-to-one. See Current limitations.
target
Required
StringReference to the target model in the form #data-graph-model:<model-id>.
source_join_key
Required
StringColumn on the source model used in the join.
target_join_key
Required
StringColumn on the target model used in the join.

Target reference format

Relationship targets use the #data-graph-model:<model-id> reference format, where <model-id> is the id of another model in the same data graph. For example:

target: "#data-graph-model:sales"

Complete example

The following example defines a small e-commerce data graph with two entities (Customers, Accounts), one event (Sales), and the relationships between them:

version: "rudder/v1"
kind: "data-graph"
metadata:
  name: "ecommerce-data-graph"
spec:
  id: "ecommerce-data-graph"
  account_id: "<warehouse-account-id>" # RudderStack generates this ID when you connect a warehouse to your RudderStack workspace.
  models:
    # --- Customers (entity) ---
    - id: "customers"
      display_name: "Customers"
      type: "entity"
      table: "ECOMMERCE_DB.E_MART.DIM_CUSTOMERS"
      description: "Customers with demographics and loyalty info"
      primary_id: "CUSTOMER_KEY"
      relationships:
        - id: "customer-has-sales"
          display_name: "Has Sales"
          cardinality: "one-to-many"
          target: "#data-graph-model:sales"
          source_join_key: "CUSTOMER_KEY"
          target_join_key: "CUSTOMER_KEY"
        - id: "customer-belongs-to-account"
          display_name: "Belongs To Account"
          cardinality: "many-to-one"
          target: "#data-graph-model:accounts"
          source_join_key: "ACCOUNT_KEY"
          target_join_key: "ACCOUNT_KEY"

    # --- Accounts (entity) ---
    - id: "accounts"
      display_name: "Accounts"
      type: "entity"
      table: "ECOMMERCE_DB.E_MART.DIM_ACCOUNTS"
      description: "Customer account records for individual, household, and corporate grouping"
      primary_id: "ACCOUNT_KEY"

    # --- Sales (event) ---
    - id: "sales"
      display_name: "Sales"
      type: "event"
      table: "ECOMMERCE_DB.E_MART.FACT_SALES"
      description: "Sales transactions with amounts, status, and store/channel links"
      timestamp: "CREATED_AT"

Sync to your workspace

Once your data graph YAML is ready, use the Rudder CLI to validate and sync it to your workspace:

rudder-cli apply -f data-graph.yaml

See also

Questions? We're here to help.

Join the RudderStack Slack community or email us for support