Data Graph YAML Reference
Beta
YAML schema reference for defining a data graph with the Rudder CLI — including entities, events, and relationships.
This reference documents the YAML schema for defining a Data Graph with the Rudder CLI. Use it alongside the CLI to author, version-control, and sync data graph definitions as code.
File structure
A data graph YAML file has the following top-level structure:
version: "rudder/v1"
kind: "data-graph"
metadata:
name: "ecommerce-data-graph"
spec:
id: "ecommerce-data-graph"
account_id: "<warehouse-account-id>"
models:
- ...
Top-level fields
| Field | Type | Description |
|---|
version Required | String | Schema version. Use rudder/v1. |
kind Required | String | Resource kind. Must be data-graph. |
metadata.name Required | String | Human-readable name for the data graph |
spec.id Required | String | Unique ID for the data graph. Used as its stable identifier across syncs. |
spec.account_id Required | String | The ID of the warehouse account the data graph reads from. |
spec.models Required | List | List of entity and event models that make up the data graph. See Models for more information. |
Models
The spec.models list contains all the entities and events the data graph exposes to the Audience Builder. Each model points at a warehouse table and optionally declares relationships to other models.
Model fields
| Field | Type | Description |
|---|
id Required | String | Unique ID for the model within this data graph. Used as the target of relationships (see Relationships). |
display_name Required | String | Name shown in the Audience Builder UI (for example, Customers, Sales). |
type Required | String | Either entity (dimension-style table) or event (timestamped fact table). |
table Required | String | Fully qualified warehouse table name, for example, ECOMMERCE_DB.E_MART.DIM_CUSTOMERS. |
description | String | Human-readable description of the model. Shown as a tooltip in the builder. |
primary_id Required | String | Column that uniquely identifies a row in the table. Required for entities; Optional for events. |
timestamp Required | String | Column holding the event timestamp. Required when type: event. Used for time-window filtering in the Audience Builder. Optional for entities. |
relationships Optional | List | List of relationships this model has to other models. See Relationships for more information. |
Entity vs. event
- Entity: A dimension-like table representing a business object (
Customers, Products, Stores). Use type: entity and set primary_id. - Event: A fact-like table where each row represents something that happened at a point in time (
Sales, Customer Interactions, Loyalty Points). Use type: event and set timestamp. Events can be filtered with a time window in the Audience Builder.
Relationships
Relationships connect two models so marketers can filter one model using conditions on related records (for example, “customers with 3 or more orders”). Relationships are declared on the source model under its relationships list.
Relationship fields
| Field | Type | Description |
|---|
id Required | String | Unique ID for the relationship within the source model. |
display_name Required | String | Name shown in the Audience Builder UI (for example, Has Sales, Belongs To Account). |
cardinality Required | String | One of one-to-many, many-to-one, or one-to-one. See Current limitations. |
target Required | String | Reference to the target model in the form #data-graph-model:<model-id>. |
source_join_key Required | String | Column on the source model used in the join. |
target_join_key Required | String | Column on the target model used in the join. |
Relationship targets use the #data-graph-model:<model-id> reference format, where <model-id> is the id of another model in the same data graph. For example:
target: "#data-graph-model:sales"
Complete example
The following example defines a small e-commerce data graph with two entities (Customers, Accounts), one event (Sales), and the relationships between them:
version: "rudder/v1"
kind: "data-graph"
metadata:
name: "ecommerce-data-graph"
spec:
id: "ecommerce-data-graph"
account_id: "<warehouse-account-id>" # RudderStack generates this ID when you connect a warehouse to your RudderStack workspace.
models:
# --- Customers (entity) ---
- id: "customers"
display_name: "Customers"
type: "entity"
table: "ECOMMERCE_DB.E_MART.DIM_CUSTOMERS"
description: "Customers with demographics and loyalty info"
primary_id: "CUSTOMER_KEY"
relationships:
- id: "customer-has-sales"
display_name: "Has Sales"
cardinality: "one-to-many"
target: "#data-graph-model:sales"
source_join_key: "CUSTOMER_KEY"
target_join_key: "CUSTOMER_KEY"
- id: "customer-belongs-to-account"
display_name: "Belongs To Account"
cardinality: "many-to-one"
target: "#data-graph-model:accounts"
source_join_key: "ACCOUNT_KEY"
target_join_key: "ACCOUNT_KEY"
# --- Accounts (entity) ---
- id: "accounts"
display_name: "Accounts"
type: "entity"
table: "ECOMMERCE_DB.E_MART.DIM_ACCOUNTS"
description: "Customer account records for individual, household, and corporate grouping"
primary_id: "ACCOUNT_KEY"
# --- Sales (event) ---
- id: "sales"
display_name: "Sales"
type: "event"
table: "ECOMMERCE_DB.E_MART.FACT_SALES"
description: "Sales transactions with amounts, status, and store/channel links"
timestamp: "CREATED_AT"
Sync to your workspace
Once your data graph YAML is ready, use the Rudder CLI to validate and sync it to your workspace:
rudder-cli apply -f data-graph.yaml
See also
Questions? We're here to help.
Join the RudderStack Slack community or email us for support