Data Graph YAML Reference
Private Beta
YAML schema reference for defining a data graph with the Rudder CLI — including entities, events, and relationships.
This reference documents the YAML schema for defining a Data Graph with the Rudder CLI. Use it alongside the CLI to author, version-control, and sync data graph definitions as code.
File structure
A data graph YAML file has the following top-level structure:
version: "rudder/v1"
kind: "data-graph"
metadata:
name: "ecommerce-data-graph"
spec:
id: "ecommerce-data-graph"
account_id: "<warehouse-account-id>"
models:
- ...
Top-level fields
| Field | Type | Description |
|---|
version Required | String | Schema version. Use rudder/v1. |
kind Required | String | Resource kind. Must be data-graph. |
metadata.name Required | String | Human-readable name for the data graph |
spec.id Required | String | Unique ID for the data graph. Used as its stable identifier across syncs. |
spec.account_id Required | String | The ID of the warehouse account the data graph reads from. |
spec.models Required | List | List of entity and event models that make up the data graph. See Models for more information. |
Models
The spec.models list contains all the entities and events the data graph exposes to the Audience Builder. Each model points at a warehouse table and optionally declares relationships to other models.
Model fields
| Field | Type | Description |
|---|
id Required | String | Unique ID for the model within this data graph. Used as the target of relationships (see Relationships). |
display_name Required | String | Name shown in the Audience Builder UI (for example, Customers, Sales). |
type Required | String | Either entity (dimension-style table) or event (timestamped fact table). |
table Required | String | Fully qualified warehouse table name, for example, ECOMMERCE_DB.E_MART.DIM_CUSTOMERS. |
description | String | Human-readable description of the model. Shown as a tooltip in the builder. |
primary_id Required | String | Column that uniquely identifies a row in the table. Required for entities; Optional for events. |
timestamp Required | String | Column holding the event timestamp. Required when type: event. Used for time-window filtering in the Audience Builder. Optional for entities. |
relationships Optional | List | List of relationships this model has to other models. See Relationships for more information. |
Entity vs. event
- Entity: A dimension-like table representing a business object (
Customers, Products, Stores). Use type: entity and set primary_id. - Event: A fact-like table where each row represents something that happened at a point in time (
Sales, Customer Interactions, Loyalty Points). Use type: event and set timestamp. Events can be filtered with a time window in the Audience Builder.
Relationships
Relationships connect two models so marketers can filter one model using conditions on related records (for example, “customers with 3 or more orders”). Relationships are declared on the source model under its relationships list.
Relationship fields
| Field | Type | Description |
|---|
id Required | String | Unique ID for the relationship within the source model. |
display_name Required | String | Name shown in the Audience Builder UI (for example, Has Sales, Belongs To Account). |
cardinality Required | String | One of one-to-many, many-to-one, or one-to-one. See Current limitations. |
target Required | String | Reference to the target model in the form #data-graph-model:<model-id>. |
source_join_key Required | String | Column on the source model used in the join. |
target_join_key Required | String | Column on the target model used in the join. |
Relationship targets use the #data-graph-model:<model-id> reference format, where <model-id> is the id of another model in the same data graph. For example:
target: "#data-graph-model:sales"
Complete example
The following example defines a small e-commerce data graph with two entities (Customers, Accounts), one event (Sales), and the relationships between them:
version: "rudder/v1"
kind: "data-graph"
metadata:
name: "ecommerce-data-graph"
spec:
id: "ecommerce-data-graph"
account_id: "<warehouse-account-id>" # RudderStack generates this ID when you connect a warehouse to your RudderStack workspace.
models:
# --- Customers (entity) ---
- id: "customers"
display_name: "Customers"
type: "entity"
table: "ECOMMERCE_DB.E_MART.DIM_CUSTOMERS"
description: "Customers with demographics and loyalty info"
primary_id: "CUSTOMER_KEY"
relationships:
- id: "customer-has-sales"
display_name: "Has Sales"
cardinality: "one-to-many"
target: "#data-graph-model:sales"
source_join_key: "CUSTOMER_KEY"
target_join_key: "CUSTOMER_KEY"
- id: "customer-belongs-to-account"
display_name: "Belongs To Account"
cardinality: "many-to-one"
target: "#data-graph-model:accounts"
source_join_key: "ACCOUNT_KEY"
target_join_key: "ACCOUNT_KEY"
# --- Accounts (entity) ---
- id: "accounts"
display_name: "Accounts"
type: "entity"
table: "ECOMMERCE_DB.E_MART.DIM_ACCOUNTS"
description: "Customer account records for individual, household, and corporate grouping"
primary_id: "ACCOUNT_KEY"
# --- Sales (event) ---
- id: "sales"
display_name: "Sales"
type: "event"
table: "ECOMMERCE_DB.E_MART.FACT_SALES"
description: "Sales transactions with amounts, status, and store/channel links"
timestamp: "CREATED_AT"
Validate the data graph
Validate the data graph YAML file using the validate command before syncing it to your workspace.
rudder-cli validate -l data-graph.yaml
This will validate the data graph and return any errors or warnings.
Sync to your workspace
Once your data graph YAML is ready, use the Rudder CLI to validate and sync it to your workspace:
rudder-cli apply -l data-graph.yaml
See also
Questions? We're here to help.
Join the RudderStack Slack community or email us for support