SQL Model Resources YAML Reference Alpha

Complete reference for defining your SQL model resources using YAML configuration files.
Available Plans
  • free
  • starter
  • growth
  • enterprise

This guide serves as a detailed reference for the CLI project YAML files that contain definitions of your Reverse ETL SQL model resources.

Overview

In the context of the Rudder CLI (rudder-cli) tool, SQL model resources are defined using YAML files that specify the configuration for your Reverse ETL SQL model sources. These YAML files enable you to manage your SQL models through a Git-based workflow with version control and collaboration features.

The location and naming of these YAML files is flexible, as you can store the YAML files anywhere within the project’s root directory or subdirectories.

success

You can also group multiple SQL model resources in the same file, allowing structures that can best serve your project’s requirements.

The Rudder CLI tool processes all valid YAML files within the project structure to recognize the defined resources.

The following sections detail the specific YAML formats and parameter definitions for SQL model resources.

SQL model resources

You can define one or more SQL model resources in the YAML file by setting kind to retl-source-sql-model.

The spec parameter of the YAML file has the following structure:

PropertyTypeDescription
id
Required
StringUnique identifier for the SQL model resource within the project. This parameter must be unique across all SQL model resources.
display_name
Required
StringDisplay name for the SQL model resource as shown in the RudderStack dashboard.
primary_key
Required
StringColumn name to use as the primary key for the model. This column must exist in your SQL query results.
source_definition
Required
StringWarehouse type identifier.

See Supported warehouse types for values that you can use.
enabled
Required
BooleanWhether the SQL model resource is active. Set to true to enable the resource.
account_id
Required
StringThe account ID for the data warehouse connection. You can obtain this using ./rudder-cli workspace accounts list.
descriptionStringOptional description for the SQL model resource.
sqlStringInline SQL query definition.

Use the YAML literal block scalar `
fileStringPath to an external SQL file containing the query. Path is relative to the YAML file location.
warning
You must use exactly one of sql or file — they are mutually exclusive. Use sql for inline queries or file to reference external SQL files.

Supported warehouse types

The source_definition field supports the following data warehouse types:

Warehousesource_definition value
PostgreSQLpostgres
MySQLmysql
Snowflakesnowflake
Amazon Redshiftredshift
Google BigQuerybigquery
Databricksdatabricks
Trinotrino

Example definitions

The following examples highlight the YAML configurations for SQL model resources defined using inline SQL and external SQL files. You can group multiple resources in the same YAML file to manage them together.

Best practices

When defining SQL model resources in YAML files, follow these best practices:

File organization

  • Logical grouping: Group related SQL models in the same YAML file when they share similar purposes or data sources.
  • Clear naming: Use descriptive file names that reflect the models’ purpose, such as user-analytics-models.yaml or ecommerce-models.yaml.
  • Directory structure: Organize SQL files in subdirectories (like ./queries/ or ./sql/) to keep your project structure clean.

Resource configuration

  • Unique IDs: Choose clear, descriptive IDs that reflect the model’s purpose and avoid conflicts.
  • Meaningful names: Use display names that clearly describe what the model does for dashboard users.
  • Documentation: Always include descriptions to help team members understand the model’s purpose.
  • External files: For complex queries, use the file option to keep SQL in separate files for better syntax highlighting and version control. When importing resources, use the --sql-location parameter if you prefer external SQL files.

Environment management

  • Variable substitution: Consider using variable substitutions for account_id values when working across multiple environments (development, staging, or production).
  • Environment-specific configurations: Maintain separate configuration files or use templating for environment-specific settings.

Query optimization

  • Primary key selection: Choose primary keys that are unique, stable, and efficiently indexed in your warehouse.
  • Performance considerations: Write efficient queries that minimize warehouse compute costs and execution time.
  • Data freshness: Include appropriate date filters to control data volume and ensure relevance.

See also


Questions? Contact us by Email or on Slack