How we crafted a domain-specific language for JSON transformation at RudderStack

RudderStack created a JSON Template Engine to simplify transformation of JSON data from one format to another, making it easier to manage and maintain complex integrations. This blog post will cover why we needed to craft our own Domain-Specific Language for JSON transformation and how we did it.
First, let’s understand the background about the problem that we were trying to solve and why we needed to create our own JSON Template Engine.
The challenge
RudderStack is the Warehouse Native CDP. We provide an integrated solution for data collection, unification in the warehouse, and activation. Our platform supports over 200 integrations and features a powerful Transformations tool. Traditionally, we used native JavaScript code for data transformation, which required significant effort and maintenance. Writing intricate JavaScript code for complex JSON transformations can be error-prone and time-consuming. Moreover, JavaScript’s general-purpose nature did not provide the level of abstraction and expressiveness needed to succinctly represent JSON transformation logic. Although JSONata offered a more efficient way to manipulate JSON data, we still encountered performance bottlenecks due to its parsing and interpretation overhead.
Our solution
The solution was to use a domain-specific language tailored specifically for JSON transformation. By designing a custom JSON template language, we can provide developers with a specialized syntax and semantics optimized for JSON manipulation tasks. Such a language would abstract away low-level JavaScript details, simplify complex transformation logic, and enhance readability and maintainability.
With that goal in mind, we developed our own JSON Transformation Engine. This engine generates optimized JavaScript code from transformation templates, reducing runtime overhead and significantly improving performance.
Steps to build a domain-specific language
Here’s how we crafted our customer JSON template language. You can follow a similar process to create a language for your problem domain.
1. Define the domain and requirements
Start by clearly defining the domain for which you’re building the DSL — in our case, JSON transformation. Identify the specific requirements and challenges within that domain, such as the need for concise syntax, support for complex data structures, and efficient execution.
2. Design language syntax and semantics
Based on the identified requirements, design the syntax and semantics of your DSL — in our case, the JSON template language. Define language constructs such as statements, expressions, and control flow mechanisms that enable users to express JSON transformation logic in a clear and concise manner.
3. Implement lexing (tokenization)
Lexical analysis involves breaking down the source code into tokens, the smallest units of meaningful characters in the language. Implement a lexer to tokenize the input JSON template code, identifying keywords, identifiers, operators, and other lexical elements.
In order to understand how we approach this tokenization, let’s look at the implementation example of descendant operator `..`. This operator is used to search for a specific key in all descendants of a property.
To begin, we must first locate the descendant operator within the code. This can be achieved by creating a generic function as part of the Lexer, which is responsible for identifying various punctuators that include dots.
JAVASCRIPT
4. Implement parsing (syntax analysis)
Parsing involves constructing a parse tree (or Abstract Syntax Tree — AST) from the tokenized source code. Implement a parser to generate the AST according to the grammar rules defined for the language.
After successfully identifying the descendant selector token in the previous step, we are now proceeding to combine it with other tokens. By doing so, we are creating an expression or Abstract Syntax Tree (AST) that represents the selector in a structured manner.
JAVASCRIPT
The above functions work together to identify and parse different parts of a path, with a focus on recognizing selectors within the path structure. They rely on a separate lexer module that provides functionality for reading and identifying different token types in the input stream.
This is the Abstract Syntax Tree (AST) representation for the code expression .employees..name
JAVASCRIPT
5. Implement code translation
Translate the parsed AST into executable code in a target language (e.g., JavaScript). This involves traversing the AST and generating code that performs the specified JSON transformations as defined by the DSL.
The final step involves converting the Descendant selector Expression (AST) into JavaScript code. This step will transform the structured representation of the selector into executable JavaScript code that can be used in the desired context.
JAVASCRIPT
This code translates a selector expression containing the descendant operator (..) into executable JavaScript code. It iterates through a list of contexts, starting with a provided base context. For each context, it checks if it’s an array and recursively processes its elements. If it’s an object, it retrieves its property values and adds them to the context list for further processing. The code also considers a property specified in the selector: if it’s a wildcard (*), all object values are included in the result; otherwise, only the value for the specific property key is included. Finally, the code flattens the result array to remove any nested arrays and stores it in a designated variable.
Below is the code generated for the expression .employees..name, the code has been modified from the original generated code for better readability.
JAVASCRIPT
Conclusion
Building a DSL at RudderStack empowered our engineering team to simplify complex workflows and scale our efforts in building and managing 100s of integrations for our Customer Data Platform. This guide covered the process we used to craft a domain-specific language (DSL) for JSON transformation and build a tailored solution to streamline data integration challenges.
We covered everything from understanding the need for a DSL to implementing lexing, parsing, and code translation. Following this guide, you can create your own custom DSLs to address specific domain requirements.
Published:
March 28, 2024

Event streaming: What it is, how it works, and why you should use it
Event streaming allows businesses to efficiently collect and process large amounts of data in real time. It is a technique that captures and processes data as it is generated, enabling businesses to analyze data in real time

RudderStack: The essential customer data infrastructure
Learn how RudderStack's customer data infrastructure helps teams collect, govern, transform, and deliver real-time customer data across their stack—without the complexity of legacy CDPs.

FiveTran and dbt Labs merger: A new giant in the modern data stack
The Fivetran and dbt Labs merger combines ingestion, transformation, and activation into one stack. It reshapes the modern data landscape and signals a move toward unified, AI-native infrastructure for data-forward teams.








