April 22, 2020
RudderStack provide's all your customer data pipelines in one platform, so you can securely collect and routing your customer data to every tool in your stack. RudderStack is enterprise-ready, with a special focus on data privacy and security. In this post, we formally introduce one of our most loved features, RudderStack Transformations.
We started building RudderStack because, as data engineers ourselves, we found that existing CDI solutions lack crucial attributes that today’s data-driven enterprises need. Most solutions have a trade-off between flexibility and complexity. Sacrificing extensibility and functionality on the altar of usability.
At RudderStack, we are building a data platform that offers the environment a data engineer needs. We aim to provide a platform that addresses the unique challenges of everyday business, together with operational peace of mind. That's why we're building RudderStack to:
- Be extensible and customizable. A data engineer should be able to deal with various issues and requirements
- Be easy to deploy, manage, and monitor
With these goals in mind, we've implemented data transformation as a core part of the platform. RudderStack Transformations is a mechanism where a data engineer can define and deploy custom logic that gets executed on the stream of events that flow inside RudderStack.
At RudderStack, we believe that being able to manipulate the event data is an integral part of any data platform, so we treat this functionality as a first-class citizen in our product. The architecture of the platform reflects this belief.
The above diagram shows a high-level architecture of RudderStack. The underlying mechanism on the backend or Data Plane is responsible for implementing the integrations with the destinations. This same mechanism is responsible for user transformations.
By implementing a single mechanism responsible for both executing user transformations and integrations with destinations, we manage to:
- Simplify the architecture
- Simplify implementing and debugging custom logic
- Make it easier to reason around performance as there aren’t many moving parts that can add overhead
Deploying RudderStack Transformations
What you can do with RudderStack Transformations
Working with PII data
Developers often embed PII data in events either accidentally (e.g., developer error) or on purpose. However, there are often strong reasons to avoid sending PII to the downstream cloud or on-prem destinations. Even for internal destinations like data-warehouse, storing PII can lead to security challenges. At the same time, it is crucial to be able to address any possible edge case in terms of how the data appears and how it is stored. With RudderStack Transformations, it is possible to implement complex PII scanners and masks. You can also create a notification system that can act on the event level and notify you when it detects something.
Event sampling and aggregation
Having access to all your events is useful, but not all applications need to have access to this level of data granularity. However, you can sample your data when delivering to certain applications such as analytics tools. This functionality is even more useful when these data receiving services charge you based on volume. Similarly, you might prefer to pre-aggregate your data before you deliver them into a visualization tool and simplify their interaction with the data.
Hunting down the nulls
Errors in your data are inevitable - event schemas might change, developers introduce errors, or sometimes fields need decommissioning. Being able to detect data anomalies and react on them closely from the backend can save a lot of the time of a data engineer who’s trying to debug a broken data pipeline. With RudderStack Transformations, you can detect common errors such as null values and correct them early in the data pipeline.
Many times, you want to transform data into different representations. In one system, you might have only one field for the name and the surname. In another, you would like to have two separate fields. You may also want to extract and break down UTM parameters from a URL. Being able to transform the schema of the events is very important in maintaining a realistic data infrastructure for any company, and RudderStack Transformations provide all the functionality needed to do that.
The above are just some everyday use cases that we have encountered so far, and they demonstrate the versatility of RudderStack Transformations.
We want you to contribute
As we said in the opening of this post, our goal at RudderStack is to build the next-generation data platform that can address all the challenges data engineers face today. Our core assumption is that, in such a complex and open problem, one can provide a robust solution only by having a strong community of people who shares the same problems and vision.
Hence, instead of delivering every possible function as a new feature, we provide the infrastructure that you can deploy and run functions to transform your data.
On this Github Repository for RudderStack transformations, you can find various templates. These are community-provided templates for common use cases like the ones we mentioned earlier in this post. We encourage you to participate, create issues, make pull requests, and share the knowledge.
Sign up for free and start sending data
Test out RudderStack Transformation on our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.
We'll send you updates from the blog and monthly release notes.