RudderStack: An Open-Source Customer Data Platform

Blog Banner

RudderStack, The Warehouse Native CDP, is the only open source customer data platform built from the ground up for data teams and is a leading alternative to Segment. With RudderStack, you can collect behavioral data across the entire customer journey, unify it in your warehouse to produce reliable, complete customer profiles, and activate it downstream by syncing it to business tools or making it available in real time via API to personalize product experiences. Here’s what makes RudderStack different from every other customer data platform:

  • Warehouse native – Our warehouse-native approach enables you to build the CDP in your own data warehouse, maximizing your infrastructure investment, eliminating data silos, and easing compliance burdens.
  • Built for data teams – Engineering-driven features and a robust integration library allow you to create foundational efficiencies so you can focus on helping every business function drive growth.
  • Flexible, open architecture – Our flexible architecture delivers ultimate optionality and enables you to scale with agility as you respond to changing business needs.

RudderStack is built on an open-source, Kubernetes-native engine. The majority of our 16+ SDKs and 200+ destination integrations are also open source. Read RudderStack’s licensing explained if you’d like to get all the details, and check out our Github organization to explore the repos – you’ll also find some of our data modeling repos for use-cases like customer journey analysis and sessionization. With our OSS product, RudderStack Open Source, you can use our SDKs to easily collect event data from all of your websites and applications and send it to your entire data stack. Open Source users can even take advantage of our Transformations feature to transform data in flight with custom javascript or python functions.

RudderStack Cloud – which features a generous free tier – offers an enterprise-ready, end-to-end solution for collecting, unifying, and activating data. Our docs provide a detailed breakdown of RudderStack Cloud vs. Open Source to help you determine which product is right for you.

Set up a demo with our open source CDP experts!
Get a demo of RudderStack to start collecting first-party event data from every source and sending it to any destination today.

RudderStack Architecture

RudderStack is a standalone system dependent only on a database (PostgreSQL). You can read about how, and why, we chose PostgreSQL here. Its backend is written in Go, with a rich UI written in React.js. RudderStack's architecture consists of 2 major components:

  • Control Plane: The control plane offers a UI to configure your event data sources and destinations.
  • Data Plane: This is the RudderStack backend – the core engine that collects, transforms, and routes your events to their specified destinations.

Here’s a broad visual representation of RudderStack’s architecture:

RudderStack's Open Source CDP Architecture

For more details on the architecture, check out our documentation.

How to set up RudderStack Open Source

The easiest and fastest way to get started with RudderStack Open Source is using the Docker setup. However, if you wish to use RudderStack in production environments, we strongly recommend using our Kubernetes Helm charts. The steps for setting up RudderStack using Docker are as follows:

First, you’ll set up your data plane:

  1. Sign up for RudderStack Open Source
  2. Follow the getting started checklist in your RudderStack dashboard to easily set up and configure your event data sources and destinations. RudderStack self-hosts these configurations and does not charge for them.
  3. In the dashboard, go to Settings > Workspace and copy the Workspace token. This token is required to set up your data plane.

Note: If you want to host your own source and destination configurations, you can use the open-source RudderStack Config Generator. However, this dashboard lacks features such as user-defined transformations and live event debugging, which are present in the RudderStack-hosted dashboard.

Next, you’ll set up your data plane:

  1. Download the rudder-docker.yml docker-compose file.
  2. Replace <your_workspace_token> in this file with your workspace token.
  3. In your terminal, navigate to the directory where you want to install RudderStack and run the following command:
SH
docker compose -f rudder-docker.yml up -d

To verify if the setup is successful, you can send test events by following our verify installation guide.

With RudderStack Open Source set up, you’re now ready to take advantage of our Event Stream pipeline to collect behavioral data from all of your websites and apps and route it to over 200 destinations in real time. You can also use our powerful Transformations features to transform your data in flight.

Event Stream

RudderStack Event Stream allows you to easily ingest clean, first-party event data, send it across your data stack, and store it in your warehouse. So you can collect data from every source, stream it directly to your data warehouse and route events in real-time to the tools used by your marketing, product, and customer success teams. Our robust event stream source library features web, mobile, and server side SDKs plus a webhook source you can use to ingest events from any source that supports a webhook. It also includes integrations with some third-party platforms like Looker, PostHog, and Customer.io. Read more about Event Stream in our docs.

Support for 200+ Destinations

RudderStack can send data to any tool in your stack, ensuring every system has the same data and making migrations a breeze. RudderStack reliably routes all the tracked customer events to your preferred destinations for various activation use-cases like analytics, attribution, marketing, CRM, and personalization. You can explore the entire integration directory here.

RudderStack Transformations

With Transformations, you can transform events in real-time with Python or JavaScript before they are delivered to downstream destinations. The use cases for this feature are endless. You can use them to fix bad data, enrich events, mask PII, customize integrations, hit APIs, and more. Read more in our docs.

Our Transformations API allows you to create, read, update and delete transformations and libraries programmatically by making HTTP calls.

Beyond open source: RudderStack Cloud

While our open source offering is powerful tool for collecting and transforming event data, our cloud product delivers an enterprise ready, fully featured Warehouse Native Customer Data Platform. In RudderStack Cloud, you get a RudderStack-hosted data plane and control plane, so you need not worry about the setup. You can get started on our free tier in less than 15 minutes with our Quickstart Guide, and if you start on the cloud free tier, it’s much easier to convert to a paid plan as your needs evolve over time.

While our open source offering is powerful tool for collecting and transforming event data, our cloud product delivers an enterprise ready, fully featured Warehouse Native Customer Data Platform. In RudderStack Cloud, you get a RudderStack-hosted data plane and control plane, so you need not worry about the setup. You can get started on our free tier in less than 15 minutes with our Quickstart Guide, and if you start on the cloud free tier, it’s much easier to convert to a paid plan as your needs evolve over time.


In addition to Event Stream and Transformations, with RudderStack Cloud, you’ll have everything you need to collect, unify, and activate your customer data. RudderStack cloud offers features for:

  • Data governance – ensure data quality and compliance
  • Deployment and security – Scale and secure your RudderStack deployment
  • Monitoring and observability – Monitor your data pipelines and set up alerts
  • Audits and user management – Manage users and set access controls for various RudderStack features

RudderStack cloud also includes our data unification product, Profiles, which enables you to power your business with reliable, complete customer profiles. With RudderStack Profiles you can establish a comprehensive identity graph in your warehouse, build features on top to create a customer 360, then deliver complete customer profiles to power use cases for every team.For data activation, Profiles includes two features, Cohorts and Actiavtions. Cohorts enables data teams to define core customer segments in the warehouse, which serve as a starting point for audience creation. Activations allows non-technical users to explore the customer 360, filter core cohorts into audiences, and sync them directly to their tools for tactical activation.

RudderStack cloud also includes our Reverse ETL pipeline which makes it easy to sync data from 8+ sources including data warehouses, data lakes, and distributed query engines like Trino.

Reference our RudderStack Cloud vs. Open Source guide get more details on the differences between the two offerings. If you’re looking for information on RudderStack as an open source alternative to Segment, you can check our our RudderStack vs. Segment comparison or read this post from Marketing Arsenal: An open source Segment alternative? Rudderstack vs Segment.

Turn your customer data into competitive advantage

RudderStack runs on an open source engine, and you can use RudderStack Open Source to collect data from every data source and route it to your entire customer data stack. While the open source tool is powerful, our cloud offering delivers a more fully featured, enterprise ready product, and it runs on the same open source engine. When you start on the free tier, you get all the functionality of our Open Source offering plus Reverse ETL, and it’s easy to convert to a paid plan as your needs evolve. Sign up to test drive RudderStack today.

For an in-depth comparison of RudderStack VS Segment check out this post on Marketing Arsenal: An open source Segment alternative? Rudderstack vs Segment

Get a demo and start sending data with RudderStack
Get a demo of RudderStack to start collecting first-party event data from every source and sending it to any destination today.
December 26, 2019
Soumyadeb Mitra

Soumyadeb Mitra

Founder and CEO of RudderStack