We'll send you updates from the blog and monthly release notes.
September 13, 2023
Apache Kafka’s open-source event streaming platform sets the standard for enterprise streaming engines. Many of our customers use Kafka, but they’ve told us they often struggle to effectively use their customer data from Kafka to drive value downstream because of the custom integration and pipeline work required.
Those challenges are now a thing of the past. Today we’re excited to announce Kafka as a source in RudderStack. The new integration makes it easy for data teams to get the most value out of their Kafka implementation by automatically forwarding streams to key business tools, and standardizing schemas for data that is used in identity resolution and customer 360 projects.
Challenges to activating data from Kafka
Real-time streaming data from Kafka unlocks powerful business use cases like personalized in-app experiences and marketing campaigns. But activating Kafka data to realize business outcomes takes a lot of engineering work. Loading data into Kafka is just the first step. It requires additional processing to clean, enhance, and distribute it downstream.
Kafka leaves the implementation of this additional processing – a nontrivial amount of work – up to you. In our conversations with customers, we’ve identified several resulting pain points:
- Data quality and complaince issues: Without a strong data governance framework to enforce correctness rules on customer data, teams end up with messy data that can wreak havoc downstream.
- Building and maintaining custom connectors: Because Kafka doesn’t provide off-the-shelf integrations with business tools, teams must build and maintain these on their own to get data out of Kafka to the tools and teams that need it.
- Identity stitching: Consumers typically engage with a brand across many devices and channels, creating different identifiers that must be stitched together into a single record for each user. This step, called identity stitching, cannot be achieved in a streaming engine like Kafka.
- Customer 360: After identity stitching, companies typically build a customer 360 full of known, computed, and predictive user features like customer_lifetime_value, average_order_value, last_seen, and first_seen. Since kafka is agnostic to the events flowing through it, this customer 360 must be implemented downstream in a data warehouse or data lake.
- Machine learning: Predictive analytics and machine learning features are some of the most valuable uses for first-party event data, but Kafka doesn’t natively support ML, so the heavy lifting falls on engineering teams.
Use RudderStack to do more with your Kafka data
Kafka is a low-level tool purposely designed for ultimate flexibility and huge scalability. It’s powerful, but it’s extremely unopinionated and doesn’t directly address business concerns. That’s why RudderStack and Kafka make a great pair. When you combine the power of Kafka with RudderStack’s domain-specific tooling for customer data, it’s easy to drive business value with the data in your Kafka pipelines.
RudderStack elegantly solves the challenges we covered above in a flexible, user-friendly platform that complements Kafka’s core data transport strengths.
- Data quality & compliance – with RudderStack transformations, you can manage routing events to downstream tools based on users, events, and other rules, minimizing engineering overhead. RudderStack also automatically shapes and structures data to meet the specific input requirements of each destination system. Additionally, RudderStack's event audit API enables you to diagnose any inconsistencies in your data. This can be combined with RudderStack's tracking plan features, which allows proactive monitoring and resolution of non-compliant data.
- Native real-time/streaming integrations – Our large integration library gives you off-the-shelf connectors to invoke downstream tools in real-time based on configured audiences for use cases like triggering messages or personalization.
- Identity resolution – RudderStack Profiles automatically associates your incoming events with unique customer profiles across web, mobile, and other systems.
- Customer 360 profiles – Profiles also brings together events, profile attributes, and model outputs to build enriched, accurate customer records, creating a customer 360 in your data warehouse.
- Machine learning (ML) – RudderStack Profiles allows you to leverage ML for predictive traits in customer your profiles like predicting customer churn. Segmenting users with ML provides the ability to create powerful personalized campaigns to reduce customer churn.
How RudderStack and Kafka work together
Our Kafka source connector enables a streamlined workflow between Kafka and RudderStack’s Warehouse Native CDP. Here’s how it works.
Existing producers across your technology landscape can write event data to Kafka topics via APIs or message queues. After this raw event data gets captured in Kafka in a durable and highly available fashion, it can be made ready for processing.
RudderStack integrates directly with your Kafka cluster to tap into these events. It does so securely over SSH without needing direct access to Kafka itself. You simply whitelist the RudderStack IP addresses that will be reading the events. Kafka then pushes the event data from designated topics to RudderStack in real time as the data is generated.
Once RudderStack receives this raw event data, you can apply Transformations for real-time data enrichment and hygiene. You can also use RudderStack Profiles to stitch identities across devices with the ability to tie events to user profiles for analysis in your warehouse and activation in your downstream tools.
You can then use RudderStack to activate this clean, validated event data in real time through marketing and product tools, or via your warehouse to your downstream tools and systems. RudderStack leverages 200+ pre-built integrations to distribute your enriched customer events to diverse destinations for various use cases.
Get higher ROI from your Kafka infrastructure: All you have to do is write event data to Kafka topics once, then RudderStack handles the complexities of securely extracting, enriching, and transporting the data across your entire customer data stack. RudderStack provides a simplified, single channel for your customer event data.
Kafka’s scalable data collection combined with RudderStack’s Warehouse Native Customer Data Platform enables you to achieve powerful, real-time customer data use cases without all the low-level engineering work.
Ensure data quality for every downstream application with our data governance features, resolve identities and build a customer 360 with Profiles, and send your data downstream to every tool in your stack with our large library of integrations.
Sign up for a demo today to request early access to our Kafka source and learn how RudderStack can help you drive more value from the data in your Kafka pipelines.
Head of Product Marketing