From Homegrown Data Infrastructure to AI-Ready Foundation with Bol.com



50B+
monthly events
13 million
active customers
50,000+
retail partners
As the largest eCommerce platform in the Netherlands and Belgium, bol.com serves more than 13 million customers and supports over 50,000 retail partners across their marketplace. What began as an online bookstore evolved into a complex digital ecosystem that now powers online shopping, third-party sellers, and advertising.
Bol.com's DIY data infrastructure, built more than a decade earlier, served them well in their early stage, but it was not designed for this new level of volume, velocity, or organizational complexity. As Bol scaled, data volumes surged, and use cases multiplied, the system began to strain under demands it was never designed to meet. This shift dramatically increased bol.com’s dependence on high-quality data and behavioral signals, making them foundational to advertising measurement, personalization, experimentation, and attribution.
The team turned to RudderStack's enterprise-grade customer data infrastructure to bring reliable event streaming and proactive governance to their data platform.
When in-house infrastructure stops scaling
bol.com processes more than one billion customer events every day, from clicks and product views to ad impressions and checkout activity. Data reliability is critical because even brief outages could result in the loss of critical customer data, with revenue impacts reaching hundreds of thousands of euros. At over 1 billion events a day, the gaps in their homegrown system were no longer something they could absorb.
Bol's in-house infrastructure left data and ownership fragmented across teams. Event schemas were loosely defined and owned by no one. Different teams created and stored definitions in their own systems, with no shared standard. Downstream users were left to interpret inconsistent payloads and make assumptions about their meaning. Over time, confidence in the data eroded, and as the business continued to grow, these cracks in Bol’s homegrown data infrastructure became impossible to ignore. Poor observability compounded the problem. With minimal monitoring or alerting, broken events could go unnoticed for weeks, sometimes with direct business consequences.
These risks became especially acute as bol.com launched and scaled their retail media business. Advertising events, impressions, clicks, and attribution require strong guarantees around accuracy, integrity, and security. Standards that their existing system couldn't meet.
Ultimately, the bol.com team recognized that continuing to evolve their in-house platform would require significant ongoing investment in people and operational overhead, while slowing progress at a time when speed, reliability, and trust in data were most critical. Rather than committing to a costly internal rebuild, bol.com launched an evaluation of customer data infrastructure platforms built for the AI era.
Choosing a platform built for scale and control
As bol.com evaluated options, the team looked beyond basic data collection. The solution would need to support:
- An event collection infrastructure that could operate reliably at a massive scale and hold up when traffic spiked
- SDK maintenance across platforms, web, and mobile
- Built-in governance capabilities that enforce standards without slowing teams down
RudderStack was evaluated under real-world conditions and successfully handled spike scenarios of up to 150,000 events per second without any data loss. Beyond performance, RudderStack delivered capabilities that bol.com had not fully developed internally:
- end-to-end visibility into every event’s lifecycle
- granular bot mitigation
Performance cleared the bar. But what convinced the engineering team was the governance model.
Proactive governance for the AI era
When you build your own data infrastructure, governance doesn't come included. Every capability, schema validation, monitoring, alerting, and access controls, is a project your team has to scope, build, and maintain on top of everything else. RudderStack replaces that backlog with purpose-built tooling that works out of the box: tracking plans defined in YAML, validated through CI/CD pipelines, and enforced within Git-based workflows. Data governance lives where engineers already work, and data quality standards became part of the development process, not a downstream cleanup effort.
Bol’s engineering teams integrated RudderStack into their existing workflows. Tracking plans were versioned and validated using the RudderStack CLI, while GitLab CI pipelines automated schema checks and deployments across staging and production environments. This reduced manual intervention, minimized deployment risk, and gave teams confidence to ship changes independently. The result: cleaner, more reliable data that can be trusted at scale.
RudderStack also met bol.com's internal security standards. Sensitive advertising and event data could be collected, processed, and stored without introducing new compliance risk. The team didn't have to choose between vendor capability and security posture.
We use RudderStack as if it’s part of our own team. From load testing to transformation isolation, they’ve been all-in on building the right platform with us.
Koen Lijnkamp, Senior Engineering Manager at bol.com
For a team managing event collection at bol.com's scale, this was the difference between sustainable infrastructure and technical debt disguised as a platform. In an era where AI systems are only as good as the data feeding them, that distinction compounds. Clean, governed data at the point of collection means your team isn't just building better dashboards; they're building infrastructure that's ready for whatever activation layer comes next, including agentic AI.
A Foundation for trust, speed, and growth
With RudderStack in place, bol.com fundamentally upgraded their data foundation. The impact was immediate:
- Reliable attribution and consistent metrics - Teams now work from a single, trusted data foundation, eliminating conflicting numbers across marketing, product, and analytics.
- Improved data quality - Event data is governed by clear, enforceable schemas, ensuring only validated events flow downstream and reducing noise at the source.
- Real-time data observability - Immediate visibility into pipeline health and automated alerting dramatically reduces the risk of silent failures going undetected.
- Stronger security and compliance posture - Governance controls are now built into the data infrastructure, not bolted on after the fact.
- Increased engineering velocity - Teams deploy instrumentation updates and manage schemas through Git-based workflows, reducing coordination overhead and enabling faster iteration across the org.
The business case for getting this right shows up everywhere now. Advertising teams now rely on trusted, real-time performance metrics to measure campaign ROI. Analysts have a complete, trustworthy picture of how customers actually move through the product. Product and data science teams use clean behavioral data to power recommendations and experimentation. Leadership is confident that the underlying data infrastructure will support continued growth.
Bol.com replaced years of in-house technical debt with a data foundation that actually scales - one built for engineers, trusted by the business, and ready to power personalization, advertising, analytics, and AI at the speed their growth demands.
Bol.com processes over 70 million customer interactions with 50,000 partners across their platform. Their data infrastructure needs to be boring, reliable, scalable, trustworthy. RudderStack helps them achieve that without sacrificing control or locking data away.
More Customer Stories

Tabnine rebuilt their data stack with RudderStack to centralize developer usage data, simplify infrastructure, and give every team reliable visibility into adoption, retention, and expansion signals.
300+
product events captured per developer per hour
10+
downstream destinations activated
Read more

Jaja Finance is a UK-based digital lender reimagining credit cards with a focus on customer experience and simplicity. With a growing customer base and mobile-first approach, Jaja needed to deliver seamless onboarding and personalized engagement while modernizing its data infrastructure.
Read more

In the midst of deprecating a massive Segment implementation, this multinational company standardized data collection, governance, and activation across thousands of sources with RudderStack’s enterprise-ready infrastructure.
200+
brands migrated
5,700+
sources migrated
Weeks to days
reduction in brand onboarding
Read more


Start delivering business value faster
Implement RudderStack and start driving measurable business results in less than 90 days.


