From Homegrown Data Infrastructure to AI-Ready Foundation with Bol.com
The Problem: When Your Foundation Becomes Your Ceiling
For a decade, Bol.com ran on homegrown data infrastructure. Built by smart engineers solving real problems, it powered experimentation, recommendations, and eventually retail media for the Netherlands' largest e-commerce platform.
Then it stopped working.
Not catastrophically. Gradually. The way technical debt always does.
The symptoms:
- Marketing teams using the same data arrived at attribution numbers ranging from 80% to 300%
- Analysts spent more time reasoning backwards about what actually happened than analyzing behavior
- Every schema change affected everyone; almost no change benefited everyone
- The system collected basket states instead of basket events, forcing every analyst to reconstruct intent from snapshots
The Build vs. Buy Decision That Actually Happened
Two years ago, Bol.com ran an audit. They already knew the answer, their platform couldn't scale for what was coming. But confirmation matters when you're about to make a fundamental infrastructure change.
The decision framework:
- Keep building: They're good at data engineering. They have over 1,000 people in product and tech. Why not iterate?
- Buy traditional CDP: Vendor lock-in. Data leaves the warehouse. Marketing-focused limitations.
- Partner with RudderStack: Warehouse-native. Data stays theirs. Built for engineers.
The turning point: They realized collection, quality and governance weren't their core competency. Moving data around? They excel at that. But ensuring every event has consistent meaning across web, Android, iOS, and downstream teams? That's specialized infrastructure work.
The Technical Reality: 150,000 Events Per Second
Bol.com processes 1 billion events daily. Every click, view, impression, add-to-basket, wishlist action, captured, cleaned, governed.
Why that matters:
- E-commerce platforms face massive bot traffic that can spike without warning
- If ingestion fails during a spike, you lose actual customer data
- Being down for a few hours costs hundreds of thousands of euros
- Traditional CDPs weren't built for this scale while maintaining data quality
The test RudderStack had to pass: handle 150,000 events per second. Not as a theoretical limit. As a realistic spike scenario when bots hit the platform.
They passed. But more importantly, the partnership provided capabilities bol had not fully developed:
- Granular bot management
- Event-level observability throughout the lifecycle
- Governance frameworks that work across mobile and web at scale
What Changed: From States to Events
Old approach: Collect the entire basket state every time anything changes. Let analysts figure out what actually happened.
New approach: Capture atomic facts. "Product added to basket." "Product clicked." Self-contained events with all context needed for interpretation.
The new data structure:
- Each event is factual, not inferred
- Every event is atomic and self-describing
- Tracking plans define required vs. optional fields
- Data types and enum values are enforced at collection
- Consistency across web, Android, iOS is guaranteed, not hoped for
The Implementation: Two Weeks to Re-Tag Everything
Signed contract: End of January 2024
The goal: Re-instrument every core event across web, Android, and iOS.
The timeline: Two weeks.
They pulled it off. Not because the technology was magic. Because the approach was clear:
- Frontend engineers understood the tracking plan
- Events had obvious meanings
- The SDK didn't fight their architecture
- Governance was built in, not bolted on
What This Actually Enables
Short term: Better attribution. Consistent metrics across teams. Analysts spending time on insights instead of data archaeology.
Medium term: Every consumer using the new insights platform. Teams making decisions on data they trust.
Long term: AI use cases that require clean, interpretable data. More personalized shopping. Recommendations that work because the foundation is solid.
They're not betting on AI promises. They're building the foundation that makes AI possible—if and when it makes sense.
The Build vs. Buy Reality
Bol.com didn't buy RudderStack because building was impossible. They bought it because collection quality and governance at scale isn't their differentiator.
What they partnered on:
- Event collection infrastructure
- SDK maintenance across platforms
- Governance frameworks
- Scale management
The Bottom Line
The problem: Homegrown data infrastructure that couldn't scale with business needs
The decision: Partner on collection, own orchestration
The timeline: Contract to re-instrumentation in weeks, not quarters
The scale: 1 billion events daily, 150K events/second spike handling
The outcome: Foundation for better decisions now, AI capabilities later
The lesson: Data infrastructure isn't about having all the answers. It's about having the right questions. And having a partner who's solved them before.
Bol.com processes over 70 million customer interactions with 50,000 partners across their platform. Their data infrastructure needs to be boring, reliable, scalable, trustworthy. RudderStack helps them achieve that without sacrificing control or locking data away.







