The future of personalization: From matrix factorization to prompt-personalized LLMs

Personalization is undergoing a fundamental shift.
For years, the core question was: “What item should we show this user?”
Increasingly, the question is: “What should this system do or say for this specific person, right now?”
This mirrors what happened in search a decade ago, moving from pure retrieval toward systems that reason about intent, context, and constraints before responding. The shift matters because personalization is no longer confined to ranked lists. It now shows up inside AI agents, onboarding copilots, customer support workflows, sales assistants, and dynamically generated content.
Traditional recommender systems still matter. They are fast, scalable, and measurable. But they were never designed to reason, converse, or operate across messy, multi-step workflows. Large Language Models (LLMs) change that substrate entirely.
This post compares classical personalization approaches like matrix factorization with LLM-based approaches, and argues that the future is hybrid: classical models for scale and efficiency, LLMs for reasoning and synthesis, and a well-designed interface between the two .
What traditional personalization actually optimizes
Most production personalization systems today follow a familiar loop:
- Collect interaction data (views, clicks, purchases, ratings)
- Learn representations of users and items
- Score candidate items for each user
- Rank and serve results
- Measure outcomes like CTR, conversion, or retention
Matrix factorization (MF) and its descendants—collaborative filtering, two-tower retrieval, learning-to-rank—optimize for a very specific problem: predicting which items a user is most likely to engage with, based on past behavior patterns of other users.
Conceptually, MF clusters users who behaved similarly in the past and items consumed by similar users. At serving time, personalization becomes a fast scoring and ranking problem.
Why this worked so well
- Extreme scalability: Embeddings + dot products are cheap
- Clear metrics: Improvements are easy to A/B test
- Predictable behavior: Incremental model changes yield incremental effects
- Strong retrieval: “People like you also liked X” often works
Where it breaks down
- Cold start: New users and items lack signal
- Feature poverty: Rich context (intent, constraints, conversation history) is hard to encode
- Single-objective focus: Typically optimizes engagement in one surface
- No reasoning or explanation: Cannot ask clarifying questions or explain trade-offs
Traditional personalization excels when the surface area is a ranked list. But modern personalization increasingly lives inside conversations and workflows, not feeds.
What LLMs enable: Personalization as reasoning and generation
LLMs fundamentally change what personalization can mean.
Rather than predicting a score, LLMs can:
- Interpret natural-language intent (“I need something warm but not bulky for Iceland”)
- Respect constraints (budget, eligibility, compliance)
- Operate in multi-turn interactions, asking clarifying questions
- Combine heterogeneous data (events, CRM records, tickets, catalogs, policies)
- Generate outputs: explanations, emails, plans–not just item IDs
Personalization shifts from ranking to reasoning + generation.
LLMs as a new primitive
In most LLM-based personalization systems, what changes is the prompt and retrieved context.
Traditional MF
- Personalization = learned embeddings from interaction data
- Serving = score and rank
LLM-based personalization
- Personalization = assemble context + instructions
- Serving = reason and generate a response
The injected context may include:
- Stable user attributes (preferences, locale, plan tier)
- Recent behavioral signals
- Conversation state
- Business rules and eligibility constraints
- Relevant product or content snippets
Example
- MF-style: “Recommend top-N running shoes.”
- LLM-style: “I’m training for a half marathon, have shin splints, prefer wide toe boxes, and it’s winter.”
Now personalization includes:
- Deciding whether to ask a clarifying question
- Selecting relevant candidates
- Explaining tradeoffs
- Remembering confirmed preferences
This is well outside the scope of classical recommenders.
Architecture: Personalization as a context service
If MF-era personalization was “build a recommender,” LLM-era personalization is “build a context pipeline.”
Core components
1. Signals & Profiles: Behavioral events, identity resolution, computed traits, eligibility flags—typically powered by a CDP like RudderStack.
2. Retrieval (RAG for personalization): Fetch relevant user context, item content, policies, and prior interaction history.
3. Prompt assembly (policy-aware): Serialize context into stable schemas, enforce token budgets, redact sensitive fields, and attach rules.
4. Generation + Tools: LLMs generate responses and invoke tools (inventory search, eligibility checks, pricing, profile fetches).
5. Memory (short- vs long-term): Session memory vs confirmed preferences. Long-term memory should never be written by the LLM without validation.
6. Observability & Evaluation: Log prompts, retrieved context, tool calls, and outcomes. Measure not just clicks, but also helpfulness, safety, and task completion.
Why LLMs don’t replace classical models
LLMs are powerful. But they’re also expensive and slower. Classical models remain essential.
Candidate generation vs decisioning
- Use MF / two-tower models to generate 200–2000 candidates cheaply.
- Use LLMs to reason over the final set.
This mirrors modern search: retrieval → ranking → response generation.
Cold start and sparse history
LLMs excel with:
- a single message
- a few clarifying questions
- sparse profile attributes
MF typically needs historical data to perform well.
Hybrid architectures: Where MF and LLMs meet
The winning systems are hybrid.
Retrieval → Re-ranking → Generation
- Classical retrieval for scale
- LLM-based rankers for precision
- LLMs for explanation, decisioning, and next-best action
LLMs as feature generators
An increasingly important interface is upstream:
- Extract structured preferences from conversations
- Normalize messy text into auditable features
- Summarize recent behavior into compact signals
- Generate embeddings aligned for retrieval and generation
These features feed classical models that are cheaper, auditable, and predictable at scale.
New failure modes (and why they matter)
LLM-based personalization introduces new risks:
- Hallucinated personalization: Mitigate with strict grounding and “unknown” behaviors.
- Prompt injection & data leakage: Mitigate with data minimization, policy separation, and tool-based access.
- Preference drift: Use explicit schemas and promote preferences only after confirmation.
- Harder evaluation: Move beyond clicks to layered evaluation: offline tests, online guardrails, outcome metrics.
Closing: Where personalization is heading
Personalization is converging toward systems that orchestrate actions, not just rank items; that adapt to situational intent, not static segments; and that derive advantage less from model choice and more from fresh, governed, real-time context.
The future belongs to hybrid systems:
- Classical models for scale, efficiency, and control
- LLMs for reasoning, synthesis, and interaction
- Strong interfaces between the two
Matrix factorization will remain the engine for what could work.
LLMs will increasingly decide what should happen next, how to say it, and why. And it will be grounded in context, constrained by rules, and optimized for outcomes, not just clicks.
FAQs
What is matrix factorization, and why is it still used for personalization?
Matrix factorization is a collaborative filtering method that learns compact user and item representations (embeddings) from interaction data, then ranks items via fast similarity scoring. It is still widely used because it is scalable, stable, and easy to evaluate with A/B tests, CTR, and conversion metrics.
What is LLM-based personalization?
LLM-based personalization is the use of a large language model to tailor responses or actions using retrieved user context, recent behavior, and business rules. Instead of only producing a ranked list, the LLM can reason about intent and constraints, ask clarifying questions, and generate explanations or next-best actions.
Do LLMs replace recommender systems?
Usually, no. LLMs tend to be slower and more expensive than classical retrieval models. Many high-performing systems use traditional recommenders for candidate generation and then use LLMs for reranking, explanation, and workflow-oriented decisioning over a smaller candidate set.
What does a hybrid personalization architecture look like in practice?
A common pattern is retrieval → reranking → generation. Retrieval uses embeddings (MF or two-tower) to produce a few hundred to a few thousand candidates cheaply. Reranking applies richer criteria (constraints, policies, diversity). Generation uses the LLM to explain tradeoffs, confirm preferences, and choose next steps with tool calls.
How do you keep LLM personalization safe from prompt injection and data leakage?
Treat external inputs as untrusted, separate system rules from user-provided content, minimize the context you expose, and use tool-based access controls for sensitive operations. Add monitoring for prompt injection attempts and leakage signals, and design “unknown” and refusal behaviors when the model lacks grounded evidence.
What data do you need for real-time LLM personalization?
You need a reliable identity and profile layer (who the user is), fresh behavioral signals (what just happened), and governed business context (eligibility, policies, inventory, pricing). The key is not collecting more data, but delivering the right, policy-safe context at low latency to the model.
Published:
December 22, 2025







