The future of personalization

Personalization is undergoing a fundamental shift.

For years, the core question was: “What item should we show this user?”

Increasingly, the question is: “What should this system do or say for this specific person, right now?”

This mirrors what happened in search a decade ago, moving from pure retrieval toward systems that reason about intent, context, and constraints before responding. The shift matters because personalization is no longer confined to ranked lists. It now shows up inside AI agents, onboarding copilots, customer support workflows, sales assistants, and dynamically generated content.

Traditional recommender systems still matter. They are fast, scalable, and measurable. But they were never designed to reason, converse, or operate across messy, multi-step workflows. Large Language Models (LLMs) change that substrate entirely.

This post compares classical personalization approaches like matrix factorization with LLM-based approaches, and argues that the future is hybrid: classical models for scale and efficiency, LLMs for reasoning and synthesis, and a well-designed interface between the two .

What traditional personalization actually optimizes

Most production personalization systems today follow a familiar loop:

Collect interaction data (views, clicks, purchases, ratings)
Learn representations of users and items
Score candidate items for each user
Rank and serve results
Measure outcomes like CTR, conversion, or retention

Matrix factorization (MF) and its descendants—collaborative filtering, two-tower retrieval, learning-to-rank—optimize for a very specific problem: predicting which items a user is most likely to engage with, based on past behavior patterns of other users.

Conceptually, MF clusters users who behaved similarly in the past and items consumed by similar users. At serving time, personalization becomes a fast scoring and ranking problem.

Why this worked so well

Extreme scalability: Embeddings + dot products are cheap
Clear metrics: Improvements are easy to A/B test
Predictable behavior: Incremental model changes yield incremental effects
Strong retrieval: “People like you also liked X” often works

Where it breaks down

Cold start: New users and items lack signal
Feature poverty: Rich context (intent, constraints, conversation history) is hard to encode
Single-objective focus: Typically optimizes engagement in one surface
No reasoning or explanation: Cannot ask clarifying questions or explain trade-offs

Traditional personalization excels when the surface area is a ranked list. But modern personalization increasingly lives inside conversations and workflows, not feeds.

What LLMs enable: Personalization as reasoning and generation

LLMs fundamentally change what personalization can mean.

Rather than predicting a score, LLMs can:

Interpret natural-language intent (“I need something warm but not bulky for Iceland”)
Respect constraints (budget, eligibility, compliance)
Operate in multi-turn interactions, asking clarifying questions
Combine heterogeneous data (events, CRM records, tickets, catalogs, policies)
Generate outputs: explanations, emails, plans–not just item IDs

Personalization shifts from ranking to reasoning + generation.

LLMs as a new primitive

In most LLM-based personalization systems, what changes is the prompt and retrieved context.

Traditional MF

Personalization = learned embeddings from interaction data
Serving = score and rank

LLM-based personalization

Personalization = assemble context + instructions
Serving = reason and generate a response

The injected context may include:

Stable user attributes (preferences, locale, plan tier)
Recent behavioral signals
Conversation state
Business rules and eligibility constraints
Relevant product or content snippets

Example

MF-style: “Recommend top-N running shoes.”
LLM-style: “I’m training for a half marathon, have shin splints, prefer wide toe boxes, and it’s winter.”

Now personalization includes:

Deciding whether to ask a clarifying question
Selecting relevant candidates
Explaining tradeoffs
Remembering confirmed preferences

This is well outside the scope of classical recommenders.

Architecture: Personalization as a context service

If MF-era personalization was “build a recommender,” LLM-era personalization is “build a context pipeline.”

Core components

1. Signals & Profiles: Behavioral events, identity resolution, computed traits, eligibility flags—typically powered by a CDP like RudderStack.

2. Retrieval (RAG for personalization): Fetch relevant user context, item content, policies, and prior interaction history.

3. Prompt assembly (policy-aware): Serialize context into stable schemas, enforce token budgets, redact sensitive fields, and attach rules.

4. Generation + Tools: LLMs generate responses and invoke tools (inventory search, eligibility checks, pricing, profile fetches).

5. Memory (short- vs long-term): Session memory vs confirmed preferences. Long-term memory should never be written by the LLM without validation.

6. Observability & Evaluation: Log prompts, retrieved context, tool calls, and outcomes. Measure not just clicks, but also helpfulness, safety, and task completion.

Why LLMs don’t replace classical models

LLMs are powerful. But they’re also expensive and slower. Classical models remain essential.

Candidate generation vs decisioning

Use MF / two-tower models to generate 200–2000 candidates cheaply.
Use LLMs to reason over the final set.

This mirrors modern search: retrieval → ranking → response generation.

Cold start and sparse history

LLMs excel with:

a single message
a few clarifying questions
sparse profile attributes

MF typically needs historical data to perform well.

Hybrid architectures: Where MF and LLMs meet

The winning systems are hybrid.

Retrieval → Re-ranking → Generation

Classical retrieval for scale
LLM-based rankers for precision
LLMs for explanation, decisioning, and next-best action

LLMs as feature generators

An increasingly important interface is upstream:

Extract structured preferences from conversations
Normalize messy text into auditable features
Summarize recent behavior into compact signals
Generate embeddings aligned for retrieval and generation

These features feed classical models that are cheaper, auditable, and predictable at scale.

New failure modes (and why they matter)

LLM-based personalization introduces new risks:

Hallucinated personalization: Mitigate with strict grounding and “unknown” behaviors.
Prompt injection & data leakage: Mitigate with data minimization, policy separation, and tool-based access.
Preference drift: Use explicit schemas and promote preferences only after confirmation.
Harder evaluation: Move beyond clicks to layered evaluation: offline tests, online guardrails, outcome metrics.

Closing: Where personalization is heading

Personalization is converging toward systems that orchestrate actions, not just rank items; that adapt to situational intent, not static segments; and that derive advantage less from model choice and more from fresh, governed, real-time context.

The future belongs to hybrid systems:

Classical models for scale, efficiency, and control
LLMs for reasoning, synthesis, and interaction
Strong interfaces between the two

Matrix factorization will remain the engine for what could work.

LLMs will increasingly decide what should happen next, how to say it, and why. And it will be grounded in context, constrained by rules, and optimized for outcomes, not just clicks.

FAQs

What is matrix factorization, and why is it still used for personalization?

Matrix factorization is a collaborative filtering method that learns compact user and item representations (embeddings) from interaction data, then ranks items via fast similarity scoring. It is still widely used because it is scalable, stable, and easy to evaluate with A/B tests, CTR, and conversion metrics.

What is LLM-based personalization?

LLM-based personalization is the use of a large language model to tailor responses or actions using retrieved user context, recent behavior, and business rules. Instead of only producing a ranked list, the LLM can reason about intent and constraints, ask clarifying questions, and generate explanations or next-best actions.

Do LLMs replace recommender systems?

Usually, no. LLMs tend to be slower and more expensive than classical retrieval models. Many high-performing systems use traditional recommenders for candidate generation and then use LLMs for reranking, explanation, and workflow-oriented decisioning over a smaller candidate set.

What does a hybrid personalization architecture look like in practice?

A common pattern is retrieval → reranking → generation. Retrieval uses embeddings (MF or two-tower) to produce a few hundred to a few thousand candidates cheaply. Reranking applies richer criteria (constraints, policies, diversity). Generation uses the LLM to explain tradeoffs, confirm preferences, and choose next steps with tool calls.

How do you keep LLM personalization safe from prompt injection and data leakage?

Treat external inputs as untrusted, separate system rules from user-provided content, minimize the context you expose, and use tool-based access controls for sensitive operations. Add monitoring for prompt injection attempts and leakage signals, and design “unknown” and refusal behaviors when the model lacks grounded evidence.

What data do you need for real-time LLM personalization?

You need a reliable identity and profile layer (who the user is), fresh behavioral signals (what just happened), and governed business context (eligibility, policies, inventory, pricing). The key is not collecting more data, but delivering the right, policy-safe context at low latency to the model.

Published:

December 22, 2025