How to track AI product usage without exposing sensitive data

Teams deploying AI features—chatbots, assistants, copilots—face a common challenge: measuring their actual impact. Without proper instrumentation, it's difficult to answer fundamental questions such as who's using these features, what users are trying to accomplish, whether the AI is actually helping, and where to focus improvement efforts. While API calls and costs are visible, connecting AI interactions to user behavior and business outcomes requires structured tracking that requires additional efforts.
The first problem is that AI interactions are conversational and unstructured, making them hard to analyze at scale. Second, the raw user prompts often contain sensitive data that can't be stored directly in analytics systems. This creates a measurement gap where teams know their AI features are technically operational but lack the data infrastructure to optimize them, prove value, or even understand basic usage patterns.
This guide presents a standardized event schema for tracking AI interactions that integrates with existing data warehouses and customer data platforms. We define three core events that capture AI interactions, plus an approach to intent classification that preserves privacy.
By the end of this guide, you should be able to adopt our implementation, the code examples, and the best practices to quickly implement analytics for your AI product.
A standard schema for tracking AI features
Without a standard approach, different teams instrument AI features differently, using inconsistent event names, properties, and structures. Data analysts end up writing custom queries for each AI product, dashboards can't show unified metrics, and cross-product comparison becomes nearly impossible.
Similar to RudderStack’s E-commerce Spec and Video Spec, we solved this challenge with an AI Product Spec. This specification defines a standard approach for tracking AI products. It consists of three core events and optional intent classification for privacy-preserving analytics, which are described below:
Event name | Description |
---|---|
ai_user_prompt_created | Tracks when a user submits a prompt to your AI system |
ai_llm_response_received | Captures AI system responses and performance metrics |
ai_user_action | Measures user interactions with AI responses |
Event 1: ai_user_prompt_created (capture prompts)
Tracks when a user submits a prompt (query) to your AI system.
When to track: Immediately after user submits their query
Implementation example:
Note: This example uses the RudderStack JavaScript SDK. You may use any library of your choice as long as they follow the Event Spec (e.g. Segment) and allow to develop these new properties on top of the spec.
JAVASCRIPT
rudderanalytics.track('ai_user_prompt_created', {conversation_id: 'conv_123', // Unique identifier for the conversationprompt_text: 'What are your return policies?',input_method: 'text', // 'text', 'voice', 'button'// Add any custom properties relevant to your product});
Key Properties:
- conversation_id: Required. Links all events in a conversation
- prompt_text: The actual user input (consider privacy implications)
Event 2: ai_llm_response_received (track LLM output)
Captures AI system responses and performance metrics.
When to track: After receiving response from LLM
Implementation example:
JAVASCRIPT
rudderanalytics.track('ai_llm_response_received', {conversation_id: 'conv_123',response_text: 'Our return policy allows returns within 30 days...',response_status: 'success', // 'success', 'error', 'timeout'latency_ms: 1250,token_count: { //optionalprompt_tokens: 150,completion_tokens: 200,total_tokens: 350},cost: 0.021,model_used: 'gpt-4', // Optional: Add if using intent classificationclassified_intent: 'support_returns'});
Key Properties:
- conversation_id: Must match the prompt event
- latency_ms: Time from request to response
- token_count: For usage tracking and cost analysis
- response_status: Track failures and timeouts
Event 3: ai_user_action (measure user feedback)
Measures user interaction with AI responses like upvoting, copying the response, sharing, and so on.
When to track: When user provides any actions on top of LLM responses
Implementation example:
JAVASCRIPT
rudderanalytics.track('ai_user_action', {conversation_id: 'conv_123',action_type: 'feedback_given', // 'feedback_given', 'shared', 'reported', 'copied_response', 'regenerated', etc.action_details: {// For feedback_given:feedback_type: 'rating',feedback_value: 4,feedback_text: 'Helpful but could be more specific'// For other actions, relevant details},classified_intent: 'support_returns'});
Key Properties:
- action_type: Type of user action
- action_details: Properties for the action
- feedback_type: The mechanism of feedback
- feedback_value: Quantifiable feedback metric
Sessions vs conversations in AI interactions
Sessions vs conversations
It is important to distinguish between sessions and conversations for AI interactions. A session represents the automatic browser or app sessions which typically span from when a user opens your application until they close it or remain inactive for a certain period. Session tracking is a common thing which almost every product implements using tools such as RudderStack SDKs.
A conversation, on the other hand, is a specific AI interaction thread that you manage explicitly through the `conversation_id` parameter. The traditional analytics specification does not have this concept.
A single session may contain multiple conversations. For example, a user might start by asking about product features, complete that conversation, then later in the same session start a new conversation about pricing. Each would have its own `conversation_id` while sharing the same `session_id`. This distinction allows you to analyze both the broader user journey (session level) and specific AI interaction patterns (conversation level).
How to link AI product events with conversation
As mentioned earlier, conversation_id is critical for connecting related events within a conversation. Here's how to implement it:
Use conversation_id to connect related events:
JAVASCRIPT
// Start of conversationconst conversationId = generateUUID();// Track all events with same IDrudderanalytics.track('ai_user_prompt_created', {conversation_id: conversationId,//...});rudderanalytics.track('ai_llm_response_received', {conversation_id: conversationId,//...});// Multiple turns in same conversationrudderanalytics.track('ai_user_prompt_created', {conversation_id: conversationId, prompt_number: 2,//...});
Classify user intent without storing raw prompts
We recommend implementing intent classification for privacy-preserving analytics. In other words, instead of sending raw prompts to all downstream tools, classify prompts into business intents and forward only the classifications.
Why intent classification matters for privacy and analytics
In a traditional system, the user’s intent is generally linked to the button they click or the page url they visit, making it easier to understand the user intent. In AI-powered features, you’ll need the user’s chat message to understand the intent.
But can you send this raw user message to your downstream analytics systems to understand the intent?
Raw prompts often contain sensitive information that shouldn't be stored in warehouses or sent to analytics tools. Intent classification provides a solution.
For example, instead of storing "I need help resetting my password for john.doe@company.com," you store the intent "account_management." This gives you actionable analytics while protecting user privacy.
In order to remain compliant, you'll need to balance analytics depth with privacy requirements. And intent classification provides that. Intent classification transforms unstructured conversation data into actionable business intelligence while protecting user privacy. You track structured intents that enable powerful analytics like user journey mapping, conversion funnels, and cohort analysis.
This approach dramatically reduces data volume and query complexity. Write simple SQL against intent fields instead of parsing millions of text prompts. Most importantly, intent patterns reveal what your users are trying to accomplish, helping you identify feature gaps, optimize AI responses, and prioritize product development based on actual usage patterns rather than guesswork.
How to implement intent classification
In this section, we explore how you can set up intent classification in a few easy steps quickly. In this example, we used OpenRouter for LLM API calls to extract intent from the user message, and we used RudderStack Transformations to make that API request and enrich the incoming user event with the intent information before delivering to a warehouse and other downstream tools. But you may use any other tools as well following a similar strategy.
1. Plan and document intents
- Define 5-10 mutually exclusive intents
- Include an "other" category
- Write clear descriptions for each
2. Get OpenRouter API key
- Sign up at openrouter.ai
- Add credits to your account
- Copy your API key
3. Choose the right model: Intent classification is an extremely simple task for modern LLMs - even basic models can accurately classify intents. Using GPT-4 or Claude for classification is like using a Ferrari for grocery shopping. Check OpenRouter's pricing page and start with cheaper models like openai/gpt-5 nano ($0.05/1M tokens), mistralai/mistral-7b-instruct ($0.25/1M tokens), or even meta-llama/llama-3-8b-instruct hosted on your infrastructure. These smaller models can handle classification just as well as flagship models at a fraction of the cost. For such a simple task, you don’t even need LLM, DistillBERT models will work. Test different models with your actual prompts to find the sweet spot between accuracy and cost.
4. Experiment with your classification prompt: The prompt you use will significantly impact accuracy. You may use tools like OpenAI Playground to test your prompt without coding. When you jump into code, you may test adding different kinds of business context, but too much context might confuse smaller models. Some models work better with examples, others with just descriptions. You may also want to maintain different versions of prompt in your code as well:
JAVASCRIPT
*// Test different prompt structures*const promptV1 = `Classify into: ${intents.join(', ')}`;const promptV2 = `Given these intents and descriptions: ${intentDescriptions}, classify the user message`;const promptV3 = `You are a classifier for ${businessContext}. Choose the best matching intent...`;*// Track accuracy for each version with sample data*
5. Enrich AI events with classified intent using Transformations
With RudderStack Transformations, you can use simple JavaScript code to process incoming customer event data before they are delivered to the warehouse or other downstream tools. So, you may leverage this functionality to make the LLM API call for the intent extraction and enrich the event data with the intent. Here’s how:
- Go to Transformations in RudderStack
- Create new transformation with the code below
- Add your OpenRouter API key in the transformation code and customize intents
- Connect this transformation to sensitive destinations where you planned to send this event data
- Keep raw data for development/debugging environments
Want to see Transformations in action, or do a RudderStack deep dive? Book a demo
Transformation code
Use this RudderStack transformation to add intent classification:
JAVASCRIPT
// Add as a transformation in RudderStack// Attach to destinations where you want classified dataexport async function transformEvent(event, metadata) {// Only process prompt eventsif (event.event !== 'ai_user_prompt_created') {return event;}if (!event.properties?.prompt_text) {return event;}// Define your business-specific intentsconst intents = defineIntents();const businessContext = getBusinessContext();try {// Classify the promptconst classified_intent = await classifyPrompt(event.properties.prompt_text,intents,businessContext);// Add classification to eventevent.properties.classified_intent = classified_intent;// Remove raw prompt for privacy (optional)if (shouldRemovePrompt(metadata.destination)) {delete event.properties.prompt_text;delete event.properties.response_text;}} catch (error) {log('Classification failed:', error);event.properties.classified_intent = 'classification_error';}return event;}function defineIntents() {// Customize these for your businessreturn [{ name: 'product_inquiry', description: 'Questions about products' },{ name: 'support_shipping', description: 'Shipping and delivery' },{ name: 'support_returns', description: 'Returns and refunds' },{ name: 'technical_support', description: 'Technical issues' },{ name: 'account_management', description: 'Account related' },{ name: 'other', description: 'Uncategorized' }];}function getBusinessContext() {// Generate using a tool like Firecrawlreturn "E-commerce platform specializing in consumer electronics";}async function classifyPrompt(prompt, intents, context) {const response = await fetch('<https://openrouter.ai/api/v1/chat/completions>',{method: 'POST',headers: {'Authorization': 'Bearer YOUR_OPENROUTER_API_KEY','Content-Type': 'application/json',},body: JSON.stringify({model: 'openai/gpt-3.5-turbo',messages: [{role: 'system',content: `Context: ${context} Classify this prompt into ONLY one of these intents: ${intents.map(i => `${i.name}: ${i.description}`).join('\\n')} Respond with just the intent name.`},{role: 'user',content: prompt}],max_tokens: 20,temperature: 0.3})});const data = await response.json();return data.choices[0].message.content.trim().toLowerCase().replace(/\\s+/g, '_');}function shouldRemovePrompt(destination) {// Remove prompts for warehouse - Write your logic for removing promptsconst sensitiveDestinations = ['warehouse'];return sensitiveDestinations.includes(destination?.type);}
⚠️ Important
Before rolling out to all users, calculate your expected costs. For example: if you have 1000 users generating 100 queries/day at 200 tokens per classification, that will be 1000 × 100 × 200 = 20M tokens/day consumed. If you’re using openai/gpt-5-nano (costs $0.05/M), that will cost $1/day. The cached hits are 10 times cheaper, so the cost can be a lot less than this. For budget purposes, consider this much cost with the gpt-5-nano model and the given constraints. It is recommended to set up expense limits and cost monitoring alerts.
SQL queries to analyze AI feature usage
In this section, we provide example SQL queries for some of the key metrics you may want to track. Using AI Product Analytics Spec while storing the analytics data in your warehouse makes it possible to use these queries without much change as long as you want these metrics to be tracked. RudderStack SDKs follow these specs already, so RudderStack users can directly go ahead and use these queries.
💡 Tip
If you’re in doubt, reach out to the RudderStack support community on Slack.
Track AI usage: Engagement metrics
SQL
-- Conversations per userSELECTuser_id,COUNT(DISTINCT conversation_id) as total_conversations,AVG(prompts_per_conversation) as avg_conversation_lengthFROM ai_eventsGROUP BY user_id;-- Intent distributionSELECTclassified_intent,COUNT(*) as count,COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () as percentageFROM ai_user_prompt_createdGROUP BY classified_intentORDER BY count DESC;
Track AI performance: Latency and cost
SQL
-- Average latency by intentSELECTclassified_intent,AVG(latency_ms) as avg_latency,PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) as p95_latencyFROM ai_llm_response_receivedGROUP BY classified_intent;-- Cost analysisSELECTDATE_TRUNC('day', timestamp) as day,SUM(cost) as total_cost,SUM(token_count.total_tokens) as total_tokensFROM ai_llm_response_receivedGROUP BY day;
Track AI quality: Ratings and feedback
SQL
-- Satisfaction by intentSELECTclassified_intent,AVG(CASE WHEN feedback_type = 'rating' THEN feedback_value END) as avg_rating,SUM(CASE WHEN feedback_type = 'thumbs' AND feedback_value = true THEN 1 ELSE 0 END) * 100.0 /COUNT(CASE WHEN feedback_type = 'thumbs' THEN 1 END) as thumbs_up_rateFROM ai_user_feedback_givenGROUP BY classified_intent;
What to include in your AI analytics dashboard
For real-time monitoring, build dashboards that track active conversations, response latency compared to historical baselines, and error rates broken down by model. This helps you quickly identify and respond to performance issues as they happen.
To understand user behavior, create visualizations showing intent distribution through charts, conversation flows using sankey diagrams to see how users move between intents, and identify peak usage times to optimize resource allocation and support coverage.
Cost management becomes critical at scale, so track token usage by different models, calculate cost per conversation to understand unit economics, and segment costs by user type to identify which segments drive the most AI usage and ensure pricing aligns with value delivered.
Here’s a sample dashboard we created using Claude Artifacts:

Best practices for rolling out AI tracking
Start with a simple implementation by tracking the three core events first. Once you understand your usage patterns and see what questions you need to answer, you can add intent classification and expand the event properties based on actual analytical needs rather than trying to anticipate everything upfront.
Track errors to improve LLM reliability
Error handling is crucial for maintaining data quality. Always track both successful and failed AI interactions to understand your true system performance:
JAVASCRIPT
try {const response = await callLLM(prompt);rudderanalytics.track('ai_llm_response_received', {conversation_id: conversationId,response_status: 'success',*//...*});} catch (error) {rudderanalytics.track('ai_llm_response_received', {conversation_id: conversationId,response_status: 'error',error_type: error.name,error_message: error.message,*//...*});}
Privacy in production, debugging in development
Privacy considerations should be built in from the start, not added later. Decide early what data you'll track versus what you'll classify into intents, implement intent classification for sensitive industries like healthcare or finance, and clearly document your data retention policies for both internal teams and compliance purposes.
During development and testing, use environment-specific configurations to control data sensitivity. This allows you to debug with full data in development while protecting user privacy in production:
JAVASCRIPT
*// Debug mode for development*const DEBUG = process.env.NODE_ENV === 'development';rudderanalytics.track('ai_user_prompt_created', {conversation_id: conversationId,prompt_text: DEBUG ? prompt : '[redacted]',debug_mode: DEBUG,*//...*});
Use sampling to keep costs and latency in check
Intent classification at high volumes requires a strategic sampling approach. While classifying every prompt might work initially, as your AI feature gains adoption and volumes increase, the cost and latency of calling LLMs for every single event becomes prohibitive. Implement sampling strategies that balance comprehensive insights with operational efficiency.
Consider sampling by user ID to get complete conversation flows for a subset of users rather than fragmented data across all users. You might classify all prompts for users where hash(userId) % 100 < 10 to get a consistent 10% sample. Alternatively, prioritize classification for specific cohorts that matter most to your business - new users in their first week to understand onboarding patterns, high-value enterprise customers who need detailed analytics, or users in a specific geographic region you're expanding into.
Here's how to implement intelligent sampling in your transformation:
JAVASCRIPT
function shouldClassifyIntent(event, metadata) {*// Always classify for high-value customers*if (fetchUserProfile().plan === 'enterprise') {return true;}*// Classify all events for new users (first 7 days)*const daysSinceSignup = (Date.now() - fetchUserProfile()?.created_at) / (1000 * 60 * 60 * 24);if (daysSinceSignup <= 7) {return true;}*// Sample 10% of other users based on consistent hash*const userIdHash = simpleHash(event.userId);if (userIdHash % 100 < 10) {return true;}*// Skip classification for this event*return false;}*// In your transformation*export async function transformEvent(event, metadata) {if (event.event !== 'ai_user_prompt_created') {return event;}*// Add sampling decision to event for analysis*event.properties.intent_sampled = shouldClassifyIntent(event, metadata);if (!event.properties.intent_sampled) {event.properties.classified_intent = 'not_sampled';return event;}*// Continue with classification...*}
Track your sampling rate and adjust based on business needs. Start with a higher sampling rate (maybe 50%) to establish baseline patterns, then reduce as patterns stabilize.
Always maintain 100% classification for critical user segments and consider time-based sampling during peak hours versus off-hours to optimize costs while maintaining visibility when it matters most.
Your framework for privacy-safe AI analytics
In this guide, we introduced a standard schema for AI Product analytics events. We leveraged this standard to implement analytics for AI products using RudderStack JavaScript SDK and Transformations. Our implementation used LLM to classify intent ensuring the tracking follows the privacy compliance. We provided code examples and best practices that you can adopt for your use case. Whether you’re building chatbots, AI assistants, or LLM-powered products, this guide provides a framework tracking and analyzing AI interactions while maintaining user privacy.
This guide is part of RudderStack’s early alpha program for AI Product Analytics. We’re actively developing native platform support for intent classification and would love your feedback on this specification.
Published:
October 9, 2025

Generative AI risks and how to approach LLM risk management
Explore the top generative AI risks—hallucinations, data leakage, prompt injection, and compliance gaps—and learn strategies to detect, mitigate, and govern them effectively.

Data standardization: Why and how to standardize data
When teams cannot rely on their data, reporting slows down and decision-making becomes riskier. Learn how standardization addresses these challenges by creating a consistent structure that data can follow from the moment it enters your system.

Understanding data maturity: A practical guide for modern data teams
The journey to data maturity isn't about having the most sophisticated tools or even the biggest volume of data. It's about taking the right steps at the right time to unlock value from the data you have.