testing-your-agent
Testing Your Agent#
Before deploying your agent to production, thoroughly test it to ensure accurate, helpful responses. This guide covers testing strategies, common issues, and how to improve your agent's performance.
Why Testing Matters#
Proper testing ensures:
- ✅ Accurate answers - Agent retrieves correct information
- ✅ Appropriate tone - Matches your brand voice
- ✅ Complete responses - Doesn't miss important details
- ✅ Handles edge cases - Gracefully handles unusual questions
- ✅ Stays on-topic - Doesn't hallucinate or make up answers
Test before launch: Catch issues early before customers interact with your agent.
Testing Methods#
1. Manual Chat Testing#
Best for: Quick verification, exploratory testing
How to test:
- Go to your agent's Settings page
- Click Test Chat or Preview button
- Ask sample questions
- Review responses for accuracy
What to test:
- Common customer questions
- Edge cases and tricky queries
- Different phrasings of the same question
- Questions outside your knowledge base
Example test conversation:
You: How much does the premium plan cost?
Agent: Our premium plan is ₱999/month... ✅
You: Can you ship to Cebu?
Agent: Yes! We ship nationwide via LBC... ✅
You: What's the weather today?
Agent: I don't have information about weather... ✅
2. Test Cases Checklist#
Knowledge Base Coverage:
- Basic product information
- Pricing and payment options
- Shipping and delivery
- Returns and refunds
- Technical support questions
Tone & Style:
- Uses appropriate formality
- Code-switches correctly (if applicable)
- Emoji usage matches brand
- Response length is appropriate
Edge Cases:
- Questions outside knowledge base
- Misspelled queries
- Multiple questions in one message
- Follow-up questions requiring context
Language Mixing (Filipino Businesses):
- English questions get English responses
- Tagalog questions get Tagalog responses
- Mixed questions handled naturally
3. Retrieval Testing#
Best for: Debugging why specific content isn't being retrieved
How to test:
- Ask a question that should match a specific source
- Check which sources were retrieved (if your UI shows this)
- Verify the answer matches the source content
Example:
Source: "Our business hours are 9am-5pm Mon-Fri"
Query: "What are your hours?"
Expected: Agent should cite business hours from source
If retrieval fails: See Troubleshooting Retrieval below.
4. Stress Testing#
Best for: Ensuring agent handles high load and unusual inputs
Test scenarios:
- Long questions (500+ words)
- Rapid-fire questions (10 in a row)
- Non-English languages (if applicable)
- Special characters and emojis
- Copy-pasted technical jargon
What to watch for:
- Timeouts or errors
- Degraded response quality
- System crashes
5. User Acceptance Testing (UAT)#
Best for: Final validation before launch
How to run:
- Invite team members or beta testers
- Provide test scenarios or let them explore
- Collect feedback via survey/form
- Iterate based on feedback
Feedback questions:
- Was the agent helpful?
- Were answers accurate?
- Was the tone appropriate?
- Did you encounter any errors?
- What would you improve?
Evaluating Agent Responses#
Good Response Checklist#
A good response should be:
1. Accurate#
- ✅ Information matches knowledge sources
- ✅ No hallucinations or made-up facts
- ✅ Cites correct data (prices, dates, etc.)
2. Relevant#
- ✅ Directly answers the question
- ✅ Doesn't go off-topic
- ✅ Appropriate level of detail
3. Complete#
- ✅ Includes all necessary information
- ✅ Anticipates follow-up questions
- ✅ Provides next steps if applicable
4. Clear#
- ✅ Easy to understand
- ✅ Well-structured (paragraphs, lists)
- ✅ No jargon (unless appropriate)
5. On-Brand#
- ✅ Matches company tone
- ✅ Uses correct terminology
- ✅ Appropriate formality level
Response Quality Scale#
Rate each response:
| Rating | Description | Action |
|---|---|---|
| ⭐⭐⭐⭐⭐ | Perfect - accurate, helpful, on-brand | No changes needed |
| ⭐⭐⭐⭐ | Good - minor improvements possible | Optional tweaks |
| ⭐⭐⭐ | Acceptable - usable but needs work | Improve sources/prompt |
| ⭐⭐ | Poor - significant issues | Fix immediately |
| ⭐ | Unusable - wrong, misleading, broken | Debug urgently |
Goal: 80%+ of responses should be 4-5 stars.
Common Issues & Solutions#
Issue 1: "Agent doesn't know the answer"#
Symptoms:
- "I don't have information about that"
- Generic, unhelpful responses
- Wrong information provided
Diagnosis: Check if the information exists in your knowledge base:
- Search your sources for relevant keywords
- Verify source status is "Trained"
- Check if content is in correct namespace
Solutions:
- Missing content: Add new source with the information
- Content exists: Improve retrieval (see below)
- Wrong namespace: Adjust namespace priorities
Issue 2: "Retrieval returns wrong content"#
Symptoms:
- Agent cites unrelated sources
- Mixes up similar topics
- Provides outdated information
Diagnosis: Test with specific queries matching your sources exactly.
Solutions:
1. Improve source quality:
- Remove duplicate content
- Delete outdated sources
- Make content more specific
2. Adjust chunking:
- Increase chunk size for more context
- Decrease chunk size for more precision
3. Add Q&A pairs:
- Create explicit Q&A for important questions
- Q&A namespace has higher priority than docs
4. Use semantic search:
Instead of: "What's the price?"
Try training with: "How much does it cost?", "What's the cost?", "Pricing?"
Issue 3: "Agent's tone is wrong"#
Symptoms:
- Too formal or too casual
- Doesn't code-switch correctly
- No emojis (or too many emojis)
Solutions:
1. Edit System Prompt: Go to Agent Settings → System Prompt and adjust tone instructions:
Example:
"You are a friendly customer support agent.
Use a warm, conversational tone.
Mix English and Tagalog naturally.
Use emojis occasionally to show friendliness 😊"
2. Update Persona (Facebook Import):
- Re-import Facebook data with persona generation
- Review and edit generated persona
- Test if agent now matches your style
3. Add example responses: Create Q&A sources with desired tone:
Q: How can I help?
A: Hi! 😊 Kumusta! What can I help you with today?
Issue 4: "Agent is too slow"#
Symptoms:
- Responses take > 10 seconds
- Timeout errors
- Users complain about wait time
Diagnosis: Check agent logs for bottlenecks.
Solutions:
1. Reduce retrieval chunks:
- Lower max chunks from 20 → 10
- Faster search, slightly less context
2. Use faster AI model:
- Switch to budget models (Grok 4.1, Gemini Flash - 1 credit each)
- Or use Gemini 1.5 Flash (faster, cheaper)
3. Optimize sources:
- Remove very large files (> 100 pages)
- Split into smaller, focused sources
4. Upgrade plan:
- Higher-tier plans may have better performance
- Contact support for optimization help
Issue 5: "Agent makes up information"#
Symptoms:
- Provides facts not in knowledge base
- Hallucinates details (fake prices, dates)
- Confidently gives wrong answers
Solutions:
1. Strengthen system prompt:
IMPORTANT: Only provide information from your knowledge base.
If you don't know the answer, say "I don't have information about that."
Never make up facts, prices, or dates.
2. Use stricter retrieval:
- Increase retrieval threshold
- Require higher similarity scores
3. Add explicit boundaries: Create Q&A pairs for common out-of-scope questions:
Q: What's the weather today?
A: I don't have weather information. I can help with [your business topics].
Testing Checklist#
Before launching your agent, verify:
Core Functionality#
- Agent responds to basic queries
- Retrieves correct information from sources
- Handles "I don't know" gracefully
- Maintains conversation context
- Response time is acceptable (< 5 seconds)
Knowledge Coverage#
- All key topics have sources
- Common questions are answered
- Edge cases handled appropriately
- No major gaps in knowledge
Tone & Brand#
- Responses match brand voice
- Formality level is appropriate
- Language mixing works (if applicable)
- Emoji usage aligns with brand
Edge Cases#
- Handles misspellings and typos
- Responds to off-topic questions
- Manages long/complex queries
- Works with multiple languages (if needed)
Integration (If Applicable)#
- Embedded chat widget loads correctly
- Facebook Messenger integration works
- Instagram DM integration works
- Webhooks fire correctly
- Lead forms capture data
Performance#
- No timeout errors
- Handles concurrent users
- Maintains quality under load
Iterative Improvement Process#
Testing is ongoing. Follow this cycle:
1. Collect Feedback#
- Review conversation logs
- Analyze user satisfaction ratings
- Identify common complaints
2. Identify Issues#
- What questions are poorly answered?
- What information is missing?
- What tone adjustments needed?
3. Make Improvements#
- Add new sources for missing topics
- Refine existing Q&A pairs
- Adjust system prompt
- Update retrieval settings
4. Re-Test#
- Verify improvements work
- Check if new issues appeared
- Test edge cases again
5. Deploy & Monitor#
- Push changes to production
- Watch conversation logs closely
- Be ready to roll back if needed
Frequency: Review every 1-2 weeks initially, then monthly.
Advanced Testing Tools#
Conversation Logs#
Review actual user conversations:
- Go to Conversations tab
- Filter by date, rating, or keyword
- Identify patterns in poor responses
What to look for:
- Repeated questions with bad answers
- Users rephrasing the same question
- High drop-off rate on specific topics
Analytics (If Available)#
Track agent performance metrics:
- Response rate: % of questions answered
- Accuracy rate: % of correct answers (based on user feedback)
- Satisfaction score: Average user rating
- Resolution rate: % of conversations resolved without human help
Goal: 80%+ response rate, 4+ stars average satisfaction
A/B Testing#
Test different configurations:
- Create two agent versions with different settings
- Route 50% of users to each version
- Compare performance metrics
- Keep the better-performing version
What to test:
- Different system prompts
- Different AI models
- Different retrieval settings
Next Steps#
- Embedding & Deploying - Add agent to your website
- Developer Documentation - Integrate with other tools
- Integrations - Connect to Facebook, Zapier, etc.