Testing Your Agent

Strategies and checklists for testing your AI agent before deploying to production

Testing Your Agent#

Before deploying your agent to production, thoroughly test it to ensure accurate, helpful responses. This guide covers testing strategies, common issues, and how to improve your agent's performance.


Why Testing Matters#

Proper testing ensures:

  • Accurate answers -- Agent retrieves correct information from your knowledge base
  • Appropriate tone -- Matches your brand voice and communication style
  • Complete responses -- Does not miss important details
  • Handles edge cases -- Gracefully manages unusual or unexpected questions
  • Stays on-topic -- Does not fabricate answers or go off-topic

Test before launch: Catch issues early before customers interact with your agent.


Testing Methods#

1. Manual Chat Testing#

Best for: Quick verification and exploratory testing

How to test:

  1. Go to your agent's Settings page
  2. Click Test Chat or Preview button
  3. Ask sample questions
  4. Review responses for accuracy

What to test:

  • Common customer questions
  • Edge cases and tricky queries
  • Different phrasings of the same question
  • Questions outside your knowledge base

Example test conversation:

Code
You: How much does the premium plan cost?
Agent: Our premium plan is P999/month... [correct]

You: Can you ship to Cebu?
Agent: Yes! We ship nationwide via LBC... [correct]

You: What's the weather today?
Agent: I don't have information about weather... [correct]

2. Test Cases Checklist#

Knowledge Base Coverage:

  • Basic product information
  • Pricing and payment options
  • Shipping and delivery
  • Returns and refunds
  • Technical support questions

Tone and Style:

  • Uses appropriate formality
  • Code-switches correctly (if applicable)
  • Emoji usage matches brand
  • Response length is appropriate

Edge Cases:

  • Questions outside knowledge base
  • Misspelled queries
  • Multiple questions in one message
  • Follow-up questions requiring context

Language Mixing (Filipino Businesses):

  • English questions get English responses
  • Tagalog questions get Tagalog responses
  • Mixed questions handled naturally

3. Retrieval Testing#

Best for: Debugging why specific content is not being retrieved

How to test:

  1. Ask a question that should match a specific source
  2. Check which sources were retrieved (visible in the AI response details)
  3. Verify the answer matches the source content

Example:

Code
Source: "Our business hours are 9am-5pm Mon-Fri"
Query: "What are your hours?"
Expected: Agent should cite business hours from source

If retrieval fails: See Troubleshooting Retrieval below.

4. Stress Testing#

Best for: Ensuring the agent handles unusual inputs gracefully

Test scenarios:

  • Long questions (500+ words)
  • Rapid-fire questions (10 in a row)
  • Non-English languages (if applicable)
  • Special characters and emojis
  • Copy-pasted technical jargon

What to watch for:

  • Timeouts or errors
  • Degraded response quality
  • Inconsistent answers

5. User Acceptance Testing (UAT)#

Best for: Final validation before launch

How to run:

  1. Invite team members or beta testers
  2. Provide test scenarios or let them explore freely
  3. Collect feedback via survey or form
  4. Iterate based on feedback

Feedback questions:

  • Was the agent helpful?
  • Were answers accurate?
  • Was the tone appropriate?
  • Did you encounter any errors?
  • What would you improve?

Evaluating Agent Responses#

Good Response Checklist#

A good response should be:

1. Accurate#

  • Information matches knowledge sources
  • No fabricated facts
  • Cites correct data (prices, dates, etc.)

2. Relevant#

  • Directly answers the question
  • Does not go off-topic
  • Appropriate level of detail

3. Complete#

  • Includes all necessary information
  • Anticipates follow-up questions
  • Provides next steps if applicable

4. Clear#

  • Easy to understand
  • Well-structured (paragraphs, lists)
  • No jargon (unless appropriate for the audience)

5. On-Brand#

  • Matches company tone
  • Uses correct terminology
  • Appropriate formality level

Response Quality Scale#

Rate each response during testing:

RatingDescriptionAction
5 starsPerfect -- accurate, helpful, on-brandNo changes needed
4 starsGood -- minor improvements possibleOptional tweaks
3 starsAcceptable -- usable but needs workImprove sources or prompt
2 starsPoor -- significant issuesFix immediately
1 starUnusable -- wrong, misleading, or brokenDebug urgently

Goal: 80% or more of responses should be rated 4-5 stars.


Common Issues and Solutions#

Issue 1: "Agent doesn't know the answer"#

Symptoms:

  • "I don't have information about that"
  • Generic, unhelpful responses
  • Wrong information provided

Diagnosis: Check if the information exists in your knowledge base:

  1. Search your sources for relevant keywords
  2. Verify source status is "Trained"
  3. Check if content is in the correct category

Solutions:

  • Missing content: Add a new source with the information
  • Content exists but not retrieved: Improve the source quality or add a Q&A pair
  • Wrong category: Adjust category priorities

Issue 2: "Retrieval returns wrong content"#

Symptoms:

  • Agent cites unrelated sources
  • Mixes up similar topics
  • Provides outdated information

Diagnosis: Test with specific queries matching your sources exactly.

Solutions:

1. Improve source quality:

  • Remove duplicate content
  • Delete outdated sources
  • Make content more specific and focused

2. Add Q&A pairs:

  • Create explicit Q&A for important questions
  • Q&A responses are given higher priority than general documents

3. Test with varied phrasing:

Code
Instead of only: "What's the price?"
Also try: "How much does it cost?", "Pricing?", "Magkano?"

Issue 3: "Agent's tone is wrong"#

Symptoms:

  • Too formal or too casual
  • Does not code-switch correctly
  • Inappropriate emoji usage

Solutions:

1. Edit System Prompt: Go to Agent Settings and adjust tone instructions:

Code
Example:
"You are a friendly customer support agent.
Use a warm, conversational tone.
Mix English and Tagalog naturally.
Use emojis occasionally to show friendliness."

2. Update Persona (Facebook Import):

  • Re-import Facebook data with persona generation
  • Review and edit the generated persona
  • Test if the agent now matches your style

3. Add example responses: Create Q&A sources with your desired tone:

Code
Q: How can I help?
A: Hi! Kumusta! What can I help you with today?

Issue 4: "Agent is too slow"#

Symptoms:

  • Responses take longer than 10 seconds
  • Timeout errors
  • Users complain about wait time

Solutions:

1. Use a faster AI model:

  • Switch to a budget model (1 credit) for faster responses
  • Check the model selector for available options

2. Optimize sources:

  • Remove very large files (over 100 pages)
  • Split into smaller, focused sources

3. Upgrade plan:

  • Higher-tier plans may offer better performance
  • Contact support for optimization help

Issue 5: "Agent makes up information"#

Symptoms:

  • Provides facts not in the knowledge base
  • Fabricates details (fake prices, dates)
  • Confidently gives wrong answers

Solutions:

1. Strengthen system prompt:

Code
IMPORTANT: Only provide information from your knowledge base.
If you don't know the answer, say "I don't have information about that."
Never make up facts, prices, or dates.

2. Add explicit boundaries: Create Q&A pairs for common out-of-scope questions:

Code
Q: What's the weather today?
A: I don't have weather information. I can help with [your business topics].

Troubleshooting Retrieval#

If the AI is not retrieving the right content:

  1. Verify training -- Check that the source status is "Trained" (green)
  2. Test exact phrases -- Ask a question using exact wording from your source
  3. Check source quality -- Ensure the source content is clear and well-structured
  4. Add Q&A pairs -- For critical questions, add explicit Q&A pairs as a reliable fallback
  5. Review source priorities -- Higher-priority sources are retrieved first

Pre-Launch Testing Checklist#

Before launching your agent, verify:

Core Functionality#

  • Agent responds to basic queries
  • Retrieves correct information from sources
  • Handles "I don't know" gracefully
  • Maintains conversation context
  • Response time is acceptable (under 5 seconds)

Knowledge Coverage#

  • All key topics have sources
  • Common questions are answered accurately
  • Edge cases handled appropriately
  • No major gaps in knowledge

Tone and Brand#

  • Responses match brand voice
  • Formality level is appropriate
  • Language mixing works (if applicable)
  • Emoji usage aligns with brand guidelines

Edge Cases#

  • Handles misspellings and typos
  • Responds appropriately to off-topic questions
  • Manages long or complex queries
  • Works with multiple languages (if needed)

Integration#

  • Embedded chat widget loads correctly
  • Facebook Messenger integration works
  • Instagram DM integration works
  • Other connected channels respond correctly

Performance#

  • No timeout errors
  • Handles concurrent users
  • Maintains quality under load

Iterative Improvement Process#

Testing is ongoing, not a one-time activity. Follow this cycle:

1. Collect Feedback#

  • Review conversation logs regularly
  • Analyze user satisfaction ratings
  • Identify common complaints or confusion points

2. Identify Issues#

  • What questions are poorly answered?
  • What information is missing?
  • What tone adjustments are needed?

3. Make Improvements#

  • Add new sources for missing topics
  • Refine existing Q&A pairs
  • Adjust the system prompt
  • Update retrieval settings

4. Re-Test#

  • Verify improvements work as expected
  • Check that new changes did not create new issues
  • Test edge cases again

5. Deploy and Monitor#

  • Push changes to production
  • Watch conversation logs closely for the first few days
  • Be ready to make quick adjustments if needed

Frequency: Review every 1-2 weeks initially, then monthly once performance stabilizes.


Next Steps#