Testing Your Agent

Strategies and checklists for testing your AI agent before deploying to production

Testing Your Agent#

Before deploying your agent to production, thoroughly test it to ensure accurate, helpful responses. This guide covers testing strategies, common issues, and how to improve your agent's performance.

Why Testing Matters#

Proper testing ensures:

Accurate answers -- Agent retrieves correct information from your knowledge base
Appropriate tone -- Matches your brand voice and communication style
Complete responses -- Does not miss important details
Handles edge cases -- Gracefully manages unusual or unexpected questions
Stays on-topic -- Does not fabricate answers or go off-topic

Test before launch: Catch issues early before customers interact with your agent.

Testing Methods#

1. Manual Chat Testing#

Best for: Quick verification and exploratory testing

The Playground is your agent's primary page -- a top-level item in the agent's sidebar (not under Settings).

How to test:

Open Playground from the agent sidebar
Ask sample questions and review responses for accuracy
Use the platform switcher to test how the agent behaves on a specific channel (web widget, Messenger, and other connected channels)
Use Compare mode to run two configurations side by side -- A/B test different guidance or models on the same question

What to test:

Common customer questions
Edge cases and tricky queries
Different phrasings of the same question
Questions outside your knowledge base

Example test conversation:

Code

You: How much is the haircut package?
Agent: Our haircut package starts at ₱500... [correct]

You: Can you ship to Cebu?
Agent: Yes! We ship nationwide via LBC... [correct]

You: What's the weather today?
Agent: I don't have information about weather... [correct]

2. Test Cases Checklist#

Knowledge Base Coverage:

Basic product information
Pricing and payment options
Shipping and delivery
Returns and refunds
Technical support questions

Tone and Style:

Uses appropriate formality
Code-switches correctly (if applicable)
Emoji usage matches brand
Response length is appropriate

Edge Cases:

Questions outside knowledge base
Misspelled queries
Multiple questions in one message
Follow-up questions requiring context

Language Mixing (Filipino Businesses):

English questions get English responses
Tagalog questions get Tagalog responses
Mixed questions handled naturally

3. Retrieval Testing#

Best for: Debugging why specific content is not being retrieved

How to test:

Ask a question that should match a specific source
Check which sources were retrieved (visible in the AI response details)
Verify the answer matches the source content

Example:

Code

Source: "Our business hours are 9am-5pm Mon-Fri"
Query: "What are your hours?"
Expected: Agent should cite business hours from source

If retrieval fails: See Troubleshooting Retrieval below.

4. Stress Testing#

Best for: Ensuring the agent handles unusual inputs gracefully

Test scenarios:

Long questions (500+ words)
Rapid-fire questions (10 in a row)
Non-English languages (if applicable)
Special characters and emojis
Copy-pasted technical jargon

What to watch for:

Timeouts or errors
Degraded response quality
Inconsistent answers

5. User Acceptance Testing (UAT)#

Best for: Final validation before launch

How to run:

Invite team members or beta testers
Provide test scenarios or let them explore freely
Collect feedback via survey or form
Iterate based on feedback

Feedback questions:

Was the agent helpful?
Were answers accurate?
Was the tone appropriate?
Did you encounter any errors?
What would you improve?

Evaluating Agent Responses#

Good Response Checklist#

A good response should be:

1. Accurate#

Information matches knowledge sources
No fabricated facts
Cites correct data (prices, dates, etc.)

2. Relevant#

Directly answers the question
Does not go off-topic
Appropriate level of detail

3. Complete#

Includes all necessary information
Anticipates follow-up questions
Provides next steps if applicable

4. Clear#

Easy to understand
Well-structured (paragraphs, lists)
No jargon (unless appropriate for the audience)

5. On-Brand#

Matches company tone
Uses correct terminology
Appropriate formality level

Response Quality Scale#

Rate each response during testing:

Rating	Description	Action
5 stars	Perfect -- accurate, helpful, on-brand	No changes needed
4 stars	Good -- minor improvements possible	Optional tweaks
3 stars	Acceptable -- usable but needs work	Improve sources or guidance
2 stars	Poor -- significant issues	Fix immediately
1 star	Unusable -- wrong, misleading, or broken	Debug urgently

Goal: 80% or more of responses should be rated 4-5 stars.

Common Issues and Solutions#

Issue 1: "Agent doesn't know the answer"#

Symptoms:

"I don't have information about that"
Generic, unhelpful responses
Wrong information provided

Diagnosis: Check if the information exists in your knowledge base:

Search your sources for relevant keywords
Verify source status is "Trained"
Check if content is in the correct category

Solutions:

Missing content: Add a new source with the information
Content exists but not retrieved: Improve the source quality or add a Q&A pair
Wrong category: Adjust category priorities

Issue 2: "Retrieval returns wrong content"#

Symptoms:

Agent cites unrelated sources
Mixes up similar topics
Provides outdated information

Diagnosis: Test with specific queries matching your sources exactly.

Solutions:

1. Improve source quality:

Remove duplicate content
Delete outdated sources
Make content more specific and focused

2. Add Q&A pairs:

Create explicit Q&A for important questions
Q&A responses are given higher priority than general documents

3. Test with varied phrasing:

Code

Instead of only: "What's the price?"
Also try: "How much does it cost?", "Pricing?", "Magkano?"

Issue 3: "Agent's tone is wrong"#

Symptoms:

Too formal or too casual
Does not code-switch correctly
Inappropriate emoji usage

Solutions:

1. Edit Guidance: Go to Agent Studio > Guidance and adjust tone instructions:

Code

Example:
"You are a friendly customer support agent.
Use a warm, conversational tone.
Mix English and Tagalog naturally.
Use emojis occasionally to show friendliness."

2. Update Persona (Facebook Import):

Re-import Facebook data with persona generation
Review and edit the generated persona
Test if the agent now matches your style

3. Add example responses: Create Q&A sources with your desired tone:

Code

Q: How can I help?
A: Hi! Kumusta! What can I help you with today?

Issue 4: "Agent is too slow"#

Symptoms:

Responses feel sluggish
Timeout errors
Users complain about wait time

Solutions:

1. Use a faster AI model:

Switch to a budget model (1 credit) for faster responses
Check the model selector for available options

2. Optimize sources:

Remove very large files (over 100 pages)
Split into smaller, focused sources

3. Upgrade plan:

Higher-tier plans may offer better performance
Contact support for optimization help

Issue 5: "Agent makes up information"#

Symptoms:

Provides facts not in the knowledge base
Fabricates details (fake prices, dates)
Confidently gives wrong answers

Solutions:

1. Strengthen guidance:

Code

IMPORTANT: Only provide information from your knowledge base.
If you don't know the answer, say "I don't have information about that."
Never make up facts, prices, or dates.

2. Add explicit boundaries: Create Q&A pairs for common out-of-scope questions:

Code

Q: What's the weather today?
A: I don't have weather information. I can help with [your business topics].

Troubleshooting Retrieval#

If the AI is not retrieving the right content:

Verify training -- Check that the source status is "Trained" (green)
Test exact phrases -- Ask a question using exact wording from your source
Check source quality -- Ensure the source content is clear and well-structured
Add Q&A pairs -- For critical questions, add explicit Q&A pairs as a reliable fallback
Review source priorities -- Higher-priority sources are retrieved first

Pre-Launch Testing Checklist#

Before launching your agent, verify:

Core Functionality#

Agent responds to basic queries
Retrieves correct information from sources
Handles "I don't know" gracefully
Maintains conversation context
Response time feels acceptable

Knowledge Coverage#

All key topics have sources
Common questions are answered accurately
Edge cases handled appropriately
No major gaps in knowledge

Tone and Brand#

Responses match brand voice
Formality level is appropriate
Language mixing works (if applicable)
Emoji usage aligns with brand guidelines

Edge Cases#

Handles misspellings and typos
Responds appropriately to off-topic questions
Manages long or complex queries
Works with multiple languages (if needed)

Integration#

Embedded chat widget loads correctly
Facebook Messenger integration works
Instagram DM integration works
Other connected channels respond correctly

Performance#

No timeout errors
Handles concurrent users
Maintains quality under load

Iterative Improvement Process#

Testing is ongoing, not a one-time activity. Follow this cycle:

1. Collect Feedback#

Review conversation logs regularly
Analyze user satisfaction ratings
Identify common complaints or confusion points

2. Identify Issues#

What questions are poorly answered?
What information is missing?
What tone adjustments are needed?

3. Make Improvements#

Add new sources for missing topics
Refine existing Q&A pairs
Adjust the guidance
Update retrieval settings

4. Re-Test#

Verify improvements work as expected
Check that new changes did not create new issues
Test edge cases again

5. Deploy and Monitor#

Push changes to production
Watch conversation logs closely for the first few days
Be ready to make quick adjustments if needed

Frequency: Review every 1-2 weeks initially, then monthly once performance stabilizes.

Next Steps#

Web Widget -- Deploy your agent on your website
Integrations Overview -- Connect to Facebook, Instagram, and more
Analytics -- Track agent performance after launch