understanding-agents
Understanding Agents#
Learn how AlonChat agents work under the hood and how to optimize them for your use case.
How Agents Work#
AlonChat agents use Retrieval-Augmented Generation (RAG) to provide accurate, contextual responses:
User Question
↓
1. Intent Detection (pricing, scheduling, general, etc.)
↓
2. Knowledge Retrieval (find relevant chunks from your sources)
↓
3. Hybrid Re-Ranking (combine semantic + keyword + metadata signals)
↓
4. Context Assembly (build prompt with retrieved knowledge)
↓
5. AI Generation (GPT-5.2, Claude, Gemini, Grok generates response)
↓
Agent Response
This architecture ensures your agent:
- ✅ Answers accurately using YOUR knowledge (not generic AI knowledge)
- ✅ Cites specific sources when possible
- ✅ Stays on-topic and relevant
- ✅ Handles complex, multi-part questions
The Knowledge Base System#
What is a Knowledge Base?#
Your agent's knowledge base consists of all the sources you add:
- Files (PDF, Word, Excel, etc.)
- Text (direct text input)
- Q&A pairs (explicit question-answer mappings)
- Websites (crawled pages)
- Facebook conversations (imported Messenger history)
How Knowledge is Processed#
When you add a source and train your agent:
- Content Extraction: Text is extracted from files, websites, etc.
- Chunking: Content is split into ~2KB chunks (configurable)
- Embedding Generation: Each chunk is converted to a vector embedding (mathematical representation)
- Storage: Chunks and embeddings are stored in the database
The 4-Namespace System#
AlonChat organizes knowledge into 4 namespaces for intelligent retrieval:
| Namespace | Content | Use Case |
|---|---|---|
| persona | AI personality/style | How the agent communicates (tone, language mixing) |
| docs | Documentation, policies | General knowledge, product info, guides |
| examples | Q&A pairs | Specific question-answer mappings |
| conversation | Chat history | Historical context, relationship patterns |
Each namespace has a configurable priority weight. The system dynamically adjusts which namespaces are prioritized based on the type of question being asked - for example, pricing questions pull more from documentation, while style-related queries lean on persona and conversation history.
Source Priority System#
Sources can have different priorities:
- High Priority: Critical information (pricing, legal, policies) - most likely to be retrieved
- Normal Priority: Standard information - balanced retrieval weight
- Low Priority: Background context (archived docs, conversation history) - retrieved only when highly relevant
How to Set Priority:
- Go to Knowledge Base → Select source → Edit → Set priority
- Use "Is Price" toggle for pricing information (automatically sets high priority)
Modern RAG Features#
AlonChat uses modern retrieval techniques for superior accuracy:
1. Multi-Query Expansion#
Impact: +30-50% recall (finds more relevant information)
When you ask "What are your plans?", AlonChat generates:
- Original: "What are your plans?"
- Variant 1: "What subscription options do you offer?"
- Variant 2: "Tell me about your pricing tiers"
All three queries search the knowledge base → more comprehensive results.
2. Hybrid Re-Ranking#
Combines multiple signals for better result quality:
- Semantic similarity: AI-powered understanding of meaning
- Keyword matching: Exact word and phrase matches
- Metadata signals: Recency, priority, source type
These signals are combined using a proprietary ranking algorithm to surface the most relevant content.
3. Smooth Recency Decay#
Recent sources get a natural boost that decays smoothly over time, ensuring up-to-date information is prioritized without ignoring older content.
4. Per-Namespace Timeouts#
Each namespace retrieves independently with its own timeout. If one namespace is slow, others still return results - ensuring your agent always responds quickly.
5. Embedding Cache#
Common query embeddings are cached for faster repeated lookups. Frequently asked questions benefit from significantly reduced response times.
Agent Configuration Options#
AI Model Selection#
Factors to Consider:
-
Credit Cost (per message)
- Budget (1 credit): Grok 4.1, Gemini Flash, OpenRouter
- Mid-range (5-10 credits): Claude Haiku, GPT-5.2, Gemini Pro
- Premium (15-25 credits): Claude Sonnet, Claude Opus
-
Quality
- Best: Claude Opus 4.5, GPT-5.2
- Great: Claude Sonnet 4.5, Gemini 3 Pro
- Good: Claude Haiku 4.5, Grok 4.1
-
Speed
- Fastest: Grok 4.1, Gemini Flash (~1-2 seconds)
- Fast: Claude Haiku, GPT-5.2 (~2-3 seconds)
- Moderate: Claude Opus (~3-5 seconds)
Recommendation:
- Budget/Testing: Grok 4.1 or Gemini Flash (1 credit)
- Balanced: Claude Haiku 4.5 (5 credits)
- Premium Quality: GPT-5.2 (9 credits) or Claude Sonnet (15 credits)
Temperature Settings#
What is Temperature?
Temperature controls randomness in responses (0.0-1.0):
-
0.0-0.2: Deterministic, consistent
- Same question → almost identical answer every time
- Use for: Customer support, FAQ bots, factual Q&A
-
0.3-0.5: Slightly varied
- Same question → similar but not identical answers
- Use for: General chatbots, conversational agents
-
0.6-0.8: Creative, diverse
- Same question → noticeably different answers
- Use for: Brainstorming, creative writing
-
0.9-1.0: Highly creative (not recommended)
- Unpredictable, may hallucinate facts
- Use for: Creative content only
Best Practice: Start at 0.2 for factual bots, 0.5 for conversational bots.
System Prompts#
The system prompt defines your agent's behavior. Good prompts:
✅ Are specific and clear
You are a customer support agent for Acme Corp.
Answer questions about our products using the knowledge base.
Be friendly, concise, and accurate.
❌ Are vague and generic
You are a helpful assistant.
✅ Include fallback behavior
If you don't know the answer, say:
"I don't have that information. Let me connect you with
a human agent: support@acmecorp.com"
❌ Leave gaps in coverage
Answer user questions.
✅ Define tone and style
Use a friendly, casual tone. It's okay to use emojis
occasionally. Keep responses under 3 sentences when possible.
Context Window#
What is Context?
Context is the conversation history sent to the AI model:
- Small (5 messages): Fast, cheap, but forgets quickly
- Medium (10 messages): Balanced (recommended for most cases)
- Large (20+ messages): Remembers more, but slower and more expensive
When to Increase Context:
- Long, complex conversations
- User asks follow-up questions referencing earlier messages
- Appointment scheduling (needs to remember multiple details)
When to Decrease Context:
- Simple FAQ bots
- One-off questions (no conversation continuity)
- Budget constraints
Performance Optimization#
Improving Response Accuracy#
If your agent gives wrong answers:
-
Check knowledge coverage
- Does your knowledge base contain the answer?
- Add more sources if needed
-
Verify source priority
- Is critical information marked high priority?
- Use "Is Price" for pricing info
-
Review system prompt
- Are instructions clear?
- Does it emphasize accuracy?
-
Lower temperature
- Try 0.1-0.2 for maximum consistency
-
Add Q&A sources
- For common questions, add explicit Q&A pairs
- Q&A sources have higher retrieval priority
Improving Response Speed#
If your agent is slow:
-
Switch to faster model
- Grok 4.1 or Gemini Flash are fastest (1-2 seconds)
-
Reduce context window
- Fewer messages = faster processing
-
Reduce max tokens
- Shorter responses generate faster
-
Optimize knowledge base
- Remove duplicate sources
- Archive old, unused sources
Reducing Costs#
If your agent is expensive:
-
Use budget-friendly models (1 credit)
- Grok 4.1, Gemini Flash, OpenRouter for most queries
-
Use premium models strategically
- Claude Opus/Sonnet only for complex queries
- Mix models based on query complexity
-
Reduce context window
- Fewer messages = lower token usage
-
Reduce max tokens
- Shorter responses = lower costs
-
Cache common questions
- AlonChat automatically caches frequently asked queries for faster responses
Best Practices#
Knowledge Base Management#
-
Keep sources organized
- Use clear, descriptive names
- Archive outdated sources (don't delete - you might need them later)
-
Update regularly
- Re-train after adding/updating sources
- Review sources monthly for accuracy
-
Use the right source type
- Files: Documentation, manuals, catalogs
- Text: Quick notes, policies, single-page content
- Q&A: Common questions with specific answers
- Website: Product pages, blog posts, help centers
- Facebook: Conversation history, brand personality
-
Set appropriate priorities
- High priority: Pricing, legal, critical policies
- Normal priority: General info, documentation
- Low priority: Archived content, conversation history
Agent Configuration#
-
Start simple, iterate
- Basic agent → test → add complexity
- Don't over-configure on day one
-
Test with real questions
- Use actual customer questions, not made-up ones
- Ask colleagues to test
-
Monitor and improve
- Review chat logs monthly
- Identify gaps in knowledge
- Add sources to cover missing topics
-
Use feedback
- Enable thumbs up/down feedback
- Review negative feedback to improve
Common Mistakes#
❌ Not training after adding sources
- Adding sources doesn't automatically train the agent
- Always click "Train Agent" after changes
❌ Using high temperature for factual Q&A
- Factual bots need low temperature (0.0-0.3)
- High temperature causes inconsistent, inaccurate answers
❌ Vague system prompts
- "You are a helpful assistant" is too generic
- Be specific about role, tone, and behavior
❌ Ignoring source priority
- All sources have equal weight by default
- Mark important info as high priority
❌ Not testing before deployment
- Always test in the chat playground first
- Catch issues before users encounter them
❌ Creating multiple agents for the same purpose
- One agent with comprehensive knowledge > multiple specialized agents
- Easier to manage, more consistent behavior