understanding-agents

Understanding Agents#

Learn how AlonChat agents work under the hood and how to optimize them for your use case.

How Agents Work#

AlonChat agents use Retrieval-Augmented Generation (RAG) to provide accurate, contextual responses:

Code
User Question
    ↓
1. Intent Detection (pricing, scheduling, general, etc.)
    ↓
2. Knowledge Retrieval (find relevant chunks from your sources)
    ↓
3. Hybrid Re-Ranking (combine semantic + keyword + metadata signals)
    ↓
4. Context Assembly (build prompt with retrieved knowledge)
    ↓
5. AI Generation (GPT-5.2, Claude, Gemini, Grok generates response)
    ↓
Agent Response

This architecture ensures your agent:

  • ✅ Answers accurately using YOUR knowledge (not generic AI knowledge)
  • ✅ Cites specific sources when possible
  • ✅ Stays on-topic and relevant
  • ✅ Handles complex, multi-part questions

The Knowledge Base System#

What is a Knowledge Base?#

Your agent's knowledge base consists of all the sources you add:

  • Files (PDF, Word, Excel, etc.)
  • Text (direct text input)
  • Q&A pairs (explicit question-answer mappings)
  • Websites (crawled pages)
  • Facebook conversations (imported Messenger history)

How Knowledge is Processed#

When you add a source and train your agent:

  1. Content Extraction: Text is extracted from files, websites, etc.
  2. Chunking: Content is split into ~2KB chunks (configurable)
  3. Embedding Generation: Each chunk is converted to a vector embedding (mathematical representation)
  4. Storage: Chunks and embeddings are stored in the database

The 4-Namespace System#

AlonChat organizes knowledge into 4 namespaces for intelligent retrieval:

NamespaceContentUse Case
personaAI personality/styleHow the agent communicates (tone, language mixing)
docsDocumentation, policiesGeneral knowledge, product info, guides
examplesQ&A pairsSpecific question-answer mappings
conversationChat historyHistorical context, relationship patterns

Each namespace has a configurable priority weight. The system dynamically adjusts which namespaces are prioritized based on the type of question being asked - for example, pricing questions pull more from documentation, while style-related queries lean on persona and conversation history.

Source Priority System#

Sources can have different priorities:

  • High Priority: Critical information (pricing, legal, policies) - most likely to be retrieved
  • Normal Priority: Standard information - balanced retrieval weight
  • Low Priority: Background context (archived docs, conversation history) - retrieved only when highly relevant

How to Set Priority:

  • Go to Knowledge Base → Select source → Edit → Set priority
  • Use "Is Price" toggle for pricing information (automatically sets high priority)

Modern RAG Features#

AlonChat uses modern retrieval techniques for superior accuracy:

1. Multi-Query Expansion#

Impact: +30-50% recall (finds more relevant information)

When you ask "What are your plans?", AlonChat generates:

  • Original: "What are your plans?"
  • Variant 1: "What subscription options do you offer?"
  • Variant 2: "Tell me about your pricing tiers"

All three queries search the knowledge base → more comprehensive results.

2. Hybrid Re-Ranking#

Combines multiple signals for better result quality:

  • Semantic similarity: AI-powered understanding of meaning
  • Keyword matching: Exact word and phrase matches
  • Metadata signals: Recency, priority, source type

These signals are combined using a proprietary ranking algorithm to surface the most relevant content.

3. Smooth Recency Decay#

Recent sources get a natural boost that decays smoothly over time, ensuring up-to-date information is prioritized without ignoring older content.

4. Per-Namespace Timeouts#

Each namespace retrieves independently with its own timeout. If one namespace is slow, others still return results - ensuring your agent always responds quickly.

5. Embedding Cache#

Common query embeddings are cached for faster repeated lookups. Frequently asked questions benefit from significantly reduced response times.

Agent Configuration Options#

AI Model Selection#

Factors to Consider:

  1. Credit Cost (per message)

    • Budget (1 credit): Grok 4.1, Gemini Flash, OpenRouter
    • Mid-range (5-10 credits): Claude Haiku, GPT-5.2, Gemini Pro
    • Premium (15-25 credits): Claude Sonnet, Claude Opus
  2. Quality

    • Best: Claude Opus 4.5, GPT-5.2
    • Great: Claude Sonnet 4.5, Gemini 3 Pro
    • Good: Claude Haiku 4.5, Grok 4.1
  3. Speed

    • Fastest: Grok 4.1, Gemini Flash (~1-2 seconds)
    • Fast: Claude Haiku, GPT-5.2 (~2-3 seconds)
    • Moderate: Claude Opus (~3-5 seconds)

Recommendation:

  • Budget/Testing: Grok 4.1 or Gemini Flash (1 credit)
  • Balanced: Claude Haiku 4.5 (5 credits)
  • Premium Quality: GPT-5.2 (9 credits) or Claude Sonnet (15 credits)

Temperature Settings#

What is Temperature?

Temperature controls randomness in responses (0.0-1.0):

  • 0.0-0.2: Deterministic, consistent

    • Same question → almost identical answer every time
    • Use for: Customer support, FAQ bots, factual Q&A
  • 0.3-0.5: Slightly varied

    • Same question → similar but not identical answers
    • Use for: General chatbots, conversational agents
  • 0.6-0.8: Creative, diverse

    • Same question → noticeably different answers
    • Use for: Brainstorming, creative writing
  • 0.9-1.0: Highly creative (not recommended)

    • Unpredictable, may hallucinate facts
    • Use for: Creative content only

Best Practice: Start at 0.2 for factual bots, 0.5 for conversational bots.

System Prompts#

The system prompt defines your agent's behavior. Good prompts:

Are specific and clear

Code
You are a customer support agent for Acme Corp.
Answer questions about our products using the knowledge base.
Be friendly, concise, and accurate.

Are vague and generic

Code
You are a helpful assistant.

Include fallback behavior

Code
If you don't know the answer, say:
"I don't have that information. Let me connect you with
a human agent: support@acmecorp.com"

Leave gaps in coverage

Code
Answer user questions.

Define tone and style

Code
Use a friendly, casual tone. It's okay to use emojis
occasionally. Keep responses under 3 sentences when possible.

Context Window#

What is Context?

Context is the conversation history sent to the AI model:

  • Small (5 messages): Fast, cheap, but forgets quickly
  • Medium (10 messages): Balanced (recommended for most cases)
  • Large (20+ messages): Remembers more, but slower and more expensive

When to Increase Context:

  • Long, complex conversations
  • User asks follow-up questions referencing earlier messages
  • Appointment scheduling (needs to remember multiple details)

When to Decrease Context:

  • Simple FAQ bots
  • One-off questions (no conversation continuity)
  • Budget constraints

Performance Optimization#

Improving Response Accuracy#

If your agent gives wrong answers:

  1. Check knowledge coverage

    • Does your knowledge base contain the answer?
    • Add more sources if needed
  2. Verify source priority

    • Is critical information marked high priority?
    • Use "Is Price" for pricing info
  3. Review system prompt

    • Are instructions clear?
    • Does it emphasize accuracy?
  4. Lower temperature

    • Try 0.1-0.2 for maximum consistency
  5. Add Q&A sources

    • For common questions, add explicit Q&A pairs
    • Q&A sources have higher retrieval priority

Improving Response Speed#

If your agent is slow:

  1. Switch to faster model

    • Grok 4.1 or Gemini Flash are fastest (1-2 seconds)
  2. Reduce context window

    • Fewer messages = faster processing
  3. Reduce max tokens

    • Shorter responses generate faster
  4. Optimize knowledge base

    • Remove duplicate sources
    • Archive old, unused sources

Reducing Costs#

If your agent is expensive:

  1. Use budget-friendly models (1 credit)

    • Grok 4.1, Gemini Flash, OpenRouter for most queries
  2. Use premium models strategically

    • Claude Opus/Sonnet only for complex queries
    • Mix models based on query complexity
  3. Reduce context window

    • Fewer messages = lower token usage
  4. Reduce max tokens

    • Shorter responses = lower costs
  5. Cache common questions

    • AlonChat automatically caches frequently asked queries for faster responses

Best Practices#

Knowledge Base Management#

  1. Keep sources organized

    • Use clear, descriptive names
    • Archive outdated sources (don't delete - you might need them later)
  2. Update regularly

    • Re-train after adding/updating sources
    • Review sources monthly for accuracy
  3. Use the right source type

    • Files: Documentation, manuals, catalogs
    • Text: Quick notes, policies, single-page content
    • Q&A: Common questions with specific answers
    • Website: Product pages, blog posts, help centers
    • Facebook: Conversation history, brand personality
  4. Set appropriate priorities

    • High priority: Pricing, legal, critical policies
    • Normal priority: General info, documentation
    • Low priority: Archived content, conversation history

Agent Configuration#

  1. Start simple, iterate

    • Basic agent → test → add complexity
    • Don't over-configure on day one
  2. Test with real questions

    • Use actual customer questions, not made-up ones
    • Ask colleagues to test
  3. Monitor and improve

    • Review chat logs monthly
    • Identify gaps in knowledge
    • Add sources to cover missing topics
  4. Use feedback

    • Enable thumbs up/down feedback
    • Review negative feedback to improve

Common Mistakes#

Not training after adding sources

  • Adding sources doesn't automatically train the agent
  • Always click "Train Agent" after changes

Using high temperature for factual Q&A

  • Factual bots need low temperature (0.0-0.3)
  • High temperature causes inconsistent, inaccurate answers

Vague system prompts

  • "You are a helpful assistant" is too generic
  • Be specific about role, tone, and behavior

Ignoring source priority

  • All sources have equal weight by default
  • Mark important info as high priority

Not testing before deployment

  • Always test in the chat playground first
  • Catch issues before users encounter them

Creating multiple agents for the same purpose

  • One agent with comprehensive knowledge > multiple specialized agents
  • Easier to manage, more consistent behavior
understanding-agents | AlonChat Docs