Training Your Agent

Process and index your knowledge sources so your AI agent can answer questions accurately

Training Your Agent#

After adding knowledge sources (files, websites, Q&A pairs, and more), you need to train your agent to process and index the content. Training converts raw content into searchable, AI-ready knowledge.


What Happens During Training#

Training involves multiple automated steps that run in the background.

1. Content Extraction#

Content is extracted from each source type:

  • Files -- Text from PDFs, Word docs, Excel spreadsheets, and more
  • Websites -- Downloaded and cleaned HTML pages
  • Google Drive -- Synced and extracted from linked documents
  • Facebook/Instagram -- Parsed imported message history
  • Q&A -- Formatted question-answer pairs
  • Structured Data -- Processed records from Google Sheets

2. Intelligent Chunking#

Content is intelligently split into small, searchable pieces. The system respects natural boundaries like paragraphs and sentences to preserve context and meaning.

3. Content Organization#

Chunks are organized into categories based on content type so the AI can prioritize the right information for each question:

CategoryContainsExample
DocumentsFiles, websites, text, Google DriveProduct descriptions, policies, guides
Q&AQuestion-answer pairs"Q: What is the price? A: P500" (answered verbatim)
Business DataStructured recordsProduct catalogs, services from Sheets
Time-SensitivePromos, events"Summer sale ends June 30"

Each category has a configurable priority that determines how likely its content is to be retrieved. The system automatically prioritizes the right category based on the question type.

4. Embedding Generation#

Each chunk is converted into a form the AI can search semantically. This means "How much?" will match content about "price" even though the exact words are different.

5. Indexing#

Processed content is stored and indexed with metadata (source, position, priority, recency) so it can be retrieved quickly during conversations.


How to Train Your Agent#

Step 1: Add Knowledge Sources#

Before training, add at least one source:

  • Upload files (PDF, Word, etc.)
  • Add a website to crawl
  • Sync Google Drive documents
  • Import Facebook or Instagram data
  • Create Q&A pairs
  • Paste text directly
  • Add structured data from Google Sheets
  • Create time-sensitive promos or events

Tip: Add multiple sources at once before training. Batch processing is more efficient than training one source at a time.

Step 2: Click "Train Agent"#

  1. Go to your agent dashboard
  2. Click Sources in the sidebar
  3. Click the Train Agent button (top-right)
  4. Confirm to start training

Step 3: Monitor Progress#

Training runs in the background:

  • Real-time progress bar showing chunks processed
  • Step indicators showing the current stage (extraction, chunking, embedding)
  • Time estimate showing approximate remaining time

Typical processing times:

  • 10 pages: approximately 1-2 minutes
  • 100 pages: approximately 5-10 minutes
  • 1,000 pages: approximately 30-60 minutes

Step 4: Training Complete#

When finished, you will see:

  • "Training Complete" status
  • Total chunks created (number of searchable pieces)
  • Sources trained (count of successfully processed sources)

Your agent is now ready to answer questions using the trained content.


When to Re-Train#

Re-train your agent when:

Adding New Sources#

  • Uploaded new files
  • Added website pages
  • Imported updated Facebook or Instagram data
  • Created new Q&A pairs

Editing Existing Sources#

  • Modified Q&A answers
  • Updated text sources
  • Re-crawled a website (new content)

Deleting Sources#

  • Removed outdated files
  • Deleted old Q&A pairs
  • Archived unused sources

Changing Configuration#

  • Adjusted retrieval settings
  • Changed priority scores

Note: Training is incremental. Only new or changed sources are reprocessed, so re-training is faster after the initial run.


Understanding Training Status#

Sources have different status indicators:

StatusMeaningAction Needed
Trained (green)Successfully processed and indexedNone -- ready to use
Pending (yellow)Waiting for trainingClick "Train Agent"
Processing (blue)Currently being trainedWait for completion
Failed (red)Training encountered an errorCheck the error message and retry
PausedTraining paused by userResume or cancel

Training Performance#

Optimization Tips#

1. Batch Training

  • Add multiple sources at once, then train once
  • Much faster than training sources individually

2. Use Smaller Files

  • Split very large PDFs into smaller sections
  • Faster processing and easier to update specific sections later

3. Pre-Clean Content

  • Remove unnecessary pages (covers, table of contents, indexes)
  • Delete duplicate content across sources
  • Fix formatting issues before upload

4. Schedule Large Imports

  • Run large imports (1,000+ pages) during low-traffic periods
  • Avoid peak hours for faster processing

Troubleshooting Training Issues#

"Training failed"#

Common causes:

  1. File corrupted -- Re-upload the file
  2. Website unreachable -- Check the URL and try again
  3. Timeout -- File too large, split into smaller files

Solution: Click "Retry Training" on the failed source.

"Training stuck"#

Possible issues:

  • Large file still processing (be patient)
  • Worker overload (try again later)
  • Network issue (check your connection)

Fix: Wait 10 minutes, refresh the page, and check the status.

"Content not appearing in chat"#

Checklist:

  • Training completed successfully (status shows "Trained")
  • Content matches query (test with exact phrases from the source)
  • Source is enabled and not archived

Debug: Test in the playground with exact text from your source to verify retrieval.

"Too many chunks created"#

Symptoms:

  • Source created a very large number of chunks
  • Irrelevant content being returned

Solutions:

  1. Remove duplicate content from the source
  2. Split the source into smaller, focused files
  3. Use URL pattern filtering for website sources

"Training uses too many credits"#

Reduce costs:

  1. Remove duplicate sources
  2. Pre-process files to remove unnecessary content
  3. Split large files into focused sections covering specific topics

Training Best Practices#

1. Quality Over Quantity#

  • 10 high-quality sources outperform 100 low-quality sources
  • Focus on relevant, accurate, up-to-date content
  • Remove outdated information regularly

2. Organize by Topic#

  • Group related sources (e.g., "Pricing", "Shipping", "Returns")
  • Use consistent naming conventions
  • Makes debugging retrieval issues much easier

3. Test After Training#

  • Ask questions to verify the AI retrieves the right content
  • Check that answers are accurate and complete
  • Adjust sources if anything is missing or incorrect

4. Keep Sources Updated#

  • Re-train monthly for static content
  • Re-train weekly for frequently changing content
  • Set reminders for website re-crawls

5. Monitor Training Logs#

  • Check for failed sources after each training run
  • Review warning messages
  • Fix issues promptly before they affect conversations

Next Steps#