Training Your Agent
Process and index your knowledge sources so your AI agent can answer questions accurately
Training Your Agent#
After adding knowledge sources (files, websites, Q&A pairs, and more), you need to train your agent to process and index the content. Training converts raw content into searchable, AI-ready knowledge.
What Happens During Training#
Training involves multiple automated steps that run in the background.
1. Content Extraction#
Content is extracted from each source type:
- Files -- Text from PDFs, Word docs, Excel spreadsheets, and more
- Websites -- Downloaded and cleaned HTML pages
- Google Drive -- Synced and extracted from linked documents
- Facebook/Instagram -- Parsed imported message history
- Q&A -- Formatted question-answer pairs
- Structured Data -- Processed records from Google Sheets
2. Intelligent Chunking#
Content is intelligently split into small, searchable pieces. The system respects natural boundaries like paragraphs and sentences to preserve context and meaning.
3. Content Organization#
Chunks are organized into categories based on content type so the AI can prioritize the right information for each question:
| Category | Contains | Example |
|---|---|---|
| Documents | Files, websites, text, Google Drive | Product descriptions, policies, guides |
| Q&A | Question-answer pairs | "Q: What is the price? A: P500" (answered verbatim) |
| Business Data | Structured records | Product catalogs, services from Sheets |
| Time-Sensitive | Promos, events | "Summer sale ends June 30" |
Each category has a configurable priority that determines how likely its content is to be retrieved. The system automatically prioritizes the right category based on the question type.
4. Embedding Generation#
Each chunk is converted into a form the AI can search semantically. This means "How much?" will match content about "price" even though the exact words are different.
5. Indexing#
Processed content is stored and indexed with metadata (source, position, priority, recency) so it can be retrieved quickly during conversations.
How to Train Your Agent#
Step 1: Add Knowledge Sources#
Before training, add at least one source:
- Upload files (PDF, Word, etc.)
- Add a website to crawl
- Sync Google Drive documents
- Import Facebook or Instagram data
- Create Q&A pairs
- Paste text directly
- Add structured data from Google Sheets
- Create time-sensitive promos or events
Tip: Add multiple sources at once before training. Batch processing is more efficient than training one source at a time.
Step 2: Click "Train Agent"#
- Go to your agent dashboard
- Click Sources in the sidebar
- Click the Train Agent button (top-right)
- Confirm to start training
Step 3: Monitor Progress#
Training runs in the background:
- Real-time progress bar showing chunks processed
- Step indicators showing the current stage (extraction, chunking, embedding)
- Time estimate showing approximate remaining time
Typical processing times:
- 10 pages: approximately 1-2 minutes
- 100 pages: approximately 5-10 minutes
- 1,000 pages: approximately 30-60 minutes
Step 4: Training Complete#
When finished, you will see:
- "Training Complete" status
- Total chunks created (number of searchable pieces)
- Sources trained (count of successfully processed sources)
Your agent is now ready to answer questions using the trained content.
When to Re-Train#
Re-train your agent when:
Adding New Sources#
- Uploaded new files
- Added website pages
- Imported updated Facebook or Instagram data
- Created new Q&A pairs
Editing Existing Sources#
- Modified Q&A answers
- Updated text sources
- Re-crawled a website (new content)
Deleting Sources#
- Removed outdated files
- Deleted old Q&A pairs
- Archived unused sources
Changing Configuration#
- Adjusted retrieval settings
- Changed priority scores
Note: Training is incremental. Only new or changed sources are reprocessed, so re-training is faster after the initial run.
Understanding Training Status#
Sources have different status indicators:
| Status | Meaning | Action Needed |
|---|---|---|
| Trained (green) | Successfully processed and indexed | None -- ready to use |
| Pending (yellow) | Waiting for training | Click "Train Agent" |
| Processing (blue) | Currently being trained | Wait for completion |
| Failed (red) | Training encountered an error | Check the error message and retry |
| Paused | Training paused by user | Resume or cancel |
Training Performance#
Optimization Tips#
1. Batch Training
- Add multiple sources at once, then train once
- Much faster than training sources individually
2. Use Smaller Files
- Split very large PDFs into smaller sections
- Faster processing and easier to update specific sections later
3. Pre-Clean Content
- Remove unnecessary pages (covers, table of contents, indexes)
- Delete duplicate content across sources
- Fix formatting issues before upload
4. Schedule Large Imports
- Run large imports (1,000+ pages) during low-traffic periods
- Avoid peak hours for faster processing
Troubleshooting Training Issues#
"Training failed"#
Common causes:
- File corrupted -- Re-upload the file
- Website unreachable -- Check the URL and try again
- Timeout -- File too large, split into smaller files
Solution: Click "Retry Training" on the failed source.
"Training stuck"#
Possible issues:
- Large file still processing (be patient)
- Worker overload (try again later)
- Network issue (check your connection)
Fix: Wait 10 minutes, refresh the page, and check the status.
"Content not appearing in chat"#
Checklist:
- Training completed successfully (status shows "Trained")
- Content matches query (test with exact phrases from the source)
- Source is enabled and not archived
Debug: Test in the playground with exact text from your source to verify retrieval.
"Too many chunks created"#
Symptoms:
- Source created a very large number of chunks
- Irrelevant content being returned
Solutions:
- Remove duplicate content from the source
- Split the source into smaller, focused files
- Use URL pattern filtering for website sources
"Training uses too many credits"#
Reduce costs:
- Remove duplicate sources
- Pre-process files to remove unnecessary content
- Split large files into focused sections covering specific topics
Training Best Practices#
1. Quality Over Quantity#
- 10 high-quality sources outperform 100 low-quality sources
- Focus on relevant, accurate, up-to-date content
- Remove outdated information regularly
2. Organize by Topic#
- Group related sources (e.g., "Pricing", "Shipping", "Returns")
- Use consistent naming conventions
- Makes debugging retrieval issues much easier
3. Test After Training#
- Ask questions to verify the AI retrieves the right content
- Check that answers are accurate and complete
- Adjust sources if anything is missing or incorrect
4. Keep Sources Updated#
- Re-train monthly for static content
- Re-train weekly for frequently changing content
- Set reminders for website re-crawls
5. Monitor Training Logs#
- Check for failed sources after each training run
- Review warning messages
- Fix issues promptly before they affect conversations
Next Steps#
- Testing Your Agent -- Verify training results with thorough testing
- Web Widget -- Add your agent to your website
- Understanding Agents -- Learn how retrieval and AI responses work