Training Your Agent

Process and index your knowledge sources so your AI agent can answer questions accurately

Training Your Agent#

After adding knowledge sources (files, websites, Q&A pairs, and more), you need to train your agent to process and index the content. Training converts raw content into searchable, AI-ready knowledge.

What Happens During Training#

Training involves multiple automated steps that run in the background.

1. Content Extraction#

Content is extracted from each source type:

Files -- Text from PDFs, Word docs, Excel spreadsheets, and more
Websites -- Downloaded and cleaned HTML pages
Google Drive -- Synced and extracted from linked documents
Facebook/Instagram -- Parsed imported message history
Q&A -- Formatted question-answer pairs
Structured Data -- Processed records from Google Sheets

2. Intelligent Chunking#

Content is intelligently split into small, searchable pieces. The system respects natural boundaries like paragraphs and sentences to preserve context and meaning.

3. Content Organization#

Chunks are organized into categories based on content type so the AI can prioritize the right information for each question:

Category	Contains	Example
Documents	Files, websites, text, Google Drive	Product descriptions, policies, guides
Q&A	Question-answer pairs	"Q: What is the price? A: P500" (answered verbatim)
Business Data	Structured records	Product catalogs, services from Sheets
Time-Sensitive	Promos, events	"Summer sale ends June 30"

Each category has a configurable priority that determines how likely its content is to be retrieved. The system automatically prioritizes the right category based on the question type.

4. Embedding Generation#

Each chunk is converted into a form the AI can search semantically. This means "How much?" will match content about "price" even though the exact words are different.

5. Indexing#

Processed content is stored and indexed with metadata (source, position, priority, recency) so it can be retrieved quickly during conversations.

How to Train Your Agent#

Step 1: Add Knowledge Sources#

Before training, add at least one source:

Upload files (PDF, Word, etc.)
Add a website to crawl
Sync Google Drive documents
Import Facebook or Instagram data
Create Q&A pairs
Paste text directly
Add structured data from Google Sheets
Create time-sensitive promos or events

Tip: Add multiple sources at once before training. Batch processing is more efficient than training one source at a time.

Step 2: Click "Train Agent"#

Click Sources in the sidebar
Click the Train Agent button in the Sources panel
Confirm to start training

Step 3: Monitor Progress#

Training runs in the background:

Real-time progress bar showing chunks processed
Step indicators showing the current stage (extraction, chunking, embedding)

You can keep working while it runs. Larger sources (long documents, big websites) take longer to finish.

Step 4: Training Complete#

When finished, you will see:

"Training Complete" status
Total chunks created (number of searchable pieces)
Sources trained (count of successfully processed sources)

Your agent is now ready to answer questions using the trained content.

When to Re-Train#

Re-train your agent when:

Adding New Sources#

Uploaded new files
Added website pages
Imported updated Facebook or Instagram data
Created new Q&A pairs

Editing Existing Sources#

Modified Q&A answers
Updated text sources
Re-crawled a website (new content)

Deleting Sources#

Removed outdated files
Deleted old Q&A pairs
Archived unused sources

Changing Configuration#

Adjusted retrieval settings
Changed priority scores

Note: Training is incremental. Only new or changed sources are reprocessed, so re-training is faster after the initial run.

Understanding Training Status#

Sources have different status indicators:

Status	Meaning	Action Needed
Trained	Successfully processed and indexed (ready to use)	None
Draft	Added but not yet trained	Click "Train Agent"
Training	Currently being processed	Wait for completion
Error	Training encountered an error	Check the error message and retry

Training Performance#

Optimization Tips#

1. Batch Training

Add multiple sources at once, then train once
Much faster than training sources individually

2. Use Smaller Files

Split very large PDFs into smaller sections
Faster processing and easier to update specific sections later

3. Pre-Clean Content

Remove unnecessary pages (covers, table of contents, indexes)
Delete duplicate content across sources
Fix formatting issues before upload

4. Schedule Large Imports

Run large imports (1,000+ pages) during low-traffic periods
Avoid peak hours for faster processing

Troubleshooting Training Issues#

"Training failed"#

Common causes:

File corrupted -- Re-upload the file
Website unreachable -- Check the URL and try again
Timeout -- File too large, split into smaller files

Solution: Click "Retry Training" on the failed source.

"Training stuck"#

Possible issues:

Large file still processing (be patient)
Worker overload (try again later)
Network issue (check your connection)

Fix: Wait 10 minutes, refresh the page, and check the status.

"Content not appearing in chat"#

Checklist:

Training completed successfully (status shows "Trained")
Content matches query (test with exact phrases from the source)
Source is enabled and not archived

Debug: Test in the playground with exact text from your source to verify retrieval.

"Too many chunks created"#

Symptoms:

Source created a very large number of chunks
Irrelevant content being returned

Solutions:

Remove duplicate content from the source
Split the source into smaller, focused files
Use URL pattern filtering for website sources

"Training uses too many credits"#

Reduce costs:

Remove duplicate sources
Pre-process files to remove unnecessary content
Split large files into focused sections covering specific topics

Training Best Practices#

1. Quality Over Quantity#

10 high-quality sources outperform 100 low-quality sources
Focus on relevant, accurate, up-to-date content
Remove outdated information regularly

2. Organize by Topic#

Group related sources (e.g., "Pricing", "Shipping", "Returns")
Use consistent naming conventions
Makes debugging retrieval issues much easier

3. Test After Training#

Ask questions to verify the AI retrieves the right content
Check that answers are accurate and complete
Adjust sources if anything is missing or incorrect

4. Keep Sources Updated#

Re-train monthly for static content
Re-train weekly for frequently changing content
Set reminders for website re-crawls

5. Monitor Training Logs#

Check for failed sources after each training run
Review warning messages
Fix issues promptly before they affect conversations

Next Steps#

Testing Your Agent -- Verify training results with thorough testing
Web Widget -- Add your agent to your website
Understanding Agents -- Learn how retrieval and AI responses work