Skip to content

RAG Configuration

Advanced configuration options for Retrieval Augmented Generation (RAG) in Cognipeer AI.

Overview

RAG (Retrieval Augmented Generation) allows your AI Peers to answer questions using your own data sources while maintaining accuracy and providing source attribution. This guide covers all RAG configuration options and best practices.

Core Settings

Enable RAG Pipeline

Control whether the RAG pipeline is active for your Peer.

javascript
{
  "enableRagPipeline": true
}

Default: true

When to disable:

  • Pure conversational Peers without data requirements
  • Testing general LLM behavior
  • Reducing latency for simple queries

Include Data Sources as Tools

Expose data sources as callable tools in agentic workflows.

javascript
{
  "includeDataSourcesAsTools": true
}

Default: true

Benefits:

  • AI can explicitly query specific data sources
  • Better control over when retrieval happens
  • Improved reasoning about which source to use

Metadata Enrichment

Include Metadata

Add rich contextual metadata to retrieved documents.

javascript
{
  "ragIncludeMetadata": true
}

Default: true

What gets included:

  • Dataset items: Item identifiers, dataset names, custom fields
  • Documents: Filename, author, upload date, file type
  • External sources: URL, crawl date, page structure
  • Collections: Custom metadata from your schema

Example Output:

markdown
Context: "The Q4 revenue was $2.5M"

Without metadata:
Answer: "Q4 revenue was $2.5M."

With metadata:
Answer: "According to the Financial Report 2024 (uploaded by Jane Smith on Oct 15, 2025), 
the Q4 revenue was $2.5M."

Include Conversation Sources

Add references to previous conversation context in RAG results.

javascript
{
  "ragIncludeConversationSources": true
}

Default: true

Use cases:

  • Multi-turn conversations requiring previous context
  • Follow-up questions referencing earlier topics
  • Maintaining conversation coherence

Answer Validation

Final Answer Validation

Enable dual-LLM validation to prevent hallucinations.

javascript
{
  "ragValidateFinalAnswer": true,
  "ragValidateFinalAnswerInstructions": "Ensure all medical information is explicitly stated in verified sources. Never extrapolate symptoms or treatments."
}

Default: false (due to latency/cost impact)

How it works:

  1. Primary LLM generates answer from RAG context
  2. Validator LLM receives:
    • Original question
    • Generated answer
    • RAG context
    • Evidence sources
    • Custom validation instructions
  3. Validator checks answer against evidence
  4. Returns original or revised answer

Performance impact:

  • Adds 500-2000ms latency (depends on model)
  • Doubles LLM API calls
  • Recommended for: Healthcare, legal, financial domains

Custom validation instructions:

Provide domain-specific rules:

javascript
{
  "ragValidateFinalAnswerInstructions": `
    - Only cite information from documents dated within last 6 months
    - All numerical claims must have explicit source citations
    - If multiple sources conflict, acknowledge the discrepancy
    - Technical specifications must match exact product documentation
  `
}

Strict Knowledge Base Mode

Force AI to only answer from indexed data.

javascript
{
  "ragStrictKnowledgebaseAnswers": true
}

Default: false

Behavior:

  • AI will not use general knowledge
  • Missing information → "I don't know" responses
  • All answers must be verifiable from sources

System prompt injection:

CRITICAL: You must ONLY provide information that is explicitly 
present in the retrieved context. If the context does not contain 
the answer, you MUST state that you don't have that information. 
DO NOT use general knowledge, DO NOT make assumptions, and 
DO NOT extrapolate beyond what is explicitly stated.

Best practices:

Combine with helpful fallback messages:

javascript
{
  "ragStrictKnowledgebaseAnswers": true,
  "additionalPrompt": `If the information is not found in your knowledge base, respond:
  "I don't have that information in my current knowledge base. 
  However, I can help with [list available topics]. 
  For additional assistance, please contact support@company.com"`
}

Query Controls

Maximum Results

Control how many document chunks to retrieve.

javascript
{
  "ragMaxResults": 10
}

Default: null (system determines based on query)

Recommendations by use case:

javascript
// FAQ Chatbot - Simple, focused answers
{ "ragMaxResults": 3 }

// Research Assistant - Comprehensive analysis  
{ "ragMaxResults": 20 }

// Technical Documentation - Precise examples
{ "ragMaxResults": 5 }

// General Q&A - Balanced
{ "ragMaxResults": 10 }

Trade-offs:

  • Too low (1-3): May miss relevant information
  • Optimal (5-10): Balance relevance and cost
  • Too high (20+): Token waste, slower responses, more noise

Score Threshold

Set minimum similarity score for retrieved documents.

javascript
{
  "ragScoreThreshold": 0.75
}

Default: null (system default, typically 0.7)

Score interpretation:

1.0 - 0.9: Exact or near-exact matches
0.9 - 0.8: Highly relevant, same topic
0.8 - 0.7: Related, conceptually similar
0.7 - 0.6: Loosely related
< 0.6:     Different topics (usually filtered)

Tuning guide:

javascript
// High Precision (Strict relevance)
{ 
  "ragScoreThreshold": 0.85,
  "ragMaxResults": 3
}
// Use: Medical, legal, financial

// Balanced (Default)
{
  "ragScoreThreshold": 0.7,
  "ragMaxResults": 10  
}
// Use: General Q&A, customer support

// High Recall (Broad coverage)
{
  "ragScoreThreshold": 0.6,
  "ragMaxResults": 15
}
// Use: Research, exploratory queries

Search Mode

Choose retrieval strategy.

javascript
{
  "ragAllItemMode": "hybrid"
}

Options:

  • "semantic": Vector similarity search (default)
  • "keyword": Full-text keyword search
  • "hybrid": Combined semantic + keyword with RRF ranking

Semantic search:

javascript
{ "ragAllItemMode": "semantic" }
  • Best for: Natural language queries, conceptual searches
  • Pros: Understands synonyms, handles typos, multilingual
  • Cons: May miss exact terms or IDs

Keyword search:

javascript
{ "ragAllItemMode": "keyword" }
  • Best for: Exact phrase matching, product SKUs, code search
  • Pros: Precise term matching, faster
  • Cons: No semantic understanding, strict matching

Hybrid search (Recommended):

javascript
{ "ragAllItemMode": "hybrid" }
  • Best for: Technical documentation, mixed content
  • Pros: Combines strengths of both approaches
  • How it works: Parallel retrieval → RRF merge → Re-ranking

Hybrid search architecture:

Query: "How do I authenticate with JWT?"

        ↓ (parallel)
   ┌────────────────┐
   │                │
Semantic        Keyword
Search          Search
   │                │
   ├─ "Auth Guide"  ├─ "JWT Validation"
   ├─ "OAuth Setup" ├─ "Authentication"
   ├─ "Sessions"    ├─ "API Security"
   │                │
   └────────┬───────┘

    RRF Merge & Rerank

    ├─ JWT Validation (0.93) ← Best!
    ├─ Authentication (0.87)
    ├─ OAuth Setup (0.76)
    └─ API Security (0.74)

Complete Configuration Examples

High-Stakes Application (Healthcare)

javascript
{
  "enableRagPipeline": true,
  "ragIncludeMetadata": true,
  "ragIncludeConversationSources": true,
  "ragValidateFinalAnswer": true,
  "ragValidateFinalAnswerInstructions": "Only provide medical information from peer-reviewed sources dated within last 2 years. Never diagnose or prescribe. Always recommend consulting healthcare professionals.",
  "ragStrictKnowledgebaseAnswers": true,
  "ragMaxResults": 5,
  "ragScoreThreshold": 0.85,
  "ragAllItemMode": "hybrid",
  "additionalPrompt": "If medical information is incomplete or outdated, clearly state limitations and recommend consulting with healthcare provider."
}

Customer Support Bot

javascript
{
  "enableRagPipeline": true,
  "ragIncludeMetadata": true,
  "ragIncludeConversationSources": true,
  "ragValidateFinalAnswer": false,
  "ragStrictKnowledgebaseAnswers": false,
  "ragMaxResults": 8,
  "ragScoreThreshold": 0.7,
  "ragAllItemMode": "hybrid",
  "includeDataSourcesAsTools": true
}

Research Assistant

javascript
{
  "enableRagPipeline": true,
  "ragIncludeMetadata": true,
  "ragIncludeConversationSources": true,
  "ragValidateFinalAnswer": false,
  "ragStrictKnowledgebaseAnswers": false,
  "ragMaxResults": 20,
  "ragScoreThreshold": 0.65,
  "ragAllItemMode": "semantic",
  "additionalPrompt": "Provide comprehensive analysis with citations. When multiple sources present different viewpoints, acknowledge all perspectives."
}
javascript
{
  "enableRagPipeline": true,
  "ragIncludeMetadata": true,
  "ragIncludeConversationSources": false,
  "ragValidateFinalAnswer": false,
  "ragStrictKnowledgebaseAnswers": true,
  "ragMaxResults": 5,
  "ragScoreThreshold": 0.75,
  "ragAllItemMode": "hybrid",
  "additionalPrompt": "Always include code examples when available. Link to full documentation. If examples are incomplete, clearly indicate what's missing."
}

Monitoring and Optimization

Key Metrics to Track

Retrieval Quality:

  • Average similarity score
  • Number of results per query
  • Cache hit rate
  • Retrieval latency

Answer Quality:

  • Validation pass rate
  • User satisfaction scores
  • Source attribution percentage
  • "I don't know" response rate

Performance:

  • End-to-end latency
  • Token usage (prompt + completion)
  • API costs per query

A/B Testing

Test different configurations:

javascript
// Configuration A: Strict
const configA = {
  ragStrictKnowledgebaseAnswers: true,
  ragValidateFinalAnswer: true,
  ragScoreThreshold: 0.8,
  ragMaxResults: 5
};

// Configuration B: Lenient
const configB = {
  ragStrictKnowledgebaseAnswers: false,
  ragValidateFinalAnswer: false,
  ragScoreThreshold: 0.65,
  ragMaxResults: 10
};

// Monitor:
// - Answer rate: % of queries answered
// - Accuracy: % of correct answers
// - Latency: Average response time
// - User satisfaction: Explicit feedback

Optimization Process

  1. Start with defaults and baseline metrics
  2. Identify problems:
    • Too many "I don't know" → Lower threshold, increase max results
    • Hallucinations → Enable validation, strict mode
    • Slow responses → Reduce max results, disable validation
    • Irrelevant answers → Increase threshold, try hybrid search
  3. Test changes with evaluation dataset
  4. Monitor production metrics
  5. Iterate based on data

API Configuration

Via Dashboard

Navigate to: Peer SettingsRAG Configuration

All options available in UI with real-time preview.

Via API

javascript
const peer = await cognipeer.peer.update(peerId, {
  enableRagPipeline: true,
  ragIncludeMetadata: true,
  ragValidateFinalAnswer: true,
  ragValidateFinalAnswerInstructions: "Custom validation rules...",
  ragStrictKnowledgebaseAnswers: true,
  ragMaxResults: 10,
  ragScoreThreshold: 0.75,
  ragAllItemMode: "hybrid",
  ragIncludeConversationSources: true
});

Per-Request Override

Override Peer defaults for specific queries:

javascript
const response = await cognipeer.peer.ask(peerId, {
  message: "What's the return policy?",
  overrides: {
    ragMaxResults: 5,
    ragScoreThreshold: 0.8
  }
});

Common Issues and Solutions

Issue: "I don't have that information" for known content

Causes:

  • Threshold too high
  • Wording mismatch (semantic search limitation)
  • Content not properly indexed

Solutions:

javascript
// Lower threshold
{ "ragScoreThreshold": 0.65 }

// Use hybrid search
{ "ragAllItemMode": "hybrid" }

// Increase results
{ "ragMaxResults": 15 }

// Re-index data source
await cognipeer.datasource.reindex(datasourceId);

Issue: Hallucinations in responses

Causes:

  • Strict mode disabled
  • No answer validation
  • Low-quality source data

Solutions:

javascript
// Enable protections
{
  "ragValidateFinalAnswer": true,
  "ragStrictKnowledgebaseAnswers": true,
  "ragScoreThreshold": 0.8
}

// Add explicit instructions
{
  "additionalPrompt": "Only provide information explicitly stated in sources. If unsure, say 'I don't know.'"
}

Issue: Slow response times

Causes:

  • Too many retrieved chunks
  • Answer validation enabled
  • Large metadata payloads

Solutions:

javascript
// Reduce retrieval
{
  "ragMaxResults": 5,
  "ragScoreThreshold": 0.75
}

// Disable validation for non-critical use
{
  "ragValidateFinalAnswer": false
}

// Trim metadata if excessive
{
  "ragIncludeMetadata": false
}

Issue: Missing context from conversations

Causes:

  • Conversation sources disabled
  • Short conversation history limit

Solutions:

javascript
// Enable conversation context
{
  "ragIncludeConversationSources": true
}

// Increase history in Channel settings
{
  "messageHistoryLimit": 20
}

Best Practices

DO:

  • Start with default settings and measure
  • Enable metadata for production systems
  • Use strict mode for compliance/regulated domains
  • Validate answers in high-stakes applications
  • Monitor metrics and iterate
  • Test with evaluation datasets before production
  • Document your configuration choices

DON'T:

  • Blindly copy configurations without testing
  • Disable all safeguards for speed
  • Set thresholds without measuring impact
  • Forget to re-index after source updates
  • Ignore user feedback
  • Skip A/B testing for critical changes

Support

Questions about RAG configuration? Reach out:

Built with VitePress