Skip to content

AI-Powered Analysis & Optimization

Cognipeer AI's AI Analysis feature uses advanced language models to automatically analyze your peer's evaluation results and provide actionable improvement suggestions. This feature helps you quickly identify issues and optimize your peer's configuration for better performance.

Overview

The AI Analysis feature:

  • Analyzes Evaluation Results: Reviews failed and low-scoring questions
  • Identifies Patterns: Finds common issues across multiple failures
  • Generates Suggestions: Provides specific, actionable recommendations
  • Explains Reasoning: Details why each suggestion will help
  • Enables Quick Application: Apply changes with one click

How It Works

Analysis Process

  1. Data Collection: Gathers evaluation results, peer configuration, and context
  2. Pattern Recognition: Identifies recurring problems and failure patterns
  3. Root Cause Analysis: Determines underlying issues
  4. Suggestion Generation: Creates specific improvement recommendations
  5. Prioritization: Orders suggestions by potential impact

AI Model

The analysis uses GPT-4o to ensure high-quality, contextual recommendations. The model considers:

  • Question-answer pairs and their scores
  • Peer's current configuration (prompt, tools, datasources)
  • Evaluation metrics and failure patterns
  • Best practices for AI peer design

Using AI Analysis

Starting an Analysis

From Evaluation Results

  1. Navigate to an evaluation run's results page
  2. Click the "Analyze with AI" button
  3. Wait for analysis to complete (10-30 seconds)

Auto-Analysis

Enable automatic analysis after each evaluation run:

json
{
  "autoAnalyze": true,
  "autoAnalyzeThreshold": 0.7
}

Analysis runs automatically if overall score is below threshold.

Viewing Analysis Results

The analysis panel shows:

Summary Section

  • Overall Assessment: High-level evaluation of peer performance
  • Key Issues Identified: Main problems affecting scores
  • Improvement Potential: Expected score increase if suggestions are applied

Detailed Suggestions

Each suggestion includes:

  1. Category: Type of improvement (Prompt, Tools, Settings, Data)
  2. Priority: High, Medium, or Low
  3. Title: Brief description
  4. Detailed Explanation: Why this change helps
  5. Specific Changes: Exact modifications to make
  6. Expected Impact: Predicted score improvement

Example Analysis Output

markdown
## Overall Assessment

Your peer is struggling with customer support scenarios, particularly 
around product specifications and troubleshooting. The main issues are:
- Insufficient context about product features
- Missing tools for real-time data access
- Overly creative responses (high temperature)

Expected improvement: +15-20% overall score

## Suggestions

### 1. Enhance System Prompt for Product Support (Priority: High)

**Current Issue:**
The peer lacks specific instructions for handling product-related queries.

**Recommendation:**
Add product support guidelines to the system prompt:

"When discussing products:
1. Always reference official specifications
2. Provide step-by-step troubleshooting
3. Offer to escalate complex technical issues
4. Use clear, non-technical language for customers"

**Expected Impact:** +12% on product-related questions

### 2. Enable Knowledge Base Tool (Priority: High)

**Current Issue:**
The peer cannot access product documentation during conversations.

**Recommendation:**
Enable the "Product Documentation" datasource tool to allow 
real-time access to specifications and guides.

**Expected Impact:** +18% on specification questions

### 3. Lower Temperature Setting (Priority: Medium)

**Current Issue:**
Temperature of 0.8 leads to inconsistent, creative responses 
where accuracy is needed.

**Recommendation:**
Reduce temperature to 0.3 for more consistent, factual responses.

**Expected Impact:** +8% on factual questions

Applying Suggestions

Individual Application

Apply suggestions one at a time:

  1. Review the suggestion details
  2. Click "Preview Changes" to see exact modifications
  3. Click "Apply" to implement the change
  4. The peer configuration updates immediately

Bulk Application

Apply multiple or all suggestions:

  1. Select suggestions using checkboxes
  2. Click "Apply Selected" or "Apply All"
  3. Review combined changes in preview modal
  4. Confirm to apply all changes

Custom Modifications

Edit suggestions before applying:

  1. Click "Edit" on any suggestion
  2. Modify the proposed changes
  3. Save your custom version
  4. Apply the modified suggestion

Suggestion Categories

1. Prompt Improvements

Examples:

  • Add specific instructions for handling edge cases
  • Include examples of good responses
  • Clarify tone and style requirements
  • Add constraints or guidelines

Impact: Often highest impact, affects all responses

2. Tool Recommendations

Examples:

  • Enable relevant datasource tools
  • Add integration tools (API calls, database access)
  • Configure web search for real-time information
  • Enable file processing tools

Impact: Enables new capabilities, fills knowledge gaps

3. Settings Adjustments

Examples:

  • Temperature modifications
  • Model selection changes
  • Max tokens adjustments
  • Reasoning settings

Impact: Fine-tunes response quality and consistency

4. Data Source Updates

Examples:

  • Add missing documentation to knowledge base
  • Update outdated information
  • Improve data organization
  • Add contextual metadata

Impact: Improves factual accuracy and completeness

Best Practices

Before Requesting Analysis

  1. Run Complete Evaluation: Ensure you have sufficient test coverage (20+ questions)
  2. Diverse Question Set: Include various scenarios and difficulty levels
  3. Clear Expected Answers: Provide realistic, well-defined expected responses
  4. Baseline Configuration: Start with a reasonable baseline setup

Reviewing Suggestions

  1. Understand the Reasoning: Read the full explanation for each suggestion
  2. Check for Conflicts: Ensure suggestions don't contradict each other
  3. Validate Priority: Higher priority doesn't always mean apply first
  4. Test Incrementally: Apply and test suggestions one at a time when possible

After Applying Changes

  1. Re-run Evaluation: Verify improvements with the same test suite
  2. Compare Scores: Check if actual improvement matches prediction
  3. Monitor Edge Cases: Ensure changes don't hurt other areas
  4. Iterate: Request new analysis if scores still need improvement

Advanced Features

Custom Analysis Criteria

Specify what to focus on:

javascript
POST /api/v1/evaluation/:runId/suggest-improvements
{
  "focus": "accuracy",  // or "speed", "cost", "tone"
  "constraints": {
    "maxTemperature": 0.5,
    "preferredTools": ["datasource"],
    "maintainTone": true
  }
}

Analysis History

Track all analyses and applied suggestions:

  • View historical recommendations
  • See which suggestions were applied
  • Track improvement trends over time
  • Revert to previous configurations

A/B Testing Support

Test suggestions before full deployment:

  1. Clone your peer
  2. Apply suggestions to the clone
  3. Run evaluations on both versions
  4. Compare results
  5. Deploy the winner

Common Scenarios

Scenario 1: Low Overall Scores

Symptoms:

  • Overall score < 60%
  • Failing across multiple question types
  • Inconsistent responses

Typical Suggestions:

  1. Major prompt restructuring
  2. Enable essential tools
  3. Add comprehensive knowledge base
  4. Adjust temperature down

Approach:

  • Apply high-priority prompt changes first
  • Add tools that fill knowledge gaps
  • Re-evaluate after each major change

Scenario 2: Specific Topic Failures

Symptoms:

  • High scores overall
  • Consistent failures in one category
  • Missing domain knowledge

Typical Suggestions:

  1. Add topic-specific instructions to prompt
  2. Enable specialized tools or datasources
  3. Add targeted knowledge base content

Approach:

  • Apply category-specific suggestions
  • Test with focused evaluation suite
  • Expand to related categories

Scenario 3: Inconsistent Quality

Symptoms:

  • High variance in scores
  • Same question gets different answers
  • Unpredictable behavior

Typical Suggestions:

  1. Lower temperature setting
  2. Add more explicit constraints to prompt
  3. Use more deterministic model
  4. Add examples to prompt

Approach:

  • Apply temperature changes first
  • Add constraints incrementally
  • Test for consistency improvement

Limitations & Considerations

What AI Analysis Can Do

✅ Identify patterns in evaluation failures
✅ Suggest configuration improvements
✅ Recommend relevant tools and datasources
✅ Propose prompt enhancements
✅ Prioritize changes by expected impact

What AI Analysis Cannot Do

❌ Guarantee specific score improvements
❌ Fix fundamental knowledge gaps without data
❌ Optimize for contradictory goals simultaneously
❌ Predict all edge case behaviors
❌ Replace human judgment and testing

Important Notes

  1. Suggestions are Recommendations: Always review before applying
  2. Context Matters: AI doesn't know your full business context
  3. Test After Changes: Always validate with fresh evaluation runs
  4. Iterative Process: Multiple rounds may be needed for optimal results
  5. Data Quality: Analysis quality depends on evaluation quality

Troubleshooting

Analysis Taking Too Long

Problem: Analysis doesn't complete or times out

Solutions:

  • Ensure evaluation run completed successfully
  • Reduce number of questions if >100
  • Try again after a few minutes
  • Check API rate limits

Suggestions Don't Improve Scores

Problem: Applied suggestions but scores didn't increase

Possible Causes:

  1. Expected answers may be unrealistic: Review and adjust expectations
  2. Conflicting suggestions applied: Try one at a time
  3. Insufficient test coverage: Add more diverse questions
  4. Fundamental capability gap: May need different model or architecture

Generic or Unhelpful Suggestions

Problem: Suggestions are too vague or not actionable

Solutions:

  • Ensure evaluation has detailed failure data
  • Add more context to questions and expected answers
  • Run evaluation with multiple evaluators
  • Provide more specific peer description and purpose

API Reference

Request Analysis

javascript
POST /api/v1/evaluation/:runId/suggest-improvements

Response:
{
  "analysis": {
    "summary": "Overall assessment...",
    "suggestions": [
      {
        "id": "sugg_1",
        "category": "prompt",
        "priority": "high",
        "title": "Enhance system prompt...",
        "description": "Detailed explanation...",
        "changes": { /* specific modifications */ },
        "expectedImpact": "+12%"
      }
    ],
    "overallImprovement": "+15-20%"
  }
}

Apply Suggestions

javascript
POST /api/v1/peer/:peerId/apply-improvements
{
  "suggestions": ["sugg_1", "sugg_2"],
  "preview": false
}

Response:
{
  "applied": 2,
  "changes": { /* actual modifications made */ },
  "backup": { /* previous configuration */ }
}

Get Analysis History

javascript
GET /api/v1/peer/:peerId/analysis-history

Response:
{
  "analyses": [
    {
      "id": "analysis_1",
      "timestamp": "2025-10-20T10:30:00Z",
      "evaluationRunId": "run_123",
      "suggestionsCount": 5,
      "appliedCount": 3,
      "scoreImprovement": "+14%"
    }
  ]
}

Integration Examples

Automated Optimization Workflow

javascript
// 1. Run evaluation
const run = await evaluateion.execute(suiteId);

// 2. Auto-analyze if score is low
if (run.averageScore < 0.7) {
  const analysis = await evaluation.suggestImprovements(run.id);
  
  // 3. Auto-apply high-priority suggestions
  const highPriority = analysis.suggestions
    .filter(s => s.priority === 'high');
  
  await peer.applyImprovements(peerId, {
    suggestions: highPriority.map(s => s.id)
  });
  
  // 4. Re-run evaluation
  const newRun = await evaluation.execute(suiteId);
  
  console.log(`Improvement: ${newRun.averageScore - run.averageScore}`);
}

Scheduled Optimization

javascript
// Run weekly optimization
cron.schedule('0 2 * * 0', async () => {
  const peers = await peer.list({ autoOptimize: true });
  
  for (const p of peers) {
    // Run evaluation
    const run = await evaluation.execute(p.defaultSuiteId);
    
    // Get suggestions
    const analysis = await evaluation.suggestImprovements(run.id);
    
    // Notify admin
    await notifications.send({
      to: 'admin@company.com',
      subject: `${p.name} Optimization Report`,
      body: analysis.summary,
      suggestions: analysis.suggestions
    });
  }
});

Summary

AI-Powered Analysis accelerates the optimization process by automatically identifying issues and providing expert-level recommendations. By combining evaluation data with AI insights, you can systematically improve your peers' performance with minimal manual effort.

Best Workflow:

  1. Create comprehensive evaluation suite
  2. Run initial evaluation (baseline)
  3. Request AI analysis
  4. Review and apply suggestions
  5. Re-run evaluation (measure improvement)
  6. Iterate until target performance is reached
  7. Schedule regular evaluations for monitoring

This continuous improvement loop ensures your peers maintain high quality and adapt to changing requirements over time.

Built with VitePress