Optimizing Peers with AI-Powered Analysis

You've built your AI peer, run evaluations, and identified areas for improvement. But what comes next? Manually analyzing results and figuring out the right changes can be time-consuming and requires deep expertise.

Enter AI-Powered Analysis - a feature that uses GPT-4o to automatically analyze your peer's evaluation results and generate specific, actionable improvement suggestions. It's like having an AI expert consultant reviewing your peer's performance and recommending exactly what to fix.

The Challenge of Manual Optimization

Improving an AI peer traditionally requires:

Analyzing Failed Questions: Understanding why specific questions failed
Identifying Patterns: Finding common themes across failures
Root Cause Analysis: Determining underlying configuration issues
Generating Solutions: Deciding what changes to make
Implementation: Actually making the changes
Validation: Testing to ensure improvements worked

This process can take hours or even days, especially for complex peers with large evaluation suites.

How AI Analysis Works

AI-Powered Analysis automates this entire workflow:

Evaluation Results → AI Analysis → Actionable Suggestions → One-Click Application

Here's what happens behind the scenes:

1. Data Collection

The system gathers:

Failed and low-scoring questions
Peer's current configuration (prompt, tools, settings)
Evaluation metrics and patterns
Peer's purpose and context

2. Pattern Recognition

GPT-4o identifies:

Common failure types
Missing capabilities
Configuration issues
Knowledge gaps

3. Recommendation Generation

The AI creates:

Specific, concrete suggestions
Expected impact estimates
Priority rankings
Implementation details

4. Application

You can:

Preview each suggestion
Apply individually or in bulk
Edit suggestions before applying
Revert changes if needed

Real-World Example

Let's see AI Analysis in action with a real customer support peer.

The Scenario

Support Bot 3000 helps customers with product questions, but evaluation shows poor performance:

Overall Score: 68%
Pass Rate: 64%
Failed Questions: 18/50

Sample Failures:
❌ "What's the warranty period?" - Generic answer, no specifics
❌ "How do I activate my product?" - Missing step-by-step instructions
❌ "Is product X compatible with Y?" - Incorrect information

Running AI Analysis

After evaluation completes, click "Analyze with AI". Within 30 seconds, you receive:

Analysis Summary

markdown

## Overall Assessment

Your peer is struggling primarily with product-specific questions due to:

1. **Missing Product Knowledge**: No access to product documentation
2. **Vague System Prompt**: Lacks specific instructions for support scenarios
3. **High Temperature (0.8)**: Causing creative but inaccurate responses
4. **Missing Tools**: Cannot access real-time product data

Expected Improvement: +22-28% overall score

Suggested Improvements

Suggestion 1: Add Product Documentation Datasource (Priority: HIGH)

Current Issue:

The peer cannot answer product-specific questions about warranties, 
specifications, compatibility, or features because it has no access 
to product documentation.

Recommendation:

Enable the "Product Knowledge Base" datasource that contains:
- Product specifications
- Warranty information
- Compatibility charts
- User manuals
- Troubleshooting guides

This will allow the peer to retrieve accurate, up-to-date information 
when answering product questions.

Expected Impact: +18% on product-related questions

Implementation:

One-click to enable datasource
No prompt changes needed
Automatic integration with peer

Suggestion 2: Enhance System Prompt (Priority: HIGH)

Current Issue:

The peer's system prompt lacks specific guidance for handling support 
scenarios, leading to vague and unhelpful responses.

Recommendation:

Add these instructions to the system prompt:

"When answering customer support questions:

1. **Product Questions**: Always check the product knowledge base first
2. **Step-by-Step**: Provide clear, numbered instructions for how-to questions
3. **Warranty/Policy**: Reference exact policy terms, don't paraphrase
4. **Unknown Information**: Admit when you don't know and offer to escalate
5. **Empathy First**: Acknowledge the customer's issue before solving it

Example Response Format:
'I understand you're having trouble with [issue]. Here's how to resolve it:
1. [First step]
2. [Second step]
...
Let me know if you need any clarification!'"

Expected Impact: +12% on support interaction quality

Preview Changes:

diff

  You are a helpful customer support assistant.
+ When answering customer support questions:
+ 1. Product Questions: Always check product knowledge base first
+ 2. Step-by-Step: Provide clear, numbered instructions
+ ...

Suggestion 3: Lower Temperature Setting (Priority: MEDIUM)

Current Issue:

Temperature of 0.8 encourages creative responses, but support 
scenarios require accuracy and consistency over creativity.

Recommendation:

Reduce temperature from 0.8 to 0.3

This will:
- Reduce hallucinations and made-up information
- Increase consistency across similar questions
- Make responses more factual and deterministic
- Improve accuracy on policy/procedure questions

Expected Impact: +6% on factual accuracy

One-click apply

Suggestion 4: Enable Search Tool (Priority: MEDIUM)

Current Issue:

For questions about current events, pricing changes, or new 
product releases, the peer cannot access real-time information.

Recommendation:

Enable web search tool for queries about:
- Current promotions and pricing
- Recent product announcements
- Competitor comparisons
- Industry news affecting products

Configure with these restrictions:
- Only use for explicitly time-sensitive queries
- Always cite sources
- Verify information before presenting to user

Expected Impact: +4% on time-sensitive questions

Applying Suggestions

You have several options:

Option 1: Apply All

Click "Apply All Suggestions" to implement all recommendations at once:

✓ Added Product Knowledge Base datasource
✓ Updated system prompt with support guidelines
✓ Changed temperature from 0.8 to 0.3
✓ Enabled web search tool with restrictions

Changes saved. Ready to re-evaluate.

Option 2: Selective Application

Review and apply suggestions individually:

javascript

// Preview changes
const preview = suggestions[0].changes;

// Apply if satisfied
await peer.applySuggestion(suggestions[0].id);

Option 3: Edit Before Applying

Customize suggestions to fit your needs:

Original Suggestion:
"Lower temperature to 0.3"

Your Edit:
"Lower temperature to 0.4"
(Reasoning: We still want some creativity for greeting messages)

Results After Implementation

Re-run the evaluation:

Overall Score: 91% (+23%)
Pass Rate: 94% (+30%)
Failed Questions: 3/50 (-15)

Improvements:
✓ Product questions now accurate and detailed
✓ Step-by-step instructions clear and complete
✓ Consistent professional tone
✓ No more hallucinations

ROI: 30 minutes of work for 23% performance improvement!

Advanced AI Analysis Features

Custom Analysis Focus

Optimize for specific goals:

javascript

POST /api/v1/evaluation/:runId/suggest-improvements
{
  "focus": "accuracy",  // or "speed", "cost", "tone"
  "constraints": {
    "maxTemperature": 0.5,
    "preferredTools": ["datasource"],
    "budgetLimit": 1000
  }
}

Focus Options:

Accuracy: Prioritize correctness over speed
Speed: Optimize response time
Cost: Minimize credit/token usage
Tone: Improve communication style

Iterative Optimization

Run multiple analysis rounds:

Round 1: Basic improvements → 68% to 85%
Round 2: Fine-tuning → 85% to 91%
Round 3: Edge case handling → 91% to 94%
Round 4: Optimization → 94% to 96%

Each round focuses on progressively smaller issues.

A/B Testing with AI Suggestions

Compare suggestion effectiveness:

javascript

// Create two peer variants
const peerA = await peer.clone();
const peerB = await peer.clone();

// Apply different suggestion combinations
await peerA.applySuggestions([1, 2]);    // Prompt + datasource
await peerB.applySuggestions([1, 3, 4]); // Prompt + temperature + tools

// Evaluate both
const scoreA = await evaluate(peerA);
const scoreB = await evaluate(peerB);

// Deploy winner
await deploy(scoreB.score > scoreA.score ? peerB : peerA);

Historical Tracking

View all past analyses:

Analysis History:

Oct 15: +18% (added datasources)
Oct 12: +8% (improved prompt)
Oct 8: +5% (adjusted temperature)
Oct 1: +12% (enabled tools)

Total Improvement: +43% over 20 days

Best Practices

1. Start with Comprehensive Evaluation

AI Analysis quality depends on evaluation quality:

❌ Bad: 10 questions, narrow scope
✅ Good: 50+ questions, diverse scenarios

2. Run Analysis After Each Major Change

Track incremental improvements:

Change → Evaluate → Analyze → Apply → Repeat

3. Don't Apply Everything Blindly

Review suggestions for your specific context:

Some suggestions may not fit your use case
Business requirements might override AI recommendations
Test critical changes in staging first

4. Combine with Human Expertise

AI analysis + human judgment = best results:

AI Suggestion: "Add web search for pricing questions"
Your Decision: "Good idea, but restrict to official sources only"

5. Monitor After Application

Ensure suggestions actually improved performance:

javascript

const beforeScore = 0.68;
await applySuggestions();
const afterScore = await evaluate();

if (afterScore < beforeScore) {
  console.log('Suggestions made things worse!');
  await revert();
}

6. Document Your Changes

Keep track of what works:

markdown

## Optimization Log

### 2025-10-15: Added Product Datasource
- Suggestion: AI Analysis #12
- Expected: +18%
- Actual: +21%
- Notes: Exceeded expectations, huge impact

### 2025-10-12: Updated System Prompt
- Suggestion: AI Analysis #11
- Expected: +12%
- Actual: +9%
- Notes: Good but slightly lower than predicted

Common Patterns in AI Suggestions

Through thousands of analyses, we've identified common patterns:

Pattern 1: Missing Knowledge

Symptom: Low scores on factual questions

AI Suggests:

Add relevant datasources
Enable web search
Expand knowledge base

Success Rate: 85%

Pattern 2: Poor Instruction Following

Symptom: Correct information, wrong format

AI Suggests:

Add explicit formatting instructions
Include examples in prompt
Adjust temperature

Success Rate: 78%

Pattern 3: Inconsistent Responses

Symptom: Same question, different answers

AI Suggests:

Lower temperature
Add deterministic constraints
Use more specific prompts

Success Rate: 92%

Pattern 4: Scope Creep

Symptom: Answering outside intended domain

AI Suggests:

Add scope restrictions to prompt
Enable guardrails
Define clear boundaries

Success Rate: 88%

Limitations and Considerations

What AI Analysis Can't Do

❌ Fix fundamental design flaws: Wrong model, wrong approach
❌ Create missing data: Can suggest datasources, can't create content
❌ Override business logic: Can't change your policies
❌ Guarantee specific scores: Estimates are educated guesses

When to Ignore Suggestions

Conflicts with business requirements: Your rules take priority
Suggests proprietary tools you don't have: Skip or find alternatives
Recommends massive prompt changes: Iterate gradually instead
Pushes changes you've already tried: Trust your experience

Privacy and Security

AI Analysis:

✅ Never shares your data externally
✅ Uses only evaluation results and config
✅ Doesn't store sensitive customer data
✅ Complies with data retention policies

Measuring ROI

Track the value of AI Analysis:

Time Savings

Manual Analysis: 2-3 hours per iteration
AI Analysis: 30 seconds + 15 minutes review
Time Saved: ~90%

Performance Improvement

Average Improvement: +15-25% per analysis round
Time to 90%+ Score:
- Manual: 2-3 weeks
- With AI: 3-5 days

Cost Efficiency

AI Analysis Cost: ~10 credits
Value of 20% Performance Improvement: Priceless
ROI: Immediate

Integration Examples

Automated Optimization Pipeline

javascript

// Nightly optimization job
cron.schedule('0 2 * * *', async () => {
  const peers = await peer.listAll();
  
  for (const p of peers) {
    // Run evaluation
    const run = await evaluation.execute(p.defaultSuite);
    
    // If score below threshold, get suggestions
    if (run.averageScore < 0.85) {
      const analysis = await evaluation.analyze(run.id);
      
      // Auto-apply low-risk suggestions
      const safeChanges = analysis.suggestions
        .filter(s => s.risk === 'low' && s.priority === 'high');
      
      await peer.applySuggestions(p.id, safeChanges);
      
      // Notify team
      await slack.notify({
        message: `${p.name} auto-optimized: ${safeChanges.length} changes applied`
      });
    }
  }
});

Continuous Improvement Dashboard

javascript

// Track optimization trends
const dashboard = {
  totalAnalyses: 47,
  totalSuggestionsApplied: 156,
  averageImprovement: 0.18,
  topSuggestion: 'Add datasources (45% of cases)',
  lastOptimization: '2 hours ago',
  currentScore: 0.94
};

What's Next?

AI-Powered Analysis is just one tool in your optimization toolkit. Combine it with:

Regular Evaluation: Keep testing continuously
User Feedback: Listen to real users
A/B Testing: Validate improvements scientifically
Manual Review: Apply human judgment

Coming soon:

Multi-variant optimization: Test multiple suggestions simultaneously
Automated A/B testing: Deploy and compare automatically
Predictive analysis: Suggest improvements before problems occur
Custom optimization goals: Define your own success metrics

Try It Yourself

Ready to optimize your peers with AI?

Run an evaluation on your peer
Click "Analyze with AI" on the results page
Review the suggestions
Apply what makes sense for your use case
Re-evaluate to measure improvement

Most users see significant improvements within their first analysis session!

Resources

Conclusion

AI-Powered Analysis transforms peer optimization from an art into a science. Instead of guessing what might improve performance, you get:

✅ Data-driven recommendations
✅ Concrete implementation steps
✅ Expected impact estimates
✅ One-click application

The result? Better peers, faster optimization, and more time to focus on what matters: building great AI experiences.

Questions? Join our community forum or reach out to our team.

Optimizing Peers with AI-Powered Analysis ​

The Challenge of Manual Optimization ​

How AI Analysis Works ​

1. Data Collection ​

2. Pattern Recognition ​

3. Recommendation Generation ​

4. Application ​

Real-World Example ​

The Scenario ​

Running AI Analysis ​

Analysis Summary ​

Suggested Improvements ​

Suggestion 1: Add Product Documentation Datasource (Priority: HIGH) ​

Suggestion 2: Enhance System Prompt (Priority: HIGH) ​

Suggestion 3: Lower Temperature Setting (Priority: MEDIUM) ​

Suggestion 4: Enable Search Tool (Priority: MEDIUM) ​

Applying Suggestions ​

Option 1: Apply All ​

Option 2: Selective Application ​

Option 3: Edit Before Applying ​

Results After Implementation ​

Advanced AI Analysis Features ​

Custom Analysis Focus ​

Iterative Optimization ​

A/B Testing with AI Suggestions ​

Historical Tracking ​

Best Practices ​

1. Start with Comprehensive Evaluation ​

2. Run Analysis After Each Major Change ​

3. Don't Apply Everything Blindly ​

4. Combine with Human Expertise ​

5. Monitor After Application ​

6. Document Your Changes ​

Common Patterns in AI Suggestions ​

Pattern 1: Missing Knowledge ​

Pattern 2: Poor Instruction Following ​

Pattern 3: Inconsistent Responses ​

Pattern 4: Scope Creep ​

Limitations and Considerations ​

What AI Analysis Can't Do ​

When to Ignore Suggestions ​

Privacy and Security ​

Measuring ROI ​

Time Savings ​

Performance Improvement ​

Cost Efficiency ​

Integration Examples ​

Automated Optimization Pipeline ​

Continuous Improvement Dashboard ​

What's Next? ​

Try It Yourself ​

Resources ​

Conclusion ​

Optimizing Peers with AI-Powered Analysis

The Challenge of Manual Optimization

How AI Analysis Works

1. Data Collection

2. Pattern Recognition

3. Recommendation Generation

4. Application

Real-World Example

The Scenario

Running AI Analysis

Analysis Summary

Suggested Improvements

Suggestion 1: Add Product Documentation Datasource (Priority: HIGH)

Suggestion 2: Enhance System Prompt (Priority: HIGH)

Suggestion 3: Lower Temperature Setting (Priority: MEDIUM)

Suggestion 4: Enable Search Tool (Priority: MEDIUM)

Applying Suggestions

Option 1: Apply All

Option 2: Selective Application

Option 3: Edit Before Applying

Results After Implementation

Advanced AI Analysis Features

Custom Analysis Focus

Iterative Optimization

A/B Testing with AI Suggestions

Historical Tracking

Best Practices

1. Start with Comprehensive Evaluation

2. Run Analysis After Each Major Change

3. Don't Apply Everything Blindly

4. Combine with Human Expertise

5. Monitor After Application

6. Document Your Changes

Common Patterns in AI Suggestions

Pattern 1: Missing Knowledge

Pattern 2: Poor Instruction Following

Pattern 3: Inconsistent Responses

Pattern 4: Scope Creep

Limitations and Considerations

What AI Analysis Can't Do

When to Ignore Suggestions

Privacy and Security

Measuring ROI

Time Savings

Performance Improvement

Cost Efficiency

Integration Examples

Automated Optimization Pipeline

Continuous Improvement Dashboard

What's Next?

Try It Yourself

Resources

Conclusion