# CodeWave Advanced Features

Deep dive into CodeWave's sophisticated analysis capabilities: Developer Overview generation, Convergence Detection, Multi-Round Agent Discussion, and Agent Query Tracking.

## Table of Contents

- [Developer Overview](#developer-overview) - AI-generated commit summaries
- [Developer Growth Profiles & OKRs](#developer-growth-profiles--okrs) - AI-generated growth plans
- [Convergence Detection](#convergence-detection) - Agent consensus measurement
- [Multi-Round Agent Discussion](#multi-round-agent-discussion) - Structured agent collaboration
- [Agent RAG Query Tracking](#agent-rag-query-tracking) - Query monitoring and verification
- [Understanding Evaluation Depth](#understanding-evaluation-depth)
- [Advanced Usage](#advanced-usage)

---

## Developer Overview

Intelligent, AI-generated summary of code changes created automatically before agent evaluation.

### What is Developer Overview?

A concise, human-readable summary of what changed in a commit, generated by analyzing the diff:

```
Summary: Added actual estimation as a separate step

Details:
Introduced actual time estimation alongside ideal time in PR analysis
for better accuracy.

Key Changes:
- Implemented IActualTimeEstimator interface
- Created ActualTimeRunnable for estimation
- Merged actual time with PR lifecycle data
```

### How It Works

1. **Automatic Generation**: Runs as first step in evaluation pipeline
2. **Multi-Stage Process**:
   - Extract key changes from diff
   - Identify files modified and their purpose
   - Generate concise technical summary
   - Format for readability

3. **No Agent Opinion**: This is factual analysis, not evaluation
   - Not influenced by agent assessments
   - Same output regardless of agent disagreements
   - Based purely on code diff

4. **Available Everywhere**:
   - HTML report card (top section)
   - results.json file (`developerOverview` field)
   - Used as context for all agents

### When Generation Might Fail

**Common Causes:**

- Very large diffs (>500KB) - may timeout
- Binary files in commit - can't be analyzed
- Corrupted file content
- Network issues during generation

**What Happens:**

- Report shows: "Overview generation failed"
- Agents still evaluate using raw diff
- Report remains complete and useful
- Check console logs with `--verbose` for details

### Using Developer Overview

**For Team Leads:**

```
Quick commit assessment without reading full diff
1. Scan Developer Overview summary (10 seconds)
2. Check Functional Impact score
3. Decide if detailed review needed
```

**For CI/CD Pipelines:**

```bash
# Extract overview
jq '.developerOverview' results.json

# Parse for specific info
jq '.developerOverview' results.json | grep -i "breaking change"
```

**For Documentation:**

```bash
# Generate change log
for file in .evaluated-commits/*/results.json; do
  echo "## $(jq -r '.metadata.commitMessage' "$file")"
  echo "$(jq -r '.developerOverview' "$file")"
  echo ""
done > CHANGELOG_GENERATED.md
```

---

## Developer Growth Profiles & OKRs

CodeWave leverages its deep understanding of code quality and developer patterns to generate personalized growth profiles and OKRs.

### How It Works

1.  **Data Aggregation**: The system aggregates metric data from all evaluated commits for a specific author over a configurable timeframe (default: 3 months).
2.  **Pattern Recognition**: It identifies recurring patterns, such as:
    - Consistently high code quality but low test coverage.
    - Tendency to introduce technical debt in complex features.
    - High impact but frequent estimation misses.
3.  **LLM Analysis**: A specialized LLM agent analyzes this aggregated data to construct a "Growth Profile" and a set of actionable OKRs.
4.  **OKR Generation**: The system generates 3-5 Objectives, each with 3-5 Key Results. These are:
    - **Actionable**: Specific steps to take.
    - **Measurable**: Tied to CodeWave metrics (e.g., "Maintain Code Quality > 8").
    - **Time-bound**: Designed for the next quarter.

### Configuration

You can customize the OKR generation process:

```bash
# Set the lookback period
codewave generate-okr --months 6

# Filter by specific authors
codewave generate-okr --authors "Alice,Bob"

# Adjust LLM parameters (in config)
# "llm": { "maxTokens": 16000 } # Ensure sufficient tokens for detailed output
```

### Integration with Batch Evaluation

When running a batch evaluation, you can automatically generate OKRs at the end:

```bash
codewave batch --count 50 --generate-okr
```

This ensures that the OKRs are based on the most up-to-date analysis of the recent commits.

---

## Convergence Detection

Sophisticated algorithm that measures consensus among agents and optimizes evaluation rounds.

### What is Convergence?

**Definition**: How closely agents' final scores agree with each other

**Score Range**: 0.0 to 1.0 (higher = better consensus)

**Interpretation**:

- **0.9+**: Excellent consensus, very reliable evaluation
- **0.7-0.8**: Good consensus, minor disagreements
- **0.5-0.6**: Moderate agreement, some conflicting views
- **<0.5**: Low consensus, significant disagreement

### How Convergence Detection Works

**Phase 1: Calculate Metric Variance**

```
For each 7-pillar metric:
1. Get final score from each agent
2. Calculate standard deviation
3. Normalize to 0-1 scale (0 = all same, 1 = max variance)

Example:
Code Quality scores: [7, 7, 8, 6, 7] → variance = 0.2 (low)
Complexity scores: [3, 5, 7, 4, 6] → variance = 0.8 (high)
```

**Phase 2: Weighted Aggregation**

```
Convergence = 1 - (weighted_average_variance)

Weights:
- Code Quality: 2x (critical for quality gate)
- Test Coverage: 2x (critical for reliability)
- Functional Impact: 1.5x (important)
- Others: 1x (supporting metrics)
```

**Phase 3: Dynamic Round Optimization**

```
Target: Convergence >= 0.75 (configurable)

If convergence < target:
  → Continue to next round
  → Agents see previous assessments
  → Can adjust scores with reasoning

Maximum rounds: 3 (avoid infinite loops)

When to stop:
✓ Convergence reached target
✓ No new gaps identified
✓ Maximum rounds reached
```

### Convergence in Action

**Low Convergence Example:**

```
Evaluation 1 - Initial Assessments:
- Code Quality: [7, 6, 8] → Convergence: 0.45 (low)
- Variance: Reviewer thinks 7, Architect thinks 6, different perspectives

Round 2 - Concerns Raised:
- Architect: "Code is well-written but tightly coupled"
- Reviewer: "Coupling is acceptable for this context"
- Revised scores: [7, 7, 8] → Convergence: 0.65 (improving)

Round 3 - Validation:
- All agents review feedback
- Final scores: [7, 7, 7] → Convergence: 0.92 (excellent)
```

**High Convergence Example:**

```
Evaluation 1 - Initial Assessment:
- All agents strongly agree on scores
- Code Quality: [8, 8, 8] → Convergence: 0.98
- No disagreements to discuss
- Stop after Round 1 ✓
```

### Using Convergence Scores

**For Quality Assurance:**

```bash
# Find unreliable evaluations (low convergence)
jq '.[] | select(.convergenceScore < 0.6)' \
  .evaluated-commits/*/history.json
```

**For Statistical Analysis:**

```bash
# Average convergence across commits
jq '[.[].convergenceScore] | add / length' \
  .evaluated-commits/*/history.json
```

**For Flagging Problem Commits:**

```bash
# If convergence drops, something unusual about commit
jq -r 'select(.convergenceScore < 0.5) | .evaluationNumber' \
  .evaluated-commits/*/history.json
```

### Convergence Insights

**What Low Convergence Tells You:**

1. Commit is unusual or controversial
2. Different aspects are in tension (e.g., high impact but high risk)
3. Worth extra review and discussion
4. Disagreement is valuable - multiple perspectives on real trade-offs

**What High Convergence Tells You:**

1. Straightforward change
2. Clear quality assessment
3. Reliable evaluation
4. Agents quickly reached consensus

---

## Multi-Round Agent Discussion

The core mechanism that makes CodeWave's evaluation sophisticated: agents discuss, challenge, and refine their assessments.

### How Multi-Round Discussion Works

#### Round 1: Initial Assessment

**Agent Activity:**

- Each agent evaluates independently
- Uses only: diff, metadata, developer overview
- No knowledge of other agents' scores
- Each provides: summary, details, metrics

**Example Output:**

```
Business Analyst 🎯
Summary: High-impact feature for user workflows
Metrics: Impact=8, Ideal Time=6 hours

SDET 🧪
Summary: Test coverage needs improvement for edge cases
Metrics: Test Coverage=5, (no impact on other scores)

Developer Reviewer 🔍
Summary: Code well-structured but naming unclear in helpers
Metrics: Code Quality=6
```

**Purpose of Round 1:**

- Establish baseline perspectives
- Identify areas of agreement/disagreement
- Set context for discussion

#### Round 2: Concerns & Refinement

**What Happens:**

1. System collects all Round 1 scores
2. Agents see **only their own assessment** plus the **aggregated consensus**
3. System formulates concerns: "Impact scored 8 but Code Quality only 6 - why?"
4. Agents review concerns and can revise scores

**Agent Activity in Round 2:**

```
Architect sees:
- Their score: Complexity = 4
- Average from all: Complexity = 5.2
- Question: "Why is architectural complexity higher in consensus?"

Architect reviews feedback:
"Developer Reviewer noted tight coupling in data layer"

Architect revises:
"I was too optimistic. Coupling does increase complexity.
Revising: Complexity = 6 (was 4), Tech Debt = +3 hours"
```

**Purpose of Round 2:**

- Challenge assumptions
- Incorporate peer feedback
- Move toward consensus
- Highlight real trade-offs

#### Round 3: Validation & Consensus

**What Happens:**

1. System shows updated scores from Round 2
2. Agents review changes and final metrics
3. Final opportunity to adjust or validate
4. System calculates convergence

**Agent Activity in Round 3:**

```
All agents review updated scores
Are new assessments reasonable? Yes ✓
Do they reflect actual concerns? Yes ✓
Final validation accepted
Convergence Score: 0.78 ✓ (meets target)
```

**Purpose of Round 3:**

- Final validation
- Ensure changes are justified
- Build consensus
- Prepare final report

### What Each Agent Evaluates

#### Round 1-3: Business Analyst 🎯

**Primary Metrics:**

- Functional Impact (1-10): User/business value
- Ideal Time (hours): Perfect-case implementation time

**Questions in Discussions:**

- "Is this business value real or perceived?"
- "Are requirements clear enough for accurate time estimation?"
- "Does scope match the effort?"

**Concerns Raised:**

- Scope creep: "Impact high but ideal time seems short"
- Unclear requirements: "Can't estimate if requirements are vague"
- Misalignment: "High complexity but low impact seems risky"

---

#### Round 1-3: Developer Author 👨‍💻

**Primary Metrics:**

- Actual Time (hours): Real time spent
- Tech Debt (hours): Debt added (+) or removed (-)

**Questions in Discussions:**

- "What obstacles were encountered?"
- "Is the implementation approach sustainable?"
- "Did architectural choices increase debt?"

**Concerns Raised:**

- Time overrun: "Actual time 3x ideal - significant complexity"
- Debt accumulation: "Implementation created architectural liabilities"
- Maintainability: "Technical choices make future changes harder"

---

#### Round 1-3: Developer Reviewer 🔍

**Primary Metrics:**

- Code Quality (1-10): Correctness, design, readability
- Secondary: Test coverage adequacy

**Questions in Discussions:**

- "Does the code work correctly?"
- "Are there design improvements?"
- "Is it maintainable?"

**Concerns Raised:**

- Design issues: "Violates DRY principle, creates duplication"
- Readability: "Complex logic needs comments"
- Testing: "Core logic lacks test coverage"

---

#### Round 1-3: Senior Architect 🏛️

**Primary Metrics:**

- Complexity (1-10, reversed): How simple/complex is the architecture?
- Tech Debt (hours): Architectural liabilities

**Questions in Discussions:**

- "Is this scalable?"
- "Does it follow SOLID principles?"
- "What long-term consequences?"

**Concerns Raised:**

- Scalability: "Direct database access won't scale to production"
- Coupling: "Too tightly coupled to external service"
- Patterns: "Violates established architectural patterns"

---

#### Round 1-3: SDET 🧪

**Primary Metrics:**

- Test Coverage (1-10): Comprehensiveness of testing
- Quality: Automation framework quality

**Questions in Discussions:**

- "Are we confident this works?"
- "What scenarios aren't tested?"
- "Is test code maintainable?"

**Concerns Raised:**

- Coverage gaps: "Happy path tested but error handling not"
- Fragile tests: "Tests too tightly coupled to implementation"
- Infrastructure: "Test framework doesn't support edge case scenarios"

---

### Why Multi-Round Discussion Improves Evaluation

**Problem with Single-Round Evaluation:**

```
Agent 1 sees: "This is high impact"
Agent 2 sees: "This has high complexity"
Agent 3 sees: "Test coverage is low"

Without discussion:
→ High impact wins, low quality issues missed
→ Recommendation: "Ship it" (wrong!)
```

**With Multi-Round Discussion:**

```
Round 1: All agents submit assessments
Round 2: System highlights: "High impact but low quality - contradiction?"
Round 3: Agents discuss and agree: "High impact, but ship only with test improvements"
→ Recommendation: "Ship with test coverage requirement" (better!)
```

### Multi-Round Discussion in HTML Report

**View the Timeline:**

1. Open HTML report
2. Click "Conversation Timeline" section
3. See each round's assessments
4. Understand how agents refined their thinking

**What to Look For:**

- **Consensus**: When all agents agree (stronger signals)
- **Debate**: Different perspectives highlight real trade-offs
- **Reasoning**: Why scores changed between rounds

### Practical Examples

**Example 1: Quality vs. Speed Trade-off**

```
Round 1:
- Business Analyst: "High impact, implement quickly" (Impact=8, Ideal=4)
- SDET: "Test coverage is poor" (Tests=3)
- Architect: "Code is brittle" (Quality=4)

Round 2 Discussion:
- Analyst: "Should we delay for quality improvements?"
- Architect: "Yes, technical debt isn't worth 4-hour savings"

Round 3 Result:
- Recommendation: Add 2 more hours for quality
- Impact remains 8 (value is there)
- Quality improves to 7
- Ideal Time revised to 6 (was 4)
```

**Example 2: Framework Decision**

```
Round 1:
- Architect: "New framework adds complexity" (Complexity=3)
- SDET: "Framework lacks test support" (Tests=4)
- Developer: "Learning curve cost 6 hours" (Actual=10)

Round 2 Discussion:
- Architect: "Long-term benefits reduce complexity later?"
- Developer: "Yes, but current evaluation shows short-term pain"
- SDET: "Maturity improves next quarter"

Round 3 Result:
- Acknowledged: Current project is learning investment
- Tech Debt: +3 hours (but will reduce as team learns)
- Recommendation: Accept trade-off for strategic benefits
```

---

## Understanding Evaluation Depth

### Shallow Evaluation (Converges Quickly)

```
Round 1 Scores: [8, 8, 8, 8, 8]
Convergence: 0.98 ✓

Typical: Simple bug fixes, obvious improvements
Time: Fast (single round)
Reliability: Very high (unanimous)
```

### Deep Evaluation (Multiple Rounds)

```
Round 1: [8, 5, 6, 4, 7] → Convergence: 0.35 (low)
Round 2: [7, 6, 6, 5, 7] → Convergence: 0.62 (improving)
Round 3: [7, 6, 7, 6, 7] → Convergence: 0.78 (good) ✓

Typical: Complex features, trade-offs
Time: Longer (multiple rounds)
Reliability: Refined through discussion
Detail: Rich with nuance
```

### Key Insight

**More rounds ≠ worse evaluation**

Sometimes high-complexity changes need multiple rounds to properly assess trade-offs. The convergence score tells you if agents reached meaningful consensus.

---

## Advanced Usage

### Analyzing Discussion Quality

```bash
# Find commits with most agent disagreement
jq -s 'sort_by(.convergenceScore) | .[0:5]' \
  .evaluated-commits/*/history.json
```

### Identifying Red Flags

```bash
# Low convergence suggests need for discussion
jq '.[] | select(.convergenceScore < 0.6) | .evaluationNumber' \
  .evaluated-commits/*/history.json
```

### Tracking Consensus Over Time

```bash
# Graph convergence trend
for eval in .evaluated-commits/*/history.json; do
  echo "$(basename $(dirname $eval)): $(jq '.[-1].convergenceScore' $eval)"
done | sort -t: -k2 -rn
```

---

## Troubleshooting

### Convergence Very Low (<0.3)

**Possible Causes:**

1. Highly ambiguous commit
2. Major architectural disagreement
3. Unclear requirements

**What to Do:**

- Review agent feedback carefully
- Look for specific concerns in conversation
- Consider follow-up PR to address issues

### Convergence Stays Low Across Multiple Evaluations

**Possible Causes:**

1. Commit is genuinely controversial
2. Multiple valid interpretations
3. System may need recalibration

**What to Do:**

- This is valuable signal! Agents keep finding issues
- Treat as "needs team discussion"
- Low convergence is honest evaluation

---

## Agent RAG Query Tracking

Advanced monitoring system that tracks and reports which agents queried the documentation vector store, what queries were executed, and how successful those queries were.

### What is Agent RAG Tracking?

Agent RAG Tracking is an automatic system that monitors all Retrieval-Augmented Generation (RAG) queries made by agents during evaluation. It captures:

- **Which agent** made each query
- **What query text** was used
- **Which stores** were searched (diff, documentation, or both)
- **How many results** were found
- **Relevance scores** for results
- **Files referenced** in the results
- **Timestamp** of execution

### Why Track Agent Queries?

**Quality Assurance:**

- Verify that agents actually used documentation during evaluation
- Ensure agents checked architecture before making assessments
- Validate that RAG integration is working correctly

**Performance Analysis:**

- Measure documentation effectiveness
- Identify gaps in documentation
- Track query patterns over time

**Compliance & Audit:**

- Document which resources agents consulted
- Create audit trails for reproducibility
- Verify evaluation consistency

### How It Works

#### Automatic Tracking

Tracking happens automatically without any manual intervention:

```typescript
// When an agent creates a RAG helper:
const rag = new CombinedRAGHelper(diffStore, docStore);

// Set agent name for tracking (required)
rag.setAgentName('Senior Architect');

// When queries are executed (tracking happens automatically):
await rag.queryMultiple([
  { q: 'What design patterns are documented?', store: 'docs' },
  { q: 'Are there performance guidelines?', store: 'docs' },
]);
// ✓ Both queries are automatically tracked
```

#### Data Captured

For each query, the system captures:

```typescript
{
  agentName: 'Senior Architect',           // Which agent ran it
  query: 'What design patterns are used?', // The query text
  timestamp: 1730944523000,                // When it ran
  storeQueried: 'docs',                    // Which stores: 'diff' | 'docs' | 'both'
  resultCount: 3,                          // Total results found
  docResultCount: 3,                       // Results from documentation store
  relevantFiles: ['ARCHITECTURE.md'],      // Files referenced
  averageRelevanceScore: 0.87,             // Quality of matches (0-1)
  foundResults: true                       // Whether results were found
}
```

### Accessing Tracking Data

After evaluation completes, access tracking data with static methods:

#### Quick Check: Which Agents Used Documentation?

```typescript
import { CombinedRAGHelper } from './src/utils/combined-rag-helper';

// Get list of agents that checked documentation
const agentsUsedDocs = CombinedRAGHelper.getAgentsCheckedDocs();
console.log('Agents that verified documentation:', agentsUsedDocs);
// Output: ['Senior Architect', 'Business Analyst', 'SDET (Test Automation Engineer)', ...]
```

#### Coverage Metric: Documentation Usage Percentage

```typescript
// Get percentage of queries that used documentation
const coverage = CombinedRAGHelper.getDocumentationCoverage();
console.log(`Documentation coverage: ${coverage.toFixed(1)}%`);
// Output: Documentation coverage: 85.5%

if (coverage > 80) {
  console.log('✅ Excellent documentation usage');
} else {
  console.log('⚠️  Low documentation usage');
}
```

#### Detailed Report: Full Analysis

```typescript
// Get comprehensive markdown report with all metrics
const report = CombinedRAGHelper.getTrackingReport();
console.log(report);

// Shows:
// - Total queries executed
// - Per-agent breakdown
// - Query success rates
// - Relevance score statistics
// - File reference analysis
```

#### Raw Export: For Further Analysis

```typescript
// Export all tracking data as JSON
const data = CombinedRAGHelper.exportTrackingData();

// Data contains:
// - timestamp: When export was created
// - totalQueries: How many queries total
// - agents: Per-agent statistics
// - allQueries: Full query history

// Save for analysis or persistence
import * as fs from 'fs';
fs.writeFileSync('rag-tracking.json', JSON.stringify(data, null, 2));
```

### Verification Patterns

#### Pattern 1: Verify All Agents Checked Documentation

```typescript
const agentsUsedDocs = CombinedRAGHelper.getAgentsCheckedDocs();
const expectedAgents = [
  'Senior Architect',
  'Business Analyst',
  'SDET (Test Automation Engineer)',
  'Developer (Author)',
  'Developer Reviewer',
];

const allChecked = expectedAgents.every((agent) => agentsUsedDocs.includes(agent));

if (allChecked) {
  console.log('✅ All agents checked documentation');
} else {
  const missing = expectedAgents.filter((a) => !agentsUsedDocs.includes(a));
  console.log('❌ These agents skipped documentation:', missing);
}
```

#### Pattern 2: Check Effectiveness of Documentation

```typescript
const coverage = CombinedRAGHelper.getDocumentationCoverage();

if (coverage >= 80) {
  console.log('✅ Great documentation coverage:', coverage.toFixed(1) + '%');
  console.log('   Agents are actively using documentation');
} else if (coverage >= 50) {
  console.log('⚠️  Moderate documentation coverage:', coverage.toFixed(1) + '%');
  console.log('   Some agents may not be checking docs');
} else {
  console.log('❌ Low documentation coverage:', coverage.toFixed(1) + '%');
  console.log('   Most queries are not using documentation');
}
```

#### Pattern 3: Analyze Query Details

```typescript
const report = CombinedRAGHelper.getTrackingReport();

// Find specific sections:
// 1. Per-Agent Breakdown - See each agent's performance
// 2. Documentation Coverage Analysis - What files were referenced
// 3. Query Success Metrics - How well queries performed
```

#### Pattern 4: Export and Trend Analysis

```typescript
const data = CombinedRAGHelper.exportTrackingData();

// Analyze patterns:
// - Which files are most referenced?
// - Do agents consistently find results?
// - Are relevance scores improving over time?
// - Which agents use docs most effectively?

// Compare across multiple evaluations
const agentStats = data.agents.map((agent) => ({
  name: agent.agentName,
  queriesWithDocs: agent.queriesWithDocResults,
  avgDocResults: (agent.totalDocResults / agent.queriesExecuted).toFixed(1),
  filesReferenced: agent.relevantDocFiles.length,
}));

console.table(agentStats);
```

### Metrics Explained

| Metric                     | Meaning                                     | Good Range          | Examples                               |
| -------------------------- | ------------------------------------------- | ------------------- | -------------------------------------- |
| **Agents Checked Docs**    | Agents that executed at least one doc query | All 5               | `['Architect', 'Analyst', ...]`        |
| **Documentation Coverage** | % of queries that found documentation       | 70-100%             | 85.5% means 85.5% of queries used docs |
| **Query Success Rate**     | % of queries that returned results          | 80%+                | Indicates doc quality                  |
| **Avg Relevance Score**    | Average quality of matches (0-1)            | 0.7+                | 0.87 = good matches                    |
| **Files Referenced**       | How many doc files were used                | Depends on codebase | Higher = more comprehensive            |

### Practical Use Cases

#### Use Case 1: Quality Gate

```typescript
// Ensure evaluation met documentation standards
const agentsChecked = CombinedRAGHelper.getAgentsCheckedDocs().length;
const coverage = CombinedRAGHelper.getDocumentationCoverage();

const qualityGatePass = agentsChecked === 5 && coverage > 75;

if (!qualityGatePass) {
  console.warn('⚠️  Evaluation did not meet quality standards');
  console.warn('  - Agents using docs:', agentsChecked + '/5');
  console.warn('  - Documentation coverage:', coverage.toFixed(1) + '%');
  process.exit(1);
}
```

#### Use Case 2: Continuous Improvement

```typescript
// Track documentation effectiveness over evaluations
const data = CombinedRAGHelper.exportTrackingData();

console.log('📊 Documentation Effectiveness Report');
console.log('=====================================');
console.log('Total queries:', data.totalQueries);
console.log('Agents using docs:', data.agents.length);
console.log('Avg queries per agent:', (data.totalQueries / data.agents.length).toFixed(1));

// Identify weak areas
const lowPerformers = data.agents.filter((a) => a.queriesWithDocResults / a.queriesExecuted < 0.5);

if (lowPerformers.length > 0) {
  console.log('⚠️  Agents with low doc usage:');
  lowPerformers.forEach((a) => {
    console.log(
      `   - ${a.agentName}: ${((a.queriesWithDocResults / a.queriesExecuted) * 100).toFixed(1)}%`
    );
  });
}
```

#### Use Case 3: Debugging Failed Queries

```typescript
// Find which queries aren't getting results
const data = CombinedRAGHelper.exportTrackingData();

const failedQueries = data.allQueries.filter((q) => !q.foundResults);

if (failedQueries.length > 0) {
  console.log('❌ Queries with no results:');
  failedQueries.forEach((q) => {
    console.log(`   Agent: ${q.agentName}`);
    console.log(`   Query: "${q.query}"`);
    console.log(`   Store: ${q.storeQueried}`);
  });
  console.log('💡 Tip: These queries might indicate documentation gaps');
}
```

### Clearing Tracking Data

Reset tracking between evaluations:

```typescript
// Clear all tracking data
CombinedRAGHelper.clearTracking();

// Useful for:
// - Starting fresh evaluations
// - Clearing test data
// - Resetting for new batch
```

### Implementation Details

#### Tracking Service

The `AgentRAGTracker` singleton in `src/services/agent-rag-tracker.service.ts` manages all query tracking:

- **Automatic**: Happens transparently during `queryMultiple()`
- **Efficient**: Minimal performance overhead
- **Persistent**: Data persists across function calls within same evaluation
- **Accessible**: Static methods on `CombinedRAGHelper`

#### Agent Integration

All 5 agents automatically support tracking by calling `setAgentName()`:

```typescript
// In each agent's RAG initialization:
rag.setAgentName('Agent Name');

// Supported agents:
// - 'Senior Architect'
// - 'Business Analyst'
// - 'SDET (Test Automation Engineer)'
// - 'Developer (Author)'
// - 'Developer Reviewer'
```

### Verification Script

A test verification script demonstrates the full tracking system:

```bash
npm run build
npx ts-node test/verify-agent-rag-tracking.ts
```

This script:

1. Initializes documentation vector store
2. Simulates multiple agents querying
3. Displays comprehensive tracking report
4. Exports data to JSON file
5. Shows summary statistics

---

## Next Steps

- **[HTML_REPORT_GUIDE.md](./HTML_REPORT_GUIDE.md)** - Learn to read the report
- **[AGENTS.md](./AGENTS.md)** - Detailed agent specifications
- **[ARCHITECTURE.md](./ARCHITECTURE.md)** - System implementation details
