# Issue Classification Guide

> **Version:** 1.0.0  
> **Last Updated:** 2025-10-19  
> **Related:** [AI Issue Classification Spec](../memory-bank/requirements/specs/ai-issue-classification.md)

## Overview

This guide provides a comprehensive framework for classifying GitHub issues on a **1-10 scale** based on AI capability and human oversight requirements. The classification system enables Theory of Constraints (TOC) optimization to protect expert capacity and maximize development throughput.

## Classification Zones

Issues are classified into **4 zones** based on the level of human involvement required:

| Zone | Classification | AI Capability | Human Role | Best For |
|------|----------------|---------------|------------|----------|
| **ai-solo** | 1-3 | AI-autonomous | Reactive only | File ops, docs, simple changes |
| **ai-led** | 4-6 | AI-led | Validates | Bug fixes, isolated features, API work |
| **ai-assisted** | 7-8 | AI assists | Leads implementation | Refactoring, novel features, optimization |
| **ai-limited** | 9-10 | AI provides context | Owns implementation | Security, breaking changes, architecture |

## Theory of Constraints (TOC) Context

### Why Classification Matters

The classification system applies TOC principles to software development:

1. **Identify the Constraint**: Expert engineers (limited availability, highest skill level)
2. **Exploit the Constraint**: Experts only work on 7-10 classification tasks
3. **Subordinate Everything Else**: Junior engineers (4-6) and AI (1-3) handle other work
4. **Elevate Performance**: Optimize work sequencing to minimize expert wait time

### Resource Allocation

```
Classification 1-3 (ai-solo):     → AI + Junior Review
Classification 4-6 (ai-led):      → Junior Engineer + AI
Classification 7-8 (ai-assisted): → Senior Engineer + AI
Classification 9-10 (ai-limited): → Senior Engineer Only
```

## Classification Levels (1-10)

### Zone 1: ai-solo (1-3)

**AI-autonomous, human reactive only**

#### Classification 1: Fully Automated
- **Characteristics**: Zero ambiguity, well-defined rules, no edge cases
- **Human Role**: None (unless AI fails)
- **Examples**:
  - Run linting and auto-fix formatting
  - Update dependency versions in package.json
  - Rename files using standard naming conventions
  - Delete deprecated files per migration plan
  - Add missing semicolons per style guide

**When to use**: Mechanical changes with zero decision-making required.

---

#### Classification 2: Quick Inspection
- **Characteristics**: Low ambiguity, minimal context needed, quick human review
- **Human Role**: Quick glance to verify correctness (< 2 minutes)
- **Examples**:
  - Update README with new command usage
  - Fix typos in documentation
  - Add simple text content to markdown files
  - Update copyright year in headers
  - Change placeholder text in templates

**When to use**: Text changes that need minimal validation.

---

#### Classification 3: Light Review
- **Characteristics**: Some context needed, straightforward logic, low risk
- **Human Role**: Review for correctness and context fit (5-10 minutes)
- **Examples**:
  - Add logging statements to existing functions
  - Create simple helper functions (e.g., `formatDate()`)
  - Update error messages for clarity
  - Add JSDoc comments to functions
  - Modify template structure (add/remove sections)

**When to use**: Simple code additions that follow established patterns.

---

### Zone 2: ai-led (4-6)

**AI-led, human validates**

#### Classification 4: Manual Verification
- **Characteristics**: Requires testing, multiple paths, some edge cases
- **Human Role**: Manual testing and verification (15-30 minutes)
- **Examples**:
  - Fix simple bugs (off-by-one errors, null checks)
  - Add validation to function parameters
  - Create isolated utility functions
  - Update configuration files with new options
  - Implement simple CLI flags

**When to use**: Bug fixes and small features that need manual testing.

---

#### Classification 5: Scenario Testing
- **Characteristics**: Multiple code paths, API interactions, moderate complexity
- **Human Role**: Test multiple scenarios, validate edge cases (30-60 minutes)
- **Examples**:
  - Add new API endpoint with error handling
  - Implement feature with 3-5 acceptance criteria
  - Refactor function with dependencies (< 3 files)
  - Add GitHub CLI command integration
  - Create command-line interface for script

**When to use**: Features requiring scenario testing and validation.

---

#### Classification 6: Integration Testing
- **Characteristics**: Cross-system impact, database changes, workflow modifications
- **Human Role**: Integration testing, verify system behavior (1-2 hours)
- **Examples**:
  - Add database migration with schema changes
  - Update CI/CD workflow with new steps
  - Implement multi-file refactoring (3-7 files)
  - Add authentication middleware
  - Modify template generation workflow

**When to use**: Changes affecting multiple systems or requiring integration validation.

---

### Zone 3: ai-assisted (7-8)

**Human-led, AI assists**

#### Classification 7: Detailed Guidance Needed
- **Characteristics**: Complex dependencies, requires design decisions, risk of regression
- **Human Role**: Design approach, guide implementation, review carefully (2-4 hours)
- **Examples**:
  - Refactor core module with multiple dependents
  - Update detection utilities with new heuristics
  - Implement feature requiring architectural decisions
  - Add caching layer to existing system
  - Migrate from one library to another

**When to use**: Complex work requiring human design and AI execution.

---

#### Classification 8: Human Does Core Work
- **Characteristics**: Novel implementation, performance critical, high complexity
- **Human Role**: Write core logic, AI handles boilerplate and tests (4-8 hours)
- **Examples**:
  - Implement novel algorithm or data structure
  - Optimize performance bottleneck
  - Design and implement new architectural pattern
  - Create complex state machine
  - Build new workflow automation system

**When to use**: Novel features where human writes core logic, AI assists with tests/docs.

---

### Zone 4: ai-limited (9-10)

**Human-owned**

#### Classification 9: Human Primary
- **Characteristics**: Security-critical, breaking changes, expert knowledge required
- **Human Role**: Design and implement, AI provides context (8-16 hours)
- **Examples**:
  - Implement authentication/authorization system
  - Make breaking API changes with migration path
  - Fix critical security vulnerability
  - Design database schema for new feature
  - Implement complex business logic with edge cases

**When to use**: Critical changes requiring expert judgment and deep domain knowledge.

---

#### Classification 10: Human-Only
- **Characteristics**: Architectural redesign, novel algorithms, strategic decisions
- **Human Role**: Full ownership, AI for research/reference only (16+ hours)
- **Examples**:
  - Redesign system architecture
  - Invent novel algorithm for unique problem
  - Make strategic technical decisions (language, framework)
  - Design roadmap automation platform
  - Create foundational abstractions

**When to use**: Strategic work requiring expert vision and judgment.

---

## Sizing Constraints

### Epic Constraints

Epics have structural and effort limits to prevent overwhelming `/autopilot` processing and maintain code quality:

| Constraint | Limit | Notes |
|---|---|---|
| **Minimum issues** | 2 | 1 issue stays as standalone, 2+ justifies epic |
| **Maximum issues** | 25 | Can increase to ~50 if needed; 25 is current practical limit |
| **Maximum effort** | 40 hours | Can increase to 50 if needed; prevents context window exhaustion |
| **Variance tolerance** | ±5% | 25.625 issues or 42 hours allowed before hard block |

**Why these limits?**
- Context window exhaustion: `/autopilot` processes better with focused scopes
- Implementation quality: Smaller epics = easier to review and integrate
- Token costs: Larger epics consume more processing power
- Error reduction: Smaller scopes = fewer edge cases to miss

### Issue Constraints

Individual issues have classification-based effort limits to catch misclassification and maintain manageability:

| Classification Zone | Classification | Max Hours | Rationale |
|---|---|---|---|
| **ai-solo** | 1-3 | **4 hours** | Fully automated work; >4 hrs suggests misclassification |
| **ai-led** | 4-6 | **12 hours** | Manual testing required; ~1-1.5 day chunks |
| **ai-assisted** | 7-8 | **16 hours** | Human-designed work; 2-day max for focus |
| **ai-limited** | 9-10 | **24 hours** | Expert-owned; larger acceptable for strategic work |

**General Limits (across all classifications):**
- **Minimum**: 2 hours per issue (suggest bundling if < 2 hours)
- **Maximum**: 14 hours per issue (hard limit; can increase to 16 if needed)
- **Variance tolerance**: ±5% on all limits

**Why classification-based limits?**
1. **Catches misclassification**: A Classification 1 issue at 8 hours should trigger review
2. **Protects expert time**: Forces large issues to be expert-owned (9-10) where capacity is tracked
3. **Enforces zone principles**: ai-solo should be quick; ai-limited can be substantial
4. **Provides guidance**: Developers see recommendations early, before getting blocked

### Bundling Recommendation

Issues estimated < 2 hours should typically be bundled with related work:
- "Add log statement" (15 min) + "Update error message" (20 min) → 1 issue (35 min)
- Creates fewer GitHub issues
- Reduces notification noise
- Faster review cycles

### When to Split/Increase

**Split an issue if:**
- Exceeds max hours for its classification
- Combines different classification levels (e.g., "Fix bug (4) + Refactor system (7)")
- Has independent work streams that could be done in parallel

**Increase limits if:**
- Expert judgment indicates larger scope is necessary
- Issue is architectural/foundational (classification 9-10)
- Requires contiguous focus to avoid context switching
- Document justification in issue body for audit trail

---

## Decision Tree

Use this flowchart to classify issues:

```
START: What are you building?
│
├─ Is it mechanical/automated? (linting, renaming, deletion)
│  └─ YES → Classification 1
│
├─ Is it text/documentation only?
│  └─ YES → Classification 2
│
├─ Is it a simple code addition following existing patterns?
│  └─ YES → Classification 3
│
├─ Does it require manual testing?
│  ├─ Simple bug fix or isolated feature?
│  │  └─ YES → Classification 4
│  ├─ API work or multi-path feature?
│  │  └─ YES → Classification 5
│  └─ Database changes or integration testing needed?
│     └─ YES → Classification 6
│
├─ Does it require human design decisions?
│  ├─ Refactoring with dependencies or detection logic?
│  │  └─ YES → Classification 7
│  └─ Novel feature or performance optimization?
│     └─ YES → Classification 8
│
└─ Is it security-critical, breaking, or architectural?
   ├─ Breaking changes or security fixes?
   │  └─ YES → Classification 9
   └─ Architectural redesign or novel algorithms?
      └─ YES → Classification 10
```

## Real-World Examples from Roadcrew

### Classification 1 Examples

1. **Run ESLint auto-fix on all TypeScript files**
   - Why: Fully automated, zero decision-making
   - Zone: ai-solo

2. **Delete deprecated scripts/old-template.ts file**
   - Why: Mechanical file deletion per documented plan
   - Zone: ai-solo

3. **Update package.json Node version from 20 to 22**
   - Why: Simple version bump, well-defined
   - Zone: ai-solo

### Classification 2 Examples

1. **Fix typos in README.md**
   - Why: Text-only change, quick inspection
   - Zone: ai-solo

2. **Update COMMANDS.md with new /analyze-epic usage**
   - Why: Documentation update, minimal context
   - Zone: ai-solo

3. **Change placeholder text in epic.template.md**
   - Why: Template text update, straightforward
   - Zone: ai-solo

### Classification 3 Examples

1. **Add console.log statements to debug workflow**
   - Why: Simple logging, follows pattern
   - Zone: ai-solo

2. **Create formatIssueTitle() helper function**
   - Why: Simple utility, established pattern
   - Zone: ai-solo

3. **Update error messages in scope-release for clarity**
   - Why: Text improvements, light code review
   - Zone: ai-solo

### Classification 4 Examples

1. **Fix off-by-one error in issue numbering**
   - Why: Simple bug fix, needs manual test
   - Zone: ai-led

2. **Add validation for missing GitHub token**
   - Why: Isolated feature, manual verification
   - Zone: ai-led

3. **Implement --dry-run flag for scope-release**
   - Why: Simple feature, test scenarios
   - Zone: ai-led

### Classification 5 Examples

1. **Add GitHub API endpoint to fetch epic child issues**
   - Why: API integration, multiple scenarios
   - Zone: ai-led

2. **Implement /sync-roadmap command (5 acceptance criteria)**
   - Why: Multi-path feature, scenario testing
   - Zone: ai-led

3. **Refactor template parser (3 files affected)**
   - Why: Moderate refactoring, cross-file changes
   - Zone: ai-led

### Classification 6 Examples

1. **Add Prisma migration for new ClassificationLevel field**
   - Why: Database schema change, integration test
   - Zone: ai-led

2. **Update CI workflow to run classification validation**
   - Why: CI/CD modification, system integration
   - Zone: ai-led

3. **Modify template generation to support classification**
   - Why: Workflow change, multi-system impact
   - Zone: ai-led

### Classification 7 Examples

1. **Refactor issue-classification.ts with new algorithm**
   - Why: Core module, multiple dependents
   - Zone: ai-assisted

2. **Update deployment detection with Azure heuristics**
   - Why: Detection logic, design decisions
   - Zone: ai-assisted

3. **Implement TOC sequencing for epic dependencies**
   - Why: Complex algorithm, architectural decision
   - Zone: ai-assisted

### Classification 8 Examples

1. **Optimize template parsing for 10x performance**
   - Why: Performance critical, novel optimization
   - Zone: ai-assisted

2. **Design and implement classification validation system**
   - Why: New system, architectural design
   - Zone: ai-assisted

3. **Create workflow automation platform (roadcrew core)**
   - Why: Novel feature, complex state management
   - Zone: ai-assisted

### Classification 9 Examples

1. **Implement GitHub token security validation**
   - Why: Security-critical feature
   - Zone: ai-limited

2. **Make breaking changes to template variable syntax**
   - Why: Breaking change, migration path needed
   - Zone: ai-limited

3. **Fix critical bug in release versioning logic**
   - Why: Critical fix, expert knowledge required
   - Zone: ai-limited

### Classification 10 Examples

1. **Redesign roadcrew architecture for multi-repo support**
   - Why: Architectural redesign, strategic decision
   - Zone: ai-limited

2. **Invent novel algorithm for automatic epic dependency detection**
   - Why: Novel algorithm, expert-level work
   - Zone: ai-limited

3. **Design classification system itself (this system)**
   - Why: Foundational design, strategic vision
   - Zone: ai-limited

## Scoring Guidelines

### Classification Factors

When assigning classification, consider:

1. **Complexity**: How complex is the logic?
2. **Risk**: What's the impact of failure?
3. **Novelty**: Is this well-established or new?
4. **Dependencies**: How many systems/files affected?
5. **Testing**: How much testing is required?
6. **Reversibility**: How easy to rollback?

### Common Mistakes

❌ **Don't classify based on time alone**
- A 1-hour security fix is still classification 9
- A 4-hour documentation update is still classification 2

❌ **Don't mix classification zones in one issue**
- Split issues if parts have different classifications
- Example: "Add field (classification 3) + optimize query (classification 8)" → Split into 2 issues

❌ **Don't underestimate integration complexity**
- Multi-file changes often need higher classification
- Database changes usually classification 6+

✅ **Do consider expert capacity**
- If issue ties up expert for days → likely classification 8-10
- If junior can handle with light review → likely classification 4-6

## Validation

Use this checklist to validate your classification:

### For ai-solo (1-3)
- [ ] Can AI complete this autonomously?
- [ ] Is human review < 10 minutes?
- [ ] Are there established patterns to follow?
- [ ] Is the risk of failure low?

### For ai-led (4-6)
- [ ] Does AI need human validation?
- [ ] Is testing required (manual or automated)?
- [ ] Are there multiple code paths or edge cases?
- [ ] Is the impact contained to < 7 files?

### For ai-assisted (7-8)
- [ ] Does human need to design the approach?
- [ ] Is this novel or performance-critical?
- [ ] Are there complex dependencies?
- [ ] Is expert judgment required?

### For ai-limited (9-10)
- [ ] Is this security-critical or breaking?
- [ ] Does it require architectural decisions?
- [ ] Is deep domain expertise necessary?
- [ ] Would failure have severe consequences?

## Assignment Rules

Based on `config/assignment-rules.yml`:

```yaml
assignments:
  high-risk: "delphimon"       # Classification 7-10
  medium-risk: "delphimon"     # Classification 7-10
  low-risk: "samuelhenry"      # Classification 1-6
```

**Updated for classification system:**

```yaml
assignments:
  ai-solo: "samuelhenry"       # Classification 1-3 (AI + junior)
  ai-led: "samuelhenry"        # Classification 4-6 (junior + AI)
  ai-assisted: "delphimon"     # Classification 7-8 (senior + AI)
  ai-limited: "delphimon"      # Classification 9-10 (senior only)
```

## FAQs

### Q: What if an issue spans multiple classification levels?

**A:** Split the issue into multiple smaller issues, each with a single classification level.

Example: "Implement auth system (9) + update docs (2)" → Create 2 issues

---

### Q: Should I round up or down when uncertain?

**A:** Round **up** for safety. It's better to assign a senior dev to a classification 6 issue than assign a junior dev to a classification 7 issue.

---

### Q: Can classification change during implementation?

**A:** Yes! If you discover the issue is more complex than estimated, update the classification and reassign if needed.

---

### Q: How do I classify bug fixes?

**A:** Consider the **complexity of the fix**, not the severity of the bug:
- Simple null check fix: Classification 4
- Security vulnerability fix: Classification 9
- Performance regression: Classification 8

---

### Q: What about spikes or research tasks?

**A:** Research typically classification 7-10 (requires expert judgment). Create a time-boxed spike issue with clear acceptance criteria.

---

## TaskProfile Dimensions

> **New in v2.0:** The classification system now uses objective dimension-based scoring instead of subjective 1-10 estimates. This section describes the four dimensions that compose a TaskProfile.

The TaskProfile system replaces single-score classification with four objective dimensions. Each dimension is scored 1-10, and these scores are combined using a weighted formula to derive the final classification (1-10) and zone (ai-solo, ai-led, ai-assisted, ai-limited).

### Determinism

**Measures:** How rule-based vs. creative judgment a task requires

**Scale:**
- **1 (Mechanical)**: Completely rule-based, zero ambiguity
  - Example: "Add import statement for existing module"
  - Example: "Find and replace exact string pattern"
  - Example: "Run linter auto-fix on all files"
  
- **5 (Moderate)**: Some interpretation needed, clear patterns exist
  - Example: "Refactor function for clarity following established patterns"
  - Example: "Add validation following existing validation patterns"
  - Example: "Update API endpoint using existing route structure"
  
- **10 (Creative Judgment)**: Requires creative problem-solving, no clear rules
  - Example: "Design new architecture from scratch"
  - Example: "Invent novel algorithm for unique problem"
  - Example: "Make strategic framework choice with no precedent"

**Formula Weight:** 30% (highest weight - determinism is the strongest predictor of AI capability)

---

### Context Breadth

**Measures:** How many files/systems a task touches

**Scale:**
- **1 (Single File)**: Change affects one file only
  - Example: "Fix typo in README.md"
  - Example: "Add helper function to utils.ts"
  - Example: "Update error message in single file"
  
- **5 (Multiple Related Files)**: Changes span related files/modules
  - Example: "Update API endpoint + tests + documentation"
  - Example: "Refactor utility module and all its consumers (3-5 files)"
  - Example: "Add feature requiring model + controller + tests"
  
- **10 (System-Wide)**: Changes affect entire system architecture
  - Example: "Migrate entire database schema across all tables"
  - Example: "Refactor authentication system affecting all routes"
  - Example: "Change build system affecting all packages"

**Formula Weight:** 25%

---

### Verification Cost

**Measures:** How expensive/time-consuming testing and verification is

**Scale:**
- **1 (Instant Feedback)**: Immediate verification available
  - Example: "TypeScript compilation passes"
  - Example: "Linter passes with no errors"
  - Example: "Unit test suite runs in < 5 seconds"
  
- **5 (Manual Testing Required)**: Requires human verification or staging deployment
  - Example: "Manual QA testing in staging environment"
  - Example: "Integration test suite takes 10-30 minutes"
  - Example: "Requires deploying to test environment for validation"
  
- **10 (Production-Only Verification)**: Can only be verified in production, high cost
  - Example: "A/B test in production for 1 week"
  - Example: "Performance impact only visible under production load"
  - Example: "Security audit requires external penetration testing"

**Formula Weight:** 25%

---

### Domain Knowledge

**Measures:** How specialized the expertise required is

**Scale:**
- **1 (Generic Programming)**: Standard programming knowledge suffices
  - Example: "Add standard REST API endpoint"
  - Example: "Update logging statements"
  - Example: "Fix common JavaScript syntax error"
  
- **5 (Framework-Specific)**: Requires knowledge of specific frameworks/tools
  - Example: "Optimize React component using hooks correctly"
  - Example: "Fix Jest test configuration for ES modules"
  - Example: "Implement GitHub Actions workflow with proper secrets"
  
- **10 (Domain Expertise)**: Requires deep specialized knowledge
  - Example: "Design HIPAA-compliant data pipeline"
  - Example: "Implement financial compliance system (PCI-DSS)"
  - Example: "Optimize distributed system for high-frequency trading"

**Formula Weight:** 20% (lowest weight - domain knowledge can be learned/researched)

---

### Classification Formula

The four dimensions are combined using this weighted formula:

```
Classification = (Determinism × 0.30) + 
                 (Context Breadth × 0.25) + 
                 (Verification Cost × 0.25) + 
                 (Domain Knowledge × 0.20)
```

The result is rounded to the nearest integer (1-10), then mapped to zones:
- **1-3**: ai-solo
- **4-6**: ai-led  
- **7-8**: ai-assisted
- **9-10**: ai-limited

### Example Profile Calculation

**Task:** "Add new GitHub API endpoint with authentication"

- Determinism: 5 (follows existing API patterns)
- Context Breadth: 5 (endpoint + tests + docs)
- Verification Cost: 5 (manual testing in staging)
- Domain Knowledge: 4 (GitHub API knowledge needed)

**Calculation:**
```
(5 × 0.30) + (5 × 0.25) + (5 × 0.25) + (4 × 0.20)
= 1.5 + 1.25 + 1.25 + 0.8
= 4.8
→ Classification: 5 (ai-led zone)
```

---

## Related Documentation

- [Objective Task Complexity Framework Spec](../specs/complexity-spec.md) - Complete technical specification for TaskProfile system
- [AI Issue Classification Spec](../memory-bank/requirements/specs/ai-issue-classification.md) - Legacy classification system (still supported)
- [Current Release](../memory-bank/releases/current-release.md) - Classification system implementation plan
- [Vision PRD](../memory-bank/activeContext.md (legacy: context/vision.md)) - Product context for classification
- [COMMANDS.md](COMMANDS.md) - /analyze-epic and /scope-release commands

---

**Last Updated:** 2025-11-01  
**Version:** 2.0.0  
**Epic:** #1015 - Objective Task Complexity Framework