---
roadcrew_template_name: "minimize-cost.md"
roadcrew_template_type: "command"
roadcrew_template_version: "v1.0"
roadcrew_last_updated: "2025-10-25"
roadcrew_min_version: "1.5.0"
roadcrew_license: "See LICENSE file in .roadcrew folder"
roadcrew_copyright: "Copyright (c) 2025 North Star Holdings, LLC"
spdx_license_identifier: "LicenseRef-RoadcrewLicense-1.0"
execution_mode: "auto-execute"
---

# Cursor Command: minimize-cost

**Temperature: 0.3-0.5** (Balanced creativity and precision)

> **Execution:** Runs immediately. No approval needed between analysis phases.

## Purpose
Analyze API usage, AI communication patterns, CI/CD consumption, and resource usage to minimize costs in roadcrew workflows. The goal is to maximize functionality while staying within free tiers through smart optimization of GitHub API calls, GitHub Actions minutes, and Claude token usage.

## Activation
User says: "minimize cost" or "optimize API usage" or "stay within free tier" or "reduce verbosity"

## Analysis Framework

### 1. Map All Resource Usage Points
Identify every point where the system consumes:
- **GitHub API calls** (issue creation, PR management, milestone queries)
- **GitHub Actions minutes** (CI/CD workflows, automated testing)
- **Claude/AI tokens** (input tokens from user/context, output tokens in responses)
- **Google Drive API calls** (feedback processing, file uploads)
- **Compute time** (script execution, TypeScript compilation)
- **Network bandwidth** (git operations, npm installs)

### 2. Identify Free Tier Limits

**GitHub API (Authenticated)**
- 5,000 requests per hour
- Rate limit resets hourly
- Secondary rate limit: ~100 resources per request
- Includes: issues, PRs, comments, milestones, labels

**GitHub Actions (Free Tier)**
- Public repos: Unlimited minutes
- Private repos: 2,000 minutes/month
- Storage: 500 MB
- Minutes multiplier: Linux (1×), Windows (2×), macOS (10×)

**Claude API / Cursor (Token-Based)**
- Input tokens: Context sent to AI (files, chat history)
- Output tokens: AI responses (charged at higher rate ~3-5× input)
- Verbose responses use significantly more output tokens
- Long context windows increase input token costs

**Google Drive API**
- 1 billion requests per day (practically unlimited)
- 10 requests per second per user
- Batch operations: 100 requests per batch call

### 3. Classify Usage Patterns

**A. Essential (Cannot Reduce)**
- User-requested issue creation
- CI/CD for actual deployments
- Security/compliance requirements
- Critical API operations

**B. Optimizable (Can Reduce Calls)**
- Redundant GitHub API calls (checking same issue multiple times)
- Verbose AI responses (excessive detail in explanations)
- Over-verbose logging and summaries
- Inefficient batch operations (creating issues one-by-one vs batching)
- Repeated file reads during script execution

**C. Eliminable (Can Avoid Entirely)**
- Duplicate API checks (caching issue states)
- Unnecessary CI/CD runs (skip on docs-only changes)
- Redundant explanations in AI responses
- Debug logging in production scripts
- Unnecessary TypeScript rebuilds (already handled by ensure-built.js)

### 4. Optimization Strategies

#### Strategy 1: Optimize Claude/AI Communication (Reduce Token Usage)
**Pattern**: Request concise responses, avoid redundant explanations
```
BEFORE: 
  User: "Create a release"
  AI: "I'll help you create a release. First, let me explain what 
       a release is and why it's important. Releases help organize
       work into milestones... [300 tokens of explanation]
       
       Now, let me walk through each step:
       1. First I'll read the current-release.md file
       2. Then I'll parse it to extract epics
       3. Next I'll validate the structure...
       [200 tokens explaining every step]
       
       ✅ Created 5 epics, 23 issues
       
       Let me summarize what we accomplished:
       - We created 5 epics for the release...
       [200 tokens of summary]"
  
  Total: ~700 output tokens

AFTER:
  User: "Create a release"
  AI: "Creating release from current-release.md...
       
       ✅ Created 5 epics, 23 issues
       🔗 View milestone: https://github.com/owner/repo/milestone/12
       
       Next steps:
       - Review issues on GitHub
       - Run npm run validate-release"
  
  Total: ~100 output tokens (85% reduction)
```

**Implementation Guidelines:**
- **Skip explanatory preambles** - Don't explain what you're about to do
- **No play-by-play narration** - Just do it, report results
- **Concise summaries** - Bullet points, not paragraphs
- **Use emojis as visual markers** - Replace verbose labels
- **Skip redundant confirmations** - "I understand" / "Sure!" → just act
- **Avoid restating the ask** - Don't repeat what user just said

**Example transformation:**
```
❌ VERBOSE (150 tokens):
"I understand you want to create GitHub issues from the spec. 
Let me help you with that. I'll start by reading the spec file
to understand its contents, then I'll parse the structure to
identify the epics and issues we need to create. After that,
I'll use the GitHub API to create them one by one."

✅ CONCISE (15 tokens):
"Reading spec and creating issues..."
[performs actions]
"✅ Created 3 epics, 12 issues"
```

**Key elements:**
- Action-first communication (do → report, don't announce → do)
- Terminal-style output (concise status lines)
- Links over explanations (show, don't tell)
- Results-focused (what was accomplished, not how)

#### Strategy 2: Batch GitHub API Operations
**Pattern**: Use GraphQL or batch multiple operations when possible
```
BEFORE (REST API - Multiple Calls): 
  For each epic:
    - Create epic issue (1 API call)
    - For each child issue:
      - Create issue (1 API call per issue)
      - Add comment with epic link (1 API call per issue)
  
  5 epics × 5 issues each = 55 API calls
  (5 epic + 25 issues + 25 comments)

AFTER (Optimized):
  - Create epic (1 API call)
  - Create issue with epic reference in body (1 API call)
    (No separate comment needed - epic link in body)
  
  5 epics + 25 issues = 30 API calls (45% reduction)
```

**Roadcrew Implementation:**
```typescript
// CURRENT: Create issue, then add comment linking to epic
const epicIssue = await octokit.issues.create({...}); // 1 call
const childIssue = await octokit.issues.create({...}); // 1 call
await octokit.issues.createComment({
  body: `Part of epic #${epicIssue.number}`
}); // 1 call = 3 calls total

// OPTIMIZED: Include epic link in issue body
const epicIssue = await octokit.issues.create({...}); // 1 call
const childIssue = await octokit.issues.create({
  body: `Part of epic #${epicIssue.number}\n\n${issueBody}`
}); // 1 call = 2 calls total (33% reduction)
```

**Key elements:**
- Embed references in issue/PR bodies (not separate comments)
- Use GraphQL for multi-resource queries (when applicable)
- Cache commonly-accessed data (repo info, milestones)

#### Strategy 3: Optimize Context Window Usage
**Pattern**: Minimize token usage in AI conversations without sacrificing quality
```
BEFORE (Large Context): 
  User: "Create issues from spec"
  AI receives in context:
  - Full spec file (5000 tokens)
  - All templates (3000 tokens)
  - Recent chat history (2000 tokens)
  - Open file contents (4000 tokens)
  Total input: 14,000 tokens

AFTER (Focused Context):
  User: "Create issues from spec" (attaches only spec file)
  AI receives in context:
  - Spec file (5000 tokens)
  - Relevant template (500 tokens)
  - Minimal chat history (500 tokens)
  Total input: 6,000 tokens (57% reduction)
```

**Implementation Guidelines:**
- **Attach specific files only** - Don't include entire codebase
- **Close unrelated files** - Reduce auto-included context
- **Clear chat history** - Start new conversations for new topics
- **Reference docs by name** - "Use epic.template.md" (let AI read if needed)

**Roadcrew Example:**
```
❌ WASTEFUL:
User has open:
- vision.md (10,000 tokens)
- 5 spec files (25,000 tokens)
- 3 release files (15,000 tokens)
- README.md (5,000 tokens)
All sent as context on every message = 55,000 input tokens

✅ EFFICIENT:
User closes unrelated files
Only keeps spec.md open (5,000 tokens)
Attaches epic.template.md when needed
= 5,000-6,000 input tokens (90% reduction)
```

#### Strategy 4: Cache GitHub API Responses
**Pattern**: Don't fetch the same data repeatedly
```
BEFORE: 
  Every script run:
  - Fetch repo info (1 API call)
  - Fetch milestone list (1 API call)
  - Fetch open issues (1-5 API calls for pagination)
  
  Run 5 scripts in a workflow = 15-35 API calls

AFTER:
  First script:
  - Fetch and cache repo info
  - Fetch and cache milestone list
  - Fetch and cache issues
  
  Subsequent scripts: Read from cache
  Run 5 scripts = 7 API calls (60-80% reduction)
```

**Roadcrew Implementation:**
```typescript
// Add caching layer for expensive GitHub operations
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes
const cache = new Map<string, {data: any, timestamp: number}>();

async function getCachedMilestones(octokit: Octokit) {
  const key = 'milestones';
  const cached = cache.get(key);
  
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data; // Use cached data
  }
  
  // Fetch fresh data
  const data = await octokit.issues.listMilestones({...});
  cache.set(key, {data, timestamp: Date.now()});
  return data;
}

// Use in all scripts that need milestone data
```

**Key elements:**
- Cache frequently-accessed resources (repo, milestones, labels)
- Use short TTL (5-10 minutes) to balance freshness
- Share cache across script runs in same workflow
- Invalidate cache after create/update operations

#### Strategy 5: Optimize GitHub Actions Usage
**Pattern**: Minimize CI/CD minutes consumption
```
BEFORE: 
  Every push triggers:
  - Lint (2 minutes)
  - Type check (3 minutes)
  - Build (5 minutes)
  - Deploy preview (10 minutes)
  Total: 20 minutes per push
  
  10 pushes/day = 200 minutes/day = 6000 minutes/month
  (Exceeds 2000 free minutes for private repos)

AFTER:
  Skip CI on docs-only changes:
  - Docs changes: No CI (0 minutes)
  - Code changes: Full CI (20 minutes)
  
  5 code pushes + 5 docs pushes = 100 minutes/day = 3000 minutes/month
  (Within range with optimization)
  
  + Skip deploy preview on draft PRs
  + Cache node_modules (save 2-3 minutes per run)
  
  Optimized: ~1800 minutes/month ✅
```

**Roadcrew Implementation:**
```yaml
# .github/workflows/ci.yml
name: CI
on:
  push:
    # Skip CI for docs-only changes
    paths-ignore:
      - 'docs/**'
      - '**.md'
      - 'templates/**'
      - 'context/**'
  pull_request:
    paths-ignore:
      - 'docs/**'
      - '**.md'

jobs:
  test:
    runs-on: ubuntu-latest # Linux: 1x minutes (cheapest)
    steps:
      - uses: actions/checkout@v4
      
      # Cache dependencies (saves 2-3 min per run)
      - uses: actions/cache@v3
        with:
          path: node_modules
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
      
      - run: npm run lint
      - run: npm run type-check
```

**Key elements:**
- Use `paths-ignore` to skip docs/template changes
- Cache `node_modules` to speed up installs
- Use Linux runners (1× multiplier vs Windows 2×, macOS 10×)
- Skip preview deploys on draft PRs
- Combine jobs when possible (lint + typecheck in one job)

#### Strategy 6: Monitor and Respect GitHub Rate Limits
**Pattern**: Check remaining quota before large operations
```
BEFORE: 
  Create 100 issues without checking rate limit
  Hit 5000/hour limit at issue 80
  Remaining 20 issues fail → Manual retry needed
  Wastes time, inconsistent state

AFTER:
  Check rate limit: 4200/5000 remaining
  Process 80 issues (stay under limit)
  Wait for rate limit reset
  Process remaining 20
  = Graceful handling, no failures
```

**Roadcrew Implementation:**
```typescript
// Check rate limit before large batch operation
async function createIssuesWithRateLimit(epics: Epic[]) {
  const octokit = getOctokit();
  
  // Check remaining quota
  const { data: rateLimit } = await octokit.rateLimit.get();
  const remaining = rateLimit.rate.remaining;
  const resetTime = new Date(rateLimit.rate.reset * 1000);
  
  const estimatedCalls = epics.length * 6; // ~6 calls per epic avg
  
  if (remaining < estimatedCalls + 100) {
    console.warn(`⚠️  Rate limit: ${remaining} calls remaining`);
    console.log(`⏰ Resets at: ${resetTime.toLocaleString()}`);
    console.log(`💡 Consider: Process in batches or wait for reset`);
    
    // Option: Process what we can now
    const safe = Math.floor((remaining - 100) / 6);
    console.log(`📊 Can safely create ${safe} epics now`);
  }
  
  // Proceed with rate-aware creation
  await createEpics(epics);
}
```

**Key elements:**
- Check `octokit.rateLimit.get()` before bulk operations
- Reserve buffer (100-200 calls) for safety
- Show clear warnings when approaching limit
- Provide actionable guidance (wait time, batch size)

#### Strategy 7: Reduce Unnecessary AI Explanations
**Pattern**: Skip verbose acknowledgments and summaries
```
BEFORE (Verbose Response): 
  "I understand you'd like me to analyze the spec. Let me break
   this down into steps. First, I'll read the spec file to understand
   its structure. Then I'll extract the key components. Finally, I'll
   provide my analysis. Let me start by reading the file..."
  
  [performs action]
  
  "Great! I've successfully read the spec file. It contains 3 epics
   and 15 issues. Now let me analyze each epic in detail. The first
   epic is about authentication, which includes OAuth2 support and
   session management..."
  
  [300+ tokens of detailed summary]
  
  Total: ~500 tokens

AFTER (Concise Response):
  [immediately reads and analyzes]
  
  "📊 Spec analysis:
   - 3 epics, 15 issues
   - Dependencies: Auth → Dashboard → API
   - Estimated effort: 2-3 weeks
   
   Ready to create GitHub issues?"
  
  Total: ~50 tokens (90% reduction)
```

**Communication Optimization Checklist:**
- ❌ Don't announce actions before taking them
- ❌ Don't restate what the user just asked
- ❌ Don't explain basic concepts unless asked
- ❌ Don't provide play-by-play narration
- ❌ Don't summarize what you just did
- ✅ Do take action immediately
- ✅ Do report results concisely
- ✅ Do use structured output (bullets, tables)
- ✅ Do provide links over prose descriptions

#### Strategy 8: Use Git Operations Instead of API When Possible
**Pattern**: Local git operations are free, API calls cost quota
```
BEFORE: 
  Use GitHub API to:
  - Check if file exists (1 API call)
  - Read file contents (1 API call)
  - List files in directory (1 API call)
  
  = 3 API calls

AFTER:
  Use local git operations:
  - git ls-files (local, free)
  - read from filesystem (local, free)
  - git show (local, free)
  
  = 0 API calls
```

**Roadcrew Implementation:**
```typescript
// BEFORE: Fetch file via API
const { data } = await octokit.repos.getContent({
  owner, repo, path: 'memory-bank/releases/current-release.md'
}); // 1 API call

// AFTER: Read from local git repo
import { readFileSync } from 'fs';
const content = readFileSync('memory-bank/releases/current-release.md', 'utf8'); 
// 0 API calls
```

**Key elements:**
- Read files locally instead of via GitHub API
- Use git commands for repo inspection
- Only use API for operations that modify GitHub state
- Reserve API quota for issue creation, PR management

### 5. Cost Calculation Framework

**Step 1: Calculate Current GitHub API Usage**
```
Typical Release Creation:
- List milestones: 1 call
- Create milestone: 1 call  
- Create 5 epics: 5 calls
- Create 25 issues: 25 calls
- Check for duplicates: 30 calls (pagination)
Total: ~62 API calls per release

GitHub Limit: 5000/hour
Usage: 62 ÷ 5000 = 1.2% of hourly quota ✅
```

**Step 2: Project Scaling**
```
If scale to 10 releases/month:
Monthly API Calls: 10 × 62 = 620 calls
Hourly Average: Minimal (< 1% quota)

If scale to 100 releases/month:
Monthly API Calls: 100 × 62 = 6,200 calls  
Peak usage (5 releases in 1 hour): 310 calls = 6% of quota ✅
```

**Step 3: Identify Cost Thresholds**
```
GitHub API (authenticated):
Safe Operating Zone: < 4000 calls/hour (80%)
Warning Zone: 4000-4500 calls/hour
Critical Zone: > 4500 calls/hour (risk hitting limit)

GitHub Actions (private repos):
Safe Operating Zone: < 1600 minutes/month (80% of 2000)
Warning Zone: 1600-1900 minutes/month
Critical Zone: > 1900 minutes/month
```

**Step 4: Calculate Token Usage (Claude/Cursor)**
```
BEFORE Optimization (Verbose):
Input tokens per request: 8,000 (context) + 500 (user message) = 8,500
Output tokens per request: 800 (verbose response)
Cost ratio: Output is ~3-5× input token cost

10 requests:
- Input: 85,000 tokens
- Output: 8,000 tokens (billed at 3-5× = ~24,000-40,000 effective)
- Total effective: ~109,000-125,000 token units

AFTER Optimization (Concise + Focused Context):
Input tokens per request: 3,000 (context) + 500 (user message) = 3,500
Output tokens per request: 150 (concise response)
Cost ratio: Same (3-5×)

10 requests:
- Input: 35,000 tokens (59% reduction)
- Output: 1,500 tokens (billed at 3-5× = ~4,500-7,500 effective) (81% reduction)
- Total effective: ~39,500-42,500 token units

Total savings: 65-70% token cost reduction ✅
```

### 6. Red Flags (Anti-Patterns)

❌ **Using GitHub API to read files that exist locally**
- Wastes API calls on data already available
- Solution: Read from filesystem, use API only for GitHub operations

❌ **Verbose AI responses with excessive explanations**
- Uses 5-10× more output tokens than necessary
- Solution: Action-first communication, concise summaries

❌ **Running CI/CD on every push (including docs)**
- Wastes Actions minutes on non-code changes
- Solution: Use `paths-ignore` to skip CI for docs/templates

❌ **Not caching GitHub API responses**
- Re-fetches same data repeatedly in workflows
- Solution: Cache milestones, repo info with 5-10 min TTL

❌ **Keeping unrelated files open in Cursor**
- Sends unnecessary context on every AI request
- Solution: Close files not relevant to current task

❌ **No rate limit checking before bulk operations**
- Hits limits mid-operation, creates inconsistent state
- Solution: Check `octokit.rateLimit.get()` before large batches

❌ **Using macOS runners for GitHub Actions**
- 10× minute multiplier (10 min = 100 billed minutes)
- Solution: Use Linux runners (1× multiplier) when possible

❌ **Explaining every step before/during/after action**
- "I'll do X... [does X]... I did X" pattern
- Solution: Just do X and report results

### 7. Success Metrics

**Before Optimization:**
- GitHub API calls per operation
- Token usage per AI conversation
- GitHub Actions minutes per month
- Average response verbosity (tokens)

**After Optimization:**
- ✅ GitHub API calls reduced by >40%
- ✅ Token usage reduced by >65%
- ✅ Actions minutes reduced by >50%
- ✅ Response verbosity reduced by >80%

**Roadcrew Examples:**

*Release Creation Workflow:*
```
BEFORE:
- 62 GitHub API calls per release
- 30 duplicate checks (unnecessary pagination)
- 5 separate comment operations

AFTER:
- 30 GitHub API calls per release (52% reduction)
- Cached repo data (skip repeated fetches)
- Embed references in bodies (skip comment API calls)
```

*AI Communication:*
```
BEFORE:
- 800 output tokens per response
- Lengthy explanations and summaries
- Redundant confirmations

AFTER:
- 150 output tokens per response (81% reduction)
- Action-first, concise summaries
- Direct task execution
```

*CI/CD Usage (Private Repo):*
```
BEFORE:
- 20 minutes per run × 10 pushes/day = 200 min/day
- 6000 minutes/month (exceeds 2000 limit)

AFTER:
- Skip docs-only changes (5 pushes)
- Cache dependencies (save 3 min/run)
- 14 minutes × 5 runs/day = 70 min/day
- 2100 minutes/month + docs optimization = ~1600 min/month ✅
```

### 8. Implementation Checklist

When applying this command to roadcrew workflows:

**GitHub API Optimization:**
- [ ] Calculate current API usage (calls per release, per script)
- [ ] Implement caching for frequently-accessed resources (milestones, repo info)
- [ ] Use local file reads instead of API for existing files
- [ ] Add rate limit checks before bulk operations
- [ ] Embed references in bodies (avoid separate comment calls)

**AI Communication Optimization:**
- [ ] Identify verbose response patterns in current workflows
- [ ] Request concise, action-first responses
- [ ] Close unrelated files before AI interactions
- [ ] Use @ mentions for specific file attachments only
- [ ] Measure token usage before/after optimization

**CI/CD Optimization:**
- [ ] Add `paths-ignore` for docs/templates to workflow files
- [ ] Implement dependency caching (node_modules)
- [ ] Use Linux runners (avoid macOS 10× multiplier)
- [ ] Combine jobs where possible
- [ ] Calculate monthly minutes usage and optimize

**Workflow Consolidation:**
- [ ] Identify multi-command sequences that could be scripted
- [ ] Create wrapper scripts for common workflows
- [ ] Measure manual steps before/after optimization

### 9. Scaling Scenarios

**Scenario 1: Solo Developer (< 5 releases/month)**
- GitHub API: Well within limits (< 5% quota)
- Actions: Minimal usage (< 500 min/month)
- Token usage: Low (< 100k tokens/month)
- Strategy: No optimization needed
- Risk: None

**Scenario 2: Small Team (5-20 releases/month)**
- GitHub API: Low usage (< 20% quota)
- Actions: Moderate (500-1500 min/month)
- Token usage: Medium (100-500k tokens/month)
- Strategy: Add caching, optimize CI paths
- Risk: Low

**Scenario 3: Active Team (20-50 releases/month)**
- GitHub API: Medium usage (20-50% quota)
- Actions: High risk of exceeding 2000 min/month
- Token usage: High (500k-1M tokens/month)
- Strategy: All optimizations needed
  - Cache GitHub API responses
  - Aggressive `paths-ignore` for CI
  - Concise AI communication enforced
- Risk: Medium (CI minutes may exceed free tier)

**Scenario 4: Enterprise (> 50 releases/month)**
- GitHub API: May hit rate limits during peak hours
- Actions: Will exceed free tier (need paid plan)
- Token usage: Very high (> 1M tokens/month)
- Strategy: Maximum optimization + paid plans
  - All caching strategies
  - Self-hosted runners (avoid Actions costs)
  - Template-based responses (minimal AI usage)
- Risk: High (likely need paid GitHub plan)

### 10. Cost vs Feature Trade-offs

**Trade-off 1: AI Assistance vs Token Cost**
```
Maximum AI: Detailed explanations, verbose responses (expensive)
Balanced: Concise responses, action-first (optimal)
Minimal AI: Template-based only (cheap, less flexible)
```
**Recommendation**: Balanced approach (concise AI, 65% token savings)

**Trade-off 2: CI Coverage vs Actions Minutes**
```
Full CI: Run on every push (expensive, catches all issues)
Selective CI: Skip docs/templates (balanced, 50% savings)
Minimal CI: Manual testing only (cheap, risky)
```
**Recommendation**: Selective CI (stay within free tier)

**Trade-off 3: Real-time Sync vs API Calls**
```
Real-time: Sync on every change (expensive, always current)
Scheduled: Sync hourly/daily (balanced, slightly stale)
Manual: Sync on demand (cheap, potentially outdated)
```
**Recommendation**: Manual sync with smart caching (good enough freshness)

## Example Usage

```
User: "I'm creating 20 releases per month and our GitHub Actions are hitting
       the 2000 minute limit. Also, your responses are really long. How can 
       we minimize costs?"

AI Response:

📊 Current Usage Analysis:

GitHub API:
- 20 releases × 62 calls = 1,240 calls/month
- Peak: ~6% of hourly quota (safe) ✅

GitHub Actions:
- 20 minutes/run × 10 pushes/day = 200 min/day
- 6,000 minutes/month (exceeds 2000 limit) ❌

Token Usage:
- ~800 tokens/response average
- High verbosity detected

🎯 Optimizations:

1. GitHub Actions (Critical):
   - Add paths-ignore for docs/templates
   - Cache node_modules (save 3 min/run)
   - Estimated 50% of pushes skip CI
   - New: 14 min × 5 runs/day = 1,400 min/month ✅

2. AI Communication:
   - Switch to concise responses
   - Reduce from 800 → 150 tokens/response
   - 81% token cost reduction

3. GitHub API:
   - Add caching for milestone/repo data
   - Embed epic refs in bodies (not comments)
   - Reduce to ~30 calls per release

📈 Result:
- Actions: 6000 → 1400 min/month (stay within free tier) ✅
- Tokens: 65-70% cost reduction ✅
- API: 52% fewer calls per release ✅
```

## Quick Reference: Cost Optimization Strategies

1. **Concise Communication** - Action-first, skip explanations
2. **Close Unused Files** - Reduce context sent to AI  
3. **Cache API Responses** - Don't refetch milestone/repo data
4. **Use Local Files** - Read from filesystem, not GitHub API
5. **Optimize CI Paths** - Skip Actions on docs/template changes
6. **Check Rate Limits** - Before bulk operations
7. **Batch Operations** - Embed refs in bodies vs separate calls
8. **Linux Runners** - Avoid macOS 10× minute multiplier