# ⚙️ CI/CD Pipeline Optimization – Technical Specification {#spec-ci-opt-001}

> This document defines **how** to implement the feature. All "what/why" lives in PRD.

## 1. Mapping: PRD → Spec

| PRD Requirement  | Satisfied In Spec Section |
| ---------------- | ------------------------ |
| R1: Eliminate duplicate CI runs | See: 4.3, 8.1 |
| R2: Optimize pipeline speed via caching & parallelization | See: 4.4, 8.2 |
| R3: Implement fail-fast mechanisms | See: 4.3, 8.3 |
| R4: Test impact analysis for monorepo | See: 4.1, 8.4 |
| R5: Progressive CI stages | See: 4.3, 8.5 |
| R6: Smart test selection strategies | See: 4.1, 8.6 |

## 2. Technical Requirements

### 2.1 Functional Requirements (max 6 bullets—must map PRD 1:1)
- CI triggers only on pull_request events for feature branches, push events only for main branch
- Single dependency install job uploads node_modules artifact; all stages download and reuse
- Minimal git history fetching (fetch-depth: 0 only in analysis job, shallow clones elsewhere)
- Test impact analysis identifies and runs only affected tests on feature branches
- Progressive CI stages with fail-fast between stages (lint → unit → integration → e2e)
- Smart test selection skips e2e tests on draft PRs, requires full suite on ready-for-review

### 2.2 Non-Functional Requirements (≤20 words each)
- **Performance:** CI runtime reduced by 60% through caching, parallelization, and test impact analysis
- **Security:** No credentials stored in CI config; all secrets via environment variables
- **Reliability:** Full test suite runs on main branch; affected tests minimum 95% accuracy
- **Efficiency:** GitHub Actions token consumption reduced by 40% via artifact reuse and minimal git fetches

### 2.3 Out of Scope (3 bullets max, ≤12 words each)
- Docker containerization or container-based CI runners
- Infrastructure as Code (IaC) migrations or cloud provider changes
- End-to-end monitoring dashboards or alerting beyond CI metrics

## 3. Design Overview (≤100 words)

Replace all-branch CI triggers with selective pull_request and main-branch push events. Implement Nx for monorepo dependency graph analysis. Create initial setup job that performs full git fetch, runs affected analysis, installs dependencies once, and uploads artifacts. Subsequent stages download dependency artifacts and use shallow git clones (fetch-depth: 1), eliminating redundant installs and history fetches. Restructure pipeline into four progressive stages with fail-fast: lint/format (2 min), unit tests (5 min), integration tests (10 min), e2e tests (20 min). Implement smart triggers: skip e2e on draft PRs, require full suite when ready-for-review. Expected 40% reduction in GitHub Actions token consumption.

## 4. Architecture & Implementation

### 4.1 New/Modified Services (bullets, each ≤20 words)

- **Setup Job:** New initial job performs git fetch, affected analysis, dependency install, artifact upload
- **CI Stages:** Modified to download dependency artifacts instead of running npm ci independently
- **Nx Workspace:** Add Nx for dependency graph analysis and affected project detection
- **Test Runner:** Modify to accept affected project list from setup job artifacts

> ⚠️ **PREREQUISITE:** Issue CI-OPT.1 (Nx workspace configuration) must be completed before implementing the YAML in section 4.4. The examples below assume nx.json is configured and all projects have lint, test, integration, and e2e targets defined.

### 4.2 Data Model Changes (table: field, type, description, notes; optional YAML block if schema)

N/A - No database schema changes required

### 4.3 API Changes (table: method, endpoint, purpose, affected systems)

N/A - Internal CI pipeline changes only

### 4.4 Env/Config (table: var, default, description, required)

| Variable        | Default | Description           | Required |
| --------------- | ------- | --------------------- | -------- |
| NX_CLOUD_TOKEN  | null    | Nx Cloud token for distributed caching | No |
| SKIP_E2E        | false   | Override to skip e2e tests on feature branches | No |
| AFFECTED_BASE   | origin/dev | Base branch for affected analysis | Yes |
| ARTIFACT_RETENTION | 1     | Days to retain workflow artifacts | Yes |
| FETCH_DEPTH_SETUP | 0     | Git history depth for setup job (0=full) | Yes |
| FETCH_DEPTH_STAGES | 1    | Git history depth for stage jobs (1=shallow) | Yes |

**CI Configuration (GitHub Actions YAML):**
```yaml
name: CI Pipeline

on:
  pull_request:
    branches: [dev, main]
  push:
    branches: [main]

concurrency:

jobs:
  setup:
    runs-on: ubuntu-latest
    outputs:
      affected-projects: ${{ steps.affected.outputs.projects }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history only in setup job
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Determine affected projects
        id: affected
        run: |
          PROJECTS=$(npx nx show projects --affected --base=${{ env.AFFECTED_BASE }} --json)
          echo "projects=$PROJECTS" >> $GITHUB_OUTPUT
      
      - name: Upload node_modules
        uses: actions/upload-artifact@v4
        with:
          name: node-modules
          path: node_modules
          retention-days: 1
      
      - name: Upload affected list
        uses: actions/upload-artifact@v4
        with:
          name: affected-projects
          path: affected-projects.json
          retention-days: 1

  stage-1-lint:
    needs: setup
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1  # Shallow clone for efficiency
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      
      - name: Download dependencies
        uses: actions/download-artifact@v4
        with:
          name: node-modules
          path: node_modules
      
      - name: Run affected lint
        run: npx nx affected --target=lint --base=${{ env.AFFECTED_BASE }}
      
      - name: Run affected format check
        run: npx nx affected --target=format-check --base=${{ env.AFFECTED_BASE }}

  stage-2-unit:
    needs: stage-1-lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      
      - name: Download dependencies
        uses: actions/download-artifact@v4
        with:
          name: node-modules
          path: node_modules
      
      - name: Run affected unit tests
        run: npx nx affected --target=test --base=${{ env.AFFECTED_BASE }}

  stage-3-integration:
    needs: stage-2-unit
    runs-on: ubuntu-latest
    if: github.event.pull_request.draft == false && github.base_ref != 'dev'
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      
      - name: Download dependencies
        uses: actions/download-artifact@v4
        with:
          name: node-modules
          path: node_modules
      
      - name: Run affected integration tests
        run: npx nx affected --target=integration --base=${{ env.AFFECTED_BASE }}

  stage-4-e2e:
    needs: stage-3-integration
    runs-on: ubuntu-latest
    if: |
      github.event.pull_request.draft == false &&
      (github.base_ref == 'main' || github.ref == 'refs/heads/main')
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1
      
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      
      - name: Download dependencies
        uses: actions/download-artifact@v4
        with:
          name: node-modules
          path: node_modules
      
      - name: Run affected e2e tests
        run: npx nx affected --target=e2e --base=${{ env.AFFECTED_BASE }}
```

## 5. Execution Plan & Issues

### 5.1 Implementation Plan (≤60 words each phase)

#### 5.1.1 Current State & Migration Path

**Current CI Implementation:** The existing pipeline (`.github/workflows/ci.yml`) runs 3+ full `npm ci` installs (one per job), performs multiple full git history fetches (fetch-depth: 0 default), has no affected project filtering, and triggers on both push and pull_request events for dev/main branches (duplicate runs). No Nx integration exists.

**Migration Strategy:** 
1. Install and configure Nx workspace (Issue CI-OPT.1) without modifying CI
2. Create new workflow file or feature branch for new CI pipeline
3. Validate new pipeline on feature branches for 1 sprint
4. Gradually migrate all branches to new workflow by updating `.github/workflows/ci.yml`
5. Deprecate old pipeline steps incrementally as each issue completes

---

- **Step 1:** Install and configure Nx workspace, define project dependencies in nx.json, create target configurations for lint, test, integration, e2e. Validate affected command accuracy manually on sample PRs.

- **Step 2:** Create setup job that performs full git fetch, runs affected analysis, installs dependencies once, uploads node_modules and affected list as artifacts. Update trigger configuration to pull_request and main push only.

- **Step 3:** Implement progressive stages with fail-fast. Configure each stage to download dependency artifacts and use shallow git clones (fetch-depth: 1). Add conditional logic for draft PRs and main branch requirements.

- **Step 4:** Deploy optimized pipeline to feature branches. Monitor GitHub Actions token consumption and CI runtime for 2 weeks. Validate 40% token reduction and 60% runtime improvement. Adjust artifact retention and fetch depth as needed.

### 5.2 Issues (reference anchors)

- [Issue CI-OPT.1](#issue-ci-opt-1): Configure Nx workspace and dependency graph
- [Issue CI-OPT.2](#issue-ci-opt-2): Eliminate duplicate CI runs via trigger optimization
- [Issue CI-OPT.3](#issue-ci-opt-3): Create setup job with dependency artifact upload
- [Issue CI-OPT.4](#issue-ci-opt-4): Add test impact analysis with Nx affected
- [Issue CI-OPT.5](#issue-ci-opt-5): Restructure pipeline into progressive stages with artifact downloads
- [Issue CI-OPT.6](#issue-ci-opt-6): Implement smart test selection and conditional execution
- [Issue CI-OPT.7](#issue-ci-opt-7): Optimize git fetch depth and monitor token consumption

### 5.3 Risks & Mitigations (simple table)

| Risk           | Mitigation         |
|----------------|-------------------|
| Test impact analysis misses dependencies | Run full suite on main; validate affected accuracy with monitoring |
| Artifact upload/download failures | Implement retry logic; fall back to npm ci on artifact failure |
| Large node_modules artifact size | Use artifact compression; consider workspace-specific artifacts |
| Parallel jobs exceed runner limits | Remove unnecessary parallelization; rely on Nx built-in parallelism |
| Shallow git clones break tools | Use fetch-depth: 1 for stages; full history only in setup job |

## 6. Testing & Validation

### 6.1 Acceptance Criteria (1 line per acceptance, pass/fail)

- CI triggers only on pull_request to dev/main and push to main (no dev push triggers)
- Average CI runtime reduced by 50-60% for feature branch PRs (measure across 20+ representative runs)
- GitHub Actions token consumption reduced by 40% via artifact reuse (validate over 2-week period post-deployment)
- Setup job performs single dependency install; all stages download artifacts successfully (verify in logs)
- Git fetch-depth: 0 used only in setup job; all other jobs use fetch-depth: 1 (audit all checkout actions)
- Nx affected accurately identifies impacted projects with 95%+ precision (test on 10+ real PRs with varied change types)
- Progressive stages fail-fast (stage 1 failure blocks stages 2+; no downstream execution after first failure)
- Draft PRs skip integration and e2e tests; ready-for-review runs full suite (test with PR state toggle)
- Full test suite runs on all main branch commits (validate with git push to main)
- PR label "full-ci" overrides draft/branch filtering (test with label added to draft PR)

### 6.2 Test Plan (≤100 words OR checklist bullets)

- [ ] Create feature branch with single-project change; verify only affected tests run
- [ ] Verify setup job uploads node_modules artifact; all stages download successfully
- [ ] Confirm fetch-depth: 0 used only in setup job; other jobs use fetch-depth: 1
- [ ] Push to dev branch directly; verify no CI trigger (only PR triggers)
- [ ] Open draft PR; verify e2e tests skipped
- [ ] Mark PR ready for review to main; verify full test suite runs
- [ ] Introduce lint failure; verify stage-2+ jobs do not run (fail-fast)
- [ ] Measure GitHub Actions minutes before/after for 20 PRs; validate 40% token reduction
- [ ] Test artifact download failure; verify fallback to npm ci works
- [ ] Validate parallel job execution removed; confirm Nx handles parallelism
- [ ] Merge PR to main; verify full suite runs on push event

## 7. Dependencies & Links

- **Depends On:** Nx CLI, CI provider with caching support (GitHub Actions/GitLab CI)
- **Blocks:** Future Docker containerization, advanced monitoring dashboards
- **Related:** Monorepo architecture docs, team CI/CD runbook, test strategy guidelines

## 8. Issue Breakdown

## Epic 1: CI/CD Pipeline Optimization

Replace all-branch CI triggers with selective pull_request/main-push events. Implement Nx for monorepo dependency analysis, unified dependency install via artifacts, progressive stages with fail-fast, and smart test selection.

**Goals:**
- Reduce CI runtime by 50-60% for feature branch PRs
- Reduce GitHub Actions token consumption by 40%
- Maintain Nx affected test accuracy at 95%+
- Achieve zero regressions in test coverage or quality gates

**Success Criteria:**
- [ ] CI runtime reduced by 50-60% for feature branch PRs (20+ representative runs)
- [ ] GitHub Actions token consumption reduced by 40% (measured over 2 weeks)
- [ ] Nx affected accuracy maintained at 95%+
- [ ] Zero regressions in test coverage
- [ ] Full test suite runs on main branch; affected only on feature branches
- [ ] All 7 implementation issues completed and tested

### Issue 1.1: Configure Nx workspace and dependency graph {#issue-1-1}

- **Overview:** Install Nx, migrate monorepo to Nx workspace, define project boundaries and dependencies in nx.json, map implicit dependencies, and configure targets (lint, test, integration, e2e) for affected command support.
- **Accepts:**
  - [ ] Nx CLI installed via `npx nx@latest init`
  - [ ] nx.json configured with all projects and dependencies
  - [ ] All projects have lint, test, integration, e2e targets defined
  - [ ] Dependency graph visualization validates accuracy
  - [ ] Affected command validated on 5 sample changes

- **Tech Approach:**
  - Run `npx nx@latest init` to add Nx to existing monorepo
  - Define projects in nx.json or use auto-discovery
  - Map implicit dependencies (shared configs, root package.json)
  - Configure target defaults and executor options
  - Create project.json or package.json nx configs per project
  - Validate dependency graph with `nx graph` visualization

- **Files/Components:** nx.json, project.json (per project), package.json (root), .nxignore
- **Dependencies:** None (foundational issue)
- **Classification:** 8
- **Size/Complexity:** L/High

---

### Issue 1.2: Eliminate duplicate CI runs via trigger optimization {#issue-1-2}

- **Overview:** Modify CI workflow triggers to run only on pull_request events for feature branches targeting dev/main, and push events exclusively for main branch. Remove push triggers for dev branch to eliminate duplicate runs.
- **Accepts:**
  - [ ] CI workflow triggers defined as pull_request to dev/main only
  - [ ] Push to dev branch does not trigger CI
  - [ ] Push to main branch after merge triggers CI
  - [ ] Feature branch PRs trigger CI exactly once
  - [ ] No duplicate runs for same commit

- **Tech Approach:**
  - Update GitHub Actions workflow `on:` section with selective triggers
  - Remove dev from push branches array
  - Add pull_request with branches filter for dev and main
  - Test with sample commits to dev (verify no run) and PR to dev (verify runs)
  - Document trigger strategy in CI runbook

- **Files/Components:** .github/workflows/ci.yml
- **Dependencies:** None
- **Classification:** 3
- **Size/Complexity:** S/Simple

---

### Issue 1.3: Create setup job with dependency artifact upload {#issue-1-3}

- **Overview:** Create initial setup job that performs full git fetch (depth: 0), runs Nx affected analysis, installs dependencies once, and uploads node_modules and affected project list as workflow artifacts for downstream stages.
- **Accepts:**
  - [ ] Setup job performs fetch-depth: 0 and affected analysis
  - [ ] Dependencies installed once via npm ci in setup job only
  - [ ] node_modules artifact uploaded successfully
  - [ ] Affected projects list generated and uploaded as JSON
  - [ ] Artifact retention set to 1 day

- **Tech Approach:**
  - Create new setup job as first job in workflow with outputs
  - Add fetch-depth: 0 only to setup job checkout
  - Use GitHub npm cache action during npm ci
  - Run `nx show projects --affected` to generate affected list
  - Upload artifacts with 1-day retention
  - Add error handling for artifact upload failures

- **Files/Components:** .github/workflows/ci.yml, affected-projects.json
- **Dependencies:** Issue 1.1 (Nx configuration)
- **Classification:** 6
- **Size/Complexity:** M/Moderate

---

### Issue 1.4: Add test impact analysis with Nx affected {#issue-1-4}

- **Overview:** Integrate Nx affected commands into CI pipeline to run only tests for projects impacted by code changes. Configure base branch comparison (origin/dev), validate accuracy at 95%+ precision, implement on feature branches only.
- **Accepts:**
  - [ ] Nx affected runs only impacted project tests on feature branches
  - [ ] Base branch set to origin/dev for affected comparison
  - [ ] Affected test accuracy validated at 95%+
  - [ ] Full test suite still runs on main branch
  - [ ] Cross-project dependency changes trigger all dependent tests

- **Tech Approach:**
  - Replace static test commands with `nx affected --target=test --base=origin/dev`
  - Configure AFFECTED_BASE environment variable (default: origin/dev)
  - Add conditional: full suite on main, affected on feature branches
  - Validate affected accuracy with manual testing on 10 real PRs
  - Document affected command usage in team guidelines

- **Files/Components:** .github/workflows/ci.yml, nx.json (affected config)
- **Dependencies:** Issue 1.1 (Nx configuration)
- **Classification:** 6
- **Size/Complexity:** M/Moderate

---

### Issue 1.5: Restructure pipeline into progressive stages with artifact downloads {#issue-1-5}

- **Overview:** Refactor CI workflow into four sequential stages with fail-fast: lint/format, unit tests, integration tests, e2e tests. Each stage downloads dependency artifacts and uses shallow git clones (fetch-depth: 1) instead of reinstalling dependencies.
- **Accepts:**
  - [ ] Four distinct CI stages defined with needs dependencies
  - [ ] Each stage downloads node_modules artifact via actions/download-artifact
  - [ ] All stages use fetch-depth: 1 (not 0)
  - [ ] Stage 1 failure prevents stage 2+ execution (fail-fast)
  - [ ] Time-to-first-failure under 3 minutes

- **Tech Approach:**
  - Create separate jobs for each stage with needs keyword dependencies
  - Add actions/download-artifact step to each stage job
  - Set fetch-depth: 1 in checkout for all stage jobs
  - Remove npm ci commands from stage jobs
  - Add fallback: if artifact fails, conditionally run npm ci
  - Add job-level timeouts to prevent hung jobs

- **Files/Components:** .github/workflows/ci.yml
- **Dependencies:** Issue 1.3 (setup job and artifacts)
- **Classification:** 6
- **Size/Complexity:** M/Moderate

---

### Issue 1.6: Implement smart test selection and conditional execution {#issue-1-6}

- **Overview:** Add conditional logic to skip e2e tests on draft PRs, require full test suite for ready-for-review PRs to main, skip integration tests on non-main PRs. Implement override via environment variable and PR labels.
- **Accepts:**
  - [ ] Draft PRs skip integration and e2e stages
  - [ ] Ready-for-review PRs to main run full suite
  - [ ] PRs to dev skip e2e tests
  - [ ] SKIP_E2E environment variable overrides behavior
  - [ ] PR label "full-ci" forces full suite

- **Tech Approach:**
  - Add if conditions to stage jobs using `github.event.pull_request.draft`
  - Check `github.base_ref` to determine target branch
  - Use environment variable SKIP_E2E for manual override
  - Add PR label check for "full-ci": `contains(github.event.pull_request.labels.*.name, 'full-ci')`
  - Update stage-3 condition: `if: github.event.pull_request.draft == false && github.base_ref != 'dev'`
  - Document smart test selection in PR template

- **Files/Components:** .github/workflows/ci.yml, PR template
- **Dependencies:** Issue 1.5 (progressive stages)
- **Classification:** 5
- **Size/Complexity:** M/Moderate

---

### Issue 1.7: Optimize git fetch depth and monitor token consumption {#issue-1-7}

- **Overview:** Ensure fetch-depth: 0 used only in setup job for affected analysis; all other stage jobs use fetch-depth: 1 for shallow clones. Implement monitoring to validate 40% GitHub Actions token reduction target.
- **Accepts:**
  - [ ] Setup job uses fetch-depth: 0; all other jobs use fetch-depth: 1
  - [ ] Monitoring script tracks GitHub Actions minutes per run
  - [ ] 40% token consumption reduction validated over 2 weeks
  - [ ] Dashboard shows before/after metrics
  - [ ] Documentation updated with fetch-depth strategy

- **Tech Approach:**
  - Audit all checkout actions; set fetch-depth: 1 except setup job
  - Create monitoring script using GitHub API to fetch metrics
  - Calculate token consumption per run (minutes × runner multiplier)
  - Generate weekly comparison reports (before vs after)
  - Document fetch-depth strategy in CI runbook
  - Add alerts if consumption exceeds baseline

- **Files/Components:** .github/workflows/ci.yml, scripts/monitor-ci-tokens.sh, docs/ci-optimization.md
- **Dependencies:** Issue 1.3 (setup job), Issue 1.5 (stage structure)
- **Classification:** 5
- **Size/Complexity:** M/Moderate

---

### Issue CI-OPT.7B: Token monitoring dashboard and alerting {#issue-ci-opt-7b}

> **Note:** This is a separate, post-implementation task. Complete CI-OPT.1 through CI-OPT.7 first, validate 40% reduction target, then create this deliverable if monitoring infrastructure is needed long-term.

|- **Overview:** Build GitHub API-based monitoring script to track CI token consumption before/after optimization. Generate weekly reports and alerting for token budget overages.

|- **Accepts:**
  - Monitoring script queries GitHub API for workflow run metrics
  - Baseline measured before optimization; weekly comparisons tracked
  - Token reduction target (40%) validated and documented
  - Alerts trigger if weekly consumption exceeds baseline + 20% threshold
  - Dashboard/report generated as markdown for team review

|- **Tech Approach:**
  - Use GitHub CLI or Octokit to fetch workflow metrics programmatically
  - Store baseline metrics in .github/metrics/ directory
  - Create weekly cron job to calculate and report token usage
  - Generate markdown report with before/after comparison tables
  - Implement Slack webhook for alert notifications if needed

|- **Files/Components:** scripts/monitor-ci-tokens.sh, .github/workflows/monitor-tokens.yml, docs/ci-metrics.md

|- **Dependencies:** CI-OPT.1 through CI-OPT.7 (full implementation complete)

|- **Classification:** 4 (monitoring and reporting, non-critical path)

|- **Size/Complexity:** S/Small

## Editor Instructions:
- Write as atomic, non-narrative items wherever possible.
- Never exceed field length limits.
- All mapping tables must match PRD keys exactly.
- Issue breakdowns must map 1:1 to enhancement.template.md structure.
- Anchors enable deep linking: `#issue-{{EPIC}}-{{N}}`
- Classification (1-10) drives assignment in scope-release.
- Any deviation, ambiguity, or overrun invalidates this doc for automation.

<!--
DOWNSTREAM TEMPLATE MAPPING:

templates/enhancement.template.md requirements:
- Overview: ≤50 words → Issue "Overview" field
- Acceptance Criteria: ≤5 bullets → Issue "Accepts" field
- Technical Implementation: ≤6 bullets → Issue "Tech Approach" field
- Classification: 1-10 scale → Issue "Classification" field
- Size/Complexity: S/M/L → Issue "Size/Complexity" field

templates/epic.template.md requirements:
- Overview: ≤30 words → Epic-level summary (derive from all issues)
- Key Goals: ≤5 bullets → Epic-level objectives

templates/release.template.md requirements:
- Target State: ≤30 words → Release-level summary (derive from all epics)

.cursor/commands/implement-epic.md requirements:
- Dependencies: Explicit issue dependencies for ordering
- Test Plan: Validation steps for each issue
- Metrics: Duration, tokens, files modified
-->