# [Epic] GitHub Actions Log Collection via API

**Overview**: Implement reusable utility to download and analyze GitHub Actions workflow logs via API, enabling automated failure analysis for PR triage, auto-fix-lint, and QA workflows.

## Child Issues

### Issue 1: Core Log Collection Utility

**Type**: Enhancement

**Overview & User Story**

As a developer, I want to programmatically download GitHub Actions workflow logs via API, so that I can automate failure analysis without manually accessing the GitHub UI.

**Acceptance Criteria**

- [ ] Function `downloadWorkflowLogs()` downloads workflow run logs via GitHub API
- [ ] Function `downloadJobLogs()` downloads individual job logs via GitHub API  
- [ ] Handles API redirect (302 response) correctly and follows redirect URL
- [ ] Extracts ZIP archives automatically (supports unzip command and adm-zip fallback)
- [ ] Supports custom output directory and extraction options
- [ ] Uses existing `getOctokit()` and `getRepoInfo()` utilities for consistency
- [ ] Includes comprehensive error handling for API failures, expired URLs, and extraction errors

**Technical Implementation**

- Create `scripts/utils/roadcrew/github-logs.ts` utility module
- Implement API calls using Octokit with `redirect: 'manual'` to capture redirect URL
- Handle HTTP redirect following manually (API returns 302 with Location header)
- Download ZIP from signed URL (expires after 1 minute)
- Support ZIP extraction via:
  - System `unzip` command (Unix/macOS) - primary method
  - `adm-zip` npm package fallback (optional dependency)
- Export interfaces: `LogDownloadOptions`, `ParsedLogError`, `LogParseResult`
- Export functions: `downloadWorkflowLogs()`, `downloadJobLogs()`, `readLogFiles()`, `extractZip()`
- Follow existing code patterns (JSDoc, explicit types, error handling)

**Dependencies**

None

**Estimated Effort**

**Classification:** 4 (ai-led)

---

### Issue 2: Log Error Parsing and Analysis

**Type**: Enhancement

**Overview & User Story**

As a developer, I want to automatically parse errors from workflow logs, so that I can quickly identify root causes without manually reading through log files.

**Acceptance Criteria**

- [ ] Function `parseLogErrors()` extracts errors from log content
- [ ] Categorizes errors into types: test-failure, lint-error, type-error, build-error, unknown
- [ ] Extracts file paths and line numbers when present in error messages
- [ ] Includes context lines (5 before/after) for each error
- [ ] Returns structured `LogParseResult` with summary statistics
- [ ] Detects common error patterns:
  - Test failures: `FAIL`, `AssertionError`, `Expected...but got`
  - Lint errors: `eslint`, `prettier`, `Parsing error`
  - Type errors: `TS\d{4}`, `Type '.*' is not assignable`
  - Build errors: `npm ERR`, `Build.*failed`, `Compilation.*error`

**Technical Implementation**

- Add `parseLogErrors()` function to `github-logs.ts`
- Implement regex patterns for each error type
- Extract file/line information from error messages
- Include context extraction (surrounding lines)
- Return structured data: `{ errors: ParsedLogError[], summary: {...}, hasFailures: boolean }`
- Function `readLogFiles()` concatenates all log files from extracted directory
- Handle edge cases: empty logs, malformed errors, very long lines

**Dependencies**

- Depends on: Issue 1 (Core Log Collection Utility)

**Estimated Effort**

**Classification:** 3 (ai-solo)

---

### Issue 3: PR Log Collection Helper

**Type**: Enhancement

**Overview & User Story**

As a developer working on PR triage, I want to automatically collect all failed workflow logs for a PR, so that I can analyze failures without manually downloading each run.

**Acceptance Criteria**

- [ ] Function `collectPRLogs()` automatically finds all failed workflow runs for a PR
- [ ] Downloads logs for all failed jobs in each run
- [ ] Optionally parses errors from all collected logs
- [ ] Returns structured `PRLogCollectionResult` with summary statistics
- [ ] Function `analyzePRFailures()` provides actionable recommendations
- [ ] Identifies most common error types across all runs
- [ ] Detects if failures appear related (same error patterns)

**Technical Implementation**

- Create `scripts/utils/roadcrew/collect-pr-logs.ts` helper module
- Use GitHub API to list workflow runs for PR head SHA
- Filter to failed/cancelled runs only
- Use `getFailedJobs()` to get failed jobs per run
- Batch download logs using core utility (Issue 1)
- Aggregate error analysis across all runs
- Generate recommendations based on error types found
- Export interfaces: `PRLogCollectionResult`, `PRLogCollectionOptions`
- Export functions: `collectPRLogs()`, `analyzePRFailures()`

**Dependencies**

- Depends on: Issue 1 (Core Log Collection Utility)
- Depends on: Issue 2 (Log Error Parsing and Analysis)

**Estimated Effort**

**Classification:** 4 (ai-led)

---

### Issue 4: CLI Tool for Manual Log Collection

**Type**: Enhancement

**Overview & User Story**

As a developer, I want a CLI command to manually download workflow logs, so that I can inspect failures outside of automated workflows.

**Acceptance Criteria**

- [ ] CLI script `download-workflow-logs.ts` accepts `--run-id` or `--job-id` arguments
- [ ] Downloads logs and optionally extracts ZIP
- [ ] Supports `--parse-errors` flag to display error summary
- [ ] Supports `--output-dir` for custom output location
- [ ] Supports `--no-extract` and `--keep-zip` flags
- [ ] Provides helpful error messages and usage instructions
- [ ] Added to `package.json` scripts as `download-logs`

**Technical Implementation**

- Create `scripts/download-workflow-logs.ts` CLI script
- Parse command-line arguments
- Use core utility functions (Issue 1, Issue 2)
- Display progress and results
- Format error summaries in readable format
- Add npm script: `"download-logs": "npm run ensure-built && node dist/scripts/download-workflow-logs.js"`
- Include `--help` flag with usage examples

**Dependencies**

- Depends on: Issue 1 (Core Log Collection Utility)
- Depends on: Issue 2 (Log Error Parsing and Analysis)

**Estimated Effort**

**Classification:** 2 (ai-solo)

---

### Issue 5: Integration with PR Triage Command

**Type**: Enhancement

**Overview & User Story**

As a developer using `/triage-pr`, I want automatic log collection when CI fails, so that I get detailed error analysis without manual `gh run view --log` commands.

**Acceptance Criteria**

- [ ] `/triage-pr` command uses `collectPRLogs()` instead of manual `gh run view --log`
- [ ] Automatically downloads logs for all failed runs
- [ ] Parses errors and includes in triage analysis
- [ ] Displays error summary in findings table
- [ ] Uses parsed errors to improve root cause identification
- [ ] Reduces manual steps in triage workflow

**Technical Implementation**

- Update `commands/roadcrew/06-testing/triage-pr.md` command template
- Replace manual log collection steps with `collectPRLogs()` call
- Integrate parsed errors into analysis phase
- Include error statistics in findings table
- Use error categorization to improve root cause detection
- Maintain backward compatibility (fallback if log collection fails)

**Dependencies**

- Depends on: Issue 3 (PR Log Collection Helper)

**Estimated Effort**

**Classification:** 3 (ai-solo)

---

### Issue 6: Integration with Auto-Fix-Lint

**Type**: Enhancement

**Overview & User Story**

As a developer using auto-fix-lint, I want better error detection from actual workflow logs, so that auto-fix can target specific lint errors more accurately.

**Acceptance Criteria**

- [ ] `auto-fix-lint.ts` uses log collection when lint job fails
- [ ] Extracts specific lint errors from logs
- [ ] Uses parsed errors to improve auto-fix targeting
- [ ] Handles cases where logs unavailable gracefully
- [ ] Provides better error messages with file/line context

**Technical Implementation**

- Update `scripts/utils/roadcrew/auto-fix-lint.ts`
- Add optional log collection step when job fails
- Filter parsed errors to lint-error type only
- Use file/line information from parsed errors
- Enhance auto-fix logic with specific error details
- Add error handling for log collection failures

**Dependencies**

- Depends on: Issue 1 (Core Log Collection Utility)
- Depends on: Issue 2 (Log Error Parsing and Analysis)

**Estimated Effort**

**Classification:** 3 (ai-solo)

---

## Dependencies

- **Depends on:** None
- **Blocks:** Future enhancements to PR triage, QA workflows, CI failure analysis

## Definition of Done

- [ ] All child issues completed and merged
- [ ] Utility tested with real workflow runs
- [ ] Documentation added to `docs/GITHUB-LOGS-UTILITY.md`
- [ ] Integration points tested (triage-pr, auto-fix-lint)
- [ ] Error handling validated for edge cases

---

## API Reference Details

### GitHub Actions Logs API

**Workflow Run Logs:**
```
GET /repos/{owner}/{repo}/actions/runs/{run_id}/logs
```

**Response:** 302 redirect to signed URL (expires after 1 minute)

**Job Logs:**
```
GET /repos/{owner}/{repo}/actions/jobs/{job_id}/logs
```

**Response:** 302 redirect to signed URL (expires after 1 minute)

**Key Implementation Notes:**

1. **Redirect Handling**: API returns `302 Found` with `Location` header. Must use `redirect: 'manual'` in Octokit request to capture redirect URL, then manually follow redirect.

2. **Authentication**: Redirect URLs are signed and require same authentication token. Include `Authorization: Bearer <token>` header when downloading.

3. **Expiration**: Signed URLs expire after 1 minute. Download immediately after receiving redirect.

4. **ZIP Format**: Logs are provided as ZIP archives containing:
   - `{job_id}/` directories for each job
   - `{job_id}/log.txt` files with actual log content
   - Nested structure for workflow runs with multiple jobs

5. **Rate Limits**: Subject to GitHub API rate limits. For private repos, requires `repo` scope token.

6. **Error Cases**:
   - Logs not available (expired, deleted, insufficient permissions)
   - Network failures during download
   - ZIP extraction failures
   - Malformed log content

**Example Octokit Usage:**
```typescript
const { headers } = await octokit.request(
  'GET /repos/{owner}/{repo}/actions/runs/{run_id}/logs',
  {
    owner,
    repo,
    run_id: runId,
    request: {
      redirect: 'manual' // Don't follow redirect automatically
    }
  }
);

const redirectUrl = headers.location; // Get signed URL
// Then download from redirectUrl with token authentication
```

---

## Implementation Strategy

### Phase 1: Core Infrastructure (Issues 1-2)
Build the foundational utility with log download and parsing capabilities.

### Phase 2: Helpers and CLI (Issues 3-4)
Add convenience functions for PR-specific use cases and CLI tooling.

### Phase 3: Integration (Issues 5-6)
Integrate into existing workflows and commands.

### Testing Strategy

- Unit tests for log parsing (mock log content)
- Integration tests with real GitHub API (use test repo)
- Manual testing with actual failed workflow runs
- Edge case testing: expired URLs, missing logs, extraction failures

### Documentation

- API reference in utility module (JSDoc)
- Usage guide in `docs/GITHUB-LOGS-UTILITY.md`
- Integration examples for each use case
- Troubleshooting guide for common issues

---

**Related Spec:** This epic addresses the need identified in discussion about GitHub Actions log collection via API. No formal spec document exists yet - this epic serves as the specification.

