# Automated Log Collection & Notification Options

This document outlines various automated ways to collect and surface GitHub Actions logs when failures are detected.

## 1. GitHub Actions `workflow_run` Event Handler

### Pattern
Create a workflow that triggers when any workflow fails, downloads logs, and takes action.

### Implementation
```yaml
name: Failure Analysis & Notification

on:
  workflow_run:
    workflows: ["CI", "QA Security Audit", "QA Performance Audit"]
    types: [completed]

jobs:
  analyze-failure:
    if: github.event.workflow_run.conclusion == 'failure'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Download logs
        run: |
          npm run download-logs -- --run-id ${{ github.event.workflow_run.id }} --parse-errors
      
      - name: Post PR comment with error summary
        if: github.event.workflow_run.event == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const errors = JSON.parse(fs.readFileSync('config/reports/logs/errors.json'));
            
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.payload.workflow_run.pull_requests[0].number,
              body: `## ❌ CI Failure Analysis\n\n${formatErrors(errors)}`
            });
```

### Use Cases
- **Any workflow failure** → Automatic log collection
- **PR failures** → Comment with error summary
- **Main branch failures** → Create issue for tracking

---

## 2. PR Comments with Error Summaries

### Pattern
Post rich markdown comments to PRs with parsed error details, file locations, and fix suggestions.

### Implementation
```typescript
// In workflow or script
const logs = await collectPRLogs(octokit, owner, repo, prNumber);
const analysis = analyzePRFailures(logs);

await octokit.rest.issues.createComment({
  owner,
  repo,
  issue_number: prNumber,
  body: `
## 🔍 CI Failure Analysis

**Failed Runs:** ${analysis.summary.totalFailedRuns}
**Failed Jobs:** ${analysis.summary.totalFailedJobs}
**Total Errors:** ${analysis.summary.totalErrors}

### Top Errors
${analysis.topErrors.map(e => `- **${e.type}** (${e.count}): ${e.examples[0]}`).join('\n')}

### Recommendations
${analysis.recommendations.map(r => `- ${r}`).join('\n')}

### Detailed Logs
Download logs: \`npm run download-logs -- --run-id <id>\`
  `
});
```

### Use Cases
- **Immediate visibility** → Developers see errors without leaving PR
- **Actionable guidance** → Recommendations help fix issues faster
- **Historical tracking** → Comments provide audit trail

---

## 3. GitHub Check Runs with Detailed Output

### Pattern
Create custom status checks with rich output (annotations, images, links) visible in PR checks tab.

### Implementation
```typescript
// Create check run with detailed output
await octokit.rest.checks.create({
  owner,
  repo,
  name: 'Failure Analysis',
  head_sha: pr.head.sha,
  status: 'completed',
  conclusion: 'failure',
  output: {
    title: `${errors.length} errors found`,
    summary: errorSummary,
    text: fullErrorDetails,
    annotations: errors.map(error => ({
      path: error.file,
      start_line: error.line,
      end_line: error.line,
      annotation_level: 'failure',
      message: error.message,
      title: `${error.type}: ${error.file}`
    }))
  }
});
```

### Benefits
- **Rich annotations** → Clickable file links, line numbers
- **Visual indicators** → Check marks show in PR status
- **Detailed output** → Expandable sections in GitHub UI
- **Can block merge** → Required checks prevent bad merges

### Use Cases
- **Blocking failures** → Prevent merge until fixed
- **Visual feedback** → See all errors at a glance
- **File-level context** → Annotations point to exact locations

---

## 4. Automatic Issue Creation for Recurring Failures

### Pattern
Detect patterns (same error across multiple runs) and create GitHub issues automatically.

### Implementation
```typescript
// In scheduled workflow or failure handler
const recentFailures = await getRecentFailures(octokit, owner, repo, days: 7);
const patterns = detectFailurePatterns(recentFailures);

for (const pattern of patterns) {
  if (pattern.count >= 3) { // Same error 3+ times
    await octokit.rest.issues.create({
      owner,
      repo,
      title: `🔴 Recurring Failure: ${pattern.errorType}`,
      body: `
## Recurring Failure Detected

**Error Type:** ${pattern.errorType}
**Occurrences:** ${pattern.count} times in last 7 days
**First Seen:** ${pattern.firstSeen}
**Latest:** ${pattern.latest}

### Error Pattern
\`\`\`
${pattern.errorMessage}
\`\`\`

### Affected Files
${pattern.files.map(f => `- ${f}`).join('\n')}

### Recommendations
${pattern.recommendations.join('\n')}

### Related Runs
${pattern.runIds.map(id => `- Run #${id}`).join('\n')}
      `,
      labels: ['bug', 'ci-failure', 'automated']
    });
  }
}
```

### Use Cases
- **Proactive tracking** → Catch recurring issues before they become blockers
- **Pattern detection** → Identify systemic problems
- **Historical context** → Issues track failure history

---

## 5. Artifact Uploads for Later Analysis

### Pattern
Upload logs as workflow artifacts so they're downloadable from GitHub UI even after logs expire.

### Implementation
```yaml
- name: Upload failure logs
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: failure-logs-run-${{ github.run_id }}
    path: config/reports/logs/
    retention-days: 30
    if-no-files-found: ignore

- name: Upload error analysis
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: error-analysis-run-${{ github.run_id }}
    path: config/reports/error-analysis.json
    retention-days: 30
```

### Benefits
- **Persistent storage** → Logs available after API expiration (1 minute)
- **Easy access** → Download from GitHub Actions UI
- **Long retention** → Keep logs for 30+ days
- **Shareable** → Team members can download and analyze

### Use Cases
- **Forensic analysis** → Deep dive into failures later
- **Team collaboration** → Share logs for debugging
- **Historical reference** → Keep logs for pattern analysis

---

## 6. GitHub Discussions for Team Communication

### Pattern
Post failure summaries to GitHub Discussions for team visibility and async discussion.

### Implementation
```typescript
// Create discussion post
await octokit.rest.teams.createDiscussionComment({
  org: 'your-org',
  team_slug: 'engineering',
  discussion_number: discussionId,
  body: `
## 🚨 CI Failure Alert

**Workflow:** ${workflowName}
**Run:** [${runId}](${runUrl})
**PR:** #${prNumber}

### Summary
- **Errors:** ${errorCount}
- **Failed Jobs:** ${failedJobs.length}
- **Error Types:** ${errorTypes.join(', ')}

### Quick Fix
Run: \`npm run download-logs -- --run-id ${runId} --parse-errors\`
  `
});
```

### Use Cases
- **Team visibility** → Notify team without creating issues
- **Async discussion** → Team can discuss failures asynchronously
- **Cross-team awareness** → Keep stakeholders informed

---

## 7. Webhook Notifications (Slack, Discord, Email)

### Pattern
Send parsed error summaries to external services via webhooks.

### Implementation
```typescript
// Send to Slack
await fetch(process.env.SLACK_WEBHOOK_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: `🚨 CI Failure: ${workflowName}`,
    blocks: [
      {
        type: 'section',
        text: {
          type: 'mrkdwn',
          text: `*Workflow:* ${workflowName}\n*Run:* <${runUrl}|${runId}>\n*Errors:* ${errorCount}`
        }
      },
      {
        type: 'section',
        text: {
          type: 'mrkdwn',
          text: `\`\`\`${topErrors.join('\n')}\`\`\``
        }
      }
    ]
  })
});
```

### Use Cases
- **Real-time alerts** → Instant notifications to team channels
- **Mobile access** → Team sees failures on mobile
- **Integration** → Works with existing notification systems

---

## 8. GitHub Status Checks with Rich Output

### Pattern
Create status checks that show detailed error information inline in PR checks.

### Implementation
```yaml
- name: Create failure status check
  if: failure()
  uses: actions/github-script@v7
  with:
    script: |
      const { data: check } = await github.rest.checks.create({
        owner: context.repo.owner,
        repo: context.repo.repo,
        name: 'Failure Analysis',
        head_sha: context.sha,
        status: 'completed',
        conclusion: 'failure',
        output: {
          title: 'CI failures detected',
          summary: 'See details below',
          text: `## Failed Jobs\n\n${failedJobs.map(j => `- ${j.name}: ${j.error}`).join('\n')}`
        }
      });
```

### Use Cases
- **Inline visibility** → See errors directly in PR checks
- **No navigation** → No need to click through to logs
- **Status integration** → Works with branch protection rules

---

## 9. Scheduled Failure Tracking (Nightly QA)

### Pattern
Run nightly job that analyzes all failures from the day and creates summary reports.

### Implementation
```yaml
name: Nightly Failure Analysis

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily
  workflow_dispatch:

jobs:
  analyze-daily-failures:
    runs-on: ubuntu-latest
    steps:
      - name: Get today's failed runs
        run: |
          # Query GitHub API for failed runs in last 24h
          FAILED_RUNS=$(gh run list --json databaseId,conclusion \
            --jq '.[] | select(.conclusion=="failure") | .databaseId')
      
      - name: Download and analyze logs
        run: |
          for RUN_ID in $FAILED_RUNS; do
            npm run download-logs -- --run-id $RUN_ID --parse-errors
          done
      
      - name: Generate summary report
        run: |
          # Aggregate all errors
          # Create markdown report
          # Save to config/reports/daily-failure-summary-$(date +%Y-%m-%d).md
      
      - name: Create summary issue
        uses: actions/github-script@v7
        with:
          script: |
            const report = fs.readFileSync('config/reports/daily-failure-summary.md');
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `📊 Daily Failure Summary - ${new Date().toISOString().split('T')[0]}`,
              body: report,
              labels: ['automated', 'ci-failure-tracking']
            });
```

### Use Cases
- **Trend analysis** → Track failure patterns over time
- **Proactive alerts** → Catch issues before they escalate
- **Historical tracking** → Build failure history

---

## 10. Security Failure Alerts

### Pattern
Special handling for security-related failures with enhanced notifications.

### Implementation
```typescript
// In security-gate workflow
if (securityFailures.length > 0) {
  // Create security advisory or high-priority issue
  await octokit.rest.issues.create({
    owner,
    repo,
    title: `🔒 Security Gate Failure: ${prNumber}`,
    body: `
## Security Issues Detected

**Severity:** ${highestSeverity}
**Issues Found:** ${securityFailures.length}

### Critical Findings
${securityFailures.filter(f => f.severity === 'CRITICAL').map(f => 
  `- ${f.file}:${f.line} - ${f.message}`
).join('\n')}

### Logs
Download detailed logs: \`npm run download-logs -- --run-id ${runId}\`
    `,
    labels: ['security', 'critical', 'automated']
  });
  
  // Optional: Create security advisory
  await octokit.rest.securityAdvisories.createRepositoryAdvisory({
    owner,
    repo,
    summary: `Security issues detected in PR #${prNumber}`,
    description: securityFailures.map(f => f.message).join('\n')
  });
}
```

### Use Cases
- **Security alerts** → Immediate notification for security failures
- **Compliance** → Track security issues for audits
- **Escalation** → High-priority issues get attention faster

---

## 11. Performance Regression Tracking

### Pattern
Track performance failures and compare against baselines.

### Implementation
```typescript
// In performance-gate workflow
const performanceLogs = await downloadWorkflowLogs(runId);
const parsed = parseLogErrors(performanceLogs);
const performanceErrors = parsed.errors.filter(e => e.type === 'build-error' && 
  e.message.includes('performance') || e.message.includes('slow'));

if (performanceErrors.length > 0) {
  const baseline = await getBaselineMetrics();
  const regression = calculateRegression(currentMetrics, baseline);
  
  await octokit.rest.issues.createComment({
    owner,
    repo,
    issue_number: prNumber,
    body: `
## ⚡ Performance Regression Detected

**Regression:** ${regression.percentage}% slower than baseline
**Baseline:** ${baseline.duration}ms
**Current:** ${currentMetrics.duration}ms

### Slow Operations
${performanceErrors.map(e => `- ${e.message}`).join('\n')}

### Recommendations
- Review performance bottlenecks
- Check for N+1 queries
- Profile slow operations
    `
  });
}
```

### Use Cases
- **Performance monitoring** → Catch regressions early
- **Baseline comparison** → Track against known good performance
- **Optimization guidance** → Suggest performance improvements

---

## 12. Integration with Project Boards

### Pattern
Automatically add failure-tracking issues to project boards with appropriate status.

### Implementation
```typescript
// After creating issue from failure
const issue = await octokit.rest.issues.create({...});

// Add to project board
await octokit.rest.projects.createCard({
  column_id: projectColumnId,
  content_id: issue.id,
  content_type: 'Issue'
});

// Set project field values
await octokit.graphql(`
  mutation {
    updateProjectV2Item(
      input: {
        projectId: "${projectId}"
        itemId: "${cardId}"
        fieldValues: [
          { fieldId: "${statusFieldId}", value: "IN_PROGRESS" }
          { fieldId: "${priorityFieldId}", value: "HIGH" }
        ]
      }
    ) {
      projectV2Item { id }
    }
  }
`);
```

### Use Cases
- **Workflow integration** → Failures tracked in project management
- **Visibility** → Team sees failures in kanban boards
- **Prioritization** → Auto-prioritize based on failure severity

---

## Recommended Implementation Priority

### Phase 1: High-Value, Low-Effort
1. **PR Comments** (Issue #1066 enhancement) - Immediate visibility
2. **Artifact Uploads** - Persistent log storage
3. **Status Checks** - Rich PR feedback

### Phase 2: Proactive Tracking
4. **Automatic Issue Creation** - For recurring failures
5. **Nightly Failure Analysis** - Trend tracking
6. **Failure Pattern Detection** - Identify systemic issues

### Phase 3: Integration & Notifications
7. **Webhook Notifications** - External alerts
8. **Project Board Integration** - Workflow management
9. **Security/Performance Alerts** - Specialized handling

---

## Example: Complete Failure Handler Workflow

```yaml
name: Comprehensive Failure Handler

on:
  workflow_run:
    workflows: ["CI"]
    types: [completed]

jobs:
  handle-failure:
    if: github.event.workflow_run.conclusion == 'failure'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Download logs
        run: npm run download-logs -- --run-id ${{ github.event.workflow_run.id }} --parse-errors
      
      - name: Upload logs as artifact
        uses: actions/upload-artifact@v4
        with:
          name: failure-logs-${{ github.event.workflow_run.id }}
          path: config/reports/logs/
      
      - name: Post PR comment
        if: github.event.workflow_run.event == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            // Post error summary to PR
            
      - name: Create status check
        uses: actions/github-script@v7
        with:
          script: |
            // Create detailed check run
            
      - name: Check for recurring pattern
        uses: actions/github-script@v7
        with:
          script: |
            // Detect if same error happened before
            // Create issue if pattern detected
```

---

This provides multiple automated pathways for log collection and notification, allowing you to choose the right level of automation for different failure scenarios.

