# ShowRunner System Design

Low-level documentation for the core systems that power workflow capture.

---

## Table of Contents

1. [Recording Pipeline](#recording-pipeline)
2. [Event Capture System](#event-capture-system)
3. [Session Storage](#session-storage)
4. [Window Management](#window-management)
5. [Python Sidecar](#python-sidecar)
6. [AI Analysis Pipeline](#ai-analysis-pipeline)
7. [Editor Heuristics & Export UX](#editor-heuristics--export-ux)

---

## Recording Pipeline

### Overview

Recording uses macOS native tools rather than direct FFmpeg capture, which proved more reliable.

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ screencapture│ ──► │   .mov      │ ──► │   ffmpeg    │ ──► .mp4
│  (native)   │     │  (raw)      │     │  (convert)  │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
                    ┌─────────────┐     ┌─────────────┐
                    │  audio.m4a  │ ──► │   ffmpeg    │ ──► final.mp4
                    │   (mic)     │     │   (mux)     │
                    └─────────────┘     └─────────────┘
```

### Screen Capture (`src-tauri/src/recording.rs`)

**Why screencapture over FFmpeg avfoundation?**

- FFmpeg avfoundation has linking issues on some macOS versions
- screencapture is more reliable and handles permissions better
- Output is .mov (QuickTime native), converted to .mp4 for web compatibility

**Capture Modes:**

| Mode    | Flag            | Description                            |
| ------- | --------------- | -------------------------------------- |
| Display | (default)       | Captures primary display               |
| Window  | `-l <windowid>` | Captures specific window by CGWindowID |
| Region  | `-R x,y,w,h`    | Captures rectangular region            |

### Region Capture (macOS): Coordinate Spaces + Persistent Frame

Region capture is easy to get subtly wrong on macOS (especially on Retina / mixed-DPI setups) because
different layers report coordinates in different units.

**Coordinate spaces involved**

- **Region selector UI (React)**: mouse events are **CSS pixels** (logical points).
- **CoreGraphics (Rust)**:
    - `CGEventTap` click locations are **global display pixels**.
    - `CGDisplay` bounds/coordinates are in the global CoreGraphics space (treat as pixels for capture math).
- **`screencapture`** expects consistent coordinates; mixing points vs pixels produces **offset crops** and
  can yield **black/blank videos** (cropping into the wrong part of the captured display).

**Rule: use physical pixels end-to-end**

We normalize to **global physical pixels** for the entire region pipeline:

1. `src/views/RegionSelectorView.tsx` converts from CSS pixels to physical pixels using
   `window.scaleFactor()` and `window.innerPosition()`, then emits `{ x, y, width, height }` in global pixels.
2. `src-tauri/src/commands.rs` resolves the chosen region to a `screencapture -D <index>` display and stores
   `metadata.crop_region` as a local-to-display pixel crop rectangle.
3. Recording captures the full display via `screencapture -D <index>`, then FFmpeg applies the crop during
   MOV -> MP4 conversion:

```bash
ffmpeg -i recording.mov -vf "crop=w:h:x:y" ... recording.mp4
```

This avoids `screencapture -R` inconsistencies across multi-monitor setups and makes the final MP4
deterministic.

**Persistent region frame (UX)**

The region selector window is kept visible as a frame + outside-dim mask so users can continue working and
still see what will be recorded:

- The window is made click-through (`set_ignore_cursor_events`) after confirming the region.
- The blue frame is drawn using an **outline outside** the crop area so it does not get baked into the
  cropped recording.

If you ever see region recordings come out offset or black again, the first thing to check is whether any
part of the pipeline has slipped back to mixing logical/CSS points with physical pixels.

**screencapture flags used:**

```bash
screencapture -v -C -k [-l windowid | -R x,y,w,h] output.mov
```

- `-v` — Video recording mode
- `-C` — Capture cursor
- `-k` — Don't show click animation (we render our own)
- `-l` — Window capture by ID
- `-R` — Region capture with bounds

**MOV → MP4 Conversion:**

```bash
ffmpeg -i recording.mov \
  -c:v libx264 -preset fast -crf 23 \
  -pix_fmt yuv420p -movflags +faststart \
  -y recording.mp4
```

**Fallback:** If FFmpeg conversion fails, the .mov is renamed to .mp4 (browsers can usually play it).

### Audio Capture

**Microphone recording** (parallel to screen capture):

```bash
ffmpeg -f avfoundation -i :0 \
  -acodec aac -b:a 128k -ar 44100 -ac 1 \
  -y audio.m4a
```

**Audio muxing** (post-recording):

```bash
ffmpeg -i recording.mp4 -i audio.m4a \
  -map 0:v:0 -map 1:a:0 \
  -c:v copy -c:a aac \
  -movflags +faststart \
  -y final.mp4
```

**Duration validation:** Muxed output is validated — if duration is <80% of original video, mux is rejected and original video preserved.

---

## Event Capture System

Events are captured from multiple sources and written to `events.ndjson`.

### Architecture

```
┌────────────────────┐     ┌────────────────────┐     ┌──────────────────────────┐
│   CGEventTap       │     │   PyObjC Quartz    │     │   Chrome Extension (MV3)  │
│   (Rust/clicks)    │     │   (Python/a11y)    │     │   (DOM semantics)         │
└─────────┬──────────┘     └─────────┬──────────┘     └────────────┬─────────────┘
          │                          │                             │
          ▼                          ▼                             ▼
   clicks.json              events.ndjson                 /api/events/append
   (session dir)            (session dir)                       │
                                                                 ▼
                                                          events.ndjson
```

### Browser DOM Semantics (Chrome Extension)

For browser workflows, ShowRunner can ingest **DOM-level semantic events** via a lightweight
Chrome extension that forwards `dom_*` events to the sidecar `/api/events/append` endpoint while
recording is active. See:

- `docs/SHOWRUNNER_DOM_CAPTURE_EXTENSION.md`

### Click Capture (`src-tauri/src/click_capture.rs`)

**Technology:** CGEventTap (Core Graphics event tap)

**How it works:**

1. Creates a CGEventTap listening for `LeftMouseDown` and `RightMouseDown`
2. Runs on a dedicated thread with its own CFRunLoop
3. Captures click position as percentage of recording dimensions (0-100)
4. Stores clicks with millisecond timestamps relative to recording start

**Event tap setup:**

```rust
CGEventTapCreate(
    CGEventTapLocation::HID,           // Hardware level
    CGEventTapPlacement::HeadInsertEventTap,
    CGEventTapOptions::ListenOnly,     // Passive observer
    event_mask,                        // LeftMouseDown | RightMouseDown
    callback,
    user_info
)
```

**Click event format:**

```json
{
    "timestamp_ms": 1234,
    "x": 45.2,
    "y": 67.8,
    "button": "left"
}
```

**Permissions:** Requires Accessibility permission. If tap creation fails, capture continues without click tracking.

### Keyboard/Accessibility Capture (`python_sidecar/capture/accessibility.py`)

**Technology:** PyObjC (NSEvent global monitor via Quartz)

**EventTapWatcher class:**

- Runs on daemon thread with Quartz event tap
- Captures keyboard events with modifier keys
- Polls frontmost window context every 80ms
- Emits `app_focus` events on window changes

**Modifier flags:**

```python
MODIFIER_FLAGS = {
    "cmd": 1 << 20,   # NSEventModifierFlagCommand
    "alt": 1 << 19,   # NSEventModifierFlagOption
    "ctrl": 1 << 18,  # NSEventModifierFlagControl
    "shift": 1 << 17, # NSEventModifierFlagShift
}
```

**Special keycodes:**

```python
SPECIAL_KEYCODES = {
    36: "enter", 48: "tab", 49: "space",
    51: "backspace", 53: "escape",
    123: "left", 124: "right", 125: "down", 126: "up"
}
```

**Window context detection:**
Uses both `NSWorkspace.frontmostApplication()` and `CGWindowListCopyWindowInfo()` for reliability. Quartz window ordering is preferred as it reflects actual topmost window.

**Event format:**

```json
{
    "type": "keypress",
    "ts_ms": 1234,
    "key": "<redacted>",
    "key_type": "alphanumeric",
    "redacted": true,
    "modifiers": ["cmd", "shift"],
    "app": "Visual Studio Code",
    "window": "main.tsx - showrunner"
}
```

### Privacy defaults

- Keystrokes are metadata-first by default.
- Alphanumeric key values are redacted as `"<redacted>"` unless explicitly configured otherwise.
- Special keys (e.g. `enter`, `backspace`, arrow keys) and modifier combos are still recorded for workflow reconstruction.

---

## Session Storage

### Directory Structure

```
~/Library/Application Support/dev.showrunner.app/sessions/
└── {uuid}/
    ├── metadata.json      # Session info, hotspots
    ├── recording.mp4      # Screen recording (immutable)
    ├── events.ndjson      # Append-only event log
    ├── clicks.json        # Click positions (from Rust)
    ├── steps.json         # Structured workflow steps
    ├── edits.json         # Non-destructive edit patches
    ├── audio.m4a          # Mic recording (optional)
    └── terminal.cast      # Terminal recording (optional)
```

### metadata.json

```json
{
    "session_id": "abc123",
    "created_at": "2024-01-15T10:30:00Z",
    "display": {
        "id": "1",
        "resolution": "2560x1440",
        "refresh_rate": 60
    },
    "duration_ms": 45000,
    "event_count": 127,
    "recording_path": "recording.mp4",
    "events_path": "events.ndjson",
    "audio_path": "audio.m4a",
    "title": "Deploy to production",
    "has_terminal": false,
    "git_branch": "main",
    "git_commit": "abc1234",
    "hotspots": []
}
```

### events.ndjson Format

**Line-delimited JSON** — each line is a standalone event:

```json
{"type": "app_focus", "ts_ms": 0, "app": "Terminal", "window": "zsh"}
{"type": "keypress", "ts_ms": 1234, "key": "<redacted>", "key_type": "alphanumeric", "redacted": true, "modifiers": ["cmd"], "app": "Finder", "window": "Downloads"}
{"type": "click", "ts_ms": 2456, "x": 45.2, "y": 67.8, "button": "left"}
{"type": "marker", "ts_ms": 5000, "label": "Open settings"}
{"type": "file_change", "ts_ms": 6789, "path": "src/main.ts", "action": "modified"}
{"type": "shortcut", "ts_ms": 7021, "key": "<redacted>", "modifiers": ["shift"], "redacted": true}
{"type": "mouse_scroll", "ts_ms": 7444, "direction": "down", "dx": 0, "dy": -2}
```

**Event types:**

| Type             | Source           | Fields                         |
| ---------------- | ---------------- | ------------------------------ |
| `app_focus`      | Python sidecar   | app, window                    |
| `keypress`       | Python sidecar   | key, modifiers, app, window    |
| `shortcut`       | Python sidecar   | key, modifiers, app, window    |
| `mouse_scroll`   | Python sidecar   | direction, dx, dy              |
| `click`          | Rust CGEventTap  | x, y, button                   |
| `dom_navigation` | Chrome extension | url, title, transition         |
| `dom_action`     | Chrome extension | action, url, target, value_len |
| `marker`         | User action      | label                          |
| `file_change`    | Python watchdog  | path, action                   |
| `command`        | Terminal capture | text, cwd                      |

### steps.json Format

```json
{
    "steps": [
        {
            "id": "step-1",
            "title": "Open Terminal",
            "start_ms": 0,
            "end_ms": 3000,
            "description": "Launch Terminal application",
            "actions": [{ "type": "click", "target": "Dock", "label": "Terminal" }]
        }
    ]
}
```

### Hotspot Schema

```typescript
interface Hotspot {
    id: string;
    frameStart: number; // Start frame
    frameEnd: number; // End frame
    x: number; // % of video width (0-100)
    y: number; // % of video height (0-100)
    width: number; // % of video width
    height: number; // % of video height
    style: "highlight" | "pulse" | "arrow";
    action:
        | { type: "tooltip"; text: string }
        | { type: "seek"; targetFrame: number }
        | { type: "url"; href: string; newTab: boolean };
}
```

---

## Window Management

### Window Labels

| Label               | Purpose                        | URL                           |
| ------------------- | ------------------------------ | ----------------------------- |
| `main`              | Library, Editor, RecordingView | `/`, `/editor/:id`, `/record` |
| `recording-overlay` | Floating recording controls    | `/overlay`                    |
| `region-selector`   | Fullscreen region selection    | `/region-selector`            |

### Overlay Window (`src-tauri/src/overlay.rs`)

**Characteristics:**

- Transparent background
- Always on top
- No title bar or decorations
- Positioned at top-center of primary display

**Sizing:**

| State         | Size (logical) |
| ------------- | -------------- |
| Pre-recording | 700 × 80       |
| Recording     | 340 × 64       |

**Positioning:**

```rust
let x = (screen_width - 700.0) / 2.0;  // Center horizontally
let y = 40.0;                           // 40px from top
window.set_position(LogicalPosition::new(x, y));
```

### Window Communication

Windows communicate via Tauri events:

```typescript
// From overlay to main window
import { emitTo } from "@tauri-apps/api/event";

await emitTo("main", "navigate-to-editor", { sessionId: "abc123" });
```

```typescript
// Main window listens
import { listen } from "@tauri-apps/api/event";

await listen("navigate-to-editor", async (event) => {
    const mainWindow = await WebviewWindow.getByLabel("main");
    await mainWindow?.show();
    navigate(`/editor/${event.payload.sessionId}`);
});
```

**Important:** Always use `WebviewWindow.getByLabel("main")` instead of `getCurrentWindow()` when targeting the main window from event handlers. `getCurrentWindow()` returns the window where the code executes, which may be the overlay.

---

## Python Sidecar

### Lifecycle

```
Tauri app starts
      │
      ▼
sidecar::start_sidecar()
      │
      ├── Find python_sidecar/ directory
      ├── Locate Python binary (venv or system)
      ├── Verify runtime deps (fastapi, uvicorn, pydantic)
      │
      ▼
spawn: python -m python_sidecar.main
      │
      ├── Stdout/stderr → /tmp/showrunner-sidecar.log
      │
      ▼
Wait for health check: GET http://127.0.0.1:8765/health
      │
      ▼
Sidecar ready
```

### Python Binary Resolution

Priority order:

1. `python_sidecar/.venv/bin/python` (project venv)
2. System `python3`

Before spawning, validates runtime dependencies:

```python
python -c "import fastapi,uvicorn,pydantic"
```

### API Endpoints

| Endpoint                | Method | Purpose                  |
| ----------------------- | ------ | ------------------------ |
| `/health`               | GET    | Health check             |
| `/api/events/start`     | POST   | Start event capture      |
| `/api/events/stop`      | POST   | Stop event capture       |
| `/api/export/markdown`  | POST   | Generate markdown docs   |
| `/api/export/mp4`       | POST   | Render MP4 with overlays |
| `/api/analysis/analyze` | POST   | AI step detection        |
| `/api/terminal/start`   | POST   | Start terminal capture   |
| `/api/terminal/stop`    | POST   | Stop terminal capture    |

### Event Capture Flow

```
POST /api/events/start
  body: { session_dir, project_path?, recording_start_ms? }
      │
      ▼
Open events.ndjson for append
      │
      ├── Start EventTapWatcher (keyboard/window context)
      ├── Start FSWatcher (file changes) [optional]
      │
      ▼
Events written to file with normalized timestamps
      │
      ▼
POST /api/events/stop
      │
      ▼
Close watchers, return event count
```

### Timestamp Normalization

All events are normalized to relative timestamps:

```python
raw_ts = event.get("timestamp_ms", 0)
normalized["ts_ms"] = max(0, raw_ts - recording_start_ms)
```

---

## AI Analysis Pipeline

### Overview

```
events.ndjson
      │
      ▼
RecordingAnalyzer.analyze_recording()
      │
      ├── Load events (truncate if >200)
      ├── Format with STEP_DETECTION_PROMPT
      │
      ▼
Claude API (claude-3-5-sonnet)
      │
      ▼
JSON response with detected steps
      │
      ▼
steps.json
```

### RecordingAnalyzer (`python_sidecar/ai/analyzer.py`)

```python
class RecordingAnalyzer:
    def analyze_recording(self, events_path: str):
        events = load_ndjson(events_path)

        # Truncate to first 100 + last 100 if too many
        if len(events) > 200:
            events = events[:100] + events[-100:]

        prompt = STEP_DETECTION_PROMPT.format(events_json=events)

        response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4000,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        return json.loads(response.content[0].text)
```

### Step Detection Output

```json
{
    "steps": [
        {
            "id": "step-1",
            "title": "Open terminal",
            "start_ms": 0,
            "end_ms": 2500,
            "description": "User opens Terminal from Dock",
            "confidence": 0.92
        },
        {
            "id": "step-2",
            "title": "Navigate to project",
            "start_ms": 2500,
            "end_ms": 5000,
            "description": "cd into project directory",
            "confidence": 0.88
        }
    ],
    "summary": "This workflow demonstrates deploying a Node.js application"
}
```

### Future: Action Typing

Planned enhancement to enrich events with semantic targets:

```json
{
    "type": "click",
    "ts_ms": 1234,
    "x": 45.2,
    "y": 67.8,
    "target": {
        "role": "button",
        "label": "Submit",
        "bounds": [390, 210, 460, 260],
        "confidence": 0.72
    }
}
```

This requires accessibility API integration (app-dependent).

---

## Error Handling

### Recording Errors

```rust
pub enum AppError {
    RecordingCrashed { exit_code: i32, stderr: String },
    SessionNotFound { session_id: String },
    // ...
}
```

### Sidecar Errors

If Python sidecar fails to start:

1. Logs error to console
2. App continues in degraded mode (no sidecar-derived key/app/scroll events)
3. Recording still works (video + clicks from Rust)
4. Session is marked as capture-limited in metadata (`capture_limited`, `capture_limit_reason`)

### Graceful Degradation

| Component Failure   | Impact                                                                     |
| ------------------- | -------------------------------------------------------------------------- |
| Python sidecar down | No sidecar keyboard/app-focus/scroll events; video + clicks still captured |
| CGEventTap denied   | No click capture (video still works)                                       |
| FFmpeg missing      | MOV renamed to MP4 (usually still playable)                                |
| Anthropic API error | AI analysis fails, manual step creation still works                        |

---

## Editor Heuristics & Export UX

### Auto Steps / Generate Steps from Activity

`Auto Steps` in the editor is a **client-side heuristic pipeline** over loaded `events.ndjson` + `clicks.json`. It does not call backend analysis for this button.

Flow:

1. Load normalized markers and clicks for the session.
2. Infer activity segments from DOM semantics (when present), app focus, click density, scroll patterns, and optional OCR click hints.
3. Write generated `steps` into in-memory editor state (and persisted `steps.json` via normal autosave).

Operational note:

- You should not expect Rust-side terminal logs when pressing `Auto Steps`; it runs in the frontend.
- Editor now emits explicit UI feedback for Auto Steps success/failure (including generated counts).

### Export UX behavior

Exports are Tauri commands:

- `export_markdown(session_id, dest_path?)` -> creates `ShowRunner-<title>/README.md` (+ screenshots folder when present)
- `export_mp4(session_id, dest_path, options)` -> copies or renders chapter-card MP4

Editor UX:

- Export shows persistent success/failure feedback.
- Success includes concrete output path.
- Actions include copy path and reveal in file manager.
