# Phase 4 Plan: Assisted Replay Targeting (macOS, Local-Only)

## Scope

**Goal:** Make assisted replay viable by attaching **robust, re-identifiable targets** to user actions (especially clicks), using **UI semantics first** and **vision/OCR as fallback**.

**Constraints**

- macOS-only for Phase 4
- Local-only (no remote/GPU services)

**Non-goals (Phase 4)**

- Full autonomous replay of arbitrary workflows
- Cross-platform support (Windows/Linux)
- General "object detection" over arbitrary UI as the primary strategy

## Definitions

**Coordinate contract (critical):** all screen-space rectangles/points stored for replay must be in **global physical pixels**.

- Browser/UI coordinates (CSS pixels / points) must be converted using `scaleFactor`.
- Store provenance needed to debug transforms: `display_index`, `scale_factor`, `video_resolution`, `crop_region`, etc.

## Target Bundle (Proposed)

Each click should be able to carry multiple identifiers so replay can resolve deterministically:

```json
{
  "target": {
    "strategy": ["dom", "a11y", "ocr", "vision"],
    "dom": { "selector": "...", "role": "button", "name": "Save", "rect_px": [x, y, w, h] },
    "a11y": { "role": "AXButton", "name": "Save", "path": ["Window", "Toolbar", "Save"], "rect_px": [x, y, w, h] },
    "ocr": { "text": "Save" },
    "vision": { "type": "button", "rect_px": [x, y, w, h], "confidence": 0.72 }
  }
}
```

Replay resolution order: `dom -> a11y -> ocr -> vision`.

---

## Task 1: Define Target Schema + Coordinate Contract

**User Story**

As a developer, I want a single "target bundle" schema that can represent DOM/a11y/OCR/vision targets in **global physical pixels**, so downstream systems (editor, resolver, replay) do not diverge.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: Target bundle schema is available across the stack
  Given a click event is captured
  When the event is serialized into events.ndjson
  Then the event may include a "target" object
  And any "rect_px" or point fields are expressed in global physical pixels

Scenario: Coordinate contract is documented and enforced
  Given a UI component emits logical/CSS pixel coordinates
  When those coordinates are used for capture or replay
  Then they are converted to global physical pixels using scaleFactor before storage
  And the stored payload includes enough provenance to debug transforms
```

**Testing Strategy**

- Automated
    - TypeScript: add/extend types so `target` is strongly typed (compile-time enforcement).
    - Rust/Python: lightweight unit tests for any pure "coordinate conversion / provenance" helpers.
- Manual
    - N/A (schema-only change).

**Commit Message**

- `feat(targeting): define target bundle schema and global-pixel coordinate contract`

---

## Task 2: Capture macOS Accessibility (AX) Target on Click (Sidecar)

**User Story**

As a user, when I click in a desktop app, I want ShowRunner to capture the **accessibility element identity** (role/name/bounds/path), so assisted replay can re-find the same control even if layout shifts.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: A11y target is attached to click events
  Given the sidecar event tap is active
  When the user clicks on an accessibility-exposed control
  Then the emitted click event includes target.a11y with role and name
  And target.a11y includes a bounding rect in global physical pixels

Scenario: Capture degrades gracefully when AX is unavailable
  Given the sidecar does not have Accessibility permission
  When the user clicks
  Then the click event is still emitted
  And the event does not include target.a11y
  And the capture session is marked as limited with a reason
```

**Testing Strategy**

- Automated
    - Unit test: "shape" of the emitted event payload (no AX calls; mock the descriptor).
    - Lint/type checks for sidecar code.
- Manual (macOS)
    - Grant Accessibility permission, record clicks in: Finder, Safari, VS Code.
    - Confirm `events.ndjson` click lines include `target.a11y.*`.
    - Revoke permission, confirm capture continues and "limited" reason is recorded.

**Commit Message**

- `feat(sidecar): attach AXUIElement target metadata to click events`

---

## Task 3: Normalize DOM Targets Into the Same Bundle (Browser Capture)

**User Story**

As a user, when I click in the browser and DOM semantics are available, I want the captured target to include selector/role/name so replay prefers DOM over fallback strategies.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: DOM events map into target.dom
  Given a dom_action event is ingested from the extension
  When it is written to events.ndjson
  Then it includes a target.dom payload (selector/role/name when present)

Scenario: DOM targets are preferred during resolution
  Given an action has both target.dom and target.a11y
  When the resolver searches for the target
  Then it attempts DOM resolution first
```

**Testing Strategy**

- Automated
    - Unit test for DOM event normalization/mapping.
- Manual
    - Record a browser workflow with the extension enabled.
    - Inspect `events.ndjson` to confirm `target.dom` fields appear.

**Commit Message**

- `feat(dom): normalize browser semantics into target bundle`

---

## Task 4: Editor Surfacing (Badges + Inspector)

**User Story**

As a user, I want to see whether a click has a robust target (`DOM`, `A11Y`, `OCR`) so I can trust replay and quickly diagnose when capture was "coordinate-only".

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: Editor displays targeting method badge per action
  Given a session contains click actions with target metadata
  When I view the editor timeline/steps
  Then each action shows a badge indicating which strategies are available (DOM/A11Y/OCR)

Scenario: Editor provides a target inspector for debugging
  Given a click action has target metadata
  When I open the inspector
  Then I can view the target bundle and coordinate provenance
```

**Testing Strategy**

- Automated
    - Component test (or unit test) for badge rendering based on payload.
    - TypeScript build.
- Manual
    - Record a session with browser + a11y targets and verify badges/inspector.

**Commit Message**

- `feat(editor): show target strategy badges and target inspector`

---

## Task 5: OCR Fallback (Local) for "Text-Anchor" Targets

**User Story**

As a user, when a11y/DOM are missing or weak, I want ShowRunner to derive a usable text anchor (OCR) around click/step moments so assisted replay can still guide me.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: OCR enrichment is available for clicks/markers
  Given a session has recorded video
  When OCR enrichment runs for sampled timestamps
  Then the system produces an OCR text anchor for those moments
  And the editor can show OCR-derived labels/targets when available

Scenario: OCR is optional and does not break capture
  Given OCR tooling is not installed/available
  When the user opens the editor
  Then the editor still loads the session
  And OCR enrichment is skipped with a clear debug signal
```

**Testing Strategy**

- Automated
    - Unit test for OCR result parsing/normalization (pure functions).
- Manual
    - With OCR installed: verify OCR anchors appear for an Electron app with weak a11y labels.
    - Without OCR installed: verify no crashes and a graceful "unavailable" path.

**Commit Message**

- `feat(ocr): add local OCR fallback anchors for click targets`

---

## Task 6: Global-Pixel -> Video-Pixel Projection (for Overlays/Validation)

**User Story**

As a user, I want click/target overlays to line up with the recorded video (including region crops), so the editor and ghost overlay can highlight the right place.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: Click targets project correctly into video coordinates
  Given a region-recorded session includes crop_region metadata
  When projecting a global pixel point into the recorded video
  Then the computed video pixel coordinate falls within the video bounds
  And the overlay renders aligned with the content

Scenario: Projection is consistent across DPI and multi-display
  Given a Retina display or mixed-DPI setup
  When projecting points/rects
  Then the overlay aligns without systematic offset
```

**Testing Strategy**

- Automated
    - Unit tests for projection math (pure functions) using fixtures:
        - full-display recording
        - region recording with crop_region
        - multi-display bounds
- Manual
    - Record region + full-display sessions; confirm click ripples/target boxes align.

**Commit Message**

- `feat(targeting): project global pixel targets into video space using crop metadata`

---

## Task 7: Local Target Resolver API (A11Y-first, OCR fallback)

**User Story**

As a user, I want ShowRunner to re-find the target control on my live desktop so assisted replay can draw a ghost overlay and validate actions.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: Resolver returns a bounding box for an a11y target
  Given a click includes target.a11y
  When the resolver is asked to resolve it on the current desktop
  Then it returns a bbox in global physical pixels
  And a confidence score and method used ("a11y")

Scenario: Resolver falls back to OCR when AX resolution fails
  Given a click includes an OCR text anchor
  And a11y resolution fails
  When the resolver runs
  Then it attempts OCR-based resolution
  And returns a bbox and method ("ocr") if found
```

**Testing Strategy**

- Automated
    - Unit tests for resolver scoring/matching (mocked "screen state" providers).
- Manual (macOS)
    - Resolve targets in Finder/Safari/VS Code.
    - Move/resize windows and confirm resolver still finds targets by semantics.

**Commit Message**

- `feat(replay): add local target resolver (a11y-first, ocr fallback)`

---

## Task 8: Ghost Overlay + Step Validation (Phase 4 deliverable)

**User Story**

As a user, I want an on-screen highlight of "what to do next" and validation that I clicked the expected control, so assisted replay can guide me reliably.

**Acceptance Criteria (Gherkin)**

```gherkin
Scenario: Ghost overlay highlights resolved target
  Given a step contains a resolvable target
  When assisted replay mode is enabled
  Then a ghost overlay highlights the target bbox on screen

Scenario: Validation flags incorrect clicks
  Given the ghost overlay highlights a target
  When the user clicks outside the target bbox
  Then the system marks the step as not completed
  And provides a user-visible hint to retry
```

**Testing Strategy**

- Automated
    - UI unit tests for overlay state machine (no system capture).
- Manual (macOS)
    - End-to-end walkthrough with 5-10 steps across 2-3 apps.
    - Confirm highlight appears, updates per step, and validates click correctness.

**Commit Message**

- `feat(replay): implement ghost overlay highlighting and click validation`
