# Current upstream support matrix

Related docs:
- [`../README.md`](../README.md)
- [`../AGENTS.md`](../AGENTS.md) (rebaselining and verification stack)
- [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md)
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
- [`ELECTRON.md`](ELECTRON.md)
- [`RELEASE.md`](RELEASE.md)
- [`REQUIREMENTS.md`](REQUIREMENTS.md)

## Purpose

This is the durable release-readiness checklist for the targeted upstream version (`agent-browser` version in `CAPABILITY_BASELINE.targetVersion`). It maps the canonical capability baseline in [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) to documentation, runtime handling, tests, and validation evidence. Update it whenever the baseline version or inventory changes.

## Maintainer refresh checklist

When upstream ships a new `agent-browser` or the inventory changes:

1. Edit [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) (`targetVersion`, `helpCommands`, `inventorySections`) using real `--help` output from the binary you intend to target (the file never shells out to `agent-browser`).
2. Align human prose and required tokens in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) outside the generated HTML-comment blocks.
3. Regenerate bounded blocks with `npm run docs -- command-reference write`, then run `npm run docs` (or `npm run docs -- command-reference check`).
4. Update the **Baseline checklist by inventory section** table below so each `CAPABILITY_BASELINE.inventorySections[].id` row still points at the right docs, code, tests, and status notes.
5. Re-run the gates in **Verification evidence** on a machine that matches release expectations (`pi`, `tmux`, model config for lifecycle) and replace the dated status cells with fresh outcomes.

## Audit result

- Target upstream: `agent-browser 0.27.0` (must match `CAPABILITY_BASELINE.targetVersion` in [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs)).
- Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
- Status: supported for the current wrapper contract.
- High-priority support gaps: none identified in the baseline audit.
- Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).

## Verification evidence

Re-run the gates below before each release; this table records what the closure audit exercised.

| Gate | Evidence | Status |
| --- | --- | --- |
| Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-05-18 (`npm run verify`, `agent-browser 0.27.0` on `PATH`). |
| Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-05-18 (`npm run verify -- real-upstream`). |
| Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads exactly one packaged `agent_browser` tool, and executes fake-upstream `--version`. | Pass on 2026-05-18 (`npm run verify -- package-pi`). |
| `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits lifecycle, real-upstream, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-05-18 (`npm run verify -- release`). `prepublishOnly` still needs a fresh run during actual publish. |
| Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-18 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
| Quick isolated Pi smoke | `pi --no-extensions -e .` from repo root; native `agent_browser` only. | Pass on 2026-05-18 for a fresh interactive tmux smoke: the agent opened `https://example.com`, waited for `Example Domain`, saved `/tmp/piab-isolated-smoke.png` with verified `image/png` artifact metadata, closed the browser session, and reported PASS. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |

## Baseline checklist by inventory section

| Baseline section | Baseline items | Documentation | Runtime handling | Test coverage | Validation status |
| --- | --- | --- | --- | --- | --- |
| Built-in skills | `skills list`, `skills get core`, `skills get core --full`, `skills get <name>`, `skills get electron`, `skills get slack`, `skills get dogfood`, `skills get vercel-sandbox`, `skills get agentcore`, `skills path [name]` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#built-in-skills), generated baseline block, README proof section, release docs. | `isStatelessInspectionCommand` keeps read-only `skills list` / `skills get` / `skills path` JSON inspection stateless while preserving thin upstream passthrough. | `test/agent-browser.runtime.test.ts`; `test/agent-browser.extension-validation.test.ts` skills/provider matrix; real-upstream inspection/skills group. | Supported. Real upstream covers `skills list`, `skills get core --full`, `skills path core`; fake matrix covers specialized skills. |
| Core page, element, navigation, and extraction commands | `open <url>`, `click <sel>`, `dblclick <sel>`, `type <sel> <text>`, `fill <sel> <text>`, `press <key>`, `keyboard type <text>`, `keyboard inserttext <text>`, `keydown Shift`, `keyup Shift`, `hover <sel>`, `focus <sel>`, `check <sel>`, `uncheck <sel>`, `select <sel> <val...>`, `drag <src> <dst>`, `upload <sel> <files...>`, `download <sel> <path>`, `scroll <dir> [px]`, `scrollintoview <sel>`, `wait <sel|ms>`, `screenshot [path]`, `screenshot --full`, `screenshot --annotate`, `pdf <path>`, `snapshot`, `eval <js>`, `connect <port|url>`, `close [--all]`, `back`, `forward`, `reload`, `pushstate <url>`, `get <what> [selector]`, `is <what> <selector>`, `find <locator> <value> <action>`, `mouse <action> [args]`, `set <setting> [value]` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#core-page-and-element-commands), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md), README quick start. | Thin upstream passthrough with wrapper-owned `--json`, managed-session planning, stale-ref guidance, artifact verification, page-change summaries, no-op scroll diagnostics, focused-combobox diagnostics, first-class `semanticAction` / `job` compile paths for upstream `select`, and redaction. | Real-upstream core matrix covers representative interactions/navigation/extraction/artifacts plus raw/semantic/job `select`; fake core matrix covers additional passthrough and ordering plus no-op scroll, combobox-focus diagnostics, and select compiler validation. | Supported. Some upstream semantics remain upstream-owned; wrapper contract and artifact metadata are tested. |
| Sessions, state, tabs, frames, dialogs, and windows | `session`, `session list`, `state save <path>`, `state load <path>`, `tab list`, `tab new --label <name> [url]`, `tab <t<N>|label>`, `frame <selector|main>`, `dialog accept [text]`, `dialog dismiss`, `dialog status`, `window new` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands) (session/state/tabs/frames/dialogs/windows), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful presentation summaries/redaction; state save artifact handling; explicit/implicit session restore; tab target pinning; frame/dialog/window passthrough. | `test/agent-browser.extension-validation.test.ts` stateful matrix; runtime session/resume tests; presentation stateful redaction tests; lifecycle harness for reload/resume. | Supported. External profile/auth state remains operator-owned and documented. |
| Network, storage, artifacts, diagnostics, and performance | `network <action>`, `network route <url> [--abort|--body <json>] [--resource-type <csv>]`, `network request <requestId>`, `cookies [get|set|clear]`, `cookies set --curl <file>`, `storage <local|session>`, `diff snapshot`, `diff screenshot --baseline`, `diff url <u1> <u2>`, `trace start|stop [path]`, `profiler start|stop [path]`, `record start <path> [url]`, `record restart <path> [url]`, `record stop`, `console [--clear]`, `errors [--clear]`, `highlight <sel>`, `inspect`, `clipboard <op> [text]`, `stream enable [--port <n>]`, `stream disable`, `stream status`, `react tree`, `react inspect <id>`, `react renders start`, `react renders stop [--json]`, `react suspense [--only-dynamic] [--json]`, `vitals [url] [--json]`, `removeinitscript <id>` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage) and diagnostic sections; [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus command-specific compact diagnostic summaries, artifact metadata for HAR/diff/trace/profile/record, early missing-ffmpeg recording warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix covers network/diff/trace/profiler/record/console/errors/highlight/inspect/clipboard/stream/dashboard/chat JSON shapes and redaction; real-upstream covers safe network requests/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Browser-opening or environment-sensitive operations (`inspect`, OS clipboard, full React app inspection) are delegated thinly and documented as needing suitable local/browser state. |
| Batch, auth, confirmations, setup, dashboard, and AI commands | `batch [--bail]`, `auth save <name>`, `auth save <name> --password-stdin`, `auth login <name>`, `auth list`, `auth show <name>`, `auth delete <name>`, `confirm <id>`, `deny <id>`, `chat <message>`, `dashboard start --port <n>`, `dashboard stop`, `install`, `install --with-deps`, `upgrade`, `doctor [--fix]`, `doctor --offline --quick`, `doctor --json`, `profiles` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-and-setup), README security notes, release docs. | Batch stdin is native-tool-only; top-level `job`, `qa`, and experimental `sourceLookup` / `networkSourceLookup` compile to `batch` with generated stdin (caller `stdin` rejected for those modes); job `select` compiles to upstream `select <selector> <value...>`; auth/confirmation details are redacted; dashboard/chat/setup/doctor are passed through thinly with timeout/cleanup guidance; package doctor remains separate and read-only. | Unit/fake tests cover batch, auth password stdin, confirmations, dashboard/chat summaries, and doctor diagnostics; extension-validation covers `job`, `qa`, `sourceLookup`, and `networkSourceLookup` compilation plus `details.sourceLookup` / `details.networkSourceLookup` evidence, including job `select`; [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) includes `source-lookup-visible-element` and `network-source-lookup-failed-request` scenarios; quick isolated Pi smoke covered dashboard start/stop and chat credential-failure pass-through. | Supported. `install`, `upgrade`, `doctor --fix`, and interactive auth/chat/setup flows are upstream-owned and should be run only when the operator intends those side effects. |
| Global flags, config, providers, policy, and environment | `--profile <name|path>`, `AGENT_BROWSER_PROFILE`, `--session <name>`, `AGENT_BROWSER_SESSION`, `--session-name <name>`, `AGENT_BROWSER_SESSION_NAME`, `--state <path>`, `AGENT_BROWSER_STATE`, `--auto-connect`, `AGENT_BROWSER_AUTO_CONNECT`, `--headers <json>`, `--init-script <path>`, `AGENT_BROWSER_INIT_SCRIPTS`, `--enable <feature>`, `AGENT_BROWSER_ENABLE`, `--executable-path <path>`, `AGENT_BROWSER_EXECUTABLE_PATH`, `--extension <path>`, `AGENT_BROWSER_EXTENSIONS`, `--args <args>`, `AGENT_BROWSER_ARGS`, `--user-agent <ua>`, `AGENT_BROWSER_USER_AGENT`, `--proxy <server>`, `AGENT_BROWSER_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `--proxy-bypass <hosts>`, `AGENT_BROWSER_PROXY_BYPASS`, `NO_PROXY`, `--ignore-https-errors`, `AGENT_BROWSER_IGNORE_HTTPS_ERRORS`, `--allow-file-access`, `AGENT_BROWSER_ALLOW_FILE_ACCESS`, `--headed`, `AGENT_BROWSER_HEADED`, `--cdp <port>`, `--color-scheme <scheme>`, `AGENT_BROWSER_COLOR_SCHEME`, `--download-path <path>`, `AGENT_BROWSER_DOWNLOAD_PATH`, `--engine <name>`, `AGENT_BROWSER_ENGINE`, `--no-auto-dialog`, `AGENT_BROWSER_NO_AUTO_DIALOG`, `--json`, `AGENT_BROWSER_JSON`, `--annotate`, `AGENT_BROWSER_ANNOTATE`, `--screenshot-dir <path>`, `AGENT_BROWSER_SCREENSHOT_DIR`, `--screenshot-quality <n>`, `AGENT_BROWSER_SCREENSHOT_QUALITY`, `--screenshot-format <fmt>`, `AGENT_BROWSER_SCREENSHOT_FORMAT`, `--content-boundaries`, `AGENT_BROWSER_CONTENT_BOUNDARIES`, `--max-output <chars>`, `AGENT_BROWSER_MAX_OUTPUT`, `--allowed-domains <list>`, `AGENT_BROWSER_ALLOWED_DOMAINS`, `--action-policy <path>`, `AGENT_BROWSER_ACTION_POLICY`, `--confirm-actions <list>`, `AGENT_BROWSER_CONFIRM_ACTIONS`, `--confirm-interactive`, `AGENT_BROWSER_CONFIRM_INTERACTIVE`, `-p, --provider <name>`, `AGENT_BROWSER_PROVIDER`, `browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`, `--device <name>`, `AGENT_BROWSER_IOS_DEVICE`, `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, `agent-browser -p ios tap @e1`, `--model <name>`, `AI_GATEWAY_MODEL`, `-v, --verbose`, `-q, --quiet`, `--debug`, `AGENT_BROWSER_DEBUG`, `AGENT_BROWSER_CONFIG`, `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGENT_BROWSER_STREAM_PORT`, `AGENT_BROWSER_IDLE_TIMEOUT_MS`, `AGENT_BROWSER_ENCRYPTION_KEY`, `AGENT_BROWSER_STATE_EXPIRE_DAYS`, `AGENT_BROWSER_IOS_UDID`, `AI_GATEWAY_URL`, `AI_GATEWAY_API_KEY` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles value flags, launch-scoped flags, redacted invocation echoes, `sessionMode: "fresh"` recovery hints, explicit sessions, and provider/device launch-scoping. Process env forwards a curated allowlist/prefix set for upstream/provider credentials without cloning the whole parent env. | Runtime tests cover launch-scoped flags, provider/device planning, redaction, stateless inspections, and explicit/fresh sessions. Process tests cover provider env prefixes. Fake provider/specialized-skill matrix covers provider argv/env passthrough. Package doctor checks version/source drift. | Supported. Provider clouds, iOS/Appium, Browserbase/Kernel/BrowserUse/Browserless/AgentCore, proxies, profiles, and credentials require external setup; the wrapper documents and forwards them thinly rather than emulating provider behavior. |

## Follow-up decision after closure

Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`, and first-class Electron lifecycle/probe support are shipped.

`RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification). Wrapper-tracked packaged Electron no-candidate results now add bounded `workspaceRoot` / `electronContext` when available, limitations that the scan only covers the Pi cwd and does not unpack installed app resources or `app.asar`, and live Electron `snapshot` / `probe` / `tab list` next actions. Fake coverage: `agentBrowserExtension explains packaged Electron sourceLookup no-candidate boundaries` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.

`RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews` and `buildToolPresentation returns bounded network request next actions for benign and successful API rows` in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).

`RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).

`RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance remains the concise stop-boundary and exact-artifact-path contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).

`RQ-0096` ships first-class Electron desktop-app support without adding a generic recipe runtime: top-level `electron` covers wrapper-owned `list`, isolated `launch` with snapshot/tabs/connect handoff, `status`, `cleanup`, and compact current-session or launch-scoped `probe`; `qa.attached` extends the existing QA preset for attached Electron/CDP sessions without introducing `electron.qa`. `launch.handoff` still defaults to `"snapshot"`, while `handoff: "tabs"` is documented as the safer diagnostic starting point when refs/content capture is not needed yet. Host install discovery (`discoverElectronApps`) is macOS/Linux-only today: on Windows `electron.list` reports `platform: "unsupported"` with an empty catalog and name/bundle targets cannot resolve from scans—use `executablePath` (or a host path to the Electron binary) for Windows launch targeting. Discovery adds non-blocking likely-sensitive app annotations plus visible isolated-profile/auth-state warnings; launch output and `details.electron.profileIsolation` state that wrapper launches do not reuse existing signed-in app profiles or attach to already-running authenticated apps, and point agents to the host debug-port launch plus raw `connect` path when signed-in local app state is the goal; launch timeout failures include PID/profile/DevToolsActivePort/timing diagnostics; status/probe add launch/session identifiers, liveness, mismatch/reattach next actions, and dead-launch context for `about:blank`; post-mutation Electron death is upgraded to `tab-drift` with `details.electronPostCommandHealth`; Electron fills can add `details.fillVerification`; Electron `@e…` mutations can add same-URL ref freshness guidance; broad Electron `get text` selectors add scope warnings; cleanup ownership is bounded to wrapper-created launch records and temp profiles; externally launched debug ports stay on the manual `args: ["connect", "<port-or-url>"]` path and remain host-owned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) plus [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) for `qa.attached`; human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps) and README common calls; implementation: `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/electron/`; deterministic efficiency evidence: `electron-lifecycle` and `electron-probe` in `scripts/agent-browser-efficiency-benchmark.mjs`; fake coverage includes Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity and packaged-sourceLookup cases in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts). This plan is the `RQ-0068` revisit evidence for Electron specifically: [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md) documents repeated failure-prone discover/launch/attach/cleanup and multi-call state-probe sequences, plus bounded owner/versioning/test/docs artifacts.

`RQ-0068` remains closed with a no-adopt decision for a reusable named browser recipe runtime. The Electron evidence above justified a narrow typed shorthand and compact probe, not an open-ended recipe layer; future reusable recipes still require concrete repeated workflow evidence and a defined owner/versioning/test plan.

`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label`. Other locator/action pairs omit this block; `semanticAction` `select` now uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find` or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, look for exact normalized role/name matches for the failed target, emit `details.visibleRefFallback` plus visible `Current snapshot ref fallback`, and append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`). The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0087` keeps the RQ-0072 guard but removes `fill` from the batch invalidation latch: `fill @e…` rows remain guarded against stale/missing refs, yet multiple same-snapshot form fills can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `click`, `open`, `reload`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first. This improves login/checkout efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0086` reduces wrapper-induced click fragility found during Sauce Demo smokes: navigation-summary enrichment for click/back/forward/reload/dblclick now uses one read-only `eval` (`({ title: document.title, url: location.href })`) instead of serial `get title` plus `get url` probes, including tab-pinned batch wrappers. Tab pinning/post-command tab correction now runs only after the wrapper has evidence of tab-drift risk (profile restore correction, overlapping stale opens, or restored session state), so ordinary same-session clicks no longer get repeated `tab list` probes. This keeps `details.navigationSummary`, overlay blocker checks, and drift recovery intact while avoiding the upstream `agent-browser 0.27.0` sequence that could report later clicks as successful without dispatching pointer/click events after repeated getter/tab/snapshot probes. Fake coverage: `agentBrowserExtension enriches click results with a post-navigation title and url summary`, `agentBrowserExtension pins the intended tab inside a follow-up command when reconnect drift would otherwise steal focus`, and about-blank/tab overlap assertions in [`test/agent-browser.extension-tabs.test.ts`](../test/agent-browser.extension-tabs.test.ts); manual validation source: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt).

`RQ-0089` investigated remaining Sauce Demo no-op clicks after RQ-0086. Minimal direct-upstream probes against `agent-browser 0.27.0` reproduced the residual `Finish` behavior without the wrapper: both CSS `click [data-test="finish"]` and `find role button click --name Finish` returned success, but a page-level click listener recorded no click event and the URL stayed on `checkout-step-two.html` after a 1s wait; a separate cart-link check showed normal trusted click events and navigation when the app handled the event. Conclusion: the residual issue is upstream/site interaction rather than wrapper post-click probes. Runtime behavior stays thin/no site-specific DOM-click fallback; docs now state that click success is attempted-action evidence only and agents must verify important mutations with URL/text/state checks or fresh snapshots before continuing. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`pageChangeSummary`, `nextActions`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; manual evidence: direct-upstream RQ-0089 development probe plus [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) smoke caveats.

`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).

`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).

`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries a plain empty-object `data.result`; empty arrays such as `[]` are valid eval results and are not warned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.

`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.

`RQ-0080` adds no-op scroll recovery for dense dashboards and nested panes: for successful top-level `scroll`, `extensions/agent-browser/index.ts` samples viewport and prominent scroll-container positions before and after execution with read-only session-scoped `eval --stdin` probes. If no sampled position changes, it emits `details.scrollNoop`, appends visible `Scroll diagnostic: no observed scroll movement`, appends exact `inspect-after-noop-scroll` / `verify-noop-scroll-visually` next actions, and updates `pageChangeSummary.nextActionIds` so agents can branch without parsing prose. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`scrollNoop`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) scroll note; fake coverage: `agentBrowserExtension reports no-op scroll diagnostics with recovery next actions`.

`RQ-0081` adds focused-combobox recovery for dense dashboard controls: after successful explicit combobox-targeted actions (for example `semanticAction` role `combobox` click), `extensions/agent-browser/index.ts` runs a read-only focused-element probe and emits `details.comboboxFocus` plus visible `Combobox diagnostic` text when a combobox-like control is focused, has explicit `aria-expanded` state, and no visible listbox/options are open. It appends exact `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter` next actions, all session-prefixed when applicable. The probe is gated to explicit combobox targets to avoid ordinary-click false positives and preserves the original combobox semantic target even when active-session visible-ref resolution rewrites execution to `click @ref`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`comboboxFocus`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) combobox note; fake coverage: `agentBrowserExtension reports focused combobox diagnostics with option-opening next actions` and `agentBrowserExtension preserves combobox diagnostics after semanticAction visible-ref resolution`.

`RQ-0082` adds early recording dependency warnings: after successful `record start` / `record restart`, `extensions/agent-browser/index.ts` checks whether executable `ffmpeg` is visible on the Pi process `PATH`. If not, it emits non-blocking `details.recordingDependencyWarning` plus visible `Recording dependency warning: ffmpeg not found on PATH` text so agents can install `ffmpeg` or fix PATH before `record stop` needs to encode the WebM. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`recordingDependencyWarning`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) recording notes and README dependency table; fake coverage: `agentBrowserExtension warns after record start when ffmpeg is missing`.

`RQ-0083` documents a repeatable public Grafana stress checklist in [`RELEASE.md`](RELEASE.md#public-grafana-stress-checklist) instead of bundling private dogfood/VFR skills or adding a recipe runtime. The checklist uses Grafana Play Node Exporter Full to manually exercise dense snapshots, no-op scroll diagnostics, combobox recovery, screenshots/artifacts, optional short recording, network/console/error summaries, and cleanup. Treat known Grafana Play noise (analytics/Sentry requests, public-demo 403s, console errors) as site noise unless the wrapper leaks secrets, hides actionable rows, mishandles artifacts, or suggests unsafe follow-ups. Evidence should be a short release note or CueLoop task, not committed `.dogfood/` outputs, raw HARs, videos, or private scripts. Validation on 2026-05-15 used the native tool against Grafana Play: fresh open, dense `snapshot -i`, scroll, combobox semantic click, screenshots with verified artifacts, `network requests`, `console`, `close`, and host cleanup of `/tmp/pi-agent-browser-grafana-rq0083*.png`; observed 11 public-demo 403 request rows and Grafana console noise as expected site noise.