---
name: ironbee-verification
description: >
  MANDATORY verification after code changes. Activates when implementing features, fixing
  bugs, modifying UI components, API endpoints, styles, refactoring, or any task that
  changes application behavior. Verification runs in cycles activated by file-pattern
  match — which cycles are wired up for this project is shown in the platform sections
  near the bottom of this file. After every code edit you MUST verify the affected
  cycle(s) through real tools and submit a single verdict (pass or fail) before
  reporting task completion. If verification fails, submit the fail verdict first,
  then fix.
---

# IronBee Verification

## Rule
No task is complete until changes are verified — through **real tools**, not by reading code or inferring behavior. After verification, you MUST submit a verdict (pass or fail) before doing anything else. If verification fails, submit the fail verdict first, then fix.

## Cycles

IronBee runs verification in **cycles**. A single stop hook can drive multiple cycles in parallel — every active cycle must pass for your task to complete.

You don't choose which cycle runs — the file pattern decides. A single edited file can match multiple cycles' patterns and activate them all. Cycles always run in parallel within a single stop run. Each cycle has its own tools, flow steps, and verdict fields.

**See the platform sections near the bottom of this file** for which cycles are active for this project, the tools they expose, and the per-cycle verdict fields you must include.

## Application lifecycle (your responsibility)

For every active cycle you manage the running application:
- **Build** if needed (`npm run build`, `docker compose build`, …)
- **Start** before navigating/connecting (`npm run dev`, `docker compose up -d`, …)
- **Stop** when verification is complete

If already running, skip start. If the build fails, fix it before proceeding.

**Don't guess ports.** After starting, check the actual port via `docker compose ps`, process output, or config files.

## Verify end-to-end — trace the blast radius (don't stop at the edited file)

A change's defect most often surfaces not on the edited file's own surface but in a **downstream consumer** of what the change produces — wherever its output is read back, stored, rendered, or acted on. Before driving tools, spend ONE quick pass reading/searching the code to map the blast radius: identify what the change produces and which other surfaces consume it, then exercise the FULL flow from where the change is produced through to where its effect is observable — not only the surface the edited file owns. A feature that works at its source but breaks in a downstream consumer is a **FAIL**.

This holds even when the consumer was not itself edited: the place you should have updated but didn't never appears in the changed-files list, so don't let that list bound your verification — **follow the data, not the diff.** Keep the mapping quick (a focused scan, not a full audit) so it doesn't eat the speed budget.

## Universal flow

1. Implement your changes (write/edit code).
2. **Start verification** (one cycle covers every active mode — every active cycle's flow runs within the same verification cycle):
   ```
   echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start
   ```
   Devtools tools are blocked without this.
3. Build and start the application if not already running.
4. **Run the per-cycle flows for every active cycle.** See the platform sections near the bottom of this file — each enabled cycle's section has its own flow steps and mandatory tools. All active cycles must be exercised within this one verification cycle.
5. Stop the dev server when verification is complete (every cycle — including the final one).
6. **Honor any cycle-specific teardown** noted in the platform sections (e.g. recording stop) BEFORE submitting your verdict.
7. **Submit your verdict immediately** — do NOT edit any code first:
   ```
   echo '<verdict-json>' | ironbee hook submit-verdict
   ```
   - Verdict shape is platform-agnostic: `status`, `checks`, optionally `issues` / `fixes`. One verdict regardless of how many cycles ran.
   - Pass → `{ "session_id": "...", "status": "pass", "checks": [...] }`
   - Fail → add `"issues": [...]` describing what failed.
   - Pass after a previous fail → add `"fixes": [...]` describing what was repaired.
   - **A FALSE failure is a FAIL — not "verified failure handling".** When you exercise a negative path, separate an EXPECTED negative test (you deliberately fed invalid input — bad card, missing auth, malformed payload — and it correctly failed → supports a `pass`) from a FALSE failure (a VALID, in-scope operation that SHOULD succeed but errors out → a DEFECT). Report a false failure as `status: "fail"` (or at minimum non-empty `issues`), never as a passing "failure path verified". Passing a run whose own evidence shows a legitimate operation breaking is a false pass.
   - **Nothing to verify? Use N/A — never fake evidence.** When the change has no runtime surface (type-only edit, behavior-neutral refactor, config/docs that still tripped a cycle): global `{ "session_id": "...", "status": "not_applicable", "reason": ["why there's no runtime surface"] }` (no `checks`), or per-platform on a pass/fail verdict `"not_applicable_cycles": ["browser"], "reason": ["server-only change"]` to exempt some cycles while verifying others. `reason` is REQUIRED (recorded + observable); strict mode rejects N/A. Base "nothing to verify" on the FULL change set (the change is often already COMMITTED) — check `git diff HEAD~1 HEAD --stat`, not just a clean `git status`, before declaring N/A.
   - **The stop hook enforces that you called the required tools for every active (non-exempt) cycle and that a pass/fail verdict carries non-empty `checks`.**
8. If failed → fix → rebuild → go back to step 2 → repeat until pass.

## Speed — batch your tool calls (fewer LLM round-trips)

Each tool call is a separate LLM round-trip, and that round-trip — not the tool's execution
— is the dominant cost of a verification. Drive the tools in as few turns as you can:

- **Batch a scope's work into ONE `MCP:*_execute` call.** Each cycle exposes a batch tool
  (`MCP:bdt_execute` / `MCP:ndt_execute` / `MCP:bedt_execute` / `MCP:adt_execute`) that runs
  many steps in one turn — nest each as a `callTool('<tool>', { … })`. A batch nests only
  that cycle's own tools (you can't mix servers in one `*_execute`). It's a JS sandbox, so a later step
  can reuse a value an earlier `callTool` returned
  (`const r = callTool(…); callTool(…, { /* a field from r */ })`); and `*_execute` STOPS at
  the first failing nested call, so the rest don't run. Nested calls are credited to the gate like
  standalone calls — but authoring the batch is not the work: read each result and confirm
  real evidence came back (a batch whose interaction failed has no screenshot/snapshot
  behind it). See each platform section for that cycle's concrete batch shape, including any
  cycle-specific screenshot or recording handling.
- **Discovery stays standalone — you can't batch what you haven't seen.** The step that
  reveals what to do (navigate / connect / snapshot) runs first and on its own; you read its
  result, THEN batch the actions it told you to take.

<!--IRONBEE:PLATFORM:browser-->
<!--/IRONBEE:PLATFORM:browser-->

<!--IRONBEE:PLATFORM:node-->
<!--/IRONBEE:PLATFORM:node-->

<!--IRONBEE:PLATFORM:backend-->
<!--/IRONBEE:PLATFORM:backend-->

<!--IRONBEE:PLATFORM:android-->
<!--/IRONBEE:PLATFORM:android-->

<!--IRONBEE:PLATFORM:terminal-->
<!--/IRONBEE:PLATFORM:terminal-->

## Important
- **Always submit a verdict after every verification attempt** — both pass AND fail. Fail verdicts are tracked for analytics.
- The stop hook checks that the required tools were used for every active cycle and that the verdict carries non-empty `checks`.
- Submit verdicts via `ironbee hook submit-verdict`, never write `verdict.json` directly.
- Every file edit automatically clears your session's verdict.
- After 3 failed verification attempts, you may complete but must report unresolved issues.

## Sub-agent teams
- Sub-agents (spawned via the `Task` tool) focus on implementation only — do NOT verify.
- The main orchestrator agent verifies ALL changes after sub-agents complete.
- Each session's verification is isolated via session-specific verdict files.