---
name: ironbee-verification
description: >
  MANDATORY verification after code changes. Activates when implementing features, fixing
  bugs, modifying UI components, API endpoints, styles, refactoring, or any task that changes
  application behavior. After editing code you MUST verify the affected cycle(s) through real
  tools and submit a single verdict (pass or fail) before reporting task completion. You drive
  the verification tools yourself — they are wired into this session.
---

# IronBee Verification

> **You verify INLINE.** This project runs IronBee in **main-agent** mode: the devtools tools
> (`mcp__browser-devtools__*` / `mcp__node-devtools__*` / `mcp__backend-devtools__*` /
> `mcp__android-devtools__*`) are wired into THIS session — you drive them yourself. There is no
> verifier sub-agent to delegate to.

## Rule
No task is complete until your changes are verified through **real tools**, not by reading code or
inferring behavior. After verifying, you MUST submit a verdict (pass or fail) before doing
anything else. If verification fails, submit the fail verdict first, then fix and re-verify.

## Session id
IronBee's SessionStart hook injects your **Session ID** into your context at the top of the
session (a line `Session ID: <sid>`). Author it in every `ironbee hook` command's JSON.

## Cycles
IronBee runs verification in **cycles**. A single Stop hook can drive multiple cycles in
parallel — every active cycle must pass for your task to complete. You don't choose which cycle
runs — the edited file's path decides (pattern match). **See the platform sections near the
bottom of this file** for which cycles are active for this project, the tools they expose, and
the per-cycle flow.

## Universal flow

1. Finish your code edits (`apply_patch`).
2. **Start verification — run this ALONE first**, before any devtools call:
   ```
   echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start
   ```
   Devtools tools are blocked until this opens the cycle. **Codex runs shell commands and MCP
   tools in separate lanes**, so a same-message ordering of this shell command before a devtools
   call is NOT guaranteed — a devtools call that lands first is blocked. Once the cycle is open,
   independent MCP calls can ride one message.
3. Build and start the application **only if it isn't already running** (check `docker compose ps`
   / process output / config — don't guess ports). Track whether YOU started it.
4. **Run the per-cycle flows for every active cycle** (see the platform sections below). All
   active cycles must be exercised within this one verification cycle.
5. **Teardown** — stop ONLY what you started (the dev server / container you launched for this
   verification); never stop a server that was already running. Honor any cycle-specific teardown
   (e.g. stop an active screen recording) BEFORE the verdict.
6. **Submit your verdict immediately** — do NOT edit any code first:
   ```
   echo '<verdict-json>' | ironbee hook submit-verdict
   ```
   - Platform-agnostic shape: `session_id`, `status`, `checks`, optionally `issues` / `fixes`.
     One verdict regardless of how many cycles ran.
   - Pass → `{ "session_id": "...", "status": "pass", "checks": [...] }`
   - Fail → add `"issues": [...]` describing what failed.
   - Pass after a previous fail → add `"fixes": [...]` describing what was repaired.
   - **A FALSE failure is a FAIL — not "verified failure handling".** Separate an EXPECTED
     negative test (you deliberately fed invalid input and it correctly failed → supports a
     `pass`) from a FALSE failure (a VALID, in-scope operation that SHOULD succeed but errors out
     → a DEFECT → `status: "fail"`).
   - **Nothing to verify? Use N/A — never fake evidence.** When the change has no runtime surface
     (type-only edit, behavior-neutral refactor, config/docs that still tripped a cycle): global
     `{ "session_id": "...", "status": "not_applicable", "reason": ["why"] }` (no `checks`), or
     per-platform on a pass/fail verdict `"not_applicable_cycles": ["browser"], "reason": ["..."]`
     to exempt some cycles while verifying others. `reason` is REQUIRED. Strict mode rejects N/A.
     Base "nothing to verify" on the FULL change set (the change is often already COMMITTED) —
     check `git diff HEAD~1 HEAD --stat`, not just a clean `git status`.
7. If failed → fix → rebuild → go back to step 2 → repeat until pass (up to the retry cap).

## Speed — batch your tool calls (fewer LLM round-trips)

Each tool call is a separate LLM round-trip, and that round-trip — not the tool's execution — is
the dominant cost. Drive the tools in as few turns as you can:

- **Batch a scope's work into ONE `*_execute` call.** Each cycle exposes a batch tool
  (`mcp__browser-devtools__bdt_execute` / `mcp__node-devtools__ndt_execute` /
  `mcp__backend-devtools__bedt_execute` / `mcp__android-devtools__adt_execute`) that runs many
  steps in one turn — nest each as a `callTool('<tool>', { … })`. A batch nests only that cycle's
  own tools. It's a JS sandbox, so a later step can reuse a value an earlier `callTool` returned;
  `*_execute` STOPS at the first failing nested call. Authoring the batch is not the work — read
  each result and confirm real evidence came back.
- **Discovery stays standalone** — the step that reveals what to do (navigate / connect / snapshot)
  runs first and on its own; THEN batch the actions it told you to take.

<!--IRONBEE:PLATFORM:browser-->
<!--/IRONBEE:PLATFORM:browser-->

<!--IRONBEE:PLATFORM:node-->
<!--/IRONBEE:PLATFORM:node-->

<!--IRONBEE:PLATFORM:backend-->
<!--/IRONBEE:PLATFORM:backend-->

<!--IRONBEE:PLATFORM:android-->
<!--/IRONBEE:PLATFORM:android-->

<!--IRONBEE:PLATFORM:terminal-->
<!--/IRONBEE:PLATFORM:terminal-->

## Important
- **Always submit a verdict after every verification attempt** — both pass AND fail.
- Submit verdicts via `ironbee hook submit-verdict`, never write `verdict.json` directly.
- Every `apply_patch` edit automatically clears your session's verdict — re-verify after edits.
- After the retry cap, you may complete but must report unresolved issues.

## BANNED
- Reporting a task complete without verifying your changes through the real tools.
- Claiming verification passed based on code reading, assumptions, or prior knowledge.
- Submitting `pass` when your own evidence shows a legitimate operation breaking.
