---
name: ts-find-bad-type
goal: Locate the single deliberate TypeScript type error in the fixture and cite file + line + the nature of the mismatch.
setup: "./fixtures/ts-bad-type"
prompt: |
  This repository has exactly one TypeScript type error, and it lives in
  src/app.ts. Read that file, identify the offending line, and report
  the file path, the line number, and a one-sentence explanation of why
  the assignment is a type error. Keep the total report to two sentences.
responses:
  - blocks:
      - { kind: thinking, thinking: "The user pinned the file to src/app.ts. Read it before answering." }
      - { kind: toolCall, name: read, arguments: { path: "src/app.ts" } }
    stopReason: toolUse
  - blocks:
      - { kind: text, text: "The type error is in src/app.ts on line 2: `export const userName: string = 42` assigns the number literal 42 to a variable whose declared type is `string`. The right-hand side must be a string (or the declared type must be `number`) for the file to type-check." }
    stopReason: stop
success:
  - { type: toolCallCount, name: read, min: 1, max: 1 }
  - { type: outputContains, substring: "src/app.ts" }
  - { type: outputContains, substring: "line 2" }
  - { type: outputContains, substring: "number" }
  - { type: outputContains, substring: "string" }
---

# Notes

Primary bench task for `oh-my-pi-typescript`. A two-step scripted
exchange:

1. The assistant reads `src/app.ts` via the built-in `read` tool. We
   pin `toolCallCount: min=1, max=1` so a future change that accidentally
   replays the file into the prompt (skipping the tool) fails loudly —
   the point of the measurement is to exercise the tool-use path.
2. The assistant emits a two-sentence diagnosis. Four `outputContains`
   rules force the diagnosis to cite the file path, the line number,
   and both sides of the type mismatch — no hand-wavy answers pass.

`setup:` points at the adjacent `fixtures/ts-bad-type` directory, which
contains a minimal `src/app.ts` with one deliberate type error. The
runner copies the fixture into a fresh temp cwd for each run, so
`fileExists`-style judges and any tool side effects can never leak
between runs.

Reproducibility gate (concept-v0 §4.1): the faux provider replays a
fixed script, so `stddev(totalTokens) / mean` should sit well under the
5% threshold across three runs. Running `omp bench --write` stamps
`pi.ohmypi.bench.lastResult` with the mean.
