# Timebox — Specification

A session-budget extension for AI coding agents. Enforces wall-clock time, user-turn count, or both. Soft-nudges as the budget approaches, stops the agent for the current turn when exhausted, and persists state across reloads.

The spec is split into:

- **Core** (§2–§5) — host-agnostic logic. No host SDK imports. Drives all tests.
- **Host adapter contract** (§6) — interface every adapter implements.
- **pi-coding-agent adapter** (§7) — v1 release target.
- **Other adapters** (§8) — Claude Code is planned next; see [FUTURE-CLAUDE.md](./FUTURE-CLAUDE.md).

## 1. Goals

1. Help the user cap how long / how many turns they spend on a task.
2. Give the agent a chance to wrap up gracefully (soft nudge) before it gets cut off.
3. Survive reload/resume without losing the budget.
4. Stay portable across host agents — pi today, Claude Code next.

## 1.1 Project conventions (non-negotiable)

- **No single-file extension.** Code lives under `src/` split into a host-agnostic `core/` and one folder per host adapter under `src/adapters/<host>/`. One responsibility per file.
- **Linting & formatting** via [Biome](https://biomejs.dev/) (tab indent width 2, line width 120, recommended rules). Biome runs on `npm run check` and in CI.
- **`.editorconfig`** at the repo root so editors honor the same indent / EOL / charset rules without relying on Biome.
- **Tests** use Vitest (matches pi-mono). Target: **as close to 100% line + branch coverage as possible without invoking a real LLM**. Coverage is enforced via `vitest --coverage` thresholds; a CI run that drops below the threshold fails. The agent runtime, the network, and any LLM provider are fully faked — the test suite must be hermetic and run in <5s on a laptop.
- **TypeScript strict mode**, ESM-only, Node ≥20 (matches pi-mono).

## 2. Core data model

```ts
interface TimeboxBudget {
  timeLimitMs:   number | null;   // null = no time limit
  turnLimit:     number | null;   // null = no turn limit
  startTime:     number;          // Date.now() at /timebox set
  startTurn:     number;          // user-message count at /timebox set
  softNudgeSent: boolean;
  active:        boolean;
  onStopCommand: string | null;   // shell command to run on hard stop, or null
}
```

A **turn** is one user message in the session transcript. Active-budget turn count = `(total user messages) - startTurn`, clamped ≥0.

**Elapsed** = `Date.now() - startTime`. No pause/resume.

## 3. Core parsing rules

- Time tokens: `^\d+(\.\d+)?\s*(s|m|h)?$`, default unit minutes.
- Turn token: `^turns:\s*\d+$` (case-insensitive).
- Multiple tokens are space-separated; later tokens override earlier ones for the same dimension.
- Optional `--` delimiter splits the input: tokens before it form the budget, the literal string after it is the on-stop command. The trailing `--` with no command is ignored. A command with no budget is `invalid`.
- Sub-commands `off|disable|cancel|status` short-circuit parsing.
- If neither time nor turn token parses, return a usage error and do not modify state.

Pure helpers (`parseTimeToMs`, `parseTurns`, `formatTime`) live in `core/parse.ts` and are independently unit-tested.

## 4. Core decisions

These functions take a `TimeboxBudget` plus the current clock + turn count, and return a decision. They have no side effects.

```ts
type Decision =
  | { kind: "ok" }
  | { kind: "nudge"; urgency: "warning" | "critical"; detail: string }
  | { kind: "stop"; usedTurns: number; elapsedMs: number; onStopCommand: string | null };
```

Rules:

- `stop` fires when **either** dimension has reached its limit (`elapsed >= timeLimitMs` or `turns >= turnLimit`).
- `nudge` fires when **either** ratio (elapsed/limit, turns/limit) is ≥0.8 and `softNudgeSent` is false. Urgency is `critical` at ≥0.95, else `warning`.
- Otherwise `ok`.

The status string ("12m left (15m budget) | 3 turns left (2/5)") is also produced by a pure formatter so it's testable without a host.

## 5. Core persistence shape

The core does not perform I/O. It produces and consumes serialisable records:

```ts
type TimeboxRecord =
  | { kind: "active"; budget: TimeboxBudget }
  | { kind: "off";    disabledAt: number };
```

Restore rule (newest-first scan):

1. First match `off` → do not restore.
2. First match `active`:
   - If `timeLimitMs !== null && elapsed >= timeLimitMs` → return `{ status: "expired" }`.
   - Otherwise → re-derive `softNudgeSent` from elapsed/turns and return `{ status: "restored", budget }`.

The adapter is responsible for *where* records live (custom session entries, JSON file, etc.).

## 6. Host adapter contract

Every adapter implements this surface so the same core drives all hosts:

```ts
interface BudgetStore {
  appendActive(budget: TimeboxBudget): Promise<void> | void;
  appendOff(disabledAt: number): Promise<void> | void;
  loadRecords(): Promise<TimeboxRecord[]> | TimeboxRecord[]; // chronological order
}

interface UIBridge {
  notify(message: string, level: "info" | "warning" | "error"): void;
  setStatus(text: string | undefined): void;
}

interface TranscriptBridge {
  countUserMessages(): number;          // total user-message turns in session
}

interface AgentBridge {
  abortCurrentTurn(): void;             // best-effort; cancels in-flight agent run
  appendSystemPrompt(extra: string): void; // applied to next agent start only
}
```

Plus an event surface the adapter wires to host events:

| Core event       | Adapter triggers when…                                                |
| ---------------- | --------------------------------------------------------------------- |
| `onSessionStart` | Session loads (cold start or reload).                                 |
| `onTurnStart`    | A new user turn begins, before the agent runs.                        |
| `onAgentStart`   | The agent is about to be invoked; adapter may inject extra prompt.    |
| `onShutdown`     | Session ends.                                                         |

The adapter also registers the `/timebox` command (or its host equivalent) and routes its parsed input through the core.

Notification levels are restricted to `"info" | "warning" | "error"` because that's the lowest common denominator across hosts (pi and Claude Code both support these; pi rejects `"success"`).

## 7. pi-coding-agent adapter (v1)

Maps the contract above onto `@mariozechner/pi-coding-agent` (verified against `pi-mono/packages/coding-agent/src/core/extensions/types.ts` and `session-manager.ts`).

| Contract piece              | pi binding                                                                 |
| --------------------------- | -------------------------------------------------------------------------- |
| `BudgetStore.appendActive`  | `pi.appendEntry("timebox-active", budget)`                                 |
| `BudgetStore.appendOff`     | `pi.appendEntry("timebox-off", { disabledAt })`                            |
| `BudgetStore.loadRecords`   | `ctx.sessionManager.getEntries()` filtered by `entry.type === "custom"` and matching `customType` |
| `UIBridge.notify`           | `ctx.ui.notify`                                                            |
| `UIBridge.setStatus`        | `ctx.ui.setStatus("timebox", text)`                                        |
| `TranscriptBridge.countUserMessages` | `ctx.sessionManager.getEntries()` filtered by `type === "message" && message.role === "user"` |
| `AgentBridge.abortCurrentTurn` | `ctx.abort()`                                                           |
| `AgentBridge.appendSystemPrompt` | return `{ systemPrompt: ctx.getSystemPrompt() + extra }` from `before_agent_start` |
| `onSessionStart`            | `pi.on("session_start", …)`                                                |
| `onTurnStart`               | `pi.on("turn_start", …)`                                                   |
| `onAgentStart`              | `pi.on("before_agent_start", …)`                                           |
| `onShutdown`                | `pi.on("session_shutdown", …)`                                             |
| `/timebox` command          | `pi.registerCommand("timebox", { description, getArgumentCompletions, handler })` |

Argument completions returned as `{ value, label }[]` (matches pi's `AutocompleteItem`).

### 7.1 User-facing surface (pi)

Command syntax (parsed by core):

| Invocation                  | Behavior                                                       |
| --------------------------- | -------------------------------------------------------------- |
| `/timebox 15m`              | 15-minute time budget.                                         |
| `/timebox 30s` / `2h` / `90`| Time budget. Bare number = minutes.                            |
| `/timebox turns:5`          | 5-turn budget.                                                 |
| `/timebox 15m turns:3`      | Combined budget.                                               |
| `/timebox 15m -- lmk done`  | Run `lmk done` as a detached shell command on hard stop.       |
| `/timebox status`           | Show current budget status (notification).                     |
| `/timebox off`              | Disable. Aliases: `disable`, `cancel`.                         |
| `/timebox` (no args / bad)  | Usage hint, leave existing budget untouched.                   |

Static completions: `off`, `status`, `15m`, `30m`, `1h`, `turns:3`, `turns:5`, `turns:10`.

Status bar string under the `timebox` key, refreshed every 1s while a budget is active:

```
Timebox: 12m 34s left (15m budget) | 3 turns left (2/5)
```

Single-dimension budgets show `no time limit` / `no turn limit` for the unset dimension.

### 7.2 Notifications

- `info`: budget set; `/timebox status` output; disabled; "no active timebox".
- `warning`: 80% threshold crossed; invalid arguments; restored budget already expired.
- `error`: budget spent (agent stops for this turn; chat continues).

### 7.3 Soft nudge & hard stop

Soft nudge (one-shot per budget):

1. Warning notification.
2. `softNudgeSent = true`.
3. On every subsequent `before_agent_start`, append a budget-warning block to the system prompt:
   - `IMPORTANT TIMEBOX WARNING` if worst ratio ≥0.8, `CRITICAL TIMEBOX WARNING` if ≥0.95.
   - Remaining time and/or turns.
   - Instruction to wrap up and stop.

Hard stop (at `turn_start` once limit is reached):

1. Error notification: `Timebox budget spent. Used N turns, Tm Ss. The agent stops for this turn. The chat continues.`
2. `budget.active = false`; status timer stops.
3. If `onStopCommand` is set, the adapter spawns it via the injected `runOnStop` runner. Default impl: detached `/bin/sh -c <cmd>` with stdio ignored, then `unref()` so it survives pi shutdown.
4. `ctx.abort()` cancels the in-flight turn.

The disabled budget is **not** auto-cleared — user must `/timebox off` or set a new one.

### 7.4 Edge cases (pi adapter)

1. New `/timebox` while one is active replaces it without confirmation. (Future: `ctx.ui.confirm`.)
2. `/timebox off` with no active budget: notify info, do not write a `timebox-off` record.
3. `/timebox status` with no active budget: usage hint.
4. Reload after expiry: warning notify, do not restore.
5. Reload after explicit off: do not restore.
6. `session_shutdown`: clear in-memory state, stop timer, clear status. Do not write records.
7. Hard stop fires at most once per budget.
8. Soft nudge fires at most once per budget.
9. Combined budget: nudge / stop fires when *either* dimension hits its threshold.
10. Status timer is idempotent (`startStatusTimer` clears any prior interval).
11. `onStopCommand` is persisted with the active record; `restore` normalises a missing field on legacy records to `null`.
12. The on-stop runner is fired once per hard stop. It does not fire on the soft nudge or on `/timebox off`.

## 8. Other adapters

Claude Code is next after the v1 pi release. See [FUTURE-CLAUDE.md](./FUTURE-CLAUDE.md) for the design sketch.

## 9. Implementation status

The pi adapter is implemented under `src/adapters/pi/` and matches §7 of this spec. Resolved during the rewrite:

- Notifications use only `"info" | "warning" | "error"` (host-supported levels).
- Parsing dead-code removed; bare numbers default to minutes only when not prefixed with `turns:`.
- Hard-stop messaging fires at the turn that breaches the limit; the abort prevents that turn from running.

## 10. Test plan

Tests use **Vitest** (matches pi-mono's runner) and run on Node 20 ESM. Implemented under `tests/`; current coverage is 100% lines / functions / statements and 96.95% branches across `src/`.

### 10.1 Core unit tests (no host)

`core/parse.test.ts`:

- `parseTimeToMs`
  - `15m` → 900_000; `30s` → 30_000; `2h` → 7_200_000; `1.5h` → 5_400_000.
  - `90` (no unit) → 5_400_000 (minutes default).
  - `15M` (uppercase) → 900_000.
  - `abc`, `""`, `15x` → `null`.
- `parseTurns`
  - `turns:5` → 5; `TURNS: 10` → 10.
  - `turns:` / `turns:abc` → `null`.
- `formatTime`
  - `0` → `0s`; `59_000` → `59s`; `60_000` → `1m 0s`; `3_600_000` → `1h 0m`; `3_661_000` → `1h 1m`.

`core/decide.test.ts`:

- `decide()` returns `ok` below 80%.
- Returns `nudge:warning` at exactly 80% time; `nudge:critical` at 95% turns.
- Returns `stop` at 100% time; at `turns >= turnLimit`.
- Combined budget: critical wins over warning; whichever ratio is worst sets urgency.

`core/persist.test.ts`:

- `restore([])` → `{ status: "none" }`.
- `restore([active])` with `now < startTime + timeLimitMs` → `{ status: "restored", budget }`.
- `restore([active])` past time limit → `{ status: "expired" }`.
- `restore([active, off])` → `{ status: "none" }` (off wins).
- `restore([off, active])` (active is newer) → `{ status: "restored" }`.
- `restore([active])` past 80% but under 100% → `softNudgeSent` re-derived to `true`.

### 10.2 Adapter behavioral tests (pi adapter, with fakes)

A fake `ExtensionAPI`/`ExtensionContext` records `notify`, `setStatus`, `appendEntry`, `abort`, exposes appended entries via `sessionManager.getEntries()`, lets the test fire `session_start`/`turn_start`/`before_agent_start`/`session_shutdown`, and uses an injectable clock (`Date.now` factory) so no test sleeps.

Cases:

1. `/timebox 15m` → one `appendEntry("timebox-active", …)`, info notify, status set.
2. `/timebox turns:5` → same, `turnLimit=5`.
3. `/timebox 15m turns:3` → both limits set.
4. `/timebox 15x` → warning notify, no `appendEntry`, no state change.
5. `/timebox status` before and after setting.
6. `/timebox off` with active budget → `appendEntry("timebox-off", …)`, status cleared, info notify.
7. `/timebox off` with no budget → info notify, no `appendEntry`.
8. Soft nudge at 80% time: advance clock, fire `turn_start` → warning notify, `softNudgeSent=true`. Second `turn_start` does not re-warn.
9. Soft nudge at 80% turns: append 4 user messages on `turns:5`, fire `turn_start` → warning notify.
10. Prompt injection after nudge: `before_agent_start` returns `{ systemPrompt }` containing `IMPORTANT TIMEBOX WARNING`, time/turn detail.
11. At ≥95% the injected prompt contains `CRITICAL TIMEBOX WARNING`.
12. Hard stop on time: clock past limit → error notify, `abort` called, `active=false`, status timer stopped.
13. Hard stop on turns: append (limit+1) user messages → same as #12.
14. Restore active budget on reload: pre-seed `timebox-active`, `now < expiry` → state restored, status set, info notify.
15. Restore expired budget: `now > expiry` → warning notify, no state restored.
16. Restore respects most recent off: `[active, off]` → no restore.
17. Restore re-derives nudge at 0.85 elapsed → `softNudgeSent=true`.
18. Shutdown clears state.
19. Idempotent timer start (mock `setInterval` count).
20. Replacing budget while active resets `softNudgeSent` to false.

### 10.3 Integration smoke (manual)

Load extension into pi, set `/timebox turns:2`, send three messages, observe nudge after #2 and stop on #3. Reload mid-budget, observe restoration notification.

## 11. Out of scope (v1)

- Pause / resume.
- `/timebox extend 5m`, `/timebox add turns:3`.
- Per-tool-call budgeting.
- Token budgets.
- Confirm dialog when overwriting an active budget.
- Cross-session aggregate budgets.
