# Llamafiles Extension — Specification

> Status: **DRAFT v1** — to be reviewed before writing the implementation plan.

This document specifies the behavior of the `llamafiles` pi extension. It is the single source of truth for what the extension must and must not do. The implementation plan and the code derive from this document.

---

## 1. Purpose

[Llamafile](https://github.com/Mozilla-Ocho/llamafile) distributes local LLMs as self-contained executables that, when run with `--server`, expose an OpenAI-compatible HTTP API. Pi can already talk to OpenAI-compatible servers; what it cannot do today is **start, stop, and supervise** the local process that serves the model.

The `llamafiles` extension adds that lifecycle layer:

- The user declares llamafile-served models in `~/.pi/agent/models.json` under a `llamafiles` provider, including the executable and arguments.
- Selecting one of those models via `/model` starts the corresponding process and waits until its HTTP server is ready.
- Switching to another model — llamafile or otherwise — stops the running process.
- Quitting pi stops the running process.

The extension does **not** modify pi's request path: once the server is up, pi talks to it through the normal OpenAI-completions code path.

---

## 2. Goals and non-goals

### Goals

- G1. Process supervision: one running llamafile process per pi session at most, owned by this extension.
- G2. Declarative configuration in `~/.pi/agent/models.json`, alongside other providers.
- G3. Custom executables: support either a llamafile binary directly, or a generic `llamafile` server binary fed with a `.gguf` model file via `args`, or any other command the user wants.
- G4. Per-model HTTP port, defaulting to `8080`.
- G5. Readiness check before pi tries to use the server.
- G6. Adopt an already-running compatible server instead of failing.
- G7. Clean shutdown (SIGTERM, then SIGKILL after a grace period).
- G8. Visible status via a `/llamafiles` command and footer status messages.

### Non-goals

- N1. Running multiple llamafile processes concurrently. Only one is alive at any time per pi session (but we could have multiple pi sessions working with different llamafiles answering at different ports).
- N2. (at least for now) Running models on remote machines (SSH, containers, etc.). All processes are local.
- N3. Cross-session coordination. Two pi processes pointing at the same port collide; we do not arbitrate.
- N4. (at least for now) Downloading or installing llamafile binaries.
- N5. Log rotation. Logs append indefinitely; users manage them out-of-band.
- N6. Automatic crash recovery / restart. If a llamafile dies after becoming ready, we surface it and stop tracking it. The user can `/model` again to restart.
- N7. GPU scheduling, memory budgeting, or model-size warnings.

---

## 3. Configuration

### 3.1 Location

- File: `~/.pi/agent/models.json` (the same file pi uses for custom models).
- Provider key: `llamafiles`.
- The extension reads this file **synchronously at extension load time** (and on `/reload`). It does not watch for changes; edits between loads do not take effect until reload.

### 3.2 Schema

Under `providers.llamafiles`:

| Field | Required | Default | Description |
|---|---|---|---|
| `baseUrl` | No | `http://localhost:8080/v1` | Default base URL when a model has no `port`. Used only as a fallback display value; the actual URL per model is derived from its `port`. |
| `api` | No | `"openai-completions"` | Forwarded to pi. Other API types are not tested. |
| `apiKey` | No | `"llamafiles-not-needed"` | Forwarded to pi. Llamafile servers ignore the key. |
| `compat` | No | `{ supportsDeveloperRole: false, supportsReasoningEffort: false, maxTokensField: "max_tokens" }` | Forwarded to pi. |
| `models` | Yes | — | Array of model entries; see below. |

Each entry in `models[]`:

| Field | Required | Default | Description |
|---|---|---|---|
| `id` | Yes | — | Unique model identifier. Passed to the OpenAI API as the `model` field. Must match what `GET /v1/models` on the running server returns (see §4.3). |
| `name` | No | `id` | Human-readable label shown in `/model`. |
| `command` | Yes | — | Executable to run. Either an absolute path or a name resolved via `PATH`. May be a llamafile binary itself, the generic `llamafile` server binary, `sh`, etc. |
| `args` | No | `[]` | Argument array passed to `command`. The extension does **not** inject `--server`, model paths, or any other argument — the user supplies them explicitly. **Template substitution**: any `args` element containing the literal token `{{port}}` is replaced at spawn time with the value of the `port` field (see §5.5.1). No other tokens are substituted in v1. |
| `port` | No | `8080` | Canonical TCP port for this model. Single source of truth: used to build pi's base URL (`http://localhost:<port>/v1`), to probe for adoption, and to substitute into `{{port}}` tokens in `args`. If the binary needs a port argument, the user references `{{port}}` in `args` — they do **not** repeat the literal number. |
| `env` | No | `{}` | Extra environment variables merged on top of pi's environment for the spawned process. Existing keys are overridden by this map. |
| `cwd` | No | the pi process's `cwd` | Working directory for the spawned process. |
| `reasoning` | No | `false` | Forwarded to pi. |
| `input` | No | `["text"]` | Forwarded to pi (`["text"]` or `["text", "image"]`). |
| `contextWindow` | No | `131072` | Forwarded to pi. |
| `maxTokens` | No | `8192` | Forwarded to pi. |
| `cost` | No | all zeros | Forwarded to pi. |

Fields not listed above are passed through to pi unchanged.

### 3.3 Validation rules

- The extension **does not** validate that `command` is executable or that `args` make sense; it relies on the spawn failing loudly.
- If `models.json` is missing, unparseable, or has no `providers.llamafiles.models`, the extension logs a warning and exits its factory cleanly (does not register the provider). Pi still starts normally.
- Two models with the same `id` are a configuration error. The extension keeps the first occurrence and warns about subsequent duplicates.

---

## 4. Provider registration

### 4.1 What gets registered

On load, the extension calls `pi.registerProvider("llamafiles", { ... })` with:

- `baseUrl`: `http://localhost:8080/v1` (display default; real URL per model below).
- `apiKey`, `api`, `compat`: from §3.2 defaults (or user override).
- `models`: one entry per configured model, with **per-model `baseUrl`** set to `http://localhost:<port>/v1`.

Custom fields (`command`, `args`, `port`, `env`, `cwd`) are kept in the extension's in-memory model map. They are **not** forwarded to pi's provider registration, because pi does not understand them.

### 4.2 Per-model base URLs

When a model has a non-default `port`, the registered model entry must override `baseUrl` so pi sends requests to the correct port. If pi's `registerProvider` accepts per-model `baseUrl` (as documented in `models.md`), the extension uses that. If during implementation we discover it does not, we surface the conflict and revisit this spec.

### 4.3 Model id ↔ server contract

The `id` declared in `models.json` is the id pi sends in the OpenAI-style `model` field of each request. The running llamafile server determines what model id it advertises in `GET /v1/models`. For adoption (§5.3) the extension matches on this id. **If they differ, adoption fails and the extension starts a fresh process.** Users are responsible for making them match.

---

## 5. Process lifecycle

### 5.1 State machine

The extension keeps at most one `running` record:

```
{ modelId, port, proc, logStream, startedAt, status: "starting" | "ready" | "exited" }
```

States:

- `null` (no process owned).
- `starting`: spawn issued, readiness not yet confirmed.
- `ready`: readiness confirmed at least once.
- `exited`: the process emitted `exit` after being `starting` or `ready`. Cleaned to `null` after notification.

### 5.2 Triggering events

The extension reacts to:

- `session_start` (any `reason`). Used to start/adopt a llamafile model that is already the active model at session boot (§5.3).
- `model_select` with `event.source ∈ {"set", "cycle", "restore"}`. All three trigger the same handler (§5.4).
- `session_shutdown` (for any `reason`). See §5.6.

### 5.3 Session start

On `session_start`, the extension inspects the currently active model via `ctx.model` (or equivalent). If `ctx.model?.provider === "llamafiles"`, it runs the same flow as model selection (§5.4) against `ctx.model.id`. This covers the "pi launched with a llamafile model preselected" case, including:

- Fresh launch with the last-used model restored from settings.
- `/resume` or `/fork` that restores a llamafile model.

If the active model is not a llamafiles model, this handler does nothing.

If `model_select` fires immediately after `session_start` for the same model (because pi also emits it on restore), the handler in §5.4 is a no-op (running record matches and a health probe succeeds).

### 5.4 Selecting a llamafile model

On `model_select` where `event.model.provider === "llamafiles"`:

1. Look up the model config by `event.model.id`. If not found → notify error, return.
2. If `running` is non-null:
   - If `running.modelId === modelId` and a fresh health probe succeeds → notify "already running", return.
   - Otherwise → stop the running process (§5.6.1) before continuing. The mid-session stop does not prompt (we own it); see §5.6.1 for adopted-process rules.
3. **Adoption probe**: `GET http://localhost:<port>/v1/models` with a 1-second timeout.
   - On success, parse the JSON and look for an entry whose `id === modelId`.
     - If found → set `running = { adopted: true, modelId, port, proc: null, ... }`, mark `ready`, notify "adopted existing server on port `<port>`", return.
     - If not found → a foreign server is on the port. Prompt the user (`ctx.ui.confirm`, default **No**): *"Port `<port>` is in use by a different server (advertises `<otherId>`). Stop it and start `<name>`?"*
       - If the user declines, or `ctx.hasUI` is false (print/JSON mode): notify error, do not kill, return.
       - If the user accepts: we still cannot send signals to a process we did not spawn through pi. We cannot reliably identify and kill an unknown PID listening on a port without additional permissions and platform-specific lookups (`lsof`, `ss`, `netstat`), and killing the wrong process is dangerous. Therefore the prompt's positive branch is **also a no-op in v1**: it surfaces an error explaining the user must free the port manually (e.g., `kill $(lsof -t -i:<port>)`) and re-select the model. The prompt exists so the user understands the situation; it does not authorize an opaque kill.
   - On connection refused / timeout → continue to spawn (§5.5).
4. Spawn (§5.5), wait for readiness (§5.7). Both work together: §5.5 launches the process, §5.7 polls until ready or rejects.
5. On success: notify "ready", set footer status.
6. On failure: clear footer status, notify error including log path.

### 5.5 Spawning

#### 5.5.1 Argument templating

Before spawning, the extension transforms `config.args` into a `resolvedArgs` array:

- Each element is processed independently.
- Within each element, every literal occurrence of the substring `{{port}}` is replaced with `String(config.port)`. Substitution is by exact match on `{{port}}` (no whitespace tolerance, no escaping syntax) and is applied to the whole string element, so `--port={{port}}` and `--port`, `{{port}}` both work.
- Elements that do not contain `{{port}}` are left unchanged.
- No other tokens (`{{cwd}}`, `{{modelId}}`, etc.) are substituted in v1. An args element containing an unrecognized `{{…}}` is **not** an error — it is passed through verbatim.

The extension does not inspect `resolvedArgs` further. In particular, it does not verify that the binary will actually listen on `config.port`. Mismatches between the declared `port` and what the binary ends up listening on surface naturally as readiness-probe failures.

#### 5.5.2 Spawn

- Use `node:child_process.spawn(config.command, resolvedArgs, { cwd, env, stdio: ["ignore", "pipe", "pipe"] })`.
- `env` = `{ ...process.env, ...config.env }`.
- `cwd` defaults to the current pi cwd if the config omits it.
- The `proc.stdout` and `proc.stderr` streams are piped to the log file (§7).
- The extension stores `running = { modelId, port, proc, logStream, startedAt: Date.now(), status: "starting", adopted: false }`.

### 5.6 Stopping

#### 5.6.1 Mid-session stops (switching model, or replacing a stale record)

Triggered when:

- A different model (any provider) is selected and `running` is non-null.
- A new `model_select` for the same llamafiles provider but a different `modelId`.
- A new start needs to replace a `running` record whose process has exited.

No prompt. The user signalled intent by changing the model.

1. If `running.adopted === true`: just clear `running`. We did not start it; we will not stop it.
2. Else if `proc.exitCode !== null` or `proc.killed`: clear `running`.
3. Else:
   1. Send `SIGTERM`.
   2. Wait for `exit` up to **5 seconds**.
   3. If still alive, send `SIGKILL` and resolve regardless.
4. After exit, the log stream is flushed and closed.
5. `running` is set to `null`.

#### 5.6.2 `session_shutdown`

Triggered when pi is exiting or replacing its extension runtime (`reason` ∈ `{"quit","reload","new","resume","fork"}`). Behavior depends on `reason`.

**`reason === "quit"`** — the user is leaving pi:

1. If `running` is null: nothing to do.
2. If `running.adopted === true`: clear tracking silently. We never stop processes we adopted; the user manages those out-of-band.
3. Else, **always stop** the owned process. There is no prompt. Pi's interactive mode tears down the TUI **before** firing `session_shutdown` (see [NOTES.md](NOTES.md) for the exact code path), so any `ctx.ui.confirm()` call here has no rendering surface and cannot interact with the user. A prompt is therefore not possible.

   To give the user visibility into the wait, the handler writes a progress line to `process.stderr` immediately before stopping, and a confirmation line after. `stderr` is used so RPC mode's JSONL `stdout` channel is not polluted:

   ```
   Stopping llamafile "<name>" (pid <pid>)...
   Stopped llamafile "<name>".
   ```

   If the stop sequence escalates from `SIGTERM` to `SIGKILL` (§5.6.1 step 3), the final line reflects that:

   ```
   Stopped llamafile "<name>" (forced after 5s).
   ```

   No progress line is printed for `adopted` records.

**`reason` ∈ `{"reload","new","resume","fork"}`** — the extension runtime is being torn down but pi continues:

1. Do **not** stop the process.
2. Close the log stream and clear in-memory tracking. The OS process keeps running.
3. The freshly loaded extension instance picks the process back up on its next `session_start` via the adoption fast-path (§5.4 step 3, "same id" branch), so the user transparently keeps their model warm across `/reload`, `/new`, `/resume`, and `/fork`.

Rationale: these reasons are transient teardown events from pi's perspective, not "the user is done with this model" signals. Stopping the process at every `/reload` would be a UX regression. Adoption on the other side closes the loop.

> **Why no confirm?** This spec previously defined a "Stop llamafile? [Y/n]" prompt on quit. Implementation found it impossible to render: pi calls `ui.stop()` (TUI teardown) *before* emitting `session_shutdown`, so any dialog is invisible and the only thing the user perceives is a long silent pause. The visible-stderr-progress design preserves the original intent ("the user knows what is happening") and is honest about the constraint we cannot work around without upstream changes to pi.

### 5.7 Readiness check

After spawning, the extension polls `GET http://localhost:<port>/v1/models` every **1 second** until one of:

- The response is HTTP 2xx → mark `ready`, resolve.
- The process emits `exit` before ready → reject with exit code/signal and log path.
- 120 seconds elapse → kill the process and reject with a timeout error.

The readiness path is fixed (`/v1/models`); we are not making it configurable in v1.

### 5.8 Crash after ready

If `proc` emits `exit` while `running.status === "ready"`:

- Notify error including exit code/signal and log path.
- Set `running = null`.
- Do **not** auto-restart. The user must `/model` again.

### 5.9 Abort signals

If pi cancels (e.g., user presses Esc during a startup that runs inside `model_select`):

- The readiness poll loop should observe `ctx.signal` when available and abort.
- The spawned process is killed via the mid-session stop procedure (§5.6.1).
- The handler returns without notifying success.

---

## 6. UI surface

### 6.1 Notifications

- `info`: "Already running: `<name>`" when we already track this model and a fresh health probe succeeds.
- `info`: "Adopted existing `<name>` on port `<port>`" when an external server advertises the same id.
- `success`: "`<name>` is ready" after a fresh start.
- `error`: any failure (config lookup, foreign server on port, spawn error, timeout, post-ready crash). Include the log path when one exists.

### 6.2 Dialogs

- Foreign server on port (§5.4 step 3, "not found" branch): `confirm("Port already in use", "Port <port> is in use by a different server (advertises <otherId>). Stop it and start <name>?")`. Default **No**. v1 outcome regardless of answer: notify the user to free the port manually; do not kill.
- **No dialogs on shutdown.** See §5.6.2 for why a confirm cannot render here. The handler writes progress lines to `process.stderr` instead.

### 6.2.1 Stderr progress (shutdown only)

The shutdown handler is the one place in the extension that writes to `process.stderr` directly rather than through `ctx.ui`. The reason is §5.6.2: the TUI is already gone. Exact lines:

- Before stopping: `Stopping llamafile "<name>" (pid <pid>)...\n`
- After graceful stop: `Stopped llamafile "<name>".\n`
- After SIGKILL escalation: `Stopped llamafile "<name>" (forced after 5s).\n`

No other lifecycle event writes to `process.stderr` from this extension.

### 6.3 Footer status

- While `starting`: `Starting <name>...`
- While `ready`: `<name> running`
- On any failure or after stop: cleared.

### 6.4 `/llamafiles` command

Prints one of:

- `No llamafile server is running`.
- `Running: <name> (id=<modelId>, port=<port>, pid=<pid|adopted>, status=<starting|ready>)`.

No arguments. No subcommands in v1.

---

## 7. Logging

- Directory: `~/.pi/llamafile_logs/` (created on first write, recursive `mkdir`).
- File: `<modelId>.log`, opened with append mode each start.
- Header line per start: `--- <ISO timestamp> Starting <name> ---`.
- Footer line on exit: `--- <ISO timestamp> Process exited (code=<code>, signal=<signal>) ---`.
- Both `stdout` and `stderr` of the spawned process are interleaved into the log.
- No rotation, no size cap. The user is expected to clear them manually.

---

## 8. Errors and edge cases

| Case | Behavior |
|---|---|
| `models.json` missing / invalid | Warn, do not register the provider, do not crash pi. |
| `providers.llamafiles.models` empty/missing | Same as above. |
| Duplicate `id` within models | Keep first, warn on duplicates. |
| `command` not found / not executable | Spawn raises `error` event; readiness rejects; notify with log path. |
| Port already in use, different model | Adoption probe sees mismatched `id`; prompt user (default No); v1 surfaces an error and never kills the port owner. |
| Port already in use, same model | Adoption succeeds; we treat it as `ready` and do not own the process. |
| Process never becomes ready (120s) | Kill, notify timeout, log path. |
| Process exits during startup | Reject readiness, notify, clear `running`. |
| Process exits after ready | Notify, clear `running`, no auto-restart. |
| User selects same model again | Health probe; if healthy, no-op with `info`. |
| User selects non-llamafile model while a llamafile is running | Stop the running process (no prompt mid-session); no other state changes. |
| Pi `quit` while we own a running process | Always stop. Print "Stopping llamafile ..." and "Stopped llamafile ..." to stderr. No prompt (pi's TUI is gone before the handler runs). |
| Pi `quit` while we have an adopted record | Clear tracking silently, never stop. |
| `/reload`, `/new`, `/resume`, `/fork` while we own a running process | No prompt. Detach silently; the new extension instance re-adopts via `session_start`. |
| `session_shutdown` (`quit`) while `starting` | Treat as "we own it"; print stderr progress and stop. May interrupt a still-loading model. |
| `session_shutdown` (non-`quit`) while `starting` | Detach silently. The process keeps loading. The next instance will adopt it once readiness completes. |
| Two pi instances with the same port | Out of scope; behavior is undefined and noisy. |
| Session start with a llamafile model preselected | Run the start/adopt flow (§5.3). |
| Session start with a llamafile model whose process is already healthy on the right port | Adoption fast-path (§5.4 step 3). |

---

## 9. Dependencies and packaging

- Imports from `@earendil-works/pi-coding-agent` (correct current package name). The existing file uses the older `@mariozechner/...` name and must be updated.
- Node built-ins only (`node:child_process`, `node:fs`, `node:os`, `node:path`).
- No new npm dependencies.
- `package.json` declares `@earendil-works/pi-coding-agent` as a peer dependency.

---

## 10. Out of scope (v1)

- Configurable readiness path or method (e.g., POST).
- Multiple concurrent llamafile processes.
- Remote (SSH) llamafile hosts.
- Hot-reload of `models.json` without `/reload`.
- Auto-restart on crash.
- Log rotation.
- Progress indication during readiness wait beyond the footer status (no spinner, no estimated time).
- `keepWarm: true` per-model flag.
- Killing the foreign process that occupies a model's port. We surface the conflict and instruct the user; we do not run `lsof`/`kill` for them.

---

## 11. Acceptance criteria

The extension is considered complete for v1 when, with a `models.json` containing two `llamafiles` models on different ports:

1. `pi --list-models` shows both models under the `llamafiles` provider.
2. Selecting model A via `/model` starts the process, footer reads `Starting A...` then `A running`, and pi can complete a chat turn against it.
3. Selecting model B mid-session stops A's process (no prompt), starts B's, and the chat continues against B.
4. Selecting `anthropic/<something>` mid-session stops B's process; `/llamafiles` reports nothing running.
5. Quitting pi (`Ctrl+D` or `/quit`) while model B is running prints `Stopping llamafile "B" (pid N)...` and `Stopped llamafile "B".` to stderr, then exits with no orphaned process.
5a. Running `/reload` while model B is running does **not** prompt; B keeps running; after the reload, `/llamafiles` again reports B as running (adopted).
6. Launching pi with model A preselected (e.g., previous session) auto-starts A on session boot, without an explicit `/model`.
7. Launching pi with model A preselected while A's server is already running externally results in adoption — no second process is spawned, and pi can complete a chat turn.
8. Manually starting model A's server outside pi, then selecting A inside pi, also results in adoption.
9. Starting an unrelated server on model A's port, then selecting A, prompts the user; v1 always declines to kill and surfaces a clear "free the port" error regardless of the answer.
10. Quitting pi while only an adopted server is tracked does not stop that server and writes nothing to stderr.
11. A bad `command` results in an error notification with a log path, `/llamafiles` reports nothing running, and pi remains usable.
12. A model declared with `port: 11434` and `args: ["...", "--server", "--port", "{{port}}"]` is spawned with `--port 11434`, becomes ready on `http://localhost:11434/v1`, and pi can chat with it. Changing the model's declared `port` to `11500` without editing `args` is sufficient to make the next start use port `11500` end-to-end.

---

## 12. Testing

This section defines how we mechanically verify §11. It is part of the spec because untested acceptance criteria are aspirational, not enforceable.

### 12.1 Goals

- Every numbered criterion in §11 is covered by at least one automated test.
- Tests run without a real llamafile binary, without a GPU, and without network access beyond `localhost`.
- The full suite (unit + integration) completes in under ~30 seconds on a developer machine.

### 12.2 Three layers

**Unit (pure logic, in-process, fast).** vitest. No pi runtime, no spawn, no network. Targets:
- `template.test.ts` — `{{port}}` substitution: present, absent, repeated, multiple per element, unknown `{{…}}` passes through, whitespace-sensitive match.
- `config.test.ts` — `models.json` parsing: missing file, malformed JSON, empty/missing `providers.llamafiles.models`, duplicate `id` (keep first + warn), default-filling.
- `stop-sequence.test.ts` — SIGTERM → 5s grace → SIGKILL, state transitions on each branch. Use vitest fake timers + a spied process.

**Integration (real extension + real pi + fake server).** vitest. Each test loads the extension into a live pi session and drives a synthetic llamafile.

**Manual smoke (one-shot checklist).** `tests/MANUAL.md`. A short list of things automation can't reach: a real llamafile binary actually serving a chat, terminal rendering of footer status and confirm wording.

### 12.3 Fake server

`tests/helpers/fake-server.ts` — a Node.js script (~80 lines) we spawn instead of a llamafile binary. It is the only piece of the harness that opens a socket.

CLI:

```
fake-server --port <n> --id <s>
            [--startup-delay-ms <n>]    # simulate slow boot
            [--never-listen]            # never bind (test 120s timeout)
            [--exit-on-startup <code>]  # spawn-then-die before listening
            [--die-after-ms <n>]        # crash after becoming ready
```

Endpoints:
- `GET /v1/models` → `{ "data": [{ "id": <id> }] }`
- `POST /v1/chat/completions` → minimal valid OpenAI Chat Completions response (single `assistant` message, fixed text).

Signals: clean exit on SIGTERM with `process.exit(0)`; no special SIGKILL handling (the OS forces it).

### 12.4 Harness split: SDK vs RPC

Both modes are used. The split is by which path we want to exercise:

| Mode | When used | Why |
|---|---|---|
| **SDK** (`createAgentSession`) | All integration tests in v1 | In-process, fast, dialogs answered by overriding the UI host. Direct access to the registered provider/models, notification stream, and lifecycle. |
| ~~**RPC**~~ | Originally planned for §11.5/§11.5a; **dropped in implementation** | Spawning `pi --mode rpc` inside vitest's worker pool reliably produced spurious "Worker exited unexpectedly" errors from tinypool. The behavior actually under test is the extension's `session_shutdown` handler, not pi's CLI signal plumbing, so SDK is sufficient. |
| **CLI shellout** | Reserved for manual smoke (see `tests/MANUAL.md`) | Verifies pi's `--list-models` output literally, not the API surface. Equivalent automated coverage is in `tests/integration/list-models.test.ts`. |

#### 12.4.1 SDK harness

`tests/helpers/pi-session.ts` wraps the boilerplate:

1. Creates a temp HOME directory (`HOME=<tmp>` for the duration of the test).
2. Writes `<tmp>/.pi/agent/models.json` referencing `fake-server` with a free port allocated from the `30000–40000` range (retry on `EADDRINUSE`).
3. Calls `createAgentSession({ extensions: [<path-to-extension>], sessionManager: SessionManager.inMemory(), ... })`.
4. Replaces the UI host so `confirm`/`notify`/`setStatus` are captured into an in-memory log the test can assert against, and so `confirm` answers are scripted per test (`harness.scriptConfirm({ "Stop llamafile?": "yes" })`).
5. Exposes assertions: `harness.notifications`, `harness.probe(port)`, `harness.processAlive(pid)`, `harness.activeModel()`.
6. Cleanup hook: kill any process still listening on the test port; remove temp HOME.

#### 12.4.2 RPC harness

`tests/helpers/pi-rpc.ts` spawns `pi --mode rpc` against the same temp-HOME `models.json`, speaks JSONL on stdin/stdout, and:
- Sends commands (`set_model`, `prompt`, etc.).
- Reads events, including extension-UI confirm requests, and answers them per the test script.
- Captures all event types into a typed buffer the test asserts on.

### 12.5 Test inventory

`tests/unit/` — three files listed in §12.2.

`tests/integration/` — one file per acceptance criterion, driver selected per row:

| File | §11 criterion | Driver |
|---|---|---|
| `list-models.test.ts` | 1 | SDK |
| `start-fresh.test.ts` | 2 | SDK |
| `switch-llamafile.test.ts` | 3 | SDK |
| `switch-away.test.ts` | 4 | SDK |
| `quit-prompt.test.ts` | 5 (Yes + No) | SDK |
| `reload-no-prompt.test.ts` | 5a | SDK |
| `preselect-start.test.ts` | 6 | SDK |
| `preselect-adopt.test.ts` | 7 | SDK |
| `adopt-running.test.ts` | 8 | SDK |
| `foreign-port.test.ts` | 9 | SDK |
| `adopt-no-stop-on-quit.test.ts` | 10 | SDK |
| `bad-command.test.ts` | 11 | SDK |
| `template-port.test.ts` | 12 | SDK |

All thirteen integration tests use the SDK driver. The shutdown and reload tests directly call into the agent runtime to emit the relevant session events; verifying that pi's CLI subsequently calls those code paths is left to the manual smoke checklist (since it's pi's contract, not this extension's).

Each integration test asserts on three observables: the notification stream, an HTTP probe of the expected port, and OS process liveness (`process.kill(pid, 0)`).

### 12.6 Tooling

- `vitest` ≥ 1.x. Pi's monorepo uses it, so the same tooling.
- `package.json` additions:
  - `"test": "vitest run"`
  - `"test:watch": "vitest"`
  - Existing `"dev": "pi -e ."` preserved for manual exercising.
- `tsconfig.json` for the extension if not present.
- `vitest.config.ts` with a 30s per-test timeout (integration tests need it; startup polls can take a couple of seconds).

### 12.7 CI considerations

- No real llamafile, no GPU, no internet.
- Free ports allocated from `30000–40000` with retry on `EADDRINUSE`. The fake server prints its bound port on stdout for the test to confirm.
- Each test must clean up: kill any orphan fake server, remove temp HOME. Use vitest `afterEach` and a process-level `beforeExit` safety net.
- The suite is the source of truth for §11; CI runs it on every PR (out of scope to wire up here, but the suite is the thing that would be wired).


---

## 13. Schema compatibility with pi's `models.json` parser

This extension extends pi's `models.json` with five custom per-model fields (`command`, `args`, `port`, `env`, `cwd`). This section documents how that interacts with pi's built-in `models.json` parser, why it works today, and what could break it in the future.

### 13.1 Where pi's schema lives

Pi defines the schema in `packages/coding-agent/src/core/model-registry.ts`:

- **`ModelDefinitionSchema`** (~line 140): per-model fields pi knows about — `id`, `name?`, `api?`, `baseUrl?`, `reasoning?`, `thinkingLevelMap?`, `input?`, `cost?`, `contextWindow?`, `maxTokens?`, `headers?`, `compat?`.
- **`ProviderConfigSchema`** (~line 184): per-provider fields — `name?`, `baseUrl?`, `apiKey?`, `api?`, `headers?`, `compat?`, `authHeader?`, `models?`, `modelOverrides?`.
- **`ModelsConfigSchema`** (~line 196): the top-level `{ providers: Record<string, ProviderConfigSchema> }`.

All three are TypeBox schemas. Validation is `Compile(ModelsConfigSchema).Check(parsed)` (~line 463), and a failure rejects the **entire** `models.json` with an error — every provider in the file becomes unavailable.

### 13.2 Why our extra fields pass today

TypeBox's `Type.Object({...})` is **permissive by default**: it does not emit `additionalProperties: false`, so unknown keys are silently accepted. Our `command`, `args`, `port`, `env`, `cwd` ride through validation unmolested.

After validation, pi's `parseModels()` (~line 558) constructs each `Model` object by **explicitly reading only the known fields**. Our extras are not copied into the registered models — they exist only in the in-memory parse tree and are then garbage-collected. Pi's registry never sees them.

### 13.3 How `pi.registerProvider("llamafiles", ...)` reconciles

When pi loads `models.json` at startup, it registers a `llamafiles` provider from the user's config — but without our custom fields, every model resolves with `baseUrl` from the (single) `providerConfig.baseUrl`. That registration is briefly "wrong" (no per-model port awareness).

Then our extension's factory runs and calls `pi.registerProvider("llamafiles", { models: [...] })`. `applyProviderConfig` (~line 880-883) treats this as a **full replacement**: it filters out existing `llamafiles` models from the registry and installs ours with the correct per-model `baseUrl`. From that point on, the registry holds only our authoritative entries.

So the lifecycle is:

```
pi startup
  ├─ load models.json (TypeBox validates, extras pass)
  ├─ parseModels() → register `llamafiles` with stripped fields  (state A)
  └─ load extensions
       └─ our factory → pi.registerProvider("llamafiles", ...)   (state B replaces A)
```

State A is invisible to the user because nothing queries the registry between A and B.

### 13.4 Graceful degradation when the extension is absent

If the user has a `llamafiles` provider in `models.json` but the extension is not installed (or fails to load), pi's `parseModels()` walks the models, looks for a `baseUrl` resolution path (model > provider > built-in defaults), and finds none — `llamafiles` is not a built-in provider, and our per-model `baseUrl` is derived only by our extension. The models are silently skipped. No crash, no confusion, no half-broken registrations.

This is intentional: a user without the extension sees no `llamafiles` models in `pi --list-models`, exactly as if they had configured an unknown provider.

### 13.5 Compatibility risk: if pi switches to strict schemas

We depend on TypeBox's permissive default. If pi ever adds `additionalProperties: false` to `ModelDefinitionSchema` or `ProviderConfigSchema`, every `models.json` containing our extra fields will fail validation, and pi will reject the **entire** file — taking down every other provider (`lm-studio`, `anthropic`, etc.) alongside `llamafiles`.

Indicators that this has happened:
- `pi --list-models` shows zero custom models.
- Pi prints an error like `Invalid models.json schema: providers.llamafiles.models.0.command: must NOT have additional properties`.

### 13.6 Mitigation if that day comes

Two options, ordered by preference:

**A. Move our custom fields into a separate config file.**

Read `~/.pi/agent/llamafiles.json` (or similar) instead of riding on top of `models.json`. The `llamafiles` provider in `models.json` would then carry only standard fields, and the extension would consult its own file for `command`/`args`/`port`/`env`/`cwd`. Cost: two files to keep in sync; users need to redo their config. Benefit: zero coupling to pi's schema evolution.

**B. Petition upstream pi for a documented `extensions` escape hatch.**

Ask the pi maintainers to make `ModelDefinitionSchema` and `ProviderConfigSchema` accept an `extensions: Record<string, unknown>` field that extensions are free to namespace under. Our config would move to `models.json` `providers.llamafiles.models[i].extensions.llamafiles.{ command, args, port, env, cwd }`. Cost: depends on upstream timeline. Benefit: stays in one file, sanctioned by pi.

Until either is needed, the current design is the right one: simplest config UX for users, single file, no extra wiring.

### 13.7 Watching for breakage

If pi releases a version that strict-validates `models.json`, this extension will break loudly (whole-file rejection). To catch this early:

- The integration test `list-models.test.ts` exercises the full pi-load → extension-register pipeline. It will fail if pi rejects our test config.
- The unit test `config.test.ts` is isolated from pi and would not catch this; it tests **our** loader, which always accepts our extras by definition.

So `npm test` will catch a strict-validation regression in pi as long as we keep `list-models.test.ts` green.
