# Changelog

All notable changes to Alvin Bot are documented here.

## [5.44.1] — 2026-06-08

### Laptops no longer restart the bot after waking from sleep

On a laptop, the crash-backstop could needlessly restart a perfectly healthy bot every time the machine woke from sleep — while asleep the bot can't refresh its liveness signal, and that quiet gap looked like a freeze. It now tells sleep apart from a real freeze by checking how long the machine has actually been awake, so a healthy bot is left alone. A genuinely stuck bot is still recovered exactly as before, and always-on desktops were never affected. The fix self-applies on the next start — nothing to do.

## [5.44.0] — 2026-06-05

### /provider and /model now do different things

The two commands used to overlap. Now they form a clear hierarchy: `/provider` chooses the AI service, `/model` chooses the model within it.

- `/provider` — pick which AI service answers you (Claude, Groq, OpenAI, Gemini, NVIDIA, OpenRouter, Ollama) and add its key or log in.
- `/model` — pick which model of the currently-active service: Claude shows Opus / Sonnet / Haiku; key-based providers show their live model list with a recommended pick; Ollama shows your local models. The choice applies immediately, no restart.

## [5.43.2] — 2026-06-05

### Clearer Claude sign-in from the dashboard

Signing in to Claude from the web Providers card now shows you the exact one-line command to run on the machine running Alvin and waits until you're signed in — a clearer, more reliable flow than before. The terminal `alvin-bot provider login` is unchanged.

## [5.43.1] — 2026-06-05

### Terminal provider changes apply to the running bot again

A quick fix to yesterday's provider switching: changing your provider from the terminal (`alvin-bot provider use` / `provider key`) now takes effect on the already-running bot immediately, instead of only on the next restart.

## [5.43.0] — 2026-06-05

### Change your AI provider anytime — no reinstall, no re-setup

You can now switch your primary AI provider whenever you like — from chat, the web dashboard, or the terminal — and add a new provider's key or log in right there, without re-running setup.

- New `/provider` command (Telegram, Slack, Discord, WhatsApp): see every provider with a ready/not-ready mark and switch with one tap. Not set up yet? Add the key in a quick owner-only DM (the message is deleted instantly) or open the dashboard.
- The web dashboard gets a Providers card: switch, paste an API key, or log in to Claude through a guided link — all live, with no restart.
- Switching now checks the provider is actually ready first, so you never end up "switched" to something that quietly falls back to another provider.
- In the terminal: `alvin-bot provider use <name>`, `provider key <name>`, and `provider login <name>` — applied live to the running bot when it's up.

## [5.42.1] — 2026-06-04

### Workspaces can now use maximum thinking depth

A workspace can now set its thinking depth all the way to `max` — the deepest reasoning level, the same one you get from the global `/effort max` command.

- Until now a workspace's effort setting only went up to `high`. Setting it to `max` was silently ignored and the workspace quietly fell back to your global setting. Now a workspace pins maximum thinking depth on its own, with no fallback.
- The global `/effort` command was unaffected and always supported `max` — this only closes the gap for per-workspace overrides.

## [5.42.0] — 2026-06-04

### Starting the bot on a Mac just works now

`alvin-bot start` now sets things up the right way on its own.

- On macOS, the bot is started inside your login session automatically, so anything that needs your Mac's secure storage — signing in to Claude, saving your `/setup sudo` password — just works, with no extra step.
- Switching an already-running setup over is safe: it checks the new way is actually up and quietly restores the old one if anything goes wrong, so your bot is never left stopped.
- A headless Mac (nobody logged in at the screen), and every non-Mac machine, keeps the previous background runner exactly as before — and `alvin-bot start --pm2` forces it any time. Updating never changes how your bot is run.

Nothing to do — your existing setup is left untouched.

## [5.41.0] — 2026-06-03

### Pick the right model the smart way — live, recommended, everywhere

Choosing which AI model to run is now effortless, in the terminal and the web setup alike.

- When you set up a cloud provider, Alvin fetches that provider's current models **live** and pre-selects a sensible recommended one — no more outdated, hardcoded model names that quietly stop working over time.
- The hardware-aware local-model picker is now in the **web setup** too, not just the terminal: it reads your machine's memory, suggests the models that comfortably fit, and can download your pick in the background with a progress bar.
- Whichever way you set up — terminal or browser — the experience matches, your choice is remembered, and more cloud providers work as your primary out of the box.

Nothing to do if you're happy with your current model — your setup is left untouched.

## [5.40.0] — 2026-06-03

### Background tasks you can trust — Alvin spots interruptions and stays aware

When Alvin runs work in the background, he now keeps a clear, honest picture of it.

- If a background task is cut short or runs into trouble partway through, Alvin recognises it as **unfinished** instead of treating it as done — and tells you plainly, so half-finished work never slips by as "complete".
- He keeps track of his background tasks throughout the conversation: what's still running, what just finished, and what was interrupted — even many messages later, so he never loses the thread when you follow up.
- When a task ends, his follow-up reflects what actually happened — a clean result, or an honest "this one didn't finish, here's how I'll carry on".

Nothing to do on your side — this just works in the background.

## [5.39.0] — 2026-06-02

### A smoother first hour — and local models that actually fit your machine

A big release for getting started and for running offline.

**Getting started**
- Setup remembers what you've already configured and asks only for what's missing — re-running it never overwrites your settings, with a review step to change anything before it saves.
- Add or switch AI providers and set a failover order from one command.
- The dashboard and terminal chat recognise your existing setup and connect cleanly; self-update is reliable on every install type, including brand-new machines with a self-contained runtime.
- The right background service per OS — launchd on macOS (survives reboots, keeps your login keychain), pm2 elsewhere.

**Local / offline models**
- Setup now reads your machine's memory and offers the local models that fit — Gemma (recommended) and Qwen alongside it — each marked with its size and whether it runs comfortably, so you never pick one that's too big to load. Your choice stays entirely yours.
- Local models can use tools (shell, files, web) now, not just chat — a real agent offline, whenever the model supports it.
- Setup and update keep the local runtime up to date so current models load reliably.

Nothing to do if you're happy as you are — your configuration is left untouched.

## [5.37.2] — 2026-06-01

### One-line install, rock-solid on any fresh machine

A final round of polish on the one-line installer so it's dependable from the
very first command on a brand-new computer — set up and pointed straight at the
next step in one go. Already running? Nothing changes.

## [5.37.1] — 2026-06-01

### Even smoother first run on a brand-new machine

More polish on the one-line install — getting started is even more reliable on a
freshly set-up computer. Already up and running? Nothing changes.

## [5.37.0] — 2026-06-01

### One-line install on a brand-new machine — no setup hassle

Getting started no longer assumes you already have developer tools installed.

- **Install with a single command.** On a fresh Mac or Linux box you can now run
  one line in the terminal and be up and running — even with nothing
  preinstalled (no Homebrew, no Xcode tools, no system Node, no admin password):

  ```
  curl -fsSL https://unpkg.com/alvin-bot/install.sh | bash
  ```

  If a recent Node is already there it's reused; otherwise the installer fetches
  a self-contained Node into your home folder and installs Alvin into a
  user-owned location. Nothing system-wide is touched, and it never asks for
  `sudo`. When it finishes it launches the setup wizard, where you pick your
  provider and paste your token — your choice of provider is unchanged.

- **WhatsApp is now optional and installed only if you want it.** The WhatsApp
  connector pulled in heavy dependencies that every user paid for in install
  size and that needed extra developer tools to build. It's now an opt-in
  add-on: the base install is smaller and simpler, and Telegram, Slack, Discord,
  the terminal and the web UI all work exactly as before. Want WhatsApp? Enable
  it from the Web UI (Platforms → Install Dependencies) or with a one-line
  install — the bot tells you exactly how when you first try to use it.

## [5.36.0] — 2026-06-01

### No more "subsystem restart" spam after a laptop wakes

The last bit of sleep noise: when a laptop wakes, the dropped Telegram
connection is re-established automatically (correct behaviour) — but the bot was
sending you a "🔄 Subsystem-Restart: telegram-poll" notice every single time,
which adds up fast on a machine that sleeps often. The bot now recognises a
reconnect that's just collateral of a wake (the connection was stale for the
whole sleep duration, or a resume was just detected) and reconnects silently. A
real runner failure still notifies you. The reconnect itself is unchanged.

## [5.35.0] — 2026-06-01

### Reliable self-update + a calmer bot on sleeping laptops

Two related reliability fixes, especially for laptops that sleep:

- **`/update` works again on npm installs.** The self-update was silently
  failing for anyone who installed via `npm install -g` — a flaky Chromium
  download (a transitive dependency the bot never even uses) aborted the whole
  install. It's now skipped, so updates go through. And a dropped network during
  the update (common right after a laptop wakes) no longer aborts it or, worse,
  installs the new version without restarting — the restart always happens.
- **Fewer false alarms and restarts after sleep.** Building on the v5.33 wake
  detection: the watchdog now gives a freshly-woken machine a grace window to
  reconnect before considering a restart (so a slow post-wake reconnect doesn't
  bounce the bot), and the trend monitor recognises a frequently-sleeping
  machine and stops flagging the expected sleep/reconnect noise as a "crash
  loop." A genuine fault still triggers a restart and still alerts.

## [5.34.0] — 2026-05-30

### Fix: the daily budget cap no longer blocks you out of the blue

The optional `MAX_BUDGET_USD` daily spend cap is now **off by default** — the bot
is never stopped on budget unless you deliberately turn it on. Previously the
shipped example config set `MAX_BUDGET_USD=5.0`, which used to be ignored; once
the cap started being enforced, anyone who'd copied that example suddenly hit a
surprise "$5.00 spent today" wall after a normal day's use. Sorry about that.

Now: no budget blocking happens unless you set **both** a limit and
`BUDGET_ENFORCE=1`. If you were affected, just update — or remove the
`MAX_BUDGET_USD` line from your `.env`. The cap is still available as an opt-in
runaway-cost brake for anyone who wants one.

## [5.33.0] — 2026-05-29

### A sleeping laptop no longer triggers a restart loop

If you run Alvin on a laptop that goes to sleep (or any machine that suspends),
waking it up used to look like a hang to the health watchdog: the whole process
had been frozen, so every subsystem appeared "stale" and the bot restarted
itself — over and over on each wake, spamming false "crash loop" alerts. The
watchdog now recognises a large wall-clock jump between its checks as a
resume-from-sleep, re-arms its health beacons, and skips that one check instead
of restarting. A genuine hang is still caught on the next normal check. No more
sleep-induced restart loops or false crash alerts on laptops.

## [5.32.0] — 2026-05-29

### Internal cleanup — smaller, more maintainable code modules

A maintainability pass with no change to how anything behaves:

- The large command file was split further — the tools/extensions/memory
  commands now live in their own module (joining the cron and sub-agent ones).
- The web dashboard's single 3,300-line script was split into three focused
  files (core, panels, views), loaded in order with identical behavior.

Verified byte-for-byte equivalent; nothing for you to do.

## [5.31.0] — 2026-05-29

### Deeper defense-in-depth for tools and untrusted content

A round of security hardening that strengthens the safe defaults without
changing how you use the bot:

- **Protected-path write guard.** The file-writing tools now refuse to write or
  edit credential, persistence, and system paths — your `.env`, `~/.ssh`,
  `authorized_keys`, the bot's own internal state, shell startup files,
  `LaunchAgents`, `/etc`, and the like. Normal file output is unaffected; this
  is an always-on backstop against both mistakes and prompt injection.
- **Secret-safe tool servers.** External tool servers (MCP) no longer inherit
  the bot's full environment — your bot token and API keys are no longer handed
  to third-party tool processes. If a server genuinely needs a value, pass it
  explicitly in that server's `env` config.
- **Untrusted-content fencing.** Content fetched from the web, web-search
  results, and the output of external tool servers are now wrapped in a clear
  "treat this as data, not instructions" fence before the model sees them, with
  a heuristic that flags likely prompt-injection attempts in your logs.
- **Documented threat model.** `SECURITY.md` now spells out the trust boundary
  and the layered defenses.

Nothing to configure — normal usage is unchanged.

## [5.30.0] — 2026-05-29

### Faster local development with a watch mode

For anyone who runs Alvin from source, there are now `npm run dev:watch` and
`npm run build:watch` — the bot (or the compiler) restarts automatically on every
file change, so you don't have to stop and start it by hand while hacking on it.

Under the hood, the largest command file was split into smaller modules (the cron
and sub-agent commands now live in their own files) with no change to how any
command behaves — purely a maintainability cleanup.

## [5.29.0] — 2026-05-29

### /subagents list now shows which background agents will auto-continue

When you ask Alvin to do something after a background task finishes, that
follow-up (the auto-continuation from the last release) now has a visible marker
in `/subagents list`: agents with a follow-up queued show a small ↪️ next to
them, so you can tell at a glance which ones will keep working on their own once
they're done. Detached background agents were already listed; this just makes
their pending follow-up visible.

## [5.28.0] — 2026-05-29

### Stability — budget cap, clearer setup checks, self-healing MCP connections

Three reliability improvements under the hood:

- **The daily spend cap actually works now.** If you set `MAX_BUDGET_USD`, Alvin
  stops making paid model calls once that much has been spent today and tells you
  clearly instead of quietly running up a bill. It's opt-in — if you don't set the
  variable, nothing changes (so flat-rate subscription users are never blocked).
- **Misconfigurations get a clear message at startup.** Common setup mistakes —
  an empty or malformed bot token, a chosen AI provider whose API key is missing,
  an invalid allow-list or budget value, or exposing the web UI without a password —
  now produce one plain-English line on startup instead of a cryptic crash later.
- **Dropped tool connections heal themselves.** If an external tool server
  (configured via `mcp.json`) crashes or disconnects, Alvin now notices within
  ~30 seconds and reconnects it automatically with backoff — no restart needed.

All three are internal hardening; normal usage is unaffected.

## [5.27.0] — 2026-05-29

### Security hardening — stricter defaults for the web UI and shell tool

A round of defense-in-depth so the safe defaults stay safe even in unusual setups:

- **Web UI access is tighter.** When the web UI is reachable from outside your own
  machine, it now consistently requires a password across *every* channel — including
  the live chat connection — instead of just the page itself. With no password set, it
  stays locked to your local machine (`127.0.0.1`), as before.
- **A clear warning when traffic isn't encrypted.** If you expose the web UI over plain
  HTTP, Alvin now tells you on startup that the password and chat travel unencrypted, and
  suggests putting a TLS proxy/tunnel in front (Tailscale Serve, Caddy, nginx).
- **The shell tool's safety guard is smarter.** The guard that blocks catastrophic
  commands was rewritten from a short fixed list to pattern-based detection, so the
  obviously destructive cases are caught more reliably while everyday development commands
  keep working untouched.

These are hardening changes — nothing you need to do, and normal usage is unaffected.

## [5.26.0] — 2026-05-29

### Alvin finishes the job — automatic follow-up after a background task

When Alvin kicks off a background sub-agent and says „I'll do X once it's done", he now
actually does it — automatically, without you having to nudge him again. When the
sub-agent finishes, the result is delivered as before AND Alvin runs one follow-up turn
to carry out what he promised, posted as a reply right under the original message.

- **Opt-in per task:** only fires when Alvin attached a follow-up instruction; otherwise
  nothing changes (result delivered, no auto-turn).
- **New `/continuation on|off` toggle** (default on) — control it per chat, also from your phone.
- **Safety first:** a depth limit caps chained follow-ups (no runaway loops), and a chat
  that's already busy is never interrupted.

## [5.25.2] — 2026-05-29

### Background-task results no longer post a duplicate "done" banner

When a finished background task produced a long result (over Telegram's message
limit), the bot tried to send it as a single message, that send failed, and the
fallback path then posted a *second* "✅ … completed" banner — so the same task
looked like it had finished twice. Long results are now cleanly split across
messages (or attached as a file), so every task reports exactly once.

## [5.25.1] — 2026-05-29

### /status polish — clearer reasoning mode, stacked providers

Small follow-ups to the new dashboard:

- **Reasoning depth gets its own line** (🧠 Reasoning: Medium) and is shown
  **only for providers that actually use it**. API-key providers with no
  reasoning concept no longer display a misleading "thinking" mode.
- **Provider / fallback status is listed one per line** instead of crammed
  onto a single inline row — easier to scan when more than one is configured.

## [5.25.0] — 2026-05-29

### A clearer, friendlier /status — readable at a glance

`/status` used to be a wall of technical numbers. It's now a clean dashboard that
opens with one plain-language line summarising how things are — "Running smoothly —
plenty of room", or "Context filling up (78%) — a /new soon is a good idea" — so you
never have to decode raw figures to know the state.

- **Progress bars** for the context window and daily usage, instead of bare percentages.
- **Plain-language labels** ("this chat", "actions", "notes") — friendly for everyone,
  not just developers.
- **Provider-aware:** real rate limits are shown for API-key providers that report them;
  flat-rate plans show honest own-usage rather than a meaningless limit bar.
- **Progressive disclosure:** technical telemetry (compactions, checkpoint hints, …) only
  appears when it's actually relevant, keeping the everyday view clean.
- Fully localized (English + German; other languages fall back to English).

## [5.24.7] — 2026-05-28

### The bot now heals itself from a stuck conversation instead of looping

Very rarely, a long, heavily-used conversation could end up in a state the
underlying model refused to continue. Because the bot kept trying to resume
that exact same conversation on every new message, the error then repeated on
every turn — and even came back after a restart. The only way out was to
manually start a brand-new chat, which is exactly what the bot is meant to
spare you.

The bot now recognises this specific condition and recovers on its own:

- It quietly drops the stuck conversation anchor and starts a fresh model
  session on your next message, while **keeping your conversation history as
  context** — so the thread continues seamlessly, no manual reset needed.
- You may see one short "something slipped, just send that again" notice.
  Resend your message and it carries straight on.
- When this happens, the bot records a small private diagnostic snapshot so
  the underlying trigger can be pinned down and prevented at the source in a
  later update.

## [5.24.6] — 2026-05-28

### Hotfix — `/restart` no longer briefly starts a second copy of itself

v5.24.5 made `/restart` (and `/update`) actively relaunch the bot after a
clean exit, so it comes back even when the OS service manager has stopped
respawning it on long-running machines. But on setups where the service
manager still works normally — most of them — that meant *two* things tried
to bring the bot back at once: the service manager's own respawn **and** the
new explicit relaunch. The two instances then competed for the same Telegram
connection (only one updates-poll is allowed per token), so `/restart` could
spin in a reconnect loop instead of settling.

The explicit relaunch is now a **fallback**: after a clean exit the bot
waits a short grace window to see whether the service manager brings it back
on its own, and only relaunches itself if it doesn't. Healthy setups get a
single clean restart; setups whose service manager has stopped respawning
still recover. No self-preservation was removed — the OS respawn, the
frozen-process watchdog, and the crash-loop brake all behave exactly as
before.

## [5.24.5] — 2026-05-28

### `/restart` now relaunches itself, instead of waiting on the OS

A clean shutdown is only half of a restart — the bot also has to come
back. Until now it exited and trusted the OS service manager to relaunch
it. On a machine that has been running for a long time, that automatic
relaunch can quietly stall, so a `/restart` or `/update` could leave the
bot down even though it had shut down perfectly.

- **Self-restarts now schedule their own relaunch.** After a clean exit
  the bot explicitly asks the service manager to start it again, rather
  than waiting for the automatic trigger that can stall on long-lived
  sessions. No-op where another supervisor already handles respawns.
- **The crash-backstop watcher now self-installs on boot.** The optional
  watcher that revives a frozen bot used to be set up only by the
  one-time installer, so setups that only ever updated in place never had
  it. It now installs itself on startup when it's missing.

## [5.24.4] — 2026-05-28

### `/restart` now shuts down cleanly — fixes the "bot doesn't come back" failure

The 2026-05-28 incident: on a Mac Mini running the bot under launchd
with WhatsApp enabled, `/restart` exited the process but the bot never
came back. Same `/restart` on a Test MacBook (no WhatsApp) worked
fine. Root cause: the `/restart` command — and the `/update` restart —
used a bare `setTimeout(() => process.exit(0), 500)` instead of the
hardened `scheduleGracefulRestart()` path. That bare exit is a relic
of the PM2 era (the stale code comment literally still said
"PM2-managed restart"); PM2 respawns aggressively on any exit, so it
never mattered. Under **launchd**, a hard `process.exit(0)` skips the
orderly shutdown: `runnerHandle.stop()` (which commits the Telegram
update-offset AND closes the single-allowed getUpdates long-poll),
`stopWebServer()` (which releases port 3100), `unloadPlugins()`, and
the platform teardowns (WhatsApp socket close) never ran. On the
WhatsApp-enabled Mac Mini, a lingering socket/port left from the
unclean exit prevented the launchd-respawned process from cleanly
re-acquiring its resources, and the bot stayed down. The Test MacBook
has no WhatsApp, so its exit was clean enough to respawn — which is
exactly why the bug only surfaced on one machine.

- **`/restart` and `/update` now route through
  `scheduleGracefulRestart()`** so the registered shutdown function
  runs its full orderly teardown before exit. This eliminates the
  lingering-resource class of respawn failures, logs "Graceful
  self-restart initiated" so any recurrence is visible, and applies
  the restart-storm brake (a braked `/restart` now replies with a
  clear message instead of silently doing nothing).
- **Stale "PM2-managed restart" comments corrected** to reflect the
  launchd reality (the bot moved off PM2 in 2026-04).

### Bonus — restart-storm.json test-pollution closed at the source

`src/services/restart-storm-brake.ts` resolved its state-file path
from a module-load-time `DATA_DIR` const. A test
(`test/restart-storm-brake.test.ts`) that imported it before setting
`ALVIN_DATA_DIR` therefore wrote its fixed-clock fixture values (1e9,
2e9, 6e9 ms) into the user's real `~/.alvin-bot/state/restart-storm.json`
— the recurring "1970 timestamps in restart-storm.json" the
maintainer had to hand-clean twice. (Those ancient values were always
pruned by the 10-min window so they never actually braked a restart,
but they were the same const-at-load pollution class fixed for the
async-agent-watcher in v5.24.0.) The path is now resolved lazily via
`getStateFile()`, and the test sets `ALVIN_DATA_DIR` to a tmpdir
before the import. Verified: a full `npm test` run leaves
`restart-storm.json` byte-identical.

## [5.24.3] — 2026-05-25

### Quieter, smarter, and a free Ollama for everyone

Two long-running log-noise sources silenced. Neither was breaking
anything — both produced the kind of "looks like a bug but isn't"
warning that erodes user trust over time.

- **Trends-AI debounce.** When the bot restarted several times in a
  short window — the canonical shape of a bug-fix session — each
  restart fired a fresh `dailyTask()` 60 s after boot, each one
  called the trend-anomaly AI, and each one produced an identical
  to-be-suppressed verdict. The 2026-05-24 logs had 8+ "Trends AI:
  suppressed WARN ... — version-churn" lines in 30 minutes, each one
  an AI call that burned tokens for nothing. v5.24.3 caches the
  signature `{version × errors_24h × crashes_24h × rss-50MB-bucket}`
  of each AI call's snapshot set in
  `~/.alvin-bot/state/trends-ai-cache.json`; subsequent calls within
  4 h with an unchanged signature are skipped with a single
  "skipped (no signal change)" log line.
- **Ollama preload is silent when the daemon isn't running.** Users
  who install Alvin without an Ollama daemon used to see a
  permanent `[ollama] preload warning: ... operation was aborted due
  to timeout` red flag on every boot — alarming, but expected: the
  fallback chain routes around the absent daemon. v5.24.3 demotes
  that case to a plain-language info line: `[ollama] preload
  skipped: daemon not running — Ollama is optional, fallback chain
  will route around it`. Real preload failures (daemon up but the
  model is missing, OOM, etc.) still surface as warnings.

Bonus: the new trends-ai-cache file is resolved through a lazy
`getTrendsAiCachePath()` getter that honours `ALVIN_DATA_DIR`, so
tests cannot accidentally pollute prod with it. The v5.24.1
pollution sentinel also adds `state/trends-ai-cache.json` to its
SENTINELED_PATHS so any future regression is caught at CI.

## [5.24.2] — 2026-05-25

### Files born private — no more recurring permission-repair churn

Every uptime report on the maintainer's bot contained the line
`🔒 file-permissions: repaired N sensitive file(s) to 0o600` because
`delivery-queue.json`, `cron-jobs.json`, and the daily-log
`memory/YYYY-MM-DD.md` files were being created with the default
umask of 0o644 (world-readable) and the startup audit had to repair
each one back to 0o600 on every boot. The data wasn't actually
leaking — the audit ran early enough that no other process saw the
file in 0o644 — but the cosmetic noise was a constant reminder that
"the bot writes things with the wrong mode."

v5.24.2 passes `mode: 0o600` explicitly at every write site so the
files are born private. The startup audit becomes a no-op when the
permissions are already correct, and the log line stops appearing
in normal operation. If you ever DO see the audit fire post-v5.24.2,
it means a foreign tool touched the file outside the bot — a real
signal worth investigating, not the cosmetic false alarm it used to
be.

Affected writers:
- `src/services/delivery-queue.ts` — `writeQueue()` now passes
  `mode: 0o600` to `fs.writeFileSync` plus a defensive `chmodSync`
  (covers the overwrite case where `writeFileSync`'s mode arg is
  ignored).
- `src/services/cron.ts` — `saveCronJobs()` passes
  `{ mode: 0o600 }` to `atomicWriteJson`.
- `src/services/memory.ts` — `appendDailyLog()` writes with
  `mode: 0o600` plus defensive `chmodSync`. Daily logs hold full
  conversation history; they were the largest single source of
  audit-repair entries because every new day added a fresh file.

`restart-storm-brake.ts` already passed `mode: 0o600` (v5.18.0
hardening) — no change.

## [5.24.1] — 2026-05-25

### Production-state pollution sentinel — the v5.24.0 fix is now physically guaranteed

v5.24.0 patched the two known test files that wrote into the user's
real `~/.alvin-bot/state/` and lazy-fyed the async-agent-watcher's
paths so a future polluter would be auto-isolated. That closes the
specific known leak — but the same pattern lives in roughly twenty
other service modules (session-persistence, cron, delivery-queue,
trends, fallback-order, …). Any one of them could be the next silent
polluter if a test imports it carelessly.

v5.24.1 adds a vitest `globalSetup` sentinel that runs once before
any test file is loaded, snapshots a curated list of production
state files in `~/.alvin-bot/`, and re-checks them after the entire
suite finishes. If the SHA-256 of any file changed, `npm test`
exits non-zero with a precise diff and a pointer to the test-
isolation pattern. Any future regression — anyone's, in any test
file — is caught at CI before merge instead of after release.

The sentinel intentionally EXCLUDES files the live bot mutates
autonomously (watchdog heartbeat, trends snapshot, uptime peak,
restart-storm) so a launchd daemon ticking during `npm test` on the
maintainer's machine doesn't false-fire. The sentineled set covers
the state-machine files (sessions, async agents, cron, delivery
queue, fallback order, custom models, soul/tools/env config) that
are the realistic pollution targets.

Verified end-to-end: an intentionally-polluting test trips the
sentinel with the expected diff. A clean run passes 1266/1266.

## [5.24.0] — 2026-05-25

### Beast Mode — five forensically-traced regressions silenced for good

Combined effect: every code path that lets a long, multi-sub-agent
Telegram session drift into "Hänger" or "lost the thread" or "told me
to run a command that does not exist on this machine" is closed at the
root. The bot now compacts itself transparently on every long
conversation regardless of provider, the bridge never injects past the
wrong anchor, the trend monitor never sends placeholder-noise WARN
messages, and the test suite physically cannot corrupt production
state any more.

- **lastSdkHistoryIndex reindexed after every compaction.** When
  compactSession rewrote `session.history` from N entries to
  `[summary, ...last KEEP_LAST]`, the positional anchor pointed beyond
  the shrunken array; `Math.min(anchor, length-1)` then clamped it,
  the bridge computed `gapStart > gapEnd`, and a fresh-SDK turn
  injected NOTHING — leaving Claude blind on the very next turn. The
  reset to `-1` ensures the next bridge inject re-briefs the SDK on
  the renumbered history.
- **SDK-path compaction enabled.** The pre-v5.24 `if (!isSDK)` guard
  meant Anthropic-server sessions never benefited from local
  compaction. Two real costs: the local `session.history` shadow grew
  to the MAX_HISTORY=100 cap, so every bridge-inject path
  (bypass-resume, empty-stream recovery, failover) paid a giant
  prompt, AND the compaction-summary pipeline (daily-log →
  memory-extractor → MEMORY.md → ambient-memory) never enriched the
  bot's long-term memory from long SDK sessions. Compaction now runs
  for every provider, gated on `history.length ≥ 25` so short chats
  pay nothing.
- **Trend-WARN noise floor raised: no more `(no description)` /
  `(no suggestion)`.** When the trend-anomaly AI returned a malformed
  3-line response, the parser used to substitute placeholder strings
  and emit the WARN anyway — a remote Alvin user on v5.13.0 received
  a literal-placeholder-WARN that looked like a bot bug. Now both
  fields must be non-empty; a malformed AI response is treated as
  "no anomaly" and the WARN is suppressed entirely.
- **Trend suggestions are platform-aware.** The same incident also
  emitted `journalctl -u alvin-bot ...` as the operator action — on a
  Mac install where journalctl does not exist. The AI prompt now
  injects `process.platform` and explicit do-not-use-journalctl-on-
  darwin instructions; the emit path defensively strips `journalctl` /
  `systemctl` references on darwin/win32 and falls back to a platform-
  neutral log-pointer. When a v5.13-class trend is flagged on a stale
  install, a `⚠️ try /update first` hint is prepended (≥ 5 minor
  versions behind = the canonical "just run the newer version" case).
- **Sanitizer-residue logic extracted to a unit-testable helper +
  threshold widened.** The 2026-05-25 incident shape (107 c residue
  out of 283 c original = 38 % kept) did not trigger the v5.23.0
  detector because that release defaulted `maxKeptRatio` to 0.25. The
  new default is 0.5: residues up to half the raw size are caught, but
  genuine short replies (typically > 60-70 % kept once the sanitizer
  strips only the `<system-reminder>` envelope itself) are preserved.
  Detection lives in `src/util/sanitizer-residue.ts` with 10 direct
  unit tests pinning every gate.

### Bonus — test pollution closed

Three test files imported `async-agent-watcher` before setting
`ALVIN_DATA_DIR`. Since `src/paths.ts` resolves `DATA_DIR` at
module-load time as a plain const, every `registerPendingAgent()`
those tests called wrote into the user's REAL
`~/.alvin-bot/state/async-agents.json`. The 2026-05-25 `/subagents
list says "keine" but JSON has entries` bug was exactly this: a stale
`real-agent-3` entry from a test suite was loaded by the live bot on
boot. The watcher now resolves its two state paths through lazy
getters that re-read `process.env.ALVIN_DATA_DIR` on every read/write,
and the two polluting tests now set the env BEFORE the watcher import.
Belt-and-braces: even if a future test forgets the setup, the lazy
getter falls back to the default — but no test in this release can
corrupt prod data.

Total suite 1241 → 1266 (+25 new tests: stalled-output, sanitizer-
residue, compaction-reindex, trends-version-helper). Hardcore
17-scenario stresstest on Test-MB: all green.

## [5.23.0] — 2026-05-25

### No more mid-conversation hangs and the bot never loses the thread

Three forensically-traced regressions from the past two days, fixed at
the root instead of patched. The combined effect: a long Telegram
conversation with multiple background sub-agents in flight feels as
fast and as context-aware on turn 50 as it did on turn 1.

- **Stalled-output guard for the async-agent watcher.** Pre-v5.23.0 the
  watcher only knew two failure modes: the output file never appeared
  ("missing"), or the subprocess wrote a clean terminal marker. The
  silent crash in between (subprocess re-fork orphans the parent pid,
  child dies from OOM, parent bot restart leaves a child behind) had no
  detector. The agent stayed "pending" until the 12 h give-up ceiling,
  and `session.pendingBackgroundCount` stayed > 0 the whole time,
  pinning the bot in bypass-bridge mode — every user turn paid 5-15 s
  TTFT because each one rebuilt a fresh SDK session and re-injected the
  entire conversation. v5.23.0 now declares an agent failed if its
  output file goes 5 min without a new byte (configurable via
  `ALVIN_STALLED_OUTPUT_FAILURE_MS`). One symptom, three downstream
  cures.
- **Pre-bypass compaction.** Even when the bypass path is legitimately
  active (a real sub-agent is running), the bridge inject used to ship
  every raw turn of `session.history` to a fresh Anthropic session,
  costing both a large prompt and a guaranteed prompt-cache miss. v5.23.0
  runs a compaction pass *before* the bridge is built, so the LLM sees
  a stable narrative summary plus the most recent 10 turns. Result: the
  injected prompt shrinks by an order of magnitude, the summary becomes
  a byte-stable prefix that Anthropic's prompt-cache hits on the second
  and subsequent bypass-turns, and older context survives as a coherent
  summary instead of getting silently truncated at 32 KB. A
  `COMPACT_MIN_DELTA` gate prevents the no-op summarizer churn that
  would otherwise fire on every turn once a session crosses 80 k
  lifetime input-tokens.
- **Sanitizer-residue guard against role-confused history entries.**
  When the `<system-reminder>` sanitizer fires on a leaked block, what
  remains is occasionally a tiny quote-string fragment (e.g. 107 chars
  out of an original 283-char block) that is not a coherent answer but
  *is* truthy — so the empty-reply fallback didn't catch it, and the
  fragment landed in history as `role: assistant`. The next turn read
  that fragment back as "the bot's last reply" and got confused. v5.23.0
  widens the empty-reply fallback: when a leak fired in this stream
  AND what's left is < 150 chars AND it represents < 25 % of the
  pre-sanitize text, treat it as residue and surface the same
  transparent "I got tangled — please retry" message used for fully
  empty outputs. The diagnostic log line records both lengths so this
  failure mode is observable.

The async-agent stall guard ships with 5 dedicated tests. Total suite
1245 + 5 new = 1250 green; the long-running `public-mindset.test.ts`
default-provider behavioural check remains the single pre-existing
failure unrelated to this release.

## [5.22.5] — 2026-05-25

### Defense-in-depth for the 1M-token context window

Compaction already keeps the active prompt small enough that the
1M-token Claude window is almost never hit in normal use. But "almost"
isn't "never" — a single huge upload, a runaway tool-result, or a
session that's been alive for weeks with very dense turns can in
theory blow the limit before Compaction has a chance to run. v5.22.5
adds three layers so even that edge case stays graceful.

- **Friendly bilingual error instead of the raw "prompt is too long"
  trace.** `claude-sdk-provider.ts` now detects context-length-exceeded
  errors via `isContextLengthExceededError()` covering all wording
  variants Anthropic has used (`prompt is too long`,
  `context_length_exceeded`, `context window exceeded`,
  `max_tokens_exceeded`, the descriptive "maximum tokens exceeded"
  form). The user sees a clear German hint with a retry suggestion +
  `/new` as the safe fallback.
- **Emergency Compaction on detection.** When the provider yields an
  error chunk with `contextLengthExceeded: true`, the message handler
  immediately calls `compactSession(session)` so the very next user
  message starts from a compacted history. Best-effort: a compaction
  failure never poisons the error path.
- **Soft warning at 70 % of the 1M window** (700 k input-tokens). Logs
  a `[context-window]` heads-up so a human monitoring the bot sees the
  pattern. Compaction is already running on every turn at this point;
  the warning is observability, not action.

The Compaction layer remains the primary defence — these three layers
only matter at the absolute extreme. With Alvin's typical Compaction
threshold (80 k input tokens), reaching 700 k means Compaction has
already run 8+ times and the conversation is genuinely long.

7 new tests pin all the detection-pattern variants + emergency-compact
trigger path. Total suite 1241/1241 green. Hardcore stresstest on the
Test-MacBook: 10/10 scenarios green in 300 ms (context-length, 529
retry, supertonic timeout, sanitizer, atomic-write, 100-turn bridge,
dedup storm, ambient memory, state-file health, restart-detect).

## [5.22.4] — 2026-05-25

### Auto-retry on Anthropic 529 + About-Alvin in /help

Two operator-asked-for follow-ups.

- **Auto-retry on transient HTTP 529 from Anthropic.** A regional
  capacity hiccup that took down a single voice call (or worse, a
  6 a.m. cron job that nobody is watching) now triggers an internal
  retry loop instead of bouncing the error back. Max 3 attempts with
  exponential backoff 3s → 8s → 20s, suppressed entirely once any text
  chunk has flowed to the caller (no double-content risk). If all 3
  retries still fail, the registry's existing fallback layer takes
  over and switches to the next configured provider (Groq / Gemini /
  NIM / Ollama), with the usual `⚡ claude-sdk unavailable — switching
  to <next>` notification to the user. A cron job under transient
  Anthropic overload now silently succeeds on attempt 2 or 3 instead
  of failing.

- **`/help` includes an About-Alvin section.** Links to
  https://alvin.alev-b.com (landing), the npm package, and the
  changelog. Link previews are disabled so the help message stays
  compact in the chat.

5 new tests pin the retry routing (clean stream passes through; retry
silent on early 529; final 529 forwarded after max attempts; no
retry once text has flowed; non-529 errors are NOT retried). Total
suite 1233/1233 green.

## [5.22.3] — 2026-05-25

### Anthropic 529 Overloaded no longer leaks the raw error to the chat

Operator-spotted: a transient HTTP 529 ("Overloaded") from the
Anthropic API was reaching the Telegram chat verbatim — `API Error:
529 Overloaded. This is a server-side issue, usually temporary —
try again in a moment. If it persists, check status.claude.com.`
That looks like the bot is broken even though it's a 1-2-second
upstream hiccup.

Two changes:

- **Friendly message in place of the raw error.** The SDK catch-block
  in `claude-sdk-provider.ts` now matches `\b529\b|overloaded` on the
  thrown error message and replaces it with a one-line German message
  pointing at status.claude.com and suggesting a ~30s retry.
- **Availability cache invalidation.** On 529 the SDK provider also
  fires `invalidateAvailabilityCache()` so the registry's 5-min
  heartbeat re-checks the provider. If Anthropic stays overloaded the
  failover layer routes the next message to the next configured
  provider (Groq / Gemini / NIM / Ollama) — the user isn't bounced
  with the same error every time.

No retry is added at the SDK-stream level (a 529 happens mid-stream
and the async-generator is already torn down by the time we catch it;
a true retry needs a fresh `query()` call and is owed for a future
release).

## [5.22.2] — 2026-05-25

### Voice replies no longer silently fall back to Edge TTS on cold-start

Operator-spotted 2026-05-24 23:51: after activating voice in Telegram
the next reply came back via Edge TTS even though `TTS_PROVIDER` was
explicitly set to `supertonic` and the detect check reported
`ready: true`. Forensics found `[voice] Supertonic TTS failed,
falling back to Edge: supertonic: python timed out after 56121ms` in
the err log.

Root cause: every `synthesize()` call spawns a fresh Python
subprocess that has to `import torch` + load the ONNX models. On a
Mac mini whose OS page-cache went cold (long quiet period since the
last voice call, OR fresh boot), that single-call cold-start can take
30-60 s. The pre-v5.22.2 floor of 30 s with 5 s startupMs killed the
call mid-load and silently fell back to Edge.

Two-part fix:

- **Pre-warm at boot.** Five seconds after the bot is up, a fire-and-
  forget `prewarmSupertonic()` runs a 1-char synthesis so torch +
  models land in OS page-cache + the HuggingFace cache. The first
  real voice call hits a warm pipeline and runs in 1-3 s instead of
  30-60.
- **More tolerant timeout floor.** `computeSynthTimeout()`'s
  `startupMs` raised 5 s → 60 s, min-floor raised 30 s → 90 s. A
  real cold-start (no pre-warm available) gets enough room to load
  the models before being killed. The 240 s upper cap is unchanged.

`/tts status` and the WebUI Reliability panel both now reflect
`supertonic` as the active provider correctly after a normal boot.

1 test contract updated to the new floor; total suite 1228/1228 green.

## [5.22.1] — 2026-05-25

### Three quiet hardening passes — fully green tests, atomic state writes, /btw persisted

Follow-up to v5.22.0 that closes the three smaller items I knew were
still open: a fully-green test suite, atomic writes for the last two
state-files that weren't, and `/btw` notes persisted to session history.

- **Fully green test suite.** The two long-standing exec-guard test
  failures (`config defaults to allowlist…` and `metachar payload is
  blocked by default`) were a side-effect of `config.ts` re-loading
  `~/.alvin-bot/.env` on every import — a maintainer with
  `EXEC_SECURITY=full` in that file had the `delete process.env.EXEC_SECURITY`
  in those tests immediately undone. Tests now explicitly set the
  variable instead of relying on absence. 1228/1228 now green for the
  first time since the 5.x line started.

- **Atomic writes for cron-jobs.json + restart-storm.json.** The last
  two state-files that used direct `fs.writeFileSync` now go through
  a shared `atomicWrite()` helper (tmp + rename, POSIX-atomic). A
  concurrent reader can no longer catch the file mid-write, and an
  OOM/kernel-panic during write leaves the OLD file intact instead of
  a half-JSON garbage that the boot-time state-file-health check
  would quarantine. The `atomic-write.ts` util is intentionally small
  + tested (10 cases): all writes that mutate user data (sessions,
  cron jobs, delivery queue, storm counter) now share the same
  contract.

- **/btw notes persisted to session.history.** Before v5.22.1, a `/btw`
  steering note only lived in the in-memory `SteerChannel` buffer; if
  the bot restarted mid-task the note was lost from the conversation
  record. The assistant's response (folded with the steering) still
  landed via addToHistory on finalize, but the user's actual nudge was
  a black hole. After a successful push, `/btw` now also writes
  `[/btw] <note>` as a user-turn in history. A later reply-quote or
  memory-search can find the steering text, and a post-restart
  history-bridge can replay it for the LLM.

13 new tests; total suite 1228/1228 green.

No action needed. New defaults take effect on next `/update`.

## [5.22.0] — 2026-05-25

### Three-module foundation: dedup + ambient memory + privacy hygiene

A clean foundation release that pins the three root causes behind the
"the bot lost the thread" reports of the last 48 hours. Built from a
hardcore code-review across four dimensions (sub-agent lifecycle,
context management, privacy workflow, state-file integrity).

- **Sub-agent doppel-spawn-guard.** A second `spawnSubAgent()` call with
  the same name + prompt-prefix-hash inside a 60s window is now
  rejected with a clear "already running"-error instead of starting a
  duplicate. The "6 background agents pending in parallel" incident is
  the exact shape this guard prevents.

- **Unified cancel.** `/cancel` and `/stopall` now cover the detached
  sub-agent registry too, not just the in-process one. Soft tombstones
  pending watcher entries; hard additionally kills the subprocess.
  Detached agents loaded from disk after a Bot-restart are killable
  again — the `thisBootAgentIds`-only guard was replaced with a cheap
  `ps -p <pid>` command-shape check.

- **Ambient Memory.** Semantic memory search now runs on EVERY turn,
  not just the first SDK turn. First turns + non-SDK providers keep
  the generous budget (top-3, 400-char preview); non-first SDK turns
  get a tighter one (top-2, 250-char preview) with a header that tells
  the model where to find the rest. Structural fix for "Mein Context
  zeigt mir nur deine letzten 10 Nachrichten" — the USP is now a
  code fact, not a marketing claim.

- **Privacy hygiene.** `scripts/privacy-check.sh` now also scans
  git-tracked source files (src/, docs/) in addition to the tarball,
  closing the gap where source comments could pass the tarball check
  but still land in git history. Pattern file extended with two more
  project-name regexes; old example references in CHANGELOG.md +
  bin/cli.js were replaced with generic placeholders. Internal
  planning docs under `docs/plans/` removed from git tracking + added
  to `.gitignore`.

10 new tests pin the new behaviours; existing tests on the three
changed surfaces were updated to the new contracts. Total suite
1213/1215 green (the two pre-existing exec-guard test failures are
unchanged).

No action needed. New defaults take effect on next `/update`. No
existing data structures changed.

## [5.21.0] — 2026-05-24

### No more "the bot lost the thread" mid-conversation — Alvin's USP restored

Closes the most visible context-loss regression of the day. Late
Saturday evening Alvin appeared to "go senile" after a few minutes of
back-and-forth: the model would drift to a completely different topic
(a parallel sub-agent task) when the user just typed "Go :)", and at
one point literally said _"Mein Context zeigt mir nur deine letzten
10 Nachrichten — tipp mir die Punkte nochmal kurz"_.

Three layered fixes:

- **The bypass-bridge no longer caps at 10 turns.** The hard cap was
  destroying Alvin's USP ("no `/new` needed, always knows what's going
  on"): each bypass-restart (triggered by a pending background agent)
  re-fed the LLM with only the last 10 turns of history. Sessions with
  ~30+ turns lost the thread the moment a background agent kicked the
  bypass path. The cap is gone; `BRIDGE_MAX_CHARS` is now the sole
  truncation knob, raised 2.5 k → 32 k so a normal 30-turn detailed
  conversation survives bridge-in-progress intact. Per-message char cap
  also raised 500 → 4 000 so individual assistant turns are no longer
  truncated into fragments.

- **Background-agent results carry a structural attribution marker now.**
  v5.20.0 wrote sub-agent results into the parent session's history so
  Telegram reply-quotes would have context. That fix was right —  but
  it had an unintended side-effect: a parallel sub-agent result became
  the chronologically-latest "assistant turn" in history, and the LLM
  treated the user's next reply as a continuation of THAT topic. Each
  result entry now opens with `⚙️ [BACKGROUND-AGENT RESULT — async
  side-channel, not a direct reply to the user. Only act on this if the
  user explicitly references it.]`. Concrete: prevents the "Go :)" →
  "starting the LinkedIn task" misfire from the 2026-05-24 evening
  incident.

- **CLAUDE.md USP-rule pinned.** When the model notices its context
  feels incomplete, it must check `~/.alvin-bot/memory/YYYY-MM-DD.md`
  and `sessions.json` BEFORE asking the user to repeat. Asking the user
  burns time and breaks the USP. The standing rule is loaded into every
  query via `settingSources: ["project"]`.

4 new tests pin a 40-turn session surviving bridge intact, the size cap
truncating from the OLDEST end for runaway 200-turn sessions, no
truncation at all for normal ≤20-turn working sessions, and the
30-turn pre-fix breaking-point now passes cleanly. Total suite 1199/1201
green (the two pre-existing exec-guard test failures are unchanged).

No action needed. After the upgrade, no more `/new`-needed-mid-session
moments. If you ever see Alvin lose context again, that's a real bug —
report it with the timestamp and an incident dump from
`~/.alvin-bot/incidents/`.

## [5.20.2] — 2026-05-24

### Restart classification follow-up — actually wires the planned-flag through

v5.20.1 wrote the planned-flag to the watchdog beacon at shutdown, but
the post-restart-ping was still reading from disk AFTER the watchdog's
own boot-time beacon rewrite had overwritten the flag. Net effect:
every restart still came back as "unerwarteter Exit" even though the
write itself was working.

Fix: the watchdog already captures the previous beacon's
`expectedRestart` flag at startup (via `bootExpectedRestart`, since 5.0).
The post-restart-ping now threads that value in directly from
`bootWasExpectedRestart()` instead of re-reading the (already
overwritten) file. Planned restarts now show "Neustart nach …",
unplanned ones "unerwarteter Exit vor …".

## [5.20.1] — 2026-05-24

### Restart confirmation no longer cries "crash" when nothing crashed

Two small wording + classification fixes on the v5.18.0 post-restart
ping. Operator-spotted: a normal `/restart` from Telegram was coming
back as "🦊 Wieder da nach crash — …" because the graceful-restart
path forgot to flag the imminent exit as planned in the watchdog
beacon, so the next boot read `expectedRestart=false` and assumed the
worst.

- `scheduleGracefulRestart()` now calls `markExpectedRestart()` on the
  watchdog beacon just before shutdown. Every self-initiated restart
  (`/restart`, `/tts` switch, `/update`, the launchctl-self-kill route
  added in 5.18.0) is correctly classified as planned on the next boot.
- The unplanned-branch wording is softened too — "🦊 Wieder da — v…
  (unerwarteter Exit vor …)" instead of the alarming "nach crash …".
  A real recovered crash still gets a distinct phrasing from a planned
  restart, but neither shouts at the operator.

Tests in `test/post-restart-ping.test.ts` updated to pin the new
wording + assert the "crash" string never appears.

## [5.20.0] — 2026-05-24

### Reply on a background-agent result and the bot now remembers the result

Closes a quiet UX bug observed earlier today: when a background sub-agent
finished and delivered its final result to Telegram, the message landed
in your chat but never made it into the parent session's `history`
array. If you then used Telegram's reply-quote on that result and asked
a follow-up, the LLM saw the quoted block but had nothing in its own
turn-list to connect it to — and answered generically ("Was genau
meinst du? A / B / C"). You had to copy-paste the whole thing back.

Every Telegram delivery path the sub-agent layer takes now also writes
the result into the session's history:

- **Inline single message** (≤3800 chars) — written 1:1 with a banner
  prefix so the model recognises the entry as background-sourced.
- **Multi-chunk inline** (1–2 messages, up to 7600 chars) — same
  treatment; the body is kept intact, capped at 4 kB in history.
- **File-delivered** (>7600 chars → `.md` attachment) — Telegram still
  gets the full file; history gets a 4 kB preview with a clear
  `…[truncated for history at 4000 chars; the full result was
  delivered as <name>.md to the chat]` marker so the model knows where
  the rest lives.

Implementation:

- New optional `parentUserId` field on `SubAgentConfig` / `SubAgentInfo`.
  Propagated from `DispatchInput.userId` through `registerPendingAgent`
  → `entry.userId` → async-agent-watcher → the delivery layer. Cron
  sub-agents target a Telegram DM so `parentUserId = parentChatId`.
- `subagent-delivery.ts` calls a new `writeSubagentHistory()` helper at
  the success edge of every tier. Best-effort — never blocks delivery,
  never throws. Silently skipped on older boots that don't propagate
  `parentUserId` (back-compat).
- Capped at 4 kB per history entry so a chatty agent can't blow up
  `sessions.json` over time.

6 new tests pinning every tier + the back-compat skip path + the
"never throws even when the session layer would" guarantee. Total
suite 1195/1197 (the two pre-existing exec-guard test failures are
unchanged by this release).

No action needed. Existing sub-agents created before the upgrade are
unaffected; new agents from this point forward populate history.

## [5.19.0] — 2026-05-24

### Reliability now has a face — WebUI panel + live TTS switch + visible subsystem-restarts

The v5.18.0 reliability foundation was all backend. This release surfaces
it: a dedicated Reliability panel in the WebUI, a live TTS-provider
switch you can click, and the previously-silent in-process subsystem
restarts now DM you when they happen.

- **WebUI Reliability panel.** A new `Reliability` entry in the sidebar
  opens a dashboard that polls `/api/reliability` + `/api/tts` every
  8 seconds. Sections: restart-storm counter vs threshold, state-file
  health snapshot, recent incident dumps (under
  `~/.alvin-bot/incidents/`), `EXEC_SECURITY` mode, and a live TTS
  provider switcher with a Supertonic-ready badge. Responsive grid so
  it stays readable on a phone over Tailscale.

- **Live TTS switch from the WebUI.** A new `POST /api/tts` endpoint
  mirrors the Telegram `/tts` command — writes `TTS_PROVIDER` to `.env`
  atomically, refuses Supertonic when the local install isn't ready,
  refuses ElevenLabs without an API key, then triggers a graceful
  self-restart. The Reliability panel's dropdown calls it; you'll see a
  toast in the WebUI and a "🦊 Wieder da" DM on Telegram when the new
  provider is active.

- **Subsystem restarts are no longer silent.** When the runner-supervisor
  restarts the `telegram-poll` subsystem in-process (after a runner
  failure or a wedged poll), the owner now gets a one-line Telegram DM
  with the attempt number and reason. The in-process restart still
  beats the full-process route — the difference is you see it happen.

- **State-quarantine Owner-DM.** If the boot-time state-file health
  check quarantined any of `sessions.json` / `cron-jobs.json` /
  `delivery-queue.json` to a `.corrupt.<timestamp>` sibling, that's now
  surfaced to the owner via Telegram once the bot is online (separate
  message from the restart-ping so both stay readable on a phone).

- **Incident dumps auto-prune.** `~/.alvin-bot/incidents/` is now
  trimmed at boot — JSON dumps older than 30 days are removed
  automatically. The directory will never grow unbounded again, even if
  the bot never restarts for months.

- **Operator-facing endpoints.** `/api/reliability` returns the full
  drilldown (attempts array, state-file paths with reasons, recent
  incident filenames). `/api/tts` now supports both GET (status) and
  POST (switch). The `/api/status` reliability subset stays for the
  light-weight dashboard ping.

4 new tests on top of v5.18.0's 45 (1189/1191 total green; the two
pre-existing exec-guard test failures are unchanged by this release).

No action needed. Existing installs pick up the panel + endpoints on
the next `/update`. The boot-time incident-prune runs once per restart.

## [5.18.0] — 2026-05-24

### Cron jobs the bot creates now just run, and restart loops are stopped at the source

The 2026-05-24 reliability cycle. Five independent surface areas were
hardened so the day-to-day "I created a job and it ran for months" feel
comes back even after the v4.12.2 exec-allowlist tightening.

- **Cron auto-fix at the source.** When the bot creates a shell-type
  cron job whose binary isn't on the allowlist, it silently appends the
  basename (atomic write, 0600). When the command contains pipes or
  other shell metacharacters, it silently wraps the command in a
  generated `~/.alvin-bot/workspaces/<job-slug>/run.sh` (with
  `set -euo pipefail`), chmod 0755, rewrites the job command to
  `bash <wrapper>`, and allowlists the wrapper basename. Both add paths
  (the internal tool path and the `scripts/cron-manage.js` CLI) share
  the same helper. The runtime exec-guard at execution time stays in
  place as the second line of defence for hand-edited jobs.

- **Self-kill restart-loop prevented at the detector.** Before today,
  `isSelfRestartCommand()` only matched `pm2 restart/reload`. When the
  bot ran `launchctl kickstart -k com.alvinbot.app` (or `pkill`, or a
  direct `kill <our-pid>`) the OS supervisor SIGKILL'd us mid-stream and
  launchd respawn'd — a self-reinforcing loop that fired four times in
  17 minutes earlier today. The detector now covers all four shapes and
  routes every match through `scheduleGracefulRestart()` for a clean
  Telegram-offset commit before exit.

- **Restart-storm-brake.** A sliding-window counter records every
  self-restart attempt. The 4th+ attempt in a 10-minute window is
  refused — the bot stays up, the operator gets a single DM with the
  reason and a pointer to the err log, and the counter clears itself
  after 30 min of quiet. Independent of the watchdog crash-counter,
  which tracks unintentional exits.

- **Post-restart "🦊 Wieder da" ping.** Every restart that isn't a
  cold-boot (> 1 h since the last beacon write) now sends the operator
  a one-line confirmation DM after the pre-flight passes. No more
  staring at a typing indicator wondering whether the `/update` is
  through. Distinguishes a planned restart from a recovered crash in
  the message text.

- **State-file health check at boot.** Before any service touches them,
  `sessions.json`, `cron-jobs.json`, and `delivery-queue.json` are
  validated for parseable JSON, plausible root shape, and a sane size
  cap. Unparseable or oversized files are quarantined to a
  `<file>.corrupt.<timestamp>` sibling and replaced with a clean
  default so the bot starts cleanly instead of crashing on a poisoned
  state file.

- **Power-Mode opt-in in `alvin-bot setup`.** New wizard step asks the
  operator whether to enable `EXEC_SECURITY=full` for unrestricted
  shell access (recommended for solo / homelab installs). Default
  stays `allowlist` — which, combined with the auto-fix above, no
  longer requires manual maintenance for routine Owner-driven jobs.

- **Pre-Release Hardcore-Audit-Pass** is now part of `CLAUDE.md`:
  every version-bump runs a 7-point cross-module adversarial review
  (where-else / public-user edges / cross-module ripple / races /
  state-persistence / failure-modes / owner-vs-public-user asymmetry)
  before publish. The Owner-routine-ops-without-friction rule is also
  pinned: no Y/N inline-button approvals for cron-add, allowlist-add,
  or auto-wrap — silent do-the-right-thing plus a one-line notify.

- **WebUI reliability surface.** `/api/status` now embeds a compact
  reliability snapshot (restart-storm counter, state-file health,
  recent incidents, exec-security mode, TTS provider) and a new
  `/api/reliability` endpoint exposes the full detail for a future
  dashboard panel. `/api/tts` exposes the current TTS provider and
  Supertonic install state.

This release pulls 45 new tests on top of the 5.17.2 baseline (1185
green total, the two pre-existing exec-guard tests are unchanged by
this release).

No action needed for existing installs — `/update` pulls the new
defaults and the auto-fix helper retroactively repairs the allowlist
the next time the bot creates a shell job. Restart-storm and
state-health are live from boot.

## [5.17.2] — 2026-05-24

### Bot no longer echoes internal scaffolding back into your chat

Hardens the message pipeline against a class of confused-model failures
where the underlying LLM would emit its own context-window scaffolding
(`<system-reminder>` blocks the model usually only sees in its prompt)
as if it were a normal assistant reply — and the bot DM'd it verbatim.
Worse, the poisoned turn was persisted, so the bridge that catches the
SDK up after a mid-task restart re-fed the scaffolding into the next
prompt, locking the session into a self-reinforcing loop.

Four independent layers now block it:

- The Claude SDK stream strips `<system-reminder>` blocks (and the stray
  `]` artifact observed alongside them) before yielding any text. If
  the entire chunk was scaffolding it is skipped, never shown.
- The handler chunk loop re-strips on the way out, so any future
  provider that doesn't pre-sanitize is still safe.
- The history-write boundary in `services/session.ts` sanitizes once
  more — a poisoned turn can never reach disk.
- The bridge-message builder sanitizes assistant turns before they are
  rendered back into the next prompt, breaking the self-poisoning loop
  at the input boundary. The preamble was also reframed so the prompt
  no longer ends in a stray `]` that primed the `]` echo.

When the sanitizer fires it now logs a warning AND dumps raw+cleaned
samples to `~/.alvin-bot/incidents/<timestamp>-system-reminder-leak.json`
so the trigger can be reproduced. If the model's entire reply was
scaffolding (very rare), the bot sends a transparent fallback line
instead of going silent under a `👍` reaction.

Regression test re-runs the real incident payload from a poisoned
`sessions.json` through every layer; the bridge self-poisoning test
asserts the same poisoned turn rendered twice is identical and never
grows scaffolding back in. 19 new tests, total suite 1138/1140 (the
two pre-existing exec-guard failures are unchanged by this release).

No action needed. Live sessions are not affected by the upgrade.

## [5.17.1] — 2026-05-23

### Local TTS no longer falls back to Edge mid-conversation

Patches the v5.17.0 Supertonic path against two real-world failures the
maintainer hit within hours of the release. Both produced the same
visible symptom — voice replies that were Supertonic one moment and
Edge the next — because the bot was silently falling back to Edge on
every Supertonic crash.

First fix: the Python step crashed with `ValueError: Found N
unsupported character(s)` on extremely common inputs — German opening
quotes (`„`), variation-selector emoji modifiers (the U+FE0F that
arrives invisibly with hearts and ticks), keycap modifiers (the U+20E3
that turns `1` into `1️⃣`), and every regular emoji. The bot now
normalises orthographic variants to their ASCII equivalents (smart
quotes → `"`/`'`, em-dash → hyphen, ellipsis → `...`, non-breaking
space → space) and strips the Unicode ranges Supertonic refuses,
before the Python call ever sees the text.

Second fix: the per-call timeout was a hard 60 s. A 600-word humanize
reply takes well over a minute to synthesise. Replies that long now
get a synthesis budget scaled to text length (capped at 240 s), so the
slow ones reach completion instead of being killed and falling back.

Verified on the maintainer machine against six representative inputs
(DE quotes, multiple emoji + keycap, 2000-char paragraph, plain ASCII
control) — all produce Supertonic audio now, none fall back. Tests
cover each of the originally-failing character classes.

No action needed.

## [5.17.0] — 2026-05-23

### Local on-device TTS with Supertonic — no API key, no Cloud round-trip

A new voice path that runs entirely on the operator's machine. After
this release the bot prefers a local Supertonic synth (ONNX, 31
languages, default voice M2) over the existing Edge / ElevenLabs
options whenever its Python venv is present — no API key, no per-
character cost, no audio leaving the box.

How it lands on existing installs: on the first bot start after the
upgrade, if `python3` is on PATH and Supertonic hasn't been set up
yet, the bot kicks off a background install (~30-60 s wall-clock) of
the supertonic pip package into `~/.alvin-bot/supertonic/venv`. The
HuggingFace model cache (~150 MB) is pulled lazily on the first
actual TTS call so the install step stays small. As soon as the
install lands and the next TTS call happens, the new path takes over
automatically — no `/restart` needed. If Python isn't available, or
pip can't reach the registry, or anything else blocks the install,
the bot stays on Edge TTS and records the reason in
`~/.alvin-bot/supertonic/.install-state.json`. Run
`alvin-bot supertonic doctor` any time to see the current state, or
`alvin-bot supertonic install` to retry manually.

A user who explicitly set `TTS_PROVIDER=edge` or `=elevenlabs` keeps
their choice — the auto-promote only fires when the value came from
the default-resolution path.

Also in this release: ffmpeg is now bundled as an optional
dependency (`ffmpeg-static`), so the WAV→OGG conversion the Telegram
voice-message path needs works even on machines without a system
ffmpeg install.

No action needed. The default path is silent and degrades gracefully.

## [5.16.1] — 2026-05-23

### Bot keeps full credential visibility for the operator

A polish follow-up to v5.16.0's outbound scrubber. The redactor was
running unconditionally, so even when the bot DM'd the operator with
contents of their own .env or with logs that included keys, those
keys came back as `⟨redacted⟩` — useless when the whole reason for
the ask was "show me what's in there". The owner DM is now exempt:
when the bot sends to the configured operator's own chat the scrubber
steps aside; for every other recipient (group chats, anyone in
AUTH_MODE=open / pairing-approved users) the scrubber stays on, so a
non-owner cannot accidentally end up with a token in their hand.

Also: the cron-restart-resilience tests no longer pin a specific
timezone in their date fixtures, so the cross-platform CI matrix
passes under any runner timezone (verified Europe/Berlin, UTC,
America/Los_Angeles, Asia/Tokyo). The TZ env workaround in the
workflow is gone.

No action needed.

## [5.16.0] — 2026-05-23

### The bot now restarts itself when a frozen polling worker would otherwise leave it stuck

A multi-front stability release. The biggest visible change is that the bot can
no longer silently sit in a "looks alive but isn't" state. Each critical
sub-system (the Telegram polling worker, the cron scheduler, the daily trends
snapshot) now stamps a heartbeat into the watchdog beacon every time it
completes a unit of work. When a sub-system goes quiet past its expected
cadence — say, the polling worker stops fetching updates after a network blip
but the process itself is still running — the watchdog notices, exits the
process, and lets the OS supervisor restart it. The polling worker has its own
in-process supervisor too: a single timeout or 409 conflict no longer takes
the bot off the air, the supervisor retries with exponential backoff and only
exits to the OS if every retry has failed. Bot startup itself now has a 20 s
deadline, so an unreachable Telegram during boot can no longer hang the bot
indefinitely either.

Background tasks that you triggered are now reported back in your own
language. When an asynchronous sub-agent finishes, the bot sends a short
status line ("✅ name · duration · tokens") followed by a plain-language
summary as a thread reply to the message that started the task. Cron jobs are
unaffected — they keep the existing detailed banner. Background-task
descriptions are also more honest now: the bot stopped saying "running in the
background, you'll get a report" unless it actually started a background task.

`/btw` mid-task steering is more reliable. Two issues are fixed: a race
window between the steer channel being created and the SDK query handle
landing made some early `/btw` notes look unsteerable when they were actually
fine, and a missing prompt instruction meant the model sometimes acknowledged
a `/btw` addition but didn't integrate it into the reply. Both are closed.

`/verbosity short` and `/verbosity full` actually take effect now. The change
was being persisted but the running SDK session kept the old prompt; the new
verbosity now starts a fresh context on the next turn and the model gets
clearer instructions.

Hardened the surface that ships out to Telegram. A new outbound scrubber
redacts well-known credential patterns (bot tokens, AI-provider keys, GitHub
PATs, AWS keys, JWTs, embedded-URL passwords, KEY=value secret assignments)
before any text reaches the chat. Normal prose, e-mail addresses, IBANs and
identifiers like cron-job ids pass through untouched. A new input-sanitiser
silently drops Telegram updates whose payload is more than 2× the documented
limit (defence in depth against a custom Bot-API client). A per-user
token-bucket rate-limit (30/minute for non-owners, owner is exempt) protects
the bot from a shared chat suddenly going noisy.

Cross-platform groundwork: `which claude` is now `where claude` on Windows so
the SDK can find the native binary there too, and a GitHub Actions matrix
builds and tests the bot on macOS, Linux and Windows on every PR.

No action needed on update. Everything described here switches on by default
and is fully backwards compatible.

## [5.15.2] — 2026-05-22

### Self-update never gets stuck on a post-update notification

Hardens the update flow against a cosmetic message error blocking the restart.
After a successful update the bot sends a short "what's new" note built from the
changelog; that text can contain characters Telegram treats as formatting, and
when the send was rejected the error skipped the restart — so the new version
was installed but the old one kept running until the next restart. The note is
now plain text (so it cannot be rejected) and, more importantly, sending it is
fully best-effort: whether or not the note goes through, the update now always
restarts to apply the new version. No action needed.

## [5.15.1] — 2026-05-22

### Self-update no longer fails on machines that run with NODE_ENV=production

Fixes two ways `/update` (and the 6-hour auto-update) could fail on a
source/git install. If the bot runs under `NODE_ENV=production` — common for a
launchd or systemd service — the update step that reinstalls dependencies was
quietly skipping the dev tooling the build needs, so the rebuild failed with a
"could not find a declaration file" error. The update now always installs that
tooling for the build regardless of environment. Separately, a lockfile that a
previous install had rewritten could make the update's fast-forward pull abort;
the update now clears that automatically before pulling. Both paths are covered
by tests. No action needed — your next update just works.

## [5.15.0] — 2026-05-22

### "/btw" now tells you *why* it can't steer — and every task logs a one-line start

Two clarity improvements. First, when `/btw` can't steer the running task, the
reply now names the actual reason instead of a vague catch-all: which provider
is active (live steering needs the Claude SDK), whether steering is switched
off, or that the task came from a file, photo or document — those run on a path
that doesn't support mid-task steering yet. So you immediately know whether to
just resend as a normal message, rather than guessing something broke. Second,
the bot now prints one short line when a task starts — its source (text, file,
photo, voice, video) and the provider handling it — so a quiet, healthy bot is
still easy to follow in the logs.

## [5.14.0] — 2026-05-22

### "/btw" now actually nudges a running task mid-stream (live steering works)

You can drop a quick `/btw <note>` while a task is running and it now
reaches the task *as it works*, without restarting it — the way it was
always meant to. Previously the bot processed updates strictly
one-at-a-time, so a mid-task message wasn't even received until the
current task finished, which made steering (and a mid-task `/stop`)
silently do nothing. Updates are now processed concurrently, while normal
messages are still handled in order per chat (no surprises). Live
steering uses the Claude-SDK provider.

## [5.13.0] — 2026-05-19

### Even models without tool-calling can run background sub-agents now (opt-in)

Completes provider-agnostic background sub-agents. Set
`SUBAGENT_PROMPTED_TOOLS=1` and a model that has no native
function-calling can still use the safe tools (shell, file read/write,
web) inside a background sub-agent via a strict prompted protocol —
capped, and with an honest "best-effort" label if the model can't follow
it reliably (no endless loops, no pretending it worked). It runs through
the exact same execution guardrails. The flag is **off by default**, so
nothing changes for anyone who doesn't opt in.

## [5.12.0] — 2026-05-19

### Background sub-agents now run on non-Claude providers too

Background sub-agents are no longer Claude-only. If you run a non-Claude
provider that supports function-calling — Groq, OpenRouter, NVIDIA NIM,
Gemini, or a capable Ollama model — Alvin now spawns a detached worker
that runs the same safe tool loop (shell, file read/write, web) behind
the same guardrails, and delivers the result through the normal path
(instant push, restart-safe reconciliation) exactly like the Claude
path. If you run the Claude-SDK provider, **nothing changes** — that
path is byte-for-byte identical. Models without function-calling still
get the clear graceful message from the previous release; full support
for them is the opt-in follow-up.

## [5.11.0] — 2026-05-19

### Background tasks fail gracefully instead of erroring when the Claude CLI is missing

Groundwork toward provider-agnostic background sub-agents. Internally,
how a background task is executed is now chosen through a clean seam
(more execution backends land in follow-up releases). User-visible part:
on an install without the `claude` CLI, asking for a background task no
longer surfaces a raw error — it returns a clear message through the
normal result path explaining that background sub-agents currently need
the Claude-SDK provider, while every other command keeps working. If you
run the Claude-SDK provider, **nothing changes** — the existing
background-agent path is byte-for-byte identical.

## [5.10.0] — 2026-05-19

### You can now set how detailed Alvin's answers are, and "/btw" is a real command

Two everyday-control improvements:

- **`/verbosity short | medium | full`** — pick how much detail you want
  back. *Short* gives you just the outcome (and, if it failed, why) in plain
  words — ideal when you only need "did it work". *Full* is thorough.
  *Medium* is the balanced default and is **exactly the current behaviour**,
  so nothing changes unless you ask. Works the same across every AI
  provider, and on every messenger.
- **`/btw <note>` is now an actual command.** Before, typing `/btw …` did
  nothing (it was never wired up). Now it's a real, discoverable command on
  Telegram and the other messengers: while a Claude-SDK task is running it
  nudges that task without restarting it; when it can't live-steer it tells
  you so plainly instead of silently swallowing the message.

## [5.9.0] — 2026-05-19

### Health alerts stay quiet unless something is genuinely wrong

The self-monitoring trend alert used to cry wolf after every update: a
burst of restart-related log noise made it report "errors climbing" even
though nothing was actually broken. It is now disciplined — it ignores
the expected churn around a version change, only speaks up when a real,
recurring, actionable error keeps happening on a stable version, and
never repeats the same alert within a day. Two more fixes round out the
release:

- **macOS sudo/keychain setup now works.** The setup wizard step that
  saves your account password to the Keychain previously hung and failed
  with a confusing "exit null" on macOS. It now stores reliably
  (including passwords with special characters).
- **Old browser screenshots clean themselves up.** Screenshot/temp files
  are now auto-removed after 7 days (was 30), and the cleanup finally
  covers the temp folder where command-line screenshots actually land
  (previously missed). A new `/cleanup` command runs it on demand and
  reports how much was freed and from where — so an external cleanup cron
  is no longer needed.

## [5.8.1] — 2026-05-19

### Clearer package page and version-pinned changelog links

Documentation-only release — no code, configuration, data, or behavior
change. The npm package page (README) is refreshed and now renders
correctly on npm itself: the architecture overview is plain text instead
of a diagram format npm cannot display, and the "What's new" line and
roadmap reflect the current 5.x line. Separately, every release on the
project site now links to its own exact version-pinned changelog instead
of a generic package page.

## [5.8.0] — 2026-05-19

### Smaller, hardened distribution build

The published npm package now ships a hardened, build-optimized
distribution instead of raw compiled files. The bot behaves exactly as
before — same commands, same configuration, same data, same
performance — but the shipped code is leaner and more tamper-resistant.
Local development is completely untouched: `npm run build` still
produces the normal readable build for contributors, and an internal
debug build remains available for troubleshooting. As always, verified
with a fresh global install on a clean separate machine.

## [5.7.0] — 2026-05-19

### Background-task results now arrive the instant the task finishes, and survive a restart

Detached background-task results were delivered by a 15-second polling
loop whose in-memory state could diverge across a bot restart, so a task
that finished around a restart could keep its result undelivered with
nothing shown in chat. Delivery is now pushed the moment the task's
process exits — through an always-on local callback guarded by a
per-boot token — and a startup reconciliation pass drains anything that
completed while the bot was down. An atomic deliver-once marker
guarantees the push, the polling backstop, and reconciliation never
double-deliver, and a cancelled task can no longer be resurrected. The
polling loop is kept only as a backstop for timeouts and stalled tasks.
No configuration is required and nothing changes for existing setups.

## [5.6.2] — 2026-05-19

### Long background-task results now reliably arrive in chat

A background task that produced a long final answer could finish
successfully and yet never be delivered — you would see nothing and
have to ask for the status by hand. Alvin now recognises a finished
background task no matter how long its result is, so the answer always
lands in your chat the moment the task completes.

## [5.6.1] — 2026-05-18

### Background-task results stay in the chat

Results from scheduled and background tasks now appear directly in
the chat as before. Only an output long enough to span more than two
messages comes as a single attached file instead — keeping your chat
tidy without ever splitting a result across a wall of messages. No
"shortened" notices on normal-sized results; you stay in control of
when something gets saved as a file.

As always, verified with a fresh-install + stress test on a clean
separate machine.

## [5.6.0] — 2026-05-18

### Background-task reports are now clean and to the point

When a scheduled or background task finishes, Alvin now sends you
just the result — a tight header (what ran, how long, tokens, success)
and the actual answer — instead of a wall of its working notes. If a
result is unusually long, the chat message stays short and the
complete output comes attached as a file, so you never lose anything
and never have to scroll through a transcript.

### A clear confirmation when you stop something

Press ⛔ Stop (or use /cancel) while Alvin is genuinely working and
you now get a short, plain confirmation in your language that the work
was halted — not just a fleeting button flash. If nothing was running,
Alvin still tells you that honestly instead of pretending it stopped
something.

### Health alerts that don't cry wolf

Alvin's self-monitoring now judges its health on recent activity, so a
one-off rough patch no longer keeps it flagging a problem for weeks. A
real issue still raises a flag promptly; a quiet, healthy bot stays
quiet.

As always, this shipped after a full multi-pass review and a
fresh-install + stress verification on a clean separate machine.

## [5.5.0] — 2026-05-18

### The ⛔ Stop button now responds instantly — and honestly

Stopping a task is now crisp and truthful. The moment a task finishes,
the Stop button disappears, so you're never tapping a control for
something that's already done. And the feedback always matches reality:
if you tap Stop while Alvin is genuinely working, it stops and says so;
if the task had already completed, Alvin tells you that plainly instead
of implying it cut something short. If you hit Stop in that brief moment
while an answer is being prepared, that answer is now held back — "I
stopped it" means nothing more arrives. Anything Alvin had already
shown you stays exactly as it was.

### Fewer false alerts — smarter health monitoring

Alvin's self-monitoring got a lot more trustworthy. A planned restart
or an update is no longer mistaken for a problem, and the daily health
summary only raises a flag when there's real evidence something is
actually wrong — so the alerts you do get are ones worth reading.
Routine background housekeeping no longer shows up as noise.

As always, this shipped after a full multi-pass review and a
fresh-install + stress verification on a clean separate machine.

## [5.4.0] — 2026-05-18

### Smoother background tasks — and Alvin always tells you the truth

When you ask Alvin to go off and do something longer — research, a
multi-step job — it now reliably hands control straight back to you so
you can keep chatting while it works, then delivers the result as its
own message. And if a task does need to run inline for a moment,
Alvin says so honestly instead of implying you're free when you're
not. Talking to Alvin now feels exactly like working with a colleague
who's already on it: you're never left waiting or guessing.

### Safer out of the box — with your full power one setting away

Alvin now ships with sensible, safe defaults so a fresh install is
solid for everyone, including people who just want to try it quickly.
Nothing about Alvin's capabilities has been taken away: if you want
the full, unrestricted superadmin experience it's a single documented
setting — your machine, your rules, your call. The new `.env.example`
spells out every option, including the "power" switches, in plain
language. You stay completely in control.

### Reliability & robustness across the board

A broad pass to make Alvin steadier on long-running setups: no more
duplicate messages under load, cleaner interplay between stopping,
steering and background work, more accurate scheduling for custom
cron expressions, and tighter handling of edge cases throughout.
Verified end-to-end with a stress test on a clean separate machine.

### A leaner, tidier install

Roughly 20 MB lighter to install, a calmer first-run experience
(optional features that aren't configured no longer look like
errors), better behavior on Windows and for non-German voice notes,
and a zero-config friendly default so a minimal setup just works.

As always, this shipped only after a full multi-pass review and a
fresh-install + stress verification on a clean second machine.

## [5.3.0] — 2026-05-18

### Talk to Alvin while it's working — no more interrupting yourself

Until now, a message you sent while Alvin was busy had only two
outcomes: it waited in line until the current task finished, or it
threw the task away and started over. Now there's a third, much
better one. Drop a quick *"btw, also check the other folder"* or
*"actually, use the live data not the test data"* mid-task and Alvin
takes it in **while keeping everything it has already done** — exactly
like leaning over to a colleague who's already working and adding one
more thing. No restart, no lost progress.

### A quiet 📨 so you know it landed

When your mid-task note is picked up, Alvin reacts with a 📨 on your
message and, the first time per task, adds one short line so you know
it was taken on board without derailing what it's doing. After that
it's just the reaction — no chatter, no spam while it works.

### Stop still always wins

Steering never overrides stopping. The ⛔ Stop button, `/cancel` and
`/stopall` behave exactly as before and always take precedence — a
mid-task note can never bring back a task you've stopped. If you'd
rather keep the old "queue it until done" behaviour, you can switch
steering off with a single setting; it's on by default for everyone.

Live steering works with the Claude engine; with other AI providers
your message safely falls back to the previous queue behaviour, so
nothing breaks. As always, this shipped only after a full end-to-end
verification and a stress test on a clean separate machine.

## [5.2.0] — 2026-05-17

### Stop now actually means stop — instantly

You could always type `/cancel`, but it rarely felt like anything
happened: the bot kept "thinking" for a while and the answer often
still arrived. That's fixed. When you stop a task, the bot now bails
out immediately instead of quietly trying its backup brains one after
another in the background. Press stop — it stops.

### A one-tap ⛔ Stop button on every running task

No more remembering or typing a command mid-thought. While Alvin is
working you'll see a ⛔ Stop button right there on the status message.
One tap, the task ends, the button flips to "⛔ Gestoppt". Mistyped
your request or changed your mind? It's one thumb away — and it works
the same in Telegram, Slack, Discord and WhatsApp.

### New `/stopall` — pull the plug on everything

`/cancel` (and the button) stop the task you're watching but let
already-running background helpers finish and report back later. When
you want a hard reset — *"forget all of it"* — use the new `/stopall`:
it stops the current task, terminates the background sub-agents it
spawned, and clears anything queued behind it. Nothing comes back to
surprise you afterwards.

Under the hood this was a careful, well-tested hardening of the whole
stop path — verified end-to-end on a clean second machine before
shipping. No setup or config needed; it just works.

## [5.1.8] — 2026-05-17

### Interrupted jobs auto-resume after a controlled restart

If an auto-update, `/update` or `/restart` interrupted a job while it
was running, the bot used to give up on that run and ask you to
re-trigger it manually. Now, when the interruption was a controlled
restart (not a crash) and it happened within the last 15 minutes, the
job is re-run automatically on the next tick after the bot is back.
Crash-loops are deliberately excluded — a job that keeps crashing the
bot will never resume itself — and each interrupted run is resumed at
most once. You can opt a single job out with `autoResume: false`.

### Sub-agents report a result, not a play-by-play

Finished sub-agents (background tasks, cron jobs) no longer dump a long
step-by-step recap of how they thought and what tools they called. You
get the compact status line (success/failed · duration · tokens) plus
the actual result — nothing more. Cron reports still arrive in full;
only the meta-narration is gone, since the orchestrator processes the
real output anyway.

## [5.1.7] — 2026-05-17

### Scheduled jobs no longer run twice after a restart

If two bot instances were briefly alive at the same time — for example
right after an auto-update or a restart, while the old process was still
shutting down — a scheduled job could fire twice within the same minute.
One real case: a weekly report job sent its email, then sent an empty
duplicate 30 seconds later. The old overlap guard only worked inside a
single process, so a second instance never saw the first one's claim.

Jobs are now claimed with a small cross-process lock before they run, so
only one instance can execute a given job for a given slot. A crashed
run can't wedge the lock — it is reclaimed automatically once the owning
process is gone. Manual `/cron run` honours the same lock. No
configuration changes; existing jobs just stop double-firing.

## [5.1.6] — 2026-05-15

### Planned restarts really stop counting as crashes now

v5.1.5 added a flag that marks self-updates and `/update` / `/restart`
as intentional so they don't inflate the crash counter. Half of it
didn't actually work: the code that reads the saved state back on the
next boot rebuilt it field by field and silently dropped that very
flag, so the crash detector never saw it and planned restarts were
still scored as crashes. (The other half of v5.1.5 — not counting
benign log lines as errors — was unaffected and has been working.)

This release makes the read path preserve the flag, so a planned
restart is now genuinely treated as a clean exit. The state-parsing
logic was pulled into a tested pure function so the read-back round
trip can't silently regress like this again.

### What this means for you

If you updated to 5.1.5 and still saw the crash count tick up by one
each time the bot updated itself, that stops now. The error-trend
half of the 5.1.5 fix already worked; this completes the crash half.

## [5.1.5] — 2026-05-15

### Health monitor no longer cries wolf about its own log lines

The bot watches its own error and crash counts over 24 hours and warns you if they keep climbing. It turned out that monitor was largely measuring itself. v5.1.4 fixed one harmless thing being mislabelled as an error; this release fixes the root cause so the whole class of false alarms stops.

Two things were wrong:

- **"Errors" counted every line written to the error log — even harmless ones.** Some parts of the bot deliberately write benign status notes there (a self-healing message-format retry; the alert system logging whether its own notification went out). Every one of those inflated the error count, including the alert system flagging its own output — a loop that kept the warning alive no matter what. The error count now ignores these known-harmless notes and still counts every real error, including ones it has never seen before.
- **Every intentional restart was counted as a crash.** When the bot updates itself or you run `/update` or `/restart`, it exits on purpose and is immediately relaunched. The crash detector saw the quick exit and scored it as a crash, so simply shipping updates made the crash graph creep upward and trip the alarm. Planned restarts are now recognised as planned and no longer counted as crashes.

### What this means for you

If you saw a "trend anomaly: errors/crashes steadily climbing" alert shortly after updating, that was the monitor reacting to the update itself, not a real regression. After this release the trend reflects reality. No action needed — update as usual.

## [5.1.4] — 2026-05-15

### No more false "errors are climbing" health alerts

The bot has a self-health monitor that watches how many error lines show up in its logs over the last 24 hours and warns you if the count keeps creeping up. One harmless, self-correcting event was confusing it: when a message can't be sent with rich formatting, the bot quietly retries it as plain text and the message still gets delivered. That retry was being written to the error log, so the health monitor counted every single one as a real error — producing a slow, fake "your error rate is rising" alarm even though nothing was actually wrong.

That benign retry is now logged as normal activity instead of as an error, so the 24-hour error trend reflects reality. Message delivery itself is unchanged — this only fixes where the event is recorded.

## [5.1.3] — 2026-05-13

### Stability hardening — runtime-header fallbacks

Small robustness pass on the runtime-header block that the bot uses to introduce itself. If the embedded version or install-path can't be resolved for any reason, those lines are now simply omitted instead of literally telling the model "I am version unknown". Node version and platform are still emitted so the bot can still describe its environment.

### Cleaner release notes for global users

Polished the release notes that surface inside the Telegram `/update` summary so non-technical users see a clear, English summary of what changed. No behaviour changes.

## [5.1.2] — 2026-05-13

### Bot now reports its real version after every update

If you asked the bot "what version are you?", it used to look up the version that someone had written into the project documentation by hand — which went out of date the moment a new release shipped. The bot would happily say "I'm v4.19.2" even though the running process was already on something newer.

Now the bot reads its version straight from the running code on disk, every single turn, regardless of provider (Claude SDK, Codex CLI, Groq, Gemini, OpenAI, Ollama, OpenRouter, NVIDIA NIM). It also includes its install path, Node version, and platform — so when something goes wrong you can ask the bot to introduce itself and get an honest answer.

### What does this mean for you

- Update via Telegram `/update` and the bot will pick up the new version on the next restart.
- Ask the bot in chat "which version are you running?" — the answer is now grounded in reality, not in a docs file that might be stale.

## [5.1.1] — 2026-05-13

### Audit baseline cleanup — 16 → 6 vulnerabilities via safe fixes

Ran `npm audit fix` (no `--force`) on the lockfile. Cleared 10 of 16 findings: the protobufjs top-level (axios pollution, basic-ftp CRLF, fast-uri path traversal, xmldom DoS, hono jsx/cache, ip-address XSS, postcss XSS, follow-redirects header leak, and the protobufjs vuln at the package root — but **not** the nested copy inside `libsignal-node`). No source-code changes; build + privacy clean; vitest passes 542/543 (one pre-existing flaky port-binding test, unrelated to this change).

### Remaining 6 — documented as known/deferred baseline

These are tracked, not blocking, with an honest paper trail in `.github/workflows/security-audit.yml`:

| Sev | Package | Why deferred |
|---|---|---|
| critical | `protobufjs@6.8.8` (nested) | Pinned by `@whiskeysockets/libsignal-node` (custom git fork). Forcing an `npm override` to newer protobufjs would touch Signal-Protocol parsing → high breakage risk. Awaits whiskeysockets upstream. **Only reachable with `WHATSAPP_ENABLED=true`** — dead code for the typical user. |
| high | `electron <=39.8.4` | devDependency only — affects the DMG-build path, not the npm-CLI runtime users. Major bump 35→42 = breaking, scheduled separately when DMG release is next prepared. |
| moderate × 4 | derived | `@anthropic-ai/sdk` + `claude-agent-sdk` await Anthropic minor; `baileys` + `libsignal-node` carry the protobufjs upstream lag. |

### CI audit workflow — documents the baseline

`.github/workflows/security-audit.yml` keeps `continue-on-error: true` on the audit step (PRs not blocked on the documented baseline), but the workflow header now spells out exactly what's tracked, why each is deferred, and what would unblock removal of the soft-fail flag.

### Stale Dependabot PRs closed

The 8 open Dependabot PRs (#10–#17) were opened against the pre-audit-fix lockfile state. Closed all with a redirect comment; Dependabot will reconsider against the cleaned-up baseline at its next scheduled Monday run. Closing them avoids tedious merge-conflict resolution on each.

### What didn't change

- No source code edits — pure dependency-tree pruning
- No npm audit fix `--force` — every applied bump is within existing semver ranges
- Bot runtime behavior identical to 5.1.0; verified on .75 (Pre-Flight green, all permissions detection works, bot online via launchd)

## [5.1.0] — 2026-05-13

### Permissions Wizard — guided one-and-done macOS setup

macOS' TCC framework architecturally **refuses** to let any app grant Full Disk Access / Automation / Accessibility programmatically — only the user can flip those switches in System Settings. There is no API, no AppleScript trick, no sudo bypass for a normal Node CLI.

What we can do — and what 5.1.0 does — is make the toggling experience painless:

1. **Detect** every permission's current state (sudo + FDA + Automation + Accessibility)
2. For each missing one: **open the exact right Settings pane**
3. **Poll for the toggle** every 2 s, up to 60 s per permission
4. **Verify** and move to the next
5. End with a clear summary of granted / skipped / still-missing

```bash
alvin-bot permissions status           # quick state-of-the-world
alvin-bot permissions wizard           # interactive guided setup
alvin-bot permissions open <id>        # open one Settings pane
```

The wizard also bundles **sudo-password storage** (Keychain on macOS, encrypted file on Linux) as the first step, so users get a single upfront onboarding flow instead of separate prompts later. Run-once intent: no more piecemeal permission requests at runtime.

### New detections

- **Automation (Apple Events)** — probes via `osascript -e 'tell application "System Events" to ...'`, catches error code 1743 / "Not authorized" to distinguish denied vs granted. Used by Apple Mail, Apple Notes, Calendar skills.
- **Accessibility** — now distinguishes "cliclick not installed" from "permission denied", so the wizard suggests the actually-correct fix (install via brew vs. toggle in Settings).

### Doctor integration

`alvin-bot doctor` now uses the same wizard service for its macOS-permissions section — shows all 4 permissions instead of just FDA, points to the wizard for any missing ones.

### Telegram `/setup` integration

`/setup` keyboard gets a new **🛡️ Permissions Wizard (Mac)** button — surfaces the current 4-permission status and the CLI / WebUI commands to actually run the wizard. Mobile UX is intentionally read-only: the wizard needs to drive System Settings panes on the host, which only the local CLI/WebUI can do.

### README — comprehensive CLI section refresh

Documented commands added: `tools`, `provider`, `permissions`, `browser`, `status`. Plus the full env-var opt-out table for Self-Preservation Phase 1 + 2 (`ALVIN_DISABLE_*`, `ALVIN_DEADMAN_THRESHOLD_SEC`, etc.). The README CLI section was missing six commands shipped in 4.23 — 5.0; now matches reality.

### Honest about what we CAN'T do

The README and wizard are explicit about this: macOS TCC permissions cannot be granted by an app. The wizard opens panes and verifies after; the user toggles. No tricks, no entitlement bypass attempts.

## [5.0.0] — 2026-05-13

### Self-Preservation Phase 2 — the bot now reasons about its own failures

The bot now uses **its own AI provider** (whichever one you have configured — claude-sdk, codex-cli, groq, gemini, openai, ollama/gemma4, openrouter, nvidia-nim) to analyze why it failed and where it's heading. Two new features, both event-driven, both opt-out.

The major-version bump reflects a conceptual shift: AI is now part of the bot's **operational loop** about itself, not just the user-facing chat. The feature surface is backwards-compatible — existing setups keep running unchanged; everything new is additive and opt-out via env vars.

#### AI-driven Self-Diagnosis on bundles (feature 3I)

When the watchdog brake fires and 2F writes a forensic bundle, 3I picks up the analysis at **the next successful bot start**:

1. Bot starts → Pre-Flight runs → 3I scans `~/.alvin-bot/diagnostics/` for unanalyzed bundles
2. For each bundle without a `.analysis.md` sidecar, send it (clipped to ~12 KB, head+tail) to the active AI provider via `provider.query()`
3. AI returns a structured 5-line response — `HYPOTHESIS / ROOT_CAUSE_CATEGORY / REMEDIATION / CONFIDENCE / EXPLANATION`
4. Result is written as `.analysis.md` sidecar AND delivered to the operator via 1D Telegram DM

The 5-line plain-text format was chosen over JSON because **JSON parsing reliability is uneven across providers**, especially with smaller models. The format is hard to mess up — and we parse it with a tolerant regex.

Live verified on Apple Silicon with `claude-sdk`: bundle from the actual brake earlier that day was analyzed correctly — AI identified "skills-reload triggered repeated graceful restarts that tripped the brake", suggested the documented recovery command (`rm crash-loop.alert && alvin-bot launchd install`), all within ~9 s.

**Safety policy v1**: the AI's suggested remediation is shown to the operator but **NEVER auto-applied**. This is intentional — we want a track record of accurate suggestions before granting the bot any self-modifying power.

#### Predictive Maintenance via Trends (feature 3J)

A second daily timer writes a one-line JSON snapshot of bot health to `~/.alvin-bot/state/trends.jsonl`:

```jsonl
{"ts":"2026-05-13T...","uptime_s":86400,"rss_mb":105,"heap_mb":33,"crashes_24h":0,"diag_24h":0,"errors_24h":3,"provider":"claude-sdk","version":"5.0.0"}
```

After 7 days of data accumulate, every daily snapshot also triggers an AI **anomaly-detection pass** over the last 30 days. Output is a strict 3-line format — `ANOMALY: ... / SEVERITY: warn|critical / SUGGESTION: ...`, or just `ANOMALY: NONE` when nothing's concerning.

Live verified with synthetic 30-day memory-leak data (RSS climbing 100 → 220 MB linearly): `claude-sdk` correctly identified the leak, classified as `critical`, suggested heap-snapshot capture via `kill -USR2`. Confirmed end-to-end with file flag + Telegram DM delivered via 1D.

#### Provider-agnostic by design

Both 3I and 3J use the existing `provider.query()` abstraction — the same code path the bot uses for normal chat. Switching provider via `alvin-bot provider switch <key>` (added in 4.24.0) automatically retargets 3I + 3J as well. No provider-specific code in either feature.

**Tested provider**: `claude-sdk` (the "B1" test path). The `offline-gemma4` test path (B4) — stress-test of prompt design against a small-context local model — is deferred to a follow-up session; the deferral and its acceptance criteria are documented in the (gitignored) project `BACKLOG.md`.

#### Performance budget held

All new code runs detached, on long timers, or at startup-only. Steady-state cost: zero. The startup analyzer (3I) only runs if unanalyzed bundles exist — typically 0 on a healthy run. The trends collector (3J) runs once every 24 h with a 60 s warmup after startup.

Measured on Apple Silicon (vs. 4.26.0 baseline):
- RSS idle: **+0 MB** (modules loaded lazily via dynamic `import()`)
- Cold-start ready: **unchanged** (both modules load post-startup, fire-and-forget)
- 3I per-bundle latency on claude-sdk: ~9 s
- 3J per-analysis latency on claude-sdk: ~10 s

#### New env vars

```
ALVIN_DISABLE_SELF_DIAGNOSIS=true        # disable 3I
ALVIN_DISABLE_TRENDS=true                # disable 3J
ALVIN_TRENDS_INTERVAL_HOURS=24           # default
ALVIN_TRENDS_AI_AFTER_DAYS=7             # min history before AI kicks in
```

(All Phase-1 env vars from 4.26.0 continue to work — `ALVIN_DISABLE_SELF_PRESERVATION=true` still kills everything.)

#### Why a major version bump

Semantically, the bot is now **closing a loop on itself**: it observes its own forensics, asks an AI to interpret them, and reports back. That's a conceptual line worth marking. Nothing breaks — existing users update with `npm install -g alvin-bot@latest` and the new behaviour just appears in their next failure analysis.

#### Files added

- `src/services/self-diagnosis.ts` — 3I startup analyzer + analyzeBundle()
- `src/services/trends.ts` — 3J snapshot collector + analyzeTrends()
- `src/index.ts` — two fire-and-forget dynamic imports after Pre-Flight

## [4.26.0] — 2026-05-13

### Self-Preservation Phase 1 — four new resilience features, zero hot-path cost

Bot now **survives more failure modes** and **alerts you when it can't survive them**. All four features run event-driven or on low-frequency timers — no hot-path overhead, measured RSS +4 MB / cold-start +81 ms vs baseline on a real Apple Silicon Mac (within the +5 MB / +2000 ms tolerance budget).

#### Pre-Flight Sanity Check at startup (feature 1A)

In parallel at boot, the bot now checks: (1) Telegram `getMe`, (2) AI provider `isAvailable()` — provider-agnostic via the existing Provider interface, works equally for `claude-sdk` / `codex-cli` / `groq` / `gemini` / `offline-gemma4` / etc., (3) SQLite `PRAGMA quick_check` on the embeddings DB, (4) Disk space ≥ 1 GB. Fire-and-forget — startup is **not** delayed; results land ~1 s after `Alvin Bot started` with severity-tagged output:

```
🩺 ✅ Pre-Flight: all checks ok — 986ms total
   ✓ telegram     bot=@AlvinMBAM4_bot (405ms)
   ✓ ai-provider  claude-sdk reachable (922ms)
   ✓ sqlite       embeddings DB integrity ok (43ms)
   ✓ disk         53.28 GB free (37ms)
```

Per-check timeouts (3 s / 5 s / 10 s / 2 s) bound the cost. Critical findings will feed Phase 2's auto-diagnostic (already wired). Opt-out: `ALVIN_DISABLE_PREFLIGHT=true`.

#### Critical-Event Cross-Channel Notify (feature 1D)

When the bot hits a state it can't recover from on its own — watchdog crash-loop brake engaged, repeated Telegram 409s, all providers dead, disk critically low — it now alerts the operator through a **fallback chain that doesn't depend on the bot's own platform being healthy**:

1. **`~/.alvin-bot/CRITICAL.log`** — durable audit trail, always written first. Plain text, dated, machine-readable.
2. **macOS native notification** via `osascript` — visible immediately on the user's desktop.
3. **Telegram DM to admin** via `curl` — synchronous in exit-imminent contexts so the alert lands before `process.exit()` kills any pending I/O.

The synchronous-vs-detached distinction matters: detached child processes get killed by macOS+launchd before they finish their fork-and-exec when the parent exits within a few ms. The watchdog brake explicitly uses `blockTelegram: true` to spawnSync the curl POST and confirm the HTTP response code. Plain-text body (not Markdown) so shell-command `suggestedAction`s with `"`, `&&`, etc. don't trigger Telegram's `Bad Request: can't parse entities` error. Opt-out: `ALVIN_DISABLE_CRITICAL_NOTIFY=true`.

#### Zombie Dead-Man-Switch (feature 2E)

Bot writes a unix-timestamp heartbeat to `~/.alvin-bot/heartbeat.txt` every 60 s. A **separate, tiny launchd LaunchAgent** (`com.alvinbot.deadman`) wakes every 5 min and checks the heartbeat — if older than 10 min, the watcher fires `launchctl kickstart -k gui/$UID/com.alvinbot.app` to force-restart.

Catches the failure mode the in-process watchdog **cannot** see: process is alive but frozen (event-loop deadlock, blocked I/O, native-binding hang). The in-process watchdog can't detect its own death — that's a contradiction in terms — so the external observer is the only architecturally sound solution.

Threshold overridable for testing: `ALVIN_DEADMAN_THRESHOLD_SEC=60` (default 600). End-to-end verified on a real Mac: `kill -STOP` froze the bot at PID X, watcher detected stale heartbeat 700 s old, kickstart fired, fresh PID Y came up within 8 s. CPU cost of the watcher: 0.017 %.

#### Auto-Diagnostic Logs-Collector (feature 2F)

On any critical failure, the bot now writes a structured forensic Markdown bundle to `~/.alvin-bot/diagnostics/<timestamp>-<category>.md` containing:

1. Event detail + suggested action
2. Process state (PID, RSS, heap, uptime, node version, platform, argv)
3. Non-secret environment vars (PATH, PRIMARY_PROVIDER, FALLBACK_PROVIDERS, WEB_*, …)
4. Last 200 lines of `alvin-bot.err.log`
5. Last 200 lines of `alvin-bot.out.log`
6. Watchdog state (`~/.alvin-bot/state/watchdog.json`)
7. System tool inventory (`node`, `npm`, `brew`, `pm2`, `codex`, `claude`, `yt-dlp`, `ffmpeg`, `wacli`, `agent-browser`)
8. Disk space (`df -h ~/.alvin-bot`)
9. PM2 status (if PM2 installed — the same kind of state that bit us in 4.25.1)

Bundles are ~18 KB each, capped at 50 retained files (oldest pruned automatically). The Telegram DM from feature 1D now includes the bundle path so the operator can immediately `cat` or scp it.

This is also the data input the 5.0.0 AI-Self-Diagnosis (feature 3I) will feed to a sub-agent for automated analysis. As a 4.26.0 deliverable it stands on its own as "human-readable forensic dump".

Opt-out: `ALVIN_DISABLE_AUTO_DIAGNOSTIC=true`.

### Bundle wacli (WhatsApp CLI) with conditional opt-in

`wacli` (https://wacli.sh, brew tap `steipete/tap`, v0.8.1, ~25 MB Go binary) is now part of `BOOTSTRAP_TOOLS` — but with a **hybrid install condition** that avoids forcing it onto users who don't use WhatsApp:

- **If `wacli` is already installed** → bootstrap runs `brew upgrade wacli` (treated like any other bundled tool).
- **If `WHATSAPP_ENABLED=true` is set in `.env`** → bootstrap installs via `brew install steipete/tap/wacli`.
- **Otherwise** → silent skip with dimmer `·` icon: `· wacli (WhatsApp CLI) skipped (not opted in)`.

License: see https://wacli.sh — alvin-bot does not bundle wacli, only invokes the user's brew, the user remains the licensee. macOS only (no Linux build upstream; bootstrap skips on Linux automatically).

### Opt-out env vars summary

For users who want minimal footprint:

```
ALVIN_DISABLE_SELF_PRESERVATION=true   # skip ALL Phase-1 features
ALVIN_DISABLE_PREFLIGHT=true           # skip Pre-Flight only
ALVIN_DISABLE_CRITICAL_NOTIFY=true     # skip cross-channel notify
ALVIN_DISABLE_DEAD_MAN=true            # skip heartbeat writer
ALVIN_DISABLE_AUTO_DIAGNOSTIC=true     # skip diagnostic bundles
ALVIN_DEADMAN_THRESHOLD_SEC=600        # tune dead-man threshold (default 10 min)
```

### Performance budget verified on real hardware

End-to-end measurements on Apple Silicon Mac (.75 test box):

| Metric | Baseline 4.25.1 | 4.26.0 | Δ | Tolerance |
|---|---|---|---|---|
| Cold-start ready (median, throttled) | 5023 ms | 5104 ms | +81 ms | +2000 ms |
| Cold-start ready (unthrottled, 1st run) | 2189 ms | 2170 ms | -19 ms | +2000 ms |
| RSS idle steady-state | ~102 MB | 106.4 MB | +4.4 MB | +5 MB |
| CPU idle | 0.0 % | 0.0 % | 0 | +0.1 % |
| Log dir growth | stable | stable | n/a | <1 KB/s |

All five metrics within tolerance.

## [4.25.1] — 2026-05-13

### Fixed: `alvin-bot launchd install` now persists the PM2 cleanup

The launchd installer correctly called `pm2 delete alvin-bot` when migrating a previously-PM2-managed bot to launchd — but `pm2 delete` only mutates the live process list, **not** the persisted `~/.pm2/dump.pm2`. On the next login, PM2's startup script would call `pm2 resurrect`, see alvin-bot in the stale dump, and re-spawn it. The resurrected pm2-managed instance and the new launchd-managed instance then both polled Telegram with the same `BOT_TOKEN`, racking up `409 Conflict: terminated by other getUpdates request` errors until one of them crashed.

The fix adds `pm2 save --force` immediately after `pm2 delete`, so the dump file reflects reality. `--force` is needed because plain `pm2 save` refuses to persist an empty list with a warning. Other PM2-managed processes (e.g. unrelated projects in the user's PM2) are preserved correctly — the save only mutates entries for alvin-bot.

### How it surfaced

A maintainer's local Mac that had been running alvin-bot under PM2 *before* the v4.x migration to launchd hit `409 Conflict` errors out of nowhere — turns out PM2 had been silently resurrecting alvin-bot on every reboot for months (106,000+ restart counter), invocations without args printed help and exited, PM2 restarted it instantly, the cycle continued. Today the cycle accidentally aligned with a Telegram getUpdates call, causing the conflict. Cleanup steps:

```bash
pm2 delete another-pm2-project                      # any other PM2 entries
pm2 save --force                         # empty dump
launchctl unload ~/Library/LaunchAgents/pm2.youruser.plist 2>/dev/null
rm -f ~/Library/LaunchAgents/pm2.youruser.plist
pm2 kill
npm uninstall -g pm2
rm -rf ~/.pm2
```

After this patch, future users migrating from PM2 to launchd via `alvin-bot launchd install` won't accumulate the stale-dump time bomb — the cleanup is persistent immediately.

### End-to-end verified on a real Apple Silicon Mac (.75)

1. Reproduced the broken pre-condition: install PM2, add alvin-bot to its dump via `pm2 start … && pm2 save`. `grep -c alvin-bot ~/.pm2/dump.pm2` → 7 hits.
2. Ran `alvin-bot launchd install` with the fix applied.
3. Verified `grep -c alvin-bot ~/.pm2/dump.pm2` → **0 hits**. PM2 live list is also empty.

The installer's output now ends with `🧹 Removed alvin-bot from pm2 and persisted (other pm2 projects left intact).` and includes a `💡 pm2 now has zero managed processes. You can remove it entirely:` hint when no other pm2 projects remain.

## [4.25.0] — 2026-05-13

### Auto-bootstrap media tools (yt-dlp + ffmpeg) on every setup and update

`yt-dlp` and `ffmpeg` are now installed (or upgraded) automatically whenever the user runs `alvin-bot setup` or `alvin-bot update` — silently, one line of status per tool, non-fatal on failure. These are the two foundational media tools used by the transcribe / download / video-generate / voice skills and they break or need patches often enough that "whatever brew shipped last week" is materially worse than "current latest" for ordinary users.

**No bundling. No redistribution.** alvin-bot shells out to the user's own system package manager (Homebrew on macOS, `pipx`/`apt` on Linux). The user remains the licensee under each tool's upstream license. License notes are recorded in `BOOTSTRAP_TOOLS` for transparency:

- **yt-dlp** — Unlicense (public-domain equivalent)
- **ffmpeg** — LGPL-2.1+ (Brew default build; no GPL-only codecs are pulled in by the default formula)

### Behaviour

| User action | What happens |
|---|---|
| `alvin-bot setup` (interactive) | At the very start, before any prompt: tries `brew install yt-dlp` + `brew install ffmpeg` (or `pipx`/`apt` on Linux) if missing, runs `brew upgrade` if already installed. One status line per tool. |
| `alvin-bot update` | After `npm install -g` / `git pull` completes, the same bootstrap step runs — so an update of the bot also refreshes media tools. |
| `alvin-bot setup --no-bootstrap-tools` | Skips the bootstrap step entirely. For users who want full manual control. |
| Failure (no brew, no network, etc.) | Setup/update continues; user sees a `⚠` warning line; bot still works for everything that doesn't need media tools. |

### Implementation detail that matters

On macOS, `brew upgrade <pkg>` does **not** automatically refresh the formula database before checking versions — meaning a user on stale brew (haven't run `brew update` in weeks) would see "already at latest" even when newer versions exist. The bootstrap step runs `brew update --quiet` **once per session** before any brew upgrade, memoized so multiple tools share the refresh cost. Cost: 5-30 s on first invocation, zero on subsequent calls.

### End-to-end verified on a real Apple Silicon Mac

Two scenarios run manually after the code change:

1. **Both tools present, brew formulas stale.** Bootstrap ran `brew update --quiet` + `brew upgrade yt-dlp/ffmpeg`. yt-dlp's brew-listed and upstream latest both confirmed at `2026.03.17` (verified via `brew info yt-dlp` and `yt-dlp -U`'s self-check) — code correctly reported "up-to-date". ffmpeg's pre-existing dylib-link failure (`libx265.215.dylib not found`) was incidentally fixed by `brew upgrade ffmpeg`, which is the exact kind of bit-rot the bootstrap is meant to catch.

2. **yt-dlp deliberately uninstalled, run `alvin-bot update` again.** Bootstrap correctly detected `command -v yt-dlp` → not found, ran `brew install yt-dlp`, reported "installed (2026.03.17)" (note the differentiated wording vs "up-to-date" — same versioning convention as `npm install` vs `npm update`). Post-state: `/opt/homebrew/bin/yt-dlp` back and working.

## [4.24.0] — 2026-05-12

### Switch AI providers cleanly after install — new `alvin-bot provider` command

End-to-end UX gap closed: until now, the only way to change AI provider after the initial install was to re-run the full setup wizard (Telegram, AI provider, tools menu, …) or hand-edit `~/.alvin-bot/.env` without guidance. Both options leak past the user's actual intent — "I just want to change the AI provider, please."

```bash
alvin-bot provider list                 # show all providers + per-provider install/key status
alvin-bot provider show                 # detailed info on the current provider
alvin-bot provider switch <key>         # interactive setup + targeted .env merge
alvin-bot provider doctor               # validate current provider's auth/key against its API
```

`<key>` accepts both canonical slugs (`claude-sdk`, `groq`, `gemini-2.5-flash`, …) and short aliases (`claude`, `codex`, `gemini`, `nvidia`, `gpt`, `gemma`).

### Provider-specific guided setup, reused from the setup wizard

The same install-the-CLI / run-OAuth-login / prompt-for-API-key logic that the setup wizard already uses for first-time configuration is now reusable. Refactored into three small primitives:

- **`configureClaudeSdkInline()`** — detects native `claude` binary, offers `curl -fsSL https://claude.ai/install.sh | sh` if missing, runs `claude auth login --claudeai` if not authenticated, re-validates.
- **`configureCodexCliInline()`** — new. Detects `codex` binary, offers `npm install -g @openai/codex` if missing, runs `codex login` (ChatGPT OAuth) if not authenticated.
- **`configureApiKeyInline(provider, opts)`** — for groq / nvidia / gemini / openai / openrouter. Tests an existing key in `.env` first, prompts for a new one if invalid, retries once, "save anyway" path for offline first-time setup.

### Safe `.env` merge — never destroys, never duplicates

The new `setEnvKey()` + `commentEnvKey()` helpers do **byte-preserving** in-place updates of `~/.alvin-bot/.env`:

- `PRIMARY_PROVIDER` line is **updated in place**, not appended. No duplicates.
- The new provider's API key (if any) is set the same way — also handles "previously commented out by an earlier switch" → uncomment + update.
- The previous provider's API key is **parked, not deleted**: e.g. `# GROQ_API_KEY=...  # parked 2026-05-12: was for groq, kept for rollback`. Rollback by un-commenting; secret never lost.
- Everything else (`BOT_TOKEN`, `ALLOWED_USERS`, `FALLBACK_PROVIDERS`, `WEB_*`, custom comments, blank lines, custom additions) is preserved byte-for-byte.

### Auto-restart under launchd

After a successful switch, on macOS with `~/Library/LaunchAgents/com.alvinbot.app.plist` present, the command offers to run `launchctl kickstart -k gui/$UID/com.alvinbot.app` to apply the new provider without manual intervention. Default is Yes; press Enter to restart, `n` to defer.

### Robustness: `ask()` no longer crashes on stdin EOF

`readline.question` throws `ERR_USE_AFTER_CLOSE` once stdin emits 'close' (piped input drained, terminal closed). The `ask()` helper now catches this and returns `""` — same semantic as "user pressed Enter on a prompt", so the calling wizard step picks its default. Fixes a class of crashes for users running setup wizards over `expect` / SSH-piped install scripts / CI.

### Manual end-to-end test on a fresh-ish Apple Silicon Mac (.75)

- `provider list` shows all 8 providers with per-provider status (CLI installed, API key present, OAuth-authenticated)
- `provider show` reports current = `claude-sdk` with model + key var + fallbacks
- `provider doctor` validates auth/key against the live API
- `provider switch groq` with a deliberately invalid key: API returns 401, "Save anyway" path is taken, `.env` is mutated correctly: `PRIMARY_PROVIDER=groq`, `GROQ_API_KEY=…` set, all other vars + comments preserved byte-for-byte
- `provider switch claude-sdk` (rollback): existing claude binary detected, auth-status warning surfaced, login prompt declined, `.env` mutated: `PRIMARY_PROVIDER=claude-sdk` back, `GROQ_API_KEY` line **commented and stamped** `# GROQ_API_KEY=...  # parked 2026-05-12: was for groq, kept for rollback`, every other line untouched

## [4.23.1] — 2026-05-12

### Fixed: install.sh now installs via npm (no more git clone collision)

The `install.sh` one-line installer was broken on two counts, both pre-existing and surfaced when end-to-end testing the 4.23.0 install path on a fresh machine:

1. **Data-dir collision with `~/.alvin-bot/`.** The installer cloned the alvin-bot source into `~/.alvin-bot/`, but that path is the per-user **data directory** in 4.x (env, memory, logs, cron-jobs.json), created and owned by the bot itself. On any second run, `git pull` in the data dir failed (`fatal: not a git repository`).

2. **`npm install --omit=dev` + `npm run build` doesn't work.** TypeScript needs `@types/better-sqlite3` (a `devDependency`), so building from source with prod-only deps fails with 6 `TS7016` errors.

Both bugs only mattered on the `install.sh` path; the canonical `npm install -g alvin-bot` was unaffected.

**Fix:** `install.sh` now does exactly what the README's Quick Start documents — bootstraps Node (the new code from 4.23.0) and then `npm install -g alvin-bot`. Dropped: `git clone`, `npm run build`, the symlink dance into `/usr/local/bin`, and the `INSTALL_DIR`/`REPO_URL`/`BIN_LINK` machinery. The canonical install lives at `/opt/homebrew/lib/node_modules/alvin-bot/` (Apple Silicon) or `/usr/local/lib/node_modules/alvin-bot/` (Intel/Linux); `~/.alvin-bot/` is reserved exclusively for the bot's own data files.

The new flow also:
- Retries with `sudo` if a non-sudo `npm install -g` fails with EACCES (common on Linux where node lives in `/usr/lib`).
- Skips the auto-launch of `alvin-bot setup` in non-interactive shells (i.e. raw `curl … | bash`) so we don't dump an interactive wizard into a pipe.
- Adds a `BASH_SOURCE` guard so the file can be `source`d for unit-testing helpers without firing `main`.

End-to-end verified on a fresh Apple Silicon Mac (macOS 26.4.1) with node and brew uninstalled: `curl … | bash` bootstraps Node via brew prompt, installs alvin-bot via npm, lands the binary in `/opt/homebrew/bin/alvin-bot` v4.23.0 — without ever touching `~/.alvin-bot/`.

## [4.23.0] — 2026-05-12

### Installer: bootstrap Node automatically + optional capability tools

The first-time install path no longer assumes Node (or Homebrew, or anything beyond a normal Mac/Linux shell) is already in place, and the setup wizard now offers to install a curated set of universally useful CLIs in one step.

- **`install.sh` — auto-bootstraps Node.** New `ensure_brew()` + `ensure_node()` helpers replace the old "fail with manual install instructions" behaviour. On macOS, if Homebrew is missing, the installer offers to install it via the official `curl|bash` (with `NONINTERACTIVE=1` so it doesn't pause for ENTER). On Debian/Ubuntu, offers Node 22 via the NodeSource repo (with sudo confirmation). Non-interactive shells fall back to clear manual-install messages. Skipping the prompts is always an option. `eval "$(brew shellenv)"` is invoked after a fresh brew install so the rest of the script sees brew on PATH.

- **`alvin-bot setup` — optional tools step.** After Telegram + AI provider config, the wizard now lists eight commonly useful tools and lets the user install just the ones they want with a comma-separated picklist (or `a` for all missing, `n` to skip). Already-installed tools are marked `✓` and skipped automatically. Install failures don't block bot setup — the user gets a working bot regardless.

- **New `alvin-bot tools` command.** Re-runnable later: `alvin-bot tools list` shows what's installed, `alvin-bot tools install` opens the same interactive menu the setup wizard uses.

Tools offered (curated, all generic — none require maintainer-specific creds or workflows):

| Tool | What it unlocks |
|---|---|
| Playwright + Chromium | Tier 1 stealth browser automation (web-research, social-fetch, browser-manager) |
| agent-browser CLI | Tier 1.5 token-efficient web automation (Vercel Labs) |
| ffmpeg | Audio/video processing for media-transcribe, video-generate, voice features |
| ImageMagick | Image conversion & manipulation for image-generate and visual skills |
| Pandoc | Markdown ↔ PDF/DOCX/HTML conversion for document-creation skill |
| ripgrep | Fast file/code search (`alvin-bot search`, code-aware skills) |
| jq | JSON parsing for helper scripts |
| himalaya | Multi-account IMAP/SMTP email CLI (configure after install) |

## [4.22.3] — 2026-05-12

### Fixed

- **Codex CLI provider:** close stdin after spawn so `codex exec` returns immediately. The provider opened stdin as a pipe but never sent EOF — `codex exec` then printed `Reading additional input from stdin...` and blocked until the 120 s spawn timeout, which surfaced on the chat side as "no reply" / empty Telegram messages whenever a user picked Codex CLI as their AI provider. Single-line fix (`proc.stdin.end()`) plus an explanatory comment. ([#1](https://github.com/alvbln/Alvin-Bot/pull/1))

### macOS UX: surface Full Disk Access gaps under launchd ([#2](https://github.com/alvbln/Alvin-Bot/pull/2))

When the bot runs as a LaunchAgent, macOS TCC binds permissions to the `node` binary's real Cellar path. Anything the bot spawns (Codex CLI, file-reading skills, plugins touching `~/Documents`/`~/Desktop`) inherits that identity, and without Full Disk Access on `node` those reads silently fail — no dialog appears under launchd. Fresh public users were hitting this without knowing what was going on; the failure mode (spawned tools producing empty output) looked like a bot bug.

- **`alvin-bot launchd install`** now detects FDA after a successful install and either confirms it's granted or prints a prominent warning with: the exact Cellar path to add (resolved via `realpathSync`), the `open "x-apple.systempreferences:..."` command for the right pane, the `launchctl kickstart` command to apply the grant, and a heads-up that `brew upgrade node` invalidates the grant (TCC binds to the versioned Cellar path).
- **`alvin-bot doctor`** gains a "macOS permissions" section that shows FDA status and how to fix — useful after `brew upgrade node` rolls forward to a new Cellar path and the old grant becomes stale.
- **README** "A note on permission prompts" is extended with the launchd/FDA caveat, linking to the macOS Setup Guide PDF.

## [4.22.2] — 2026-05-11

### Docs

- **README:** add a short, calm note in Quick Start explaining that permission prompts during first-time tool use come from Alvin himself (via the underlying agent runtime), not a third party — and that approving them expands what the bot can do autonomously while the user stays in control. Aimed at less experienced users who might otherwise dismiss legitimate prompts as suspicious.

## [4.22.1] — 2026-05-09

### Fixed

- **Cron scheduler:** prevent duplicate catch-up runs for already-attempted slots. When a daily/scheduled job fired but crashed before completion, the boot-time catch-up would re-fire it hours later (e.g. an `0 8 * * *` job retried at 11:00 after a 10:51 reboot). The catch-up now skips jobs whose current schedule slot has already been attempted: for cron expressions, `lastAttemptAt >= mostRecentPastTrigger` short-circuits the rewind; for interval schedules, `now - lastAttemptAt < intervalMs` does the same. Crashed runs from *previous* slots within the 6h grace window still catch up as before. New `prevCronRun()` helper in `cron-scheduling.ts`. ([#cron-catchup-bug](src/services/cron-scheduling.ts))

## [4.22.0] — 2026-05-05

### 🧠 Memory architecture overhaul: pluggable providers + smart inject

Public users without `GOOGLE_API_KEY` (the v4.20–v4.21 default for embeddings) now get a working indexed memory store out of the box. The embeddings layer is refactored behind a provider interface with four backends auto-detected at startup:

| Tier | Provider | Setup | Cost | Dim |
|---|---|---|---|---|
| 1 | Gemini (`gemini-embedding-001`) | `GOOGLE_API_KEY` | free tier | 3072 |
| 2 | OpenAI (`text-embedding-3-small`) | `OPENAI_API_KEY` | ~$0.02 / 1M tokens | 1536 |
| 3 | Ollama (default `nomic-embed-text`) | `ollama pull nomic-embed-text` | free, local, private | 768 |
| 4 | **FTS5 (BM25 keyword)** | nothing | free | n/a |

The FTS5 fallback is the headline: SQLite's built-in full-text-search virtual table with BM25 ranking. No API key, no network, no setup. Indexes the same chunks as the vector providers (`MEMORY.md`, daily logs, project files, hub memory, asset index) and ranks matches by relevance. Excellent for proper-noun and exact-term lookups (project names, commands, error messages); weaker than vector search for synonyms and conceptual paraphrase queries — but available everywhere.

**Upgrade path.** A user starts on FTS5 (no keys needed). Later they set `GOOGLE_API_KEY` in their `.env` → next bot start detects the schema mismatch via `meta.embedding_model`, drops the FTS5 table, initialises the vector schema, and reindexes. Same in reverse. All seamless, no manual steps.

Override the auto-detection with `EMBEDDINGS_PROVIDER=gemini|openai|ollama|fts5|auto` (default `auto`).

### ✂️ MEMORY.md no longer bulk-injected into every system prompt (when SQLite is populated)

Pre-v4.22, `MEMORY.md` (typically tens of KB of curated long-term knowledge) and the last two daily logs were plain-text-injected into the system prompt on **every turn**. With a populated SQLite store, the same content is available via the smaller, query-targeted `searchMemory()` retrieval — much smaller prompts, much more relevant context.

New `MEMORY_INJECT_MODE` env var:

- `auto` (default) — sqlite when the store has indexed entries, else legacy
- `legacy` — pre-v4.22 behaviour, full plain-text inject every turn
- `sqlite` — never plain-text-inject `MEMORY.md` or daily logs (force smart mode regardless of store state)

Always plain-text injected regardless of mode: `identity.md` (L0) and `preferences.md` (L1) — these are tiny by design and contain always-on facts that semantic search may miss for short or generic queries. Recommended pattern: keep critical "never X" / "always Y" rules in `preferences.md`, let the bulk knowledge live in `MEMORY.md` and be retrieved on demand.

For users still on the legacy monolithic `MEMORY.md` setup (no `identity.md`, no `preferences.md`), auto mode kicks in only after the SQLite store is populated — until then, plain-text injection of `MEMORY.md` continues to work as before. Zero-touch upgrade.

### 🔇 Quieter logs for missing keys

The `⚠️ Embeddings init failed: Google API key not configured` warning is gone — that startup line is now `ℹ️ Memory provider: fts5-bm25 (keyword-local). Initial index will run on first use.` Public users without Gemini no longer see a scary warning that suggested the bot was broken when in fact it was working correctly.

### 🩺 `alvin-bot doctor` Memory section expanded

Reports the active provider, dimension, indexed entry/file counts, last-reindex timestamp, and effective inject mode. For not-yet-initialised stores it predicts which provider will run on first start so users can confirm the auto-detection picked what they expected.

```
  Memory:
  ✅ Provider: gemini-embedding-001 (vector-cloud, 3072-dim)
     3827 entries / 316 files indexed, 48.8 MB on disk
     Last reindex: 25 h ago
     Inject mode: sqlite (auto)
```

### Architecture

- New: `src/services/embeddings/` directory — `provider.ts` (interface), `vector-base.ts` (shared vector logic), `gemini.ts`, `openai.ts`, `ollama.ts`, `fts5.ts`, `auto-detect.ts`, `index.ts` (facade)
- New: `src/services/memory-inject-mode.ts` — env resolver
- Updated: `src/services/memory-layers.ts`, `src/services/memory.ts` — gate plain-text injection on inject mode
- `src/services/embeddings.ts` is now a thin re-export shim — all existing imports keep working

### Tests

- 24 new tests across FTS5 provider, auto-detection, and inject-mode resolver
- All 535 existing tests still pass (one pre-existing port-binding flake in `web-server-integration.test.ts` is unrelated)

## [4.21.0] — 2026-05-04

### 🌐 New skill: Agent Browser (Tier-1.5)

Adds a new bundled skill, `skills/agent-browser/SKILL.md`, that teaches the bot to use the `agent-browser` CLI when it's available. Agent Browser is a [Vercel Labs](https://github.com/vercel-labs/agent-browser) tool that exposes pages as accessibility-tree snapshots with `@e1`, `@e2`, … refs — interactions cost ~200–400 tokens per turn instead of parsing rendered HTML, which is roughly 90 % cheaper than a Playwright/Puppeteer-driven flow.

The skill is **opt-in by install, not by config**: it only activates when `command -v agent-browser` succeeds. No new dependency in `package.json`, no postinstall hook, no extra disk on a fresh install. Existing browser strategies (Tier 1 Stealth, Tier 2 CDP, Tier 3 Extension) keep working untouched and remain the right tool for stealth scraping, logged-in personal accounts, and watch-along flows.

The bundled `Browser Automation` skill (`skills/browse/SKILL.md`) was updated to route the bot to the Agent Browser skill first when the binary is on the PATH and the task is interactive (click/fill/extract on cooperative pages).

`alvin-bot doctor` shows a new `Browser tools:` section reporting whether agent-browser is installed, and gives the one-liner install command if not:

```
npm i -g agent-browser && agent-browser install
```

The first command pulls the Node CLI; the second downloads a private Chrome-for-Testing build into `~/.agent-browser/`. Together about 240 MB — that's why we don't bundle it.

No code changes in the bot's core pipeline. Existing users notice nothing unless they install the CLI.

## [4.20.2] — 2026-05-04

### 🛡️ Security: Web UI loopback by default + Slack caller allowlist

Two real attack surfaces closed.

**Web UI binds to 127.0.0.1 by default.** Previous versions called `server.listen(port)` with no host argument, which Node interprets as "listen on all interfaces". Combined with an empty `WEB_PASSWORD` (which the login route silently treats as "anyone can log in"), this meant any device on the same LAN could log into the bot's Web UI and reach every authenticated endpoint — user list, memory contents, model switch, the WebSocket chat, etc. New default: bind to `127.0.0.1`. To restore LAN access, set `WEB_HOST=0.0.0.0` explicitly in `.env`. If both `WEB_HOST=0.0.0.0` and an empty `WEB_PASSWORD` are present, the bot logs a loud warning on startup.

**Slack caller allowlist.** New `SLACK_ALLOWED_USERS` env var: comma-separated list of Slack user IDs allowed to talk to the bot (DMs, @mentions, slash commands). Empty list keeps the legacy behaviour — any workspace member can interact, which is safe iff the workspace is private to the operator. To find your Slack user ID: open your profile in Slack → "..." → "Copy member ID", or just message the bot once and read the line `[slack] caller discovered: user=U… — to lock the bot to specific users, add to .env: SLACK_ALLOWED_USERS=U…` from the logs (we log each unique caller once when the allowlist is empty).

**`alvin-bot doctor` now reports both.** New `Web UI:` and `Slack:` sections flag insecure combos and show whether an allowlist is active.

No schema or behaviour changes for users who already have `WEB_PASSWORD` set or only use the bot via Telegram. Telegram allowlist (`ALLOWED_USERS`) is unchanged.

## [4.20.1] — 2026-05-03

### 🛡️ Hardening for the v4.20.0 SQLite migration

The v4.20 migration is fully automatic on first start, but a few things could go wrong on user installations that the maintainer instance never hits. v4.20.1 plugs each of them.

- **Lazy native binary load.** `better-sqlite3` is now `require()`-d inside `embeddings.ts`, not at module import time. If the prebuilt isn't available for the user's platform and a build-from-source fails (exotic Node version, missing toolchain, glibc mismatch), the bot logs a single clear warning with the exact rebuild command, and **keeps running** — only semantic memory search is disabled until the user fixes their install. Previously this would have crashed bot startup.
- **Pre-flight disk-space check.** Migration refuses to start unless the volume holding `~/.alvin-bot/memory/` has at least 2× the source JSON's size free (covers source + target + WAL during the transaction). Skipped migration leaves the JSON intact for retry on the next boot once space is free.
- **Progress logging.** On indexes larger than ~5 000 entries, the migration logs `…migrated N / M entries (P %)` every 5 000 rows so the user can see it isn't stuck.
- **Corrupt JSON recovery.** If `JSON.parse` of `.embeddings.json` throws, the file is moved aside to `.embeddings.json.broken.<timestamp>` and the next bot start treats this as a fresh install (rebuild-from-source on first search). No more boot-loop on a damaged index.
- **`alvin-bot doctor` shows memory health.** New "Memory:" section reports: native binary loadable, vector-store entry count + size, or — for not-yet-migrated installs — the legacy JSON's size and a hint that the next start will migrate.
- **Cleanup on failed migration.** WAL/SHM sidecars are removed alongside the half-written `.embeddings.db` so the next attempt starts from a clean slate.

No schema or API changes — drop-in over v4.20.0.

## [4.20.0] — 2026-05-03

### 🚀 Embeddings: JSON → SQLite

**Why.** The vector index `~/.alvin-bot/memory/.embeddings.json` had grown to **146 MB**. Every bot start parsed the whole file (slow boot, large heap), and every reindex iteration rewrote the entire 146 MB blob to disk. With ~3 800 entries the corpus is still small enough that linear-scan cosine similarity is fine, but the JSON serialisation overhead and per-write full-file rewrite were the real cost.

**Change.** New SQLite-backed store at `~/.alvin-bot/memory/.embeddings.db` (table `entries(id, source, text, vector BLOB, indexed_at)` + index on `source`). Vectors live as raw `Float32Array` BLOBs (4 B × 3072 dims = 12 KB each) instead of JSON-encoded Float64 arrays (≈ 24 KB each). Reindexing is per-chunk INSERT/UPDATE inside a single transaction — no full-file rewrite. WAL mode + 256 MB mmap, `synchronous = NORMAL`.

**Migration.** `src/services/embeddings-migration.ts` runs once on boot if `.embeddings.json` exists but `.embeddings.db` does not. Source JSON is renamed to `.embeddings.json.bak-pre-sqlite` after a successful entry-count match (idempotent, safe to re-run). On the maintainer's instance: 146 MB → 49 MB, 3 799 entries copied in 660 ms.

**Files touched.** `src/paths.ts` (new `EMBEDDINGS_DB`), `src/services/embeddings.ts` (full rewrite, drop-in same public surface), `src/services/embeddings-migration.ts` (new), `src/index.ts` (boot hook), `package.json` (deps `better-sqlite3@^12`, `@types/better-sqlite3` dev). Public API unchanged: `searchMemory`, `reindexMemory`, `initEmbeddings`, `getIndexStats` keep their signatures so callers in `engine.ts`, `web-server.ts` etc. don't change.

**Wins.** ~66 % smaller on disk. Bot boot no longer parses a 146 MB JSON. Reindex of a single file is O(log n) DELETE-by-source + transactional INSERTs instead of `JSON.stringify` + `writeFileSync` of the whole index.

## [4.19.2] — 2026-04-24

### 🐛 Fix: workspace switch produced "(no response)" format-kaskade; added empty-stream diagnostics

**Symptom.** After v4.19.1 shipped, a workspace/dir switch still produced a broken response — but this time NOT an empty stream. Claude replied with literal text like `"(no response)\n\nUser: Hallo"`, then the next turn `"(no response)\n\nUser: wie viele tools hast du…"` — a format-kaskade where every response got worse.

**Root cause.** v4.19.1's cwd-change reset set `session.lastSdkHistoryIndex = -1`. That value is consumed by `buildBridgeMessage()` in `handlers/message.ts`, which is designed for the Ollama-fallback path — its preamble frames past turns as *"the following N message(s) were exchanged with a fallback model"*. When the reset runs on a workspace switch, the ENTIRE conversation history (dozens of turns in a long-running session) gets packaged under that framing and prepended to the next prompt. If the history contains Telegram fallback artifacts (`(Keine Antwort)`, `(no response)`), Claude reads those as the "fallback model's response format" and imitates it. Each imitation lands back in history, poisoning the next bridge. Cascade.

Workspace switch is not a fallback event — it's *"new persona, new task"*. The old conversation belongs to the old workspace and must not be reframed and re-injected.

**Fix.** `handlers/message.ts`, `handlers/platform-message.ts`, and `handlers/commands.ts` (`/dir`) now set `session.lastSdkHistoryIndex = session.history.length - 1` on cwd change. `buildBridgeMessage()` returns empty for the next turn, Claude starts the new workspace with a clean slate — persona, cwd, system prompt, but no inherited conversation.

**Additionally — empty-stream diagnostics.** `src/providers/claude-sdk-provider.ts` now logs a structured JSON dump on empty-stream detection: SDK result `subtype`/`is_error`/`num_turns`/`duration_ms`, the `usage` object, the `session_id` Claude returned vs. the one we passed, model override, cwd, effort, prompt/systemPrompt/history sizes, allowedTools count, and MCP state. Lets future empty-stream events be triaged in one log line instead of guessing.

**Net effect.** `/workspace <name>` → message → clean response (no Fallback-framed preamble, no format-kaskade). `/dir <path>` → same. Next empty-stream event will come with actionable diagnostic output instead of a silent symptom.

## [4.19.1] — 2026-04-24

### 🐛 Critical fix: workspace/dir switch no longer produces empty-stream loop

**Problem:** After `/workspace <name>` (or `/dir <path>`), every subsequent SDK message returned `⚠️ Claude antwortete mit leerem Stream …` — and even switching back to the previous workspace did not recover. The v4.18.5 auto-reset only masked the symptom; the underlying cause survived the recovery attempt.

**Root cause — a two-part bug:**

1. **Prevention layer missing.** The Claude Agent SDK's `resume: <sessionId>` is bound to the cwd the session was created in: session files live under `~/.claude/projects/<cwd-hash>/<session-id>.jsonl`. When a workspace switch changes `session.workingDir`, the stored `session.sessionId` points at a file that no longer exists in the new project folder. The CLI silently returns an empty stream.
2. **Recovery layer broken.** v4.18.5's empty-stream detector correctly cleared `session.sessionId = null` on the `text` chunk — but the very next `done` chunk of the same stream carried `sessionId: resultMsg.session_id || capturedSessionId`, and the handler's `if (chunk.sessionId) session.sessionId = chunk.sessionId;` restored it. The "reset" was immediately undone by the trailing done chunk, so the next turn resumed the same dead session. Loop.

**Fix (defense in depth, three layers):**

- **Prevention** (root cause): `handlers/message.ts`, `handlers/platform-message.ts`, and `handlers/commands.ts` (`/dir`) now detect `session.workingDir !== workspace.cwd` (resp. new dir) BEFORE the query and clear `session.sessionId = null` + `session.lastSdkHistoryIndex = -1`. The next SDK turn starts fresh in the new project folder. `markSessionDirty()` is called so the clear persists across restarts.
- **Recovery**: both handlers now track a local `sessionResetInStream` flag. When the provider signals `sessionResetRequested` on a text chunk, the flag is set, and the subsequent `done` chunk's sessionId is ignored (the original resume token or the CLI's fresh-but-wrong-project fallback — neither is safe).
- **Hygiene**: `markSessionDirty()` is also called from the empty-stream reset path so the cleared sessionId is persisted immediately rather than waiting for the next trackProviderUsage debounce.

**Net effect:** `/workspace <name>` → message → works. `/workspace default` → message → works. `/dir ~/Projects/foo` → message → works. No manual `/new` needed, no credit burn, no recovery retry.

**Files:**
- `src/handlers/message.ts` — cwd-change detection, sessionResetInStream flag, done-chunk guard, markSessionDirty import
- `src/handlers/platform-message.ts` — same set of changes for non-Telegram platforms (Slack, Discord, WhatsApp)
- `src/handlers/commands.ts` — `/dir` now invalidates SDK resume anchor on cwd change

## [4.19.0] — 2026-04-24

### 🧭 Feature: per-workspace runtime overrides (effort · provider · voice · temperature · toolset)

Workspaces could already override `model` and `cwd`. v4.19.0 extends the YAML frontmatter with five more runtime fields that take effect automatically the moment a user runs `/workspace <name>`.

**New workspace frontmatter fields** (`~/.alvin-bot/workspaces/<name>.md`):

```yaml
---
purpose: my-project
cwd: ~/path/to/workdir
model: opus              # already existed
effort: high             # NEW — low | medium | high
provider: claude-sdk     # NEW — registry key; fallback chain still applies
voice: iP95p4xoKVk53GoZ742B   # NEW — ElevenLabs voice ID, or Edge-TTS voice name (e.g. "en-US-JennyNeural")
temperature: 0.3         # NEW — 0–2 sampling temperature
toolset: research        # NEW — full (default) | readonly | research
---
Persona body continues here...
```

**Toolset presets** (map to concrete `allowedTools` lists via the new exported `toolsetToAllowedTools` helper):
- `full` — provider default (`Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, Task` + MCP)
- `readonly` — `Read, Glob, Grep, WebSearch, WebFetch` (no Write/Edit/Bash)
- `research` — `Read, WebSearch, WebFetch, Grep` (pure research mode)

**Implementation:**

- `src/services/workspaces.ts` — `Workspace` interface + `parseFrontmatter` extended; numeric parsing + enum validation so a malformed value is silently dropped and the session-default wins.
- `src/handlers/message.ts` + `src/handlers/platform-message.ts` — at query-assembly time, every workspace-set field overrides the equivalent session/registry default exactly for this one query. Nothing sticky leaks across workspace switches.
- `src/providers/registry.ts` — `queryWithFallback()` gains optional `providerOverride`. When supplied AND registered, it becomes primary for that query; fallback chain still applies, and the globally active provider joins the chain as a last-resort backup so availability drops still degrade gracefully.
- `src/providers/claude-sdk-provider.ts` — passes `options.temperature` through to the Agent SDK when set.
- `src/services/voice.ts` — `textToSpeech(text, voice?)` — optional second arg; picked up from `workspace.voice` in Telegram handler's voice-reply path. Works for both ElevenLabs (Voice ID) and Edge TTS (Voice Name like `de-DE-ConradNeural`).

**Net effect:** Each workspace becomes a self-contained runtime profile. For example:
- A `prep` workspace with `model: opus · effort: high · temperature: 0.3 · voice: en-US-JennyNeural` for polished long-form work;
- A `research` workspace with `toolset: research · model: haiku · effort: low · temperature: 0.7` for cheap-and-fast web spelunking;
- A `sensitive` workspace with `toolset: readonly · provider: claude-sdk` so the agent cannot accidentally `Write` or `Bash` inside the cwd.

No data migration required — existing workspace files without the new fields keep working identically.

## [4.18.5] — 2026-04-23

### 🐛 Fix: auto-reset stale SDK sessionId on empty-stream detection

**Problem:** After a round of failed queries (common trigger: token rotation, quota exhaustion mid-turn, Claude backend dropping the session silently), the stored `session.sessionId` references a conversation that the Claude backend has already discarded. Every subsequent query passes `resume: session.sessionId` into the SDK, the backend can't find the session, and the stream terminates with zero text chunks. The v4.18.3 empty-stream detector only invalidated the availability cache — the sessionId stayed stale, so the next retry resumed the same dead session. A burn-credits loop that required a manual `/new` to escape.

**Fix:** the provider now sets a new `sessionResetRequested: true` flag on the empty-stream text chunk. Both message handlers (`handlers/message.ts` and `handlers/platform-message.ts`) listen for it and clear `session.sessionId` + `session.lastSdkHistoryIndex` immediately, so the very next user message starts a fresh SDK session instead of resuming the dead one.

**Net effect:** after a single empty-stream, the bot self-heals. One resend from the user is enough; no manual `/new`, no bot restart.

## [4.18.4] — 2026-04-23

### 🐛 Critical fix: detect Anthropic quota-exhausted responses

**Problem:** When a Claude Max subscription runs out of weekly limit or extra-usage credits, Anthropic's gateway responds to every query with a short text chunk like *"You're out of extra usage · resets 9pm (Europe/Berlin)"* — delivered as `output_tokens=0`. The SDK surfaces it as a normal assistant text message. The bot has no way to distinguish it from a real Claude response, so one of two things happens:

1. The text passes through unchanged and the user sees the raw quota message as if it were Claude's reply.
2. The text is filtered downstream (some legacy paths) and the user sees `"(Keine Antwort)"` with zero explanation.

Both outcomes hide the real cause (credits) and every retry attempt wastes more credits on nothing.

**Symptoms observed on 2026-04-23:**
- User activates `/extra-usage`, sends query → `(Keine Antwort)` or raw limit text.
- Assumes bot / workspace / token is broken, spends hours debugging.
- Actual cause: extra-usage quota silently exhausted mid-debug-session.

**Fix** (`src/providers/claude-sdk-provider.ts`):

- New `isQuotaLimitOutput(text)` detects the Anthropic-gateway quota signatures (multiple English/German variants: "out of extra usage", "weekly usage limit", "rate limit exceeded", "quota exceeded", etc.).
- In the SDK stream loop: when the first text chunk matches this pattern, rewrite it as a clear actionable hint (*"⚠️ …Top up the plan or wait for the reset…"*) AND invalidate the availability cache so the next heartbeat re-probes — but do NOT yield an `error` chunk (that would trigger fallback-cascade to Ollama and waste more credits on retries).
- In `isAvailable()`: the heartbeat probe now treats quota-exhausted output as "unavailable" in the same way it treats auth errors. Provider is marked unhealthy, bot stops trying until the next probe succeeds.

**Net effect:** bot no longer silently wastes credits after a quota limit is hit. Users see a plain, actionable message pointing at the right fix.

## [4.18.3] — 2026-04-23

### 🐛 Hotfix: 4.18.2 triggered unwanted failover to Ollama

**Bug in 4.18.2:** The empty-stream detector yielded an `error` chunk, which the registry's `queryWithFallback()` interprets as "primary provider failed" and immediately switches to the fallback (Ollama/Gemma 4). User saw `⚡ Claude (Agent SDK) unavailable — switching to Gemma 4 E4B` after every token rotation — the opposite of the intended behavior.

**Fix:** yield a `text` chunk instead of `error`. Same user-visible message, same cache-invalidation, but no failover cascade. The next CLI subprocess spawns with the fresh Keychain token automatically, and claude-sdk stays selected.

## [4.18.2] — 2026-04-23

### 🐛 Fix: silent empty-stream after OAuth-token rotation

**Problem:** After running `/extra-usage`, `/login`, or any other flow that rotates the Claude OAuth token in the macOS Keychain, the Alvin-Bot silently broke for long-lived sessions. The in-memory Claude SDK client held the old token, the CLI subprocess emitted no text chunks (the 401 was swallowed upstream), the stream terminated normally with zero output tokens, and the user saw the fallback `"(Keine Antwort)"` — with no indication that a token refresh was needed.

**Fix** (`src/providers/claude-sdk-provider.ts`): In the `result` branch of the SDK stream loop, detect the empty-stream signature (`accumulatedText === ""` and `outputTokens === 0`). When that fires:

1. Invalidate the `isAvailable()` cache so the next heartbeat probe spawns a fresh CLI subprocess that reads the current Keychain entry.
2. Yield an explicit `error` chunk with actionable text so the user sees *"…token rotation — please resend your message"* instead of a silent `"(Keine Antwort)"`.

**Applies to:** every token-rotation flow — extra-usage activation, extra-usage expiry, weekly-reset (no rotation → unaffected), manual `claude login`.

**Net effect:** Bot self-heals after token changes. A single resend on the user side is enough; no manual restart required.

## [4.18.1] — 2026-04-20

### 🔒 Privacy-Guard: pre-publish check blocks PII leaks in shipped files

Adds an automated gate that runs on every `npm publish` and prevents personal information from accidentally shipping. After the 4.18.0 privacy sanitization, this ensures it never happens again.

**New:**
- `scripts/privacy-check.sh` — scans the exact file list that `npm pack` would ship. Case-insensitive regex match against a patterns file. Any hit fails the publish.
- `scripts/privacy-patterns.default.txt` — bundled, contains only generic patterns (email shape, IP addresses, postal codes, personal task phrasings). No project or person names — so safe to ship.
- `package.json` `prepublishOnly` hook — runs the check automatically.
- `npm run privacy-check` — manual run anytime.

**Maintainer-local overrides:** Put `~/.alvin-bot/privacy-patterns.txt` with personal/project-specific patterns. That file is gitignored, never leaves your machine, and takes precedence over the bundled defaults.

**CI override:** Set `$ALVIN_PRIVACY_PATTERNS` to an absolute path; takes top precedence over both files above.

**Hardening: `.npmignore`** — added `test/` and `vitest.config.ts` to the ignore list. Previously the full test suite shipped with every npm tarball, adding ~2 MB and exposing test fixtures that sometimes referenced internal project names.

**CLAUDE.md** — documents the rule and the patterns-file lookup order so future maintenance sessions catch new cases proactively.

## [4.18.0] — 2026-04-20

### ⚡ Performance + Hardening: medium-priority cleanups from the stability audit

Completes the audit work started in 4.17.0 by addressing the remaining medium-severity findings.

**Performance (hot path):**
- **User profiles now cached in memory** (`src/services/users.ts`). Previously `touchProfile` — called on every inbound message — did a sync `readFileSync` + `writeFileSync` on disk. Now it updates an in-memory cache and schedules a debounced flush (2s batch window). A final flush runs on graceful shutdown so nothing is lost. Drops 2 blocking fs operations per message.
- **Embeddings index now cached** (`src/services/embeddings.ts`). Semantic search previously re-read + re-parsed the full on-disk index on every query (100+ MB for large memories). Now cached in memory with mtime-based invalidation — external reindexers still picked up without a restart.
- **Skills no longer force-reload every 5 minutes** (`src/services/skills.ts`). `getSkills()` used to re-scan the disk after 5min even though `fs.watch` already triggers hot-reload on change. Cache is now authoritative.

**Hardening (unbounded growth):**
- **Sub-agents map capped at 1000** (`src/services/subagents.ts`). Hits the 90%-target on overflow and evicts oldest delivered/terminated entries first. Running agents are never evicted.
- **Async-agent pending map capped at 500** (`src/services/async-agent-watcher.ts`). Same LRU strategy for orphaned `registerPending` entries.
- **Browser gateway + MCP subprocess stderr now have error handlers** (`browser-manager.ts`, `mcp.ts`). Previously a stream error would throw unhandled and could crash the node process.

**Net effect:** message path now does zero blocking fs reads/writes on the profile/skills/embeddings side. Long-running installs can't grow the in-memory state beyond the caps. No API changes.

## [4.17.0] — 2026-04-20

### 🛡️ Hardening: long-running stability audit + leak fixes

Ran a full audit of leak/stability hazards for 24/7 operation. Fixed the critical findings and added a disk-cleanup service so the bot stays lean over months of uptime.

**Fixes:**
- **WhatsApp event-listener leak on reconnect** (`src/platforms/whatsapp.ts`): Before every new socket, the previous socket's listeners are now removed and the old socket is ended. Without this, every reconnect stacked new listeners on top of old ones — causing memory growth and duplicate message processing after long sessions.
- **CDP file-descriptor leak** (`src/services/cdp-bootstrap.ts`): The log-file fd passed to the detached Chromium spawn is now closed in the parent after the child inherits it. Previously leaked one fd per browser bootstrap.
- **Heartbeat + auto-update timers now `.unref()`'d** and explicitly stopped in the shutdown handler. Prevents timers from keeping the process alive during graceful exit.

### 🧹 Feature: disk-cleanup service

New service (`src/services/disk-cleanup.ts`) that runs automatically once a day. Deletes transient files that grow without bound on long-running installs:
- Bot log rotation (>100 MB by default)
- Browser screenshots (>30 days)
- Subagent output streams (>30 days)
- `/tmp/alvin-bot/` media (>7 days)
- WhatsApp media cache (>30 days)
- CDP log file

**NEVER touched:** memory, assets, workspaces, cron-jobs, .env, session-store, delivery-queue. Memory is protected.

**Configuration via env:** `CLEANUP_LOG_MAX_MB`, `CLEANUP_SCREENSHOTS_DAYS`, `CLEANUP_SUBAGENTS_DAYS`, `CLEANUP_TMP_DAYS`, `CLEANUP_WA_MEDIA_DAYS`. Set any to `0` to disable that category.

**Telegram command:**
- `/cleanup` — show current policy + protected paths
- `/cleanup run` — trigger manual pass, get stats back

## [4.16.1] — 2026-04-20

### 🆕 Feature: /update shows release highlights

After a successful `/update`, the bot now sends a second short message with a bullet-point summary of what actually changed in the newly installed version. Pulled from the CHANGELOG entry matching the version string in the update result.

**Implementation:**
- New module `src/services/release-highlights.ts` parses the CHANGELOG block for a given version and returns at most 5 bullet points, ≤500 chars total.
- Strategy: prefer `### ` subsection headlines (feature/fix titles); fall back to first non-empty paragraph lines.
- Telegram-friendly output: plain bullets (`• ...`), no tables, no code blocks, truncates gracefully with an ellipsis line if too long.

**Result format in chat:**
```
✅ Installed v4.16.1 (was v4.16.0). Restarting...
📝 What's new in v4.16.1

• Feature: /update shows release highlights
```

## [4.16.0] — 2026-04-20

### 🚀 Feature: bot-owned CDP Chromium — no more hub dependency

**Problem for new users:** The bot's CDP strategy and the `browse` / `social-fetch` skills referenced `~/.claude/hub/SCRIPTS/browser.sh` — a private tooling setup that only the maintainer has. New npm installs silently lacked a working CDP path; the skill-documented commands errored with "file not found". A second failure mode: when a user followed any online guide to start Chrome with `--remote-debugging-port` while their daily Chrome was already running, macOS LaunchServices silently routed the call to the existing instance without applying the flag (log: "Wird in einer aktuellen Browsersitzung geöffnet"), and no CDP endpoint came up.

**Fix — three additions:**

1. **`src/services/cdp-bootstrap.ts` (new):** Spawns Playwright's bundled *Google Chrome for Testing* binary with a distinct bundle ID — zero conflict with the user's daily Chrome. Dynamic binary resolution walks the latest `chromium-NNNN/` cache directory; cross-platform (macOS arm64/x64, Linux, Windows). Idempotent `ensureRunning()` — safe to call from multiple concurrent code paths, serialized via a single-flight lock. Cleans stale PID files, verifies liveness via both process signal and CDP `/json/version` probe, captures Chromium stderr to `~/.alvin-bot/browser/chrome-cdp.log` for diagnosis.

2. **`alvin-bot browser` CLI subcommand (new):** Stable shell interface that works on every install — `start`, `stop`, `status`, `goto`, `shot`, `eval`, `tabs`, `doctor`. Wraps the bootstrap so agents in skills have a single, documented command. Screenshots default to `~/.alvin-bot/browser/screenshots/`.

3. **`browser-manager` rewired:** The `cdp` strategy now calls `cdp-bootstrap.ensureRunning()` first (works for every install), and only falls back to the hub script if present (maintainer-only dev convenience). The whole cascade still works with no hub at all.

**Skills updated:**
- `skills/browse/SKILL.md` — rewritten to use `alvin-bot browser ...` commands; hub-script references removed (kept as "if present" note for dev environments).
- `skills/social-fetch/SKILL.md` — CDP fallback line uses `alvin-bot browser goto/shot`.

**Docs:**
- `CLAUDE.md` — browser automation section switched to `alvin-bot browser` everywhere. Tier 0 (curl/WebFetch) now explicit as the cheapest path. Tier 1 example uses inline `node -e` + Playwright (no hub dependency).
- `src/paths.ts` — `HUB_BROWSER_SH` annotated as dev-only optional. New paths: `CDP_PROFILE_DIR`, `CDP_SCREENSHOTS_DIR`, `CDP_PID_FILE`, `CDP_LOG_FILE` under `~/.alvin-bot/browser/`.

**First-run setup (one-time):**
```bash
npx playwright install chromium
```

**Verified on 2026-04-20 with user's daily Chrome running:**
- `alvin-bot browser start` → PID + endpoint, no LaunchServices hijack
- `alvin-bot browser stop` + immediate `alvin-bot browser shot <url>` → CDP auto-starts, screenshot written (15 KB PNG in `~/.alvin-bot/browser/screenshots/`)
- `alvin-bot browser doctor` → all 4 checks green (binary, endpoint, PID, profile lock)
- `npm test` → 504/504 tests passing

## [4.15.2] — 2026-04-17

### 🐛 Fix: sleep-aware heartbeat prevents false failover after macOS wake

**Problem:** When the Mac goes to sleep, Node.js' `setInterval` pauses completely. After waking up, the first heartbeat probe runs against a CLI + network stack that's still warming up (OAuth token refresh, DNS cache cold, TCP connections stale). The 5s `isAvailable()` timeout is too tight for post-wake latency → probe fails → 2 consecutive failures (the heartbeat fires its backlog) → auto-failover to Ollama → the bot silently answers via Gemma4 instead of Claude, sometimes for hours.

**Evidence:** Logs showed a 7-hour gap (02:02–09:14 UTC) with zero heartbeat activity — the Mac was asleep. Immediately after wake, `claude-sdk: failure 1/2` → `unhealthy` → Ollama boot. The auto-recovery logic was correct but had no chance to fire before a manual restart.

**Fix — three mechanisms in `heartbeat.ts`:**

1. **Sleep detection via wall-clock drift:** If `now - lastHeartbeatRanAt > 2× interval`, the machine was suspended. On detection:
   - 60s grace period where probe failures don't count toward the fail threshold
   - All stale failure counters reset to zero (pre-sleep failures are meaningless)
   - `isAvailable()` caches invalidated (a 7-hour-old "available: false" cache must not survive wake)

2. **Quick recovery probe:** After every failover, schedule an extra heartbeat after 60s (not 5 min). If the primary is already back, recovery happens in ≤60s instead of up to 5 minutes.

3. **Cache invalidation API:** `ClaudeSDKProvider.invalidateAvailabilityCache()` exposed so the heartbeat can clear stale results after sleep.

**Typical post-sleep flow with fix:**
```
[wake]   → 💓 😴 Sleep detected (~420min gap). Grace period 60s
         → reset claude-sdk to healthy, invalidate caches
[+0s]    → 💓 😴 claude-sdk: probe failed during grace period — not counting
[+60s]   → grace expired → normal probe → claude-sdk healthy ✅
```
Without the fix, the same scenario triggered failover at +0s.

---

## [4.15.1] — 2026-04-16

### 🐛 Patch: suppress `fallbackModel` when primary is Haiku

v4.15.0 unconditionally set `fallbackModel: "haiku"` on every Agent SDK call as a rate-limit safety net. When the user switched to `claude-haiku` (via `/model claude-haiku` or a workspace `model: haiku`), the SDK rejected the request:

> *Fallback model cannot be the same as the main model. Please specify a different model for fallbackModel option.*

The provider registry treated this as a normal failure and cascaded to the next fallback — Ollama — which then had to cold-boot (~45 s for `gemma4:e4b`). Visible symptom: a sudden multi-second latency spike immediately after picking Haiku, followed by the bot answering via the local model instead of Claude.

**Fix:** `src/providers/claude-sdk-provider.ts` now checks whether the resolved primary model contains `"haiku"` and omits `fallbackModel` in that case. Opus / Sonnet / `inherit` still get Haiku as fallback. No other provider paths affected.

### Commits

- `ec205b5` — fix(providers): v4.15.1 — don't set fallbackModel when primary is Haiku

---

## [4.15.0] — 2026-04-16

### ✨ Feature: auto-latest Claude model selection + per-workspace overrides

Alvin now picks up new Claude models (e.g. Opus 4.7 on Max subscription) automatically, and users can switch between Opus / Sonnet / Haiku tiers directly from Telegram — or pin a specific tier per workspace.

#### What's new

**`/model` now lists four Claude entries** (plus any configured custom providers + Ollama):
- `Claude (Agent SDK)` — CLI default (= whatever Anthropic ships as current, currently Opus 4.7)
- `Claude Opus (auto-latest)` — forwards `model: "opus"` to the Agent SDK → latest Opus tier
- `Claude Sonnet (auto-latest)` — same pattern with Sonnet
- `Claude Haiku (auto-latest)` — same pattern with Haiku

The three aliased entries all route through `ClaudeSDKProvider` with different `model:` values. Switching persists to `~/.alvin-bot/.env` (`PRIMARY_PROVIDER=…`), so the choice survives bot restarts.

**Workspaces can pin a model** via an optional YAML frontmatter field:

```yaml
---
purpose: my-project
cwd: ~/Projects/my-project
model: sonnet           # opus | sonnet | haiku | claude-opus-4-7 | ...
---
```

When `model:` is omitted (the default for all existing workspaces), the globally active `/model` choice is used — no behaviour change.

**Fallback on rate limits:** the Agent SDK is now always called with `fallbackModel: "haiku"`. Keeps the bot responsive when the primary tier is throttled.

#### Why this matters

Before v4.15, `claude-opus-4-6` was hardcoded in six places. When Anthropic released Opus 4.7 on the Max plan, the CLI picked it up automatically — but Alvin's `/status` still claimed `claude-opus-4-6`, and there was no way to force a specific tier from Telegram. The Agent SDK's `query()` call wasn't even receiving a `model:` parameter, so whatever lived in `config.model` was dead metadata.

Now:
- The default `"inherit"` means "don't pass model: — let the CLI pick its current default." Fresh installs on Max plans get Opus 4.7 automatically.
- Aliases (`opus` / `sonnet` / `haiku`) resolve to the latest tier each release cycle without any code change.
- Pinning a specific ID (e.g. `claude-opus-4-7`) is supported for reproducibility.

#### Implementation

- `src/providers/claude-sdk-provider.ts` — forwards `model:` and sets `fallbackModel: "haiku"` on every `query()` call. Resolution order: per-query `options.model` → provider `this.config.model` → `"inherit"` (= no model passed).
- `src/providers/registry.ts` — registers three virtual entries (`claude-opus`, `claude-sonnet`, `claude-haiku`) as additional keys all backed by `ClaudeSDKProvider` with different `model:` values.
- `src/services/env-file.ts` — new module extracting the `readEnv` / `writeEnvVar` / `removeEnvVar` helpers from `setup-api.ts` so Telegram command handlers can persist runtime choices.
- `src/handlers/commands.ts` — `switchProviderWithLifecycle` now calls `writeEnvVar("PRIMARY_PROVIDER", targetKey)` on every switch, not just Web UI changes.
- `src/services/workspaces.ts` — `Workspace` type gets optional `model?: string`, the YAML parser picks it up from frontmatter.
- `src/providers/types.ts` — `QueryOptions` gets optional `model?: string` for per-query overrides.
- `src/handlers/message.ts` + `src/handlers/platform-message.ts` — both forward `workspace.model` into `queryOpts` when the active workspace has one defined.

#### Backward compatibility

- Default provider config is `"inherit"` — identical to pre-v4.15 behaviour (no `model:` passed to the Agent SDK, CLI default wins).
- Workspaces without a `model:` field behave exactly as before.
- Stale presets `claude-sonnet-4-20250514` → `claude-sonnet-4-6` and `claude-3-5-haiku-20241022` → `claude-haiku-4-5` updated (previously unused — only affected the REST-API code paths, which nobody referenced).

#### Docs

Workspace guides updated (`docs/install/workspaces-de.html` + `workspaces-en.html`) — the YAML-field reference table now documents the new optional `model:` entry.

### 🐛 Bonus: stale model-ID cleanup

Four hardcoded Claude model IDs replaced with current strings: `claude-sonnet-4-20250514` → `claude-sonnet-4-6`, `claude-3-5-haiku-20241022` → `claude-haiku-4-5`, openai-compat fallback `claude-opus-4` → `claude-opus-4-6`, setup-API defaults likewise. None of these were on active code paths, but they would have shipped confusing display names if anyone had referenced them.

### Commits

- `fed4b91` — feat(providers): v4.15 — auto-latest Claude model selection via /model
- `b2a6e1f` — feat(workspaces): v4.15 — optional per-workspace model override

---

## [4.14.2] — 2026-04-16

### 🐛 Patch: watcher zombie-entry fix (missing outputFile > 10 min = failed)

**Edge case the maintainer caught today:** a pending async-agent entry stuck in `/subagents list` for 3+ hours showing "running" — but the underlying `alvin_dispatch_agent` subprocess had already died (its output file was gone). The entry would have continued haunting the list until the 12-hour `giveUpAt` ceiling fired.

**Root cause:** `async-agent-watcher`'s `pollOnce` handled four states from `parseOutputFileStatus` — `completed` / `failed` / `running` / `missing`. For `missing` (file doesn't exist or is empty), the watcher just kept polling forever, on the assumption that a slow subprocess might eventually write. If the subprocess crashed before writing ANY output, the file never appeared, and we polled for 12 hours before timing out.

**Fix:** when `status.state === "missing"` AND `now - entry.startedAt > MISSING_FILE_FAILURE_MS` (default 10 min, configurable via `ALVIN_MISSING_FILE_FAILURE_MS` env var), deliver as failed with an explicit message:

> *Dispatched subprocess never wrote its output file (N m after start). Likely crashed before initializing, or the file was removed externally.*

10 minutes is well above any legitimate `claude -p` startup variance (normal first-write latency is seconds) and well below the 12-hour hard ceiling.

### What's preserved (regression-guard tested)

- Running agents (file has content but no `end_turn`/`result` yet) are untouched by this path — they still keep polling as before.
- Completed agents (clean `end_turn` or `stream-json result` event) still deliver normally.
- Explicit `failed` state from the parser (if ever used) still delivers error normally.
- v4.12.4's "file is stale but has text → deliver partial" path takes precedence over the new zombie check (the file has content, so not "missing").
- 12-hour `giveUpAt` hard ceiling still applies as the ultimate safety net.
- Session's `pendingBackgroundCount` decrement fires on zombie failure, same as every other delivery path.

### Testing

- **Baseline**: 498 tests (v4.14.1)
- **New**: `test/watcher-zombie-fix.test.ts` — 6 tests:
  - Young missing file (<threshold) stays pending
  - Old missing file (>threshold) delivers failed + removes from pending
  - Default threshold is 10 min when env var unset
  - Running file (has content) is unaffected by zombie check
  - Completed file delivers as completed (regression guard)
  - Session's `pendingBackgroundCount` decrements on zombie delivery
- **Total**: 504 tests, all green, TSC clean

### Files changed

- **Modified**: `src/services/async-agent-watcher.ts` (new `getMissingFileFailureMs()` + zombie branch in `pollOnce`)
- **NEW tests**: `test/watcher-zombie-fix.test.ts`
- **Version**: `package.json` 4.14.1 → 4.14.2

---

## [4.14.1] — 2026-04-16

### 🐛 Patch: `/subagents list` now shows v4.13+ dispatch agents too

**Bug the maintainer caught:** typing `/subagents list` in Telegram while a `alvin_dispatch_agent` sub-agent was actively running returned "no agents running" — even though the user could see the agent finish and deliver a result shortly after. Cross-platform effect too: `/alvin` slash command on Slack had the same display gap.

**Root cause:** two separate registries for sub-agents:
- `src/services/subagents.ts` `activeAgents` Map — used since v4.0.0 for bot-level sub-agents (cron spawns, implicit Task tool children, `/sub-agents spawn` CLI)
- `src/services/async-agent-watcher.ts` `pending` Map — used since v4.13 for detached `alvin_dispatch_agent` subprocesses

`/subagents list` only read from the first map. The entire v4.13+ dispatch path was invisible in the listing.

**Fix:** new `listActiveSubAgents()` helper in subagents.ts that merges both registries. Pending async-agent-watcher entries get synthesized into `SubAgentInfo` shape (status="running", source="cron", depth=0, platform preserved). The `/subagents list` handler and the default-render path both switch to the merged helper. The old `listSubAgents()` function stays pure (unchanged behavior) — cancel/result paths still use it because detached subprocess PIDs aren't tracked.

### Technical details

- `listActiveSubAgents()` is async (lazy dynamic import of the watcher module to keep subagents.ts load order clean) — existing `listSubAgents()` remains sync for the v4.0.0 consumers
- Synthesis mapping: `PendingAsyncAgent.agentId → SubAgentInfo.id`, `description → name`, `startedAt → startedAt`, always `status="running"` (pending by definition), `source="cron"` (matches watcher's delivery banner), `depth=0`
- Platform field preserved so the renderer can show cross-platform context if desired later

### Testing

- **Baseline**: 492 tests (v4.14.0)
- **New**: `test/list-subagents-merged.test.ts` — 6 tests (empty state, single slack agent, multi-platform merge, timestamp preservation, source tag, listSubAgents purity guard)
- **Total**: 498 tests, all green, TSC clean

### Files changed

- **Modified**: `src/services/subagents.ts` (new listActiveSubAgents helper), `src/handlers/commands.ts` (both /subagents list paths switch to merged view)
- **NEW tests**: `test/list-subagents-merged.test.ts`
- **Version**: `package.json` 4.14.0 → 4.14.1

---

## [4.14.0] — 2026-04-16

### ✨ Sub-agent dispatch on Slack, Discord, WhatsApp (Telegram unchanged)

v4.13.0 shipped truly-detached sub-agents via the `mcp__alvin__dispatch_agent` MCP tool, but only Telegram passed the required `alvinDispatchContext` to the provider. Slack/Discord/WhatsApp users couldn't trigger background sub-agents — the tool was visible to Claude but effectively unreachable.

v4.14 wires the same dispatch path through the non-Telegram handler (`src/handlers/platform-message.ts`) and adds a platform-aware delivery router so results come back on the same platform they were dispatched from.

**Telegram is untouched.** The v4.13.0 Telegram pipeline (message.ts → Claude SDK → alvin_dispatch_agent → watcher → grammy-api delivery) is bit-for-bit identical. Only the types widened (`chatId: number | string`, `platform?: ...`), and the new code paths activate only when `platform !== "telegram"`.

### Technical details

**Type widening** (`src/services/async-agent-watcher.ts`, `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `src/providers/types.ts`, `src/services/subagents.ts`):
- `PendingAsyncAgent.chatId` / `userId`: `number` → `number | string`
- `PendingAsyncAgent.platform?: "telegram" | "slack" | "discord" | "whatsapp"` (optional, undefined = telegram)
- `SubAgentInfo.parentChatId`: same widening
- `SubAgentInfo.platform?: ...` new field
- `DispatchInput`, `AlvinDispatchContext`, `QueryOptions.alvinDispatchContext`: same widening + `platform` field

Pre-v4.14 persisted `async-agents.json` entries keep working — missing `platform` field defaults to `telegram`, numeric `chatId` still routes through grammy.

**New module** `src/services/delivery-registry.ts`:
- `registerDeliveryAdapter({ platform, sendText, sendDocument? })` — called by each platform module at startup
- `getDeliveryAdapter(platform)` — watcher lookup
- Tiny surface: sendText + optional sendDocument, string | number chatId, no Markdown or live-stream

**Delivery router** `src/services/subagent-delivery.ts` `deliverSubAgentResult()`:
- Branches on `info.platform ?? "telegram"`:
  - `telegram` → existing grammy path (unchanged Markdown parsing, file uploads, 3800-char chunking)
  - `slack`/`discord`/`whatsapp` → new `deliverViaRegistry()` path — plain text (no Markdown), 3800-char chunks, optional file upload via adapter.sendDocument

**Adapter registration** in `src/platforms/slack.ts`, `src/platforms/discord.ts`, `src/platforms/whatsapp.ts`:
- Each platform's `start()` now calls `registerDeliveryAdapter` at the end
- The adapter's `sendText` wraps the existing platform `sendText` (no duplicate code)

**Handler wiring** `src/handlers/platform-message.ts`:
- When the active provider is SDK, `alvinDispatchContext: { chatId, userId, sessionKey, platform }` is passed in queryOpts — mirrors the Telegram handler's v4.13.0 behavior
- Claude sees the same `mcp__alvin__dispatch_agent` tool and uses it the same way

### Testing

- **Baseline**: 483 tests (v4.13.2)
- **New**:
  - `test/delivery-registry.test.ts` — 4 tests (register/get roundtrip, unregistered returns null, re-register replaces, per-platform isolation)
  - `test/subagent-delivery-platform-routing.test.ts` — 5 tests (slack routes via registry not grammy, telegram defaults still use grammy, discord routes correctly, orphan platform skips gracefully, long output chunks on non-telegram adapters)
- **Total**: 492 tests, all green, TSC clean
- **Telegram regression guard**: the routing test explicitly verifies `info.platform=undefined` still hits grammy, and `info.platform='slack'` never touches grammy. That's the load-bearing invariant.

### Files changed

- **NEW**: `src/services/delivery-registry.ts`, `test/delivery-registry.test.ts`, `test/subagent-delivery-platform-routing.test.ts`
- **Modified**: `src/services/async-agent-watcher.ts` (chatId widening + platform field), `src/services/subagent-delivery.ts` (platform router + plain-text banner variant), `src/services/alvin-dispatch.ts` (type widening), `src/services/alvin-mcp-tools.ts` (context pass-through), `src/services/subagents.ts` (SubAgentInfo.platform + widened parentChatId), `src/providers/types.ts` (QueryOptions.alvinDispatchContext extended), `src/handlers/platform-message.ts` (dispatch context), `src/platforms/slack.ts` / `discord.ts` / `whatsapp.ts` (adapter registration)
- **Version**: `package.json` 4.13.2 → 4.14.0 (minor bump — new public surface: delivery-registry, platform field)

### Known limitations

- **Slack slash command context**: when a user invokes `/alvin <prompt>` in Slack, dispatch works (same codepath), but the sub-agent result delivery lands as a persistent channel message, not an ephemeral slash-command response. If you want ephemeral replies, use DM.
- **Discord/WhatsApp not smoke-tested**: the code paths match Slack, and the adapter registration is symmetric, but I only end-to-end tested Slack. YMMV until you run a real test.

---

## [4.13.2] — 2026-04-16

### ✨ Slack: `/alvin` slash commands + rewritten setup guide

**Bug (carried over from v4.13.1):** Slash commands didn't work on Slack. When a user typed `/status` in a DM with the bot, Slack either hit its built-in `/status` (user status setter) or showed "Not a valid command" — nothing reached the bot. The Slack adapter only registered `message` + `app_mention` event handlers, no `command` handler; the manifest declared no slash commands.

**Why it was a gotcha**: Slack treats slash commands as a separate event type (`command`), not as message text. Apps must explicitly register each command in their manifest AND add a `app.command(...)` handler to receive the events. None of this had been set up.

**Fix**: v4.13.2 introduces a single namespaced command `/alvin` that takes a subcommand argument. Users type `/alvin status`, `/alvin new`, `/alvin effort high`, `/alvin help` — the Slack adapter parses the subcommand from `command.text` and forwards it as a `/status`/`/new`/etc. message through the existing `handlePlatformCommand` pipeline. Unknown subcommands fall through to normal LLM handling so `/alvin what's the weather` also works as a free-form query.

### Technical details

**New parser** `src/platforms/slack-slash-parser.ts`: pure `parseSlackSlashCommand(text)` helper. Empty text → `/help`. Single word → `/<word>`. Word + args → `/<word> <args>`. Lowercases subcommand, preserves arg capitalization, strips defensive leading slash, collapses extra whitespace. 8 unit tests.

**Adapter change** `src/platforms/slack.ts`: new `app.command("/alvin", ...)` registration in `start()` (guarded with `typeof app.command === "function"` for test-mock compat). `ack()` fires immediately to meet Slack's 3-second requirement. New `handleSlashCommand(command)` method synthesizes an `IncomingMessage` with the translated `text` and the command's `channel_id`/`user_id` and forwards to the same `this.handler(...)` path as regular DMs. Response goes back via `chat.postMessage` (persistent, visible in channel history) rather than slash-command-native `respond()` (ephemeral) — matches DM behavior.

**Slack app manifest**: requires a new `features.slash_commands` entry declaring `/alvin` and a new `commands` OAuth scope. Both are in the manifest JSON the setup guide pastes in — no manual per-field config. Existing installations need a one-time re-install to pick up the new `commands` scope (Slack shows a yellow banner after manifest save).

**Setup guide rewrite** `src/web/setup-api.ts` Slack `setupSteps[]`: replaces the old 7-step "click-through every section" sequence with a 9-step manifest-paste flow that actually matches how the bot is currently set up (Messages Tab, Events, Socket Mode, slash commands — all covered in one JSON paste). Includes the full manifest JSON inline. New users get a working Slack app in ~2 minutes instead of hunting through the Slack API UI.

### Testing

- **Baseline**: 475 tests (v4.13.1)
- **New**: `test/slack-slash-command.test.ts` — 8 tests (empty → /help, single word, args preservation, whitespace collapse, case insensitivity on subcommand, case preservation on args, defensive leading slash handling)
- **Total**: 483 tests, all green, TSC clean
- **Live smoke verification**: manifest pushed via Chrome browser automation, reinstall completed, Slack adapter re-registered with `app.command("/alvin")`. Live test of `/alvin status` pending user confirmation.

### Files changed

- **NEW**: `src/platforms/slack-slash-parser.ts`, `test/slack-slash-command.test.ts`
- **Modified**: `src/platforms/slack.ts` (command registration + handler), `src/web/setup-api.ts` (slack setupSteps rewrite), `package.json` (4.13.1 → 4.13.2)

### Known limitations

- **One command namespace only**: we register `/alvin` not individual `/status`/`/new` etc. because `/status` conflicts with Slack's built-in command. Side effect: slightly more typing for users (`/alvin status` vs `/status`). Alternative namespaces considered (`/alvin-status` as multiple commands each) would work too but require more manifest boilerplate; deferred unless users complain.
- **Channel responses are public**: when `/alvin status` is invoked in a channel, the bot's response is a normal `chat.postMessage` visible to the whole channel. If you want private responses there, use DM or switch the sendText call to use Slack's `response_url` (ephemeral). Deferred as enhancement — DM is the primary use case.

---

## [4.13.1] — 2026-04-16

### 🐛 Patch: Slack Test Connection + PM2 → launchd migration for Maintenance UI

Two latent UI bugs surfaced during live Slack setup:

**Bug 1 — `/api/platforms/test-connection` returned "Unknown platform" for Slack.** The handler in `setup-api.ts` only knew about telegram/discord/signal/whatsapp. Users who entered a valid Bot Token (`xoxb-…`) + App Token (`xapp-…`) and clicked Test Connection got a confusing "Unknown platform" error — couldn't tell if their tokens were wrong or the feature was broken.

**Fix:** New `slack` case in the handler. Validates Bot Token via `https://slack.com/api/auth.test` (cheap, ~100ms). For App Token, checks the `xapp-` prefix as the quickest sanity check (Socket Mode can't actually be "pinged" without opening a persistent WebSocket). Returns the authenticated bot user + team name on success, or Slack's own `auth.test` error (e.g. `invalid_auth`, `token_expired`) on failure. Warns if App Token is missing or has wrong prefix even when Bot Token is valid — helps users notice they only configured half the pair.

**Bug 2 — Maintenance section's buttons were broken on macOS launchd installs.** Since v4.8 the macOS install runs under `launchd` (`com.alvinbot.app.plist`), not PM2. But `doctor-api.ts` kept calling `pm2 jlist`/`pm2 restart`/`pm2 stop`/`pm2 logs`. Results: status endpoint returned stale data from ghost PM2 entries (uptime/memory/cpu/restarts all wrong), Stop/Start buttons silently failed, log viewer was empty. The Restart button accidentally worked because it used `scheduleGracefulRestart` (launchd's `KeepAlive` auto-brings-back on exit).

**Fix:** New `src/services/process-manager.ts` abstraction that auto-detects the active supervisor per request:
- **launchd** (macOS) if `launchctl print gui/$UID/com.alvinbot.app` succeeds
- **pm2** (VPS / legacy installs) if `pm2 jlist` lists our process
- **standalone** if neither (fallback — only Restart works, since there's no supervisor to bring the process back)

Each manager implements `getStatus()`, `stop()`, `start()`, `getLogs()` with the right tooling:
- launchd: `launchctl print` + `ps -p <pid> -o %cpu=,%mem=,rss=,etime=` for resource stats, `launchctl bootout` / `bootstrap` for stop/start, `tail` on the known log paths for logs
- pm2: unchanged — `pm2 jlist` / `pm2 stop` / `pm2 start` / `pm2 logs`
- standalone: `process.uptime()` / `process.memoryUsage()` / manual log tailing

The WebUI routes (`/api/pm2/status`, `/api/pm2/action`, `/api/pm2/logs`) keep their names for compat but now dispatch via `detectProcessManager()`. Real-world verified against the running bot: detection returned `launchd`, PID/uptime/memory all correct from the actual launchd-managed process (not a stale PM2 ghost).

### Testing

- **Baseline**: 460 tests (v4.13.0)
- **New**:
  - `test/slack-test-connection.test.ts` — 5 tests (no tokens set, auth.test accepts, auth.test rejects, App Token format warning, unknown platform regression)
  - `test/process-manager.test.ts` — 10 tests (detection order, each manager's status parsing, stop/start command dispatch)
- **Total**: 475 tests, all green, TSC clean
- **Live verification**: ran `detectProcessManager().getStatus()` against the actual running bot → returned `launchd`, PID 4767 (matches `launchctl print pid = 4767`), uptime 655s, memory 76MB — all real data, not stale PM2 cache

### Files changed

- **NEW**: `src/services/process-manager.ts`, `test/slack-test-connection.test.ts`, `test/process-manager.test.ts`
- **Modified**: `src/web/setup-api.ts` (+slack case in test-connection), `src/web/doctor-api.ts` (routes use process-manager abstraction), `package.json` (4.13.0 → 4.13.1)

### Known limitations (deferred to v4.14)

- **Slack subagent support**: v4.13.0's `mcp__alvin__dispatch_agent` tool only activates on the Telegram handler (passes `alvinDispatchContext`). Slack users can receive normal replies but can't trigger background sub-agents yet. Requires extending `PendingAsyncAgent.chatId` to `number | string`, adding `platform` to the watcher's pending record, and making `subagent-delivery.ts` platform-aware. Tracked for v4.14.

---

## [4.13.0] — 2026-04-16

### ✨ Major: truly detached sub-agent dispatch via `alvin_dispatch_agent` MCP tool

**Background.** v4.12.1 → v4.12.3 tried three progressively more complex fixes for the "bot freezes while sub-agent runs" problem, all of which depended on Claude Agent SDK's built-in `Task(run_in_background: true)` tool. All three iterations missed the same architectural reality: the SDK's background task stays tied to the parent SDK subprocess lifecycle. When v4.12.3's bypass path aborted the parent to unblock the user, the abort cascaded into killing the in-flight sub-agent mid-work. v4.12.4 worked around this at the delivery layer (recovering partial output after a 5-min staleness window), but the fundamental architecture was still wrong.

v4.13 fixes the architecture. Instead of using the SDK's built-in Task tool for background work, we register our own MCP tool — `mcp__alvin__dispatch_agent` — which spawns a **completely independent** `claude -p` subprocess (its own PID, its own process group, unreferenced from the parent's event loop). Aborting the parent has zero effect on the dispatched subprocess. It continues to write its stream-json output to its own file and runs to completion. The async-agent-watcher polls the output file and delivers the result as a separate message when ready.

Empirically verified with a standalone survival test (`scripts/smoke-test-abort-survival.mjs`): dispatch an agent that needs 20+ seconds of work, kill the parent Node process 100ms later, watch the subprocess keep writing to its output file and complete cleanly with the expected result.

### What changed for the user

- **Before v4.13** (with Task tool): the bot shows "typing…" for the entire duration of the sub-agent's work (5, 20, 60 minutes). New messages sit in a queue and don't get processed. If the user interrupts via v4.12.3's bypass, the sub-agent dies mid-work and hours later the user gets a `720m timeout · (empty output)` message.
- **After v4.13** (with `alvin_dispatch_agent`): the bot's turn completes within seconds of dispatch. The user sees "🤖 Dispatched 2 background agents — I'll send the results when ready." and can immediately chat about anything else. The background subprocesses finish cleanly and deliver their full results as separate messages.

This matches the OpenClaw experience the user was asking about — except it's built natively into Claude Agent SDK's MCP-tool mechanism, not a wholesale replacement.

### Technical details

**New module** `src/services/alvin-dispatch.ts`
- `dispatchDetachedAgent(input)` — spawns `claude -p <prompt> --output-format stream-json` via `child_process.spawn({ detached: true, stdio: ["ignore", outFd, errFd] })` + `.unref()`
- Synchronous return: `{ agentId, outputFile, spawned: true }`
- Side effects: registers with `async-agent-watcher`, increments `session.pendingBackgroundCount`
- Unique agent IDs via `crypto.randomBytes(12).toString("hex")` (collision-safe for parallel dispatch)
- Cleans `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` from env to prevent nested-session errors

**New module** `src/services/alvin-mcp-tools.ts`
- `buildAlvinMcpServer(ctx)` — creates an SDK MCP server bound to this turn's `{ chatId, userId, sessionKey }` context via closure
- Exposes `dispatch_agent` tool (zod-validated input: `{ prompt: string, description: string }`)
- Tool handler calls `dispatchDetachedAgent` and returns `agentId + outputFile` to Claude
- Uses SDK's `createSdkMcpServer` + `tool` builders (the SDK's native inline-tool API — no separate MCP server process needed)

**Provider integration** (`src/providers/claude-sdk-provider.ts`)
- New `QueryOptions.alvinDispatchContext` field — when set, provider registers `mcpServers: { alvin: buildAlvinMcpServer(ctx) }` + appends `mcp__alvin__dispatch_agent` to the default `allowedTools` list
- When unset, the MCP server is not registered and Claude falls back to the built-in Task tool only
- Non-SDK providers ignore the new field entirely

**Handler integration** (`src/handlers/message.ts`)
- Passes `alvinDispatchContext: { chatId, userId, sessionKey }` on every SDK turn
- No other handler changes — the bypass path, the staleness parser, and the pending-count decrement are all reused from v4.12.3/v4.12.4

**Parser extension** (`src/services/async-agent-parser.ts`)
- New first-pass scan for `{"type":"result"}` events — the completion marker used by `claude -p --output-format stream-json` (different from the SDK-internal sub-agent format that uses `message.stop_reason: "end_turn"`)
- When found, uses the `result.result` field as authoritative output when present, falls back to aggregating all assistant text blocks
- Preserves backward compat with the existing `end_turn`-based path (tested by the old test suite)

**System prompt update** (`src/services/personality.ts`)
- `BACKGROUND_SUBAGENT_HINT` rewritten to strongly prefer `mcp__alvin__dispatch_agent` over `Task(run_in_background: true)` on Telegram/WhatsApp/Slack/Discord
- Explicit decision tree, concrete example prompts, parallel-dispatch guidance
- Built-in Task tool remains available but deprecated for long-running work; reserved for the rare case where Claude needs a result in the same turn

### Known limitations

- **First-turn only for now**: the MCP server is bound to `{ chatId, userId, sessionKey }` at query construction time. If the session's underlying SDK session ID changes mid-conversation (rare), the tool context goes stale. Defensive: a new MCP server is built on each handler invocation, so any next turn picks up the correct context.
- **Non-Telegram platforms**: `src/handlers/platform-message.ts` (Slack/Discord/WhatsApp) doesn't pass `alvinDispatchContext` yet. Deferred to follow-up — the Telegram path is the primary use case and the one the user explicitly requested.
- **Parallel dispatch not smoke-tested**: the system prompt guides Claude to call `dispatch_agent` multiple times in one turn for parallel work, but I only end-to-end tested single dispatch. Should work (no shared state in the handler), but YMMV until battle-tested.

### Testing

- **Baseline**: 447 tests (v4.12.4)
- **New**:
  - `test/alvin-dispatch.test.ts` — 6 tests (spawn flags, unique IDs, watcher registration, session counter, stdio redirect, env cleanup)
  - `test/async-agent-parser-streamjson.test.ts` — 7 tests (result-event detection, token extraction, error state, running state, multi-text aggregation, `result.result` precedence, minimal fields)
- **Total**: 460 tests, all green, TSC clean
- **Real-world smoke tests** (NOT in CI — run via `node scripts/smoke-test-dispatch.mjs` and `node scripts/smoke-test-abort-survival.mjs`):
  - `smoke-test-dispatch`: dispatches a real `claude -p` subprocess, polls to completion (~10s), verifies exact output `"SMOKE_TEST_OK_v4.13"`. **PASS**.
  - `smoke-test-abort-survival`: dispatches a subprocess that needs ~25s of work, kills the parent Node process ~100ms later, polls the output file. Subprocess survives and completes cleanly. **PASS**.

### Files changed

- **NEW**: `src/services/alvin-dispatch.ts`, `src/services/alvin-mcp-tools.ts`, `scripts/smoke-test-dispatch.mjs`, `scripts/smoke-test-abort-survival.mjs`
- **NEW tests**: `test/alvin-dispatch.test.ts`, `test/async-agent-parser-streamjson.test.ts`
- **Modified**: `src/paths.ts` (SUBAGENTS_DIR), `src/services/async-agent-parser.ts` (stream-json detection), `src/providers/claude-sdk-provider.ts` (MCP server registration + allowedTools), `src/providers/types.ts` (QueryOptions.alvinDispatchContext), `src/handlers/message.ts` (pass dispatch context), `src/services/personality.ts` (BACKGROUND_SUBAGENT_HINT rewrite)
- **Version**: `package.json` 4.12.4 → 4.13.0 (minor bump — new public surface: MCP tool)

---

## [4.12.4] — 2026-04-16

### 🐛 Patch: recover partial output from interrupted background sub-agents

**The bug the maintainer saw:** Two Telegram messages appeared hours apart: `⏱️ Background agent a5bf8c74 timeout · 720m 3s · 0 in / 0 out` and `... ab9372d4 timeout · 720m 1s · 0 in / 0 out`, both with `(empty output)`. Three more agents were still pending, all interrupted mid-execution with hundreds of KB of real work sitting on disk.

**Root cause:** v4.12.3's bypass-abort calls `session.abortController.abort()`, which propagates through `claude-sdk-provider.ts`'s `internalAbortController` into the SDK's CLI subprocess, which in turn propagates into any in-flight `Agent(run_in_background: true)` tool executions. Evidence from the disk:

- `agent-a03ce829...jsonl`: 116 lines, last event = literally `"[Request interrupted by user for tool use]"` mid-Bash-tool-use
- `agent-af61fa6e...jsonl`: 81 lines, last assistant text = `"Ich habe jetzt genug Daten für den vollständigen Audit. Hier ist der Report:"` — interrupted while streaming the final report
- `agent-ac47c4a2...jsonl`: 131 lines, last assistant text = `"## Audit Report — Ergebnis\n### Kritische Bugs"` — interrupted a few words into the payoff

None of them reached `stop_reason: "end_turn"`. The pre-v4.12.4 `parseOutputFileStatus` only recognized `end_turn` as a completion signal, so these agents sat in the pending list for 12h until `giveUpAt` elapsed, then got delivered as `(empty output)` while their real work was still on disk.

**The fix:** `parseOutputFileStatus` now has a staleness fallback. When no `end_turn` is present BUT the outputFile hasn't been written to in `stalenessMs` (default 5 min, configurable via `ALVIN_SUBAGENT_STALENESS_MS`) AND there is usable assistant text content in the tail, the parser:

1. Aggregates ALL text blocks across all assistant turns in the tail (not just the last one — bias toward delivering more context)
2. Prepends a clear banner: `⚠️ _Sub-Agent wurde unterbrochen — hier ist der partielle Output:_`
3. Returns `state: "completed"` so the watcher delivers it instead of continuing to poll

Result: on the next `pollOnce()` after v4.12.4 ships, the three stuck agents get delivered with their real partial output (combined ~1.2MB of text across the three). Future interrupts recover within 5 minutes instead of hanging 12 hours.

### Behavioral notes

- **Clean `end_turn` sub-agents are unchanged** — the staleness fallback is a *fallback only*. The existing strict path runs first and takes precedence.
- **`stalenessMs: 0` disables the fallback entirely** — strict end_turn-only mode for callers that prefer it.
- **Thinking blocks are still filtered out** of the partial delivery — same as with clean completion.
- **Files with no assistant text at all** (only tool_use) stay in `running` state — nothing useful to deliver.
- **Tokens are surfaced when available** — the last assistant event's `usage.input_tokens`/`output_tokens` flow through to the delivery banner.

### Known limitations (carried over from v4.12.3, deferred to v4.13)

- The bypass-abort mechanism in `message.ts` still propagates to the SDK subprocess and kills in-flight sub-agents. v4.12.4 works around this at the delivery layer (recovering partial output); a true fix requires either architectural replacement of the SDK's `Task` tool with our own detached-subprocess dispatch, or SDK support for per-task-branch abort signals. Tracked for v4.13.
- Users may still experience the bot's "typing…" indicator when Claude is thinking in the main turn (before dispatching any background agent). Bypass only fires once `pendingBackgroundCount > 0`. For interrupt before dispatch, use `/cancel`.

### Testing

- **Baseline**: 436 tests (v4.12.3)
- **New**: `test/async-agent-parser-staleness.test.ts` — 11 tests covering: clean `end_turn` still wins over staleness, fresh-interrupted file stays running, stale-interrupted file delivers partial with banner, no-text file stays running, `stalenessMs: 0` disables, aggregation across multiple turns, thinking-block filtering, token extraction, interrupt-only file with no useful content, and ordering preservation.
- **Total**: 447 tests, all green, TSC clean.

### Files changed

- **Modified**: `src/services/async-agent-parser.ts` — staleness fallback in `parseOutputFileStatus`, `DEFAULT_STALENESS_MS` constant, `INTERRUPTED_BANNER` prefix.
- **NEW tests**: `test/async-agent-parser-staleness.test.ts`.
- **Version**: `package.json` 4.12.3 → 4.12.4.

---

## [4.12.3] — 2026-04-15

### 🐛 Patch: Background sub-agent no longer blocks the main Telegram session

**The bug the maintainer reported:** After launching an async sub-agent (`run_in_background: true`), sending any follow-up message to the bot silently stalled for 2+ minutes before being processed. v4.12.1/v4.12.2 attempted a prompt-hint mitigation but did NOT address the architectural root cause.

**Root cause (re-diagnosed with live SDK event logs):** The Claude Agent SDK's CLI subprocess stays alive for the full duration of a background task so it can inject the `<task-notification>` inline into the NEXT assistant turn. While that subprocess idles, Alvin's query iterator is still being drained, `session.isProcessing` stays `true`, and every new user message gets pushed into the 3-slot queue — which doesn't auto-drain. From the user's perspective: send "A" → nothing happens for 2 minutes.

**The fix (architectural workaround):** New session field `pendingBackgroundCount` tracks the number of background agents currently in-flight. When a new message arrives while `isProcessing=true` AND the counter is `>0`, the handler:

1. **Aborts the blocked query** instead of queueing. The old SDK subprocess dies; the background task's own detached subprocess keeps writing to its `output_file`.
2. **Starts a fresh SDK session** (`resume: null`) for the new message so it doesn't inherit the block. Recent conversation history is carried forward via the bridge preamble so Claude retains context.
3. **Relies on the existing `async-agent-watcher` (v4.10.0)** to poll the background task's `output_file` and deliver the result as a separate Telegram message via `subagent-delivery.ts`. The watcher decrements the counter when it delivers, so subsequent messages go back to normal SDK-resume behavior.

**Net effect:** Sending "A" during a 5-minute research task now gets processed in ~200ms instead of after 5 minutes. The background research still delivers its result via a separate message when ready.

### Technical details

**New module** `src/handlers/background-bypass.ts` — pure state-machine helpers:
- `shouldBypassQueue(state)` — returns true when `isProcessing=true`, `pendingBackgroundCount>0`, and an unaborted `abortController` exists
- `shouldBypassSdkResume(state)` — returns true when `pendingBackgroundCount>0`, signalling the next query should pass `sessionId=null`
- `waitUntilProcessingFalse(session, timeoutMs, tickMs)` — poll-waits for the old handler's `finally` block to flip the flag before the new query starts

**`src/services/session.ts`** — new field `pendingBackgroundCount: number` (default 0, reset on `/new`). Not persisted across restarts — the watcher re-hydrates its own state file and delivery still works, and starting a fresh counter after restart avoids stale drift.

**`src/services/async-agent-watcher.ts`** — `PendingAsyncAgent` gets an optional `sessionKey` field. On every delivery path (completed/failed/timeout), a new `decrementPendingCount(sessionKey)` helper clamps the counter at 0 using `Math.max`. Missing/unknown session keys are a no-op (backwards compatible with pre-v4.12.3 persisted state files).

**`src/handlers/async-agent-chunk-handler.ts`** — `TurnContext` gets `sessionKey`. When `registerPendingAgent` is called, the counter is incremented in the same function.

**`src/handlers/message.ts`** (Telegram):
- Computes `sessionKey` once at the top of the handler and passes it everywhere
- `if (session.isProcessing)` branch now checks `shouldBypassQueue` first — if true, aborts + waits for cleanup + falls through to process the new message. If false, queues as before.
- When queueing, the handler now sends a text reply (`"⏳ Eine Anfrage läuft gerade. Deine Nachricht ist in der Warteschlange..."`) in addition to the 📝 reaction, so the user sees what happened (reactions alone were too subtle)
- New `bypassResume` variable controls whether `queryOpts.sessionId` is `null` (fresh session) or `session.sessionId` (normal resume)
- Bridge preamble now has two modes: the existing "SDK recovery" mode that bridges fallback turns, plus a new "bypass" mode that bridges the last 10 turns when starting a fresh session mid-conversation
- New `_bypassAbortFired` session flag + `bypassAborted` local flag ensure that the old handler silently absorbs the abort error instead of showing a confusing "request cancelled" reply, and the fresh handler's finalize/broadcast/👍 reaction path is skipped for the aborted turn

### Known limitations

- **Platform coverage**: bypass path is Telegram-only in v4.12.3. Slack/Discord/WhatsApp handlers (`src/handlers/platform-message.ts`) don't currently handle `tool_result` chunks at all, so async agents can't be registered on those platforms. That's a pre-existing limitation that will be fixed in a future release.
- **SDK behavior dependency**: the fix assumes the background task's own subprocess is detached from the parent SDK query's `AbortController`. Empirically this holds (the watcher delivers results even after bypass-abort), but if a future SDK release changes this we'd need to either stop using `run_in_background` and rely on a pure Alvin-side background dispatch (bigger change) or add a targeted `process.kill` for the parent only, keeping the child alive.
- **Restart mid-flight**: if the bot restarts while a background agent is pending, the session's counter starts at 0 on restart. The watcher re-hydrates its own state file and still delivers the result correctly, but the session's "is this blocked?" signal is lost, so the first post-restart message might use SDK resume on the old (possibly-blocked) session ID. Minor cosmetic issue, not a data loss.

### Testing

- **Baseline**: 396 tests (v4.12.2)
- **New tests**: +40
  - `test/session-pending-background.test.ts` — 4 tests (counter wiring, reset, clamp)
  - `test/watcher-pending-count.test.ts` — 6 tests (decrement on delivery/timeout/failure, missing sessionKey, multi-agent)
  - `test/async-agent-chunk-flow.test.ts` — +3 tests (sessionKey propagation, counter stacking, non-async no-op)
  - `test/background-bypass.test.ts` — 12 tests (pure helpers: shouldBypassQueue, shouldBypassSdkResume, waitUntilProcessingFalse)
  - `test/background-bypass-integration.test.ts` — 6 tests (full lifecycle, stress, session isolation)
  - `test/background-bypass-stress.test.ts` — 9 tests (100 parallel sessions, 200 churn cycles, extreme drift, /new during pending, ephemeral session, mixed rollout, timing edge cases, high load 50×4 agents)
- **Total**: 436 tests, all green, TSC clean

### Files changed

- **NEW**: `src/handlers/background-bypass.ts`
- **NEW tests**: `test/session-pending-background.test.ts`, `test/watcher-pending-count.test.ts`, `test/background-bypass.test.ts`, `test/background-bypass-integration.test.ts`, `test/background-bypass-stress.test.ts`
- **Modified**: `src/handlers/message.ts` (bypass wiring + visible queue reply), `src/handlers/async-agent-chunk-handler.ts` (sessionKey + counter increment), `src/services/async-agent-watcher.ts` (sessionKey in PendingAsyncAgent + decrement on delivery), `src/services/session.ts` (pendingBackgroundCount field + _bypassAbortFired flag), `src/services/session-persistence.ts` (counter not persisted — reset on restart), `test/async-agent-chunk-flow.test.ts` (new assertions)
- **Version**: `package.json` 4.12.2 → 4.12.3

---

## [4.12.2] — 2026-04-15

### 🔒 Security patch: file permissions, ALLOWED_USERS hard-fail, exec-guard hardening, CVE updates

This is the first **formal security release** of Alvin Bot, motivated by a comprehensive audit after v4.12.1 production deployment. The audit surfaced real issues that needed fixing before the bot could be safely installed on multi-user dev servers or shared by external users. All fixes are additive and backwards-compatible — existing single-user installs see no behavior change except improved security.

#### CRITICAL CVE — axios 1.14.0 → 1.15.0 (CVSS 10.0)

Transitive dependency via `@slack/bolt`. Two CVEs closed:
- GHSA-fvcv-3m26-pcqx — Cloud Metadata Exfiltration via Header Injection Chain (CVSS 10.0)
- GHSA-3p68-rc4w-qgx5 — NO_PROXY Hostname Normalization Bypass → SSRF

Fix: `npm update @slack/bolt` (4.6.0 → 4.7.0) + `package.json overrides: axios ^1.15.0` to force transitive updates in `@slack/web-api` and `@whiskeysockets/baileys`. Post-fix `npm audit` shows **0 critical, 2 high remaining** (`basic-ftp` HIGH — never invoked by Alvin, `electron` HIGH — devDep only, tracked as Phase 18).

Also updated `@anthropic-ai/claude-agent-sdk` 0.2.97 → 0.2.109 (MODERATE: GHSA-5474-4w2j-mq4c Path Validation Sandbox Escape).

#### CRITICAL — File permissions on sensitive files (0o600)

Pre-v4.12.2 `~/.alvin-bot/.env`, `state/sessions.json`, memory logs, cron-jobs.json were written with the default umask — typically 0o644 on Linux/macOS, meaning any other user on the same machine could read BOT_TOKEN + all API keys, full conversation history, cron prompts, and encrypted sudo credentials.

**Fix**: new `src/services/file-permissions.ts` with `writeSecure()`, `ensureSecureMode()`, `auditSensitiveFiles()`. All `.env` writes in setup-api, doctor-api, server, fallback-order, session-persistence now use `writeSecure()`. Startup audit in `index.ts` chmod-repairs the full sensitive-file list idempotently on every boot.

#### CRITICAL — ALLOWED_USERS startup hard-fail

Pre-v4.12.2 Alvin started with BOT_TOKEN set but ALLOWED_USERS empty with only a console.warn — leaving the bot "configured but unguarded".

**Fix**: new pure gate function `src/services/allowed-users-gate.ts`. `src/index.ts` refuses to start with a clear error message. Two explicit escape hatches: `AUTH_MODE=open` or `ALVIN_INSECURE_ACKNOWLEDGED=1`.

#### HIGH — Webhook bearer token timing-safe comparison

`src/web/server.ts` POST /api/webhook previously used naive `authHeader !== "Bearer " + token` leaking comparison position via timing side-channel.

**Fix**: new `src/services/timing-safe-bearer.ts` wraps `crypto.timingSafeEqual` with strict "Bearer <token>" format, empty-expected rejection, length-mismatch dummy comparison.

#### HIGH — Exec-guard shell metacharacter rejection

`checkExecAllowed()` only inspected the first word — `echo safe; rm -rf /` passed as "echo". Trivially bypassable via `&&`, `|`, `` ` ``, `$(...)`, redirects.

**Fix**: allowlist mode rejects any command containing `;`, `&`, `|`, `` ` ``, `$(...)`, `{...}`, `<`, `>`. Operators who need shell pipelines set `EXEC_SECURITY=full` explicitly.

#### HIGH — Cron shell-job execGuard integration

Pre-v4.12.2 cron `type: "shell"` bypassed the exec-guard entirely. **Fix**: cron.ts case "shell" now calls `checkExecAllowed()` before `execSync()` and sends a blocked-notification on deny.

#### MEDIUM — Sub-agent toolset allowlist (readonly, research)

`SubAgentConfig.toolset` widened from `"full"` to `"full" | "readonly" | "research"`:
- `readonly` → Read, Glob, Grep only (no write, shell, network)
- `research` → readonly + WebSearch, WebFetch
- `full` → unchanged default

New `QueryOptions.allowedTools?: string[]` honored by `claude-sdk-provider`. Other providers ignore it.

#### NEW — `docs/security.md` threat model + hardening guide (279 lines)

First formal security documentation covering: TL;DR safety table, capability surface, attacker model, trust boundaries, hardening step-by-step, shell execution policy, file permissions list, sub-agent presets, prompt injection honesty section, Phase 18 pending work, security issue reporting, incident response playbook. Public doc, shipped with the repo.

#### NEW — README Security section rewrite

Replaced thin bullet list with a boxed warning ("Alvin has full shell + filesystem access") and four sub-sections: access control, execution hardening, data hardening, known limitations. Links to docs/security.md.

#### Testing

**396 tests total** (350 baseline from v4.12.1 + 46 new). All green. Build clean.

- 10 `test/file-permissions.test.ts`
- 7 `test/allowed-users-gate.test.ts`
- 10 `test/timing-safe-bearer.test.ts`
- 13 `test/exec-guard-metachars.test.ts`
- 4 `test/subagent-toolset-allowlist.test.ts`
- 2 extended `test/subagents-toolset.test.ts` (readonly + research)

#### Phase 18 (deferred, tracked in README Roadmap)

- Electron 35 → 41+ upgrade (Desktop build, 6 CVEs)
- Prompt injection defense strategy (design debate, not code filter)
- TypeScript 5 → 6 upgrade
- MCP plugin sandboxing (architectural v5.0)

---

## [4.12.1] — 2026-04-15

### 🐛 Patch: Sync sub-agent timeout + workspace command menu

Three issues from v4.12.0 production use, fixed:

- **Fix (Bug 1)**: `Task`/`Agent` tool calls without `run_in_background: true` were false-aborted after 10 minutes. The Claude Agent SDK runs synchronous sub-agents entirely inside the tool call — the parent stream emits no intermediate chunks during that time, so the flat 10-minute stuck-timer fired on legitimate long-running work. The new task-aware stuck timer detects sync Task/Agent tool calls (tracked by `toolUseId`) and automatically escalates the idle timeout to 120 minutes (configurable via `ALVIN_SYNC_AGENT_IDLE_TIMEOUT_MINUTES`). Once the matching `tool_result` arrives, the timer reverts to the normal 10-minute idle detection for genuine SDK hangs.

- **Mitigation (Bug 2)**: The `BACKGROUND_SUBAGENT_HINT` in `src/services/personality.ts` was rewritten with `⚠️ CRITICAL` framing, a concrete decision-tree structure, an aggressive ~30 second threshold (down from "2 minutes"), and an explicit warning about the Telegram session-blocking consequence. The goal is to get Claude to reliably set `run_in_background: true` when sub-agents will take more than a few seconds, so the main Telegram session doesn't stay blocked while the sub-agent works. This is defense-in-depth on top of the Bug 1 fix — the timer prevents false aborts regardless of Claude's compliance; the strengthened hint reduces how often main-session blocking happens in the first place. Compliance is monitored empirically via logs.

- **Fix (Bug 3)**: `/workspace` and `/workspaces` were registered as Telegram command handlers in v4.12.0 but not added to the `bot.api.setMyCommands` array, so they didn't appear in Telegram's auto-complete menu (the list that pops up when you type `/`). Added both, plus a new "🧭 Workspaces" block in the `/help` text.

#### Architecture details

**NEW `src/handlers/stuck-timer.ts`**: Pure state machine `createStuckTimer({normalMs, extendedMs, onTimeout})` returning `{reset, enterSync, exitSync, cancel}`. Testable in isolation without grammy/session/provider mocks via `vi.useFakeTimers()`. 8 unit tests cover normal fire, enterSync extends, exitSync returns, multi-pending, unknown-id no-op, cancel, reset-while-extended, idempotent enterSync.

**Protocol change in `src/providers/types.ts` + `claude-sdk-provider.ts`**: `StreamChunk` gains a new additive optional field `runInBackground?: boolean`. The provider extracts it from `block.input.run_in_background` **before** the existing 500-char JSON truncation on `toolInput` — this is load-bearing because for long prompts the serialized input can exceed 500 chars, and naive post-truncation parsing would lose the flag and misclassify sync tasks as async. `toolUseId` is now also yielded on `tool_use` chunks (previously only on `tool_result`) so the consumer can correlate tool_use → tool_result for sync tracking. 4 contract-pin tests mock `@anthropic-ai/claude-agent-sdk` with scripted assistant messages to verify the extraction logic.

**Critical ordering in `message.ts`**: State mutation of the pending-sync-task set (`stuckTimer.enterSync` / `stuckTimer.exitSync`) happens **before** `stuckTimer.reset()` in the for-await loop, so the timer arms with the post-mutation state. Inline comment added documenting this invariant.

#### Known limitation (not fixed in v4.12.1)

A Nanosecond-race where the stuck timer fires the same moment a `tool_result` arrives (fundamentally unfixable without `check-before-fire` semantics in `setTimeout`). With the 120-minute extended window the race requires the tool_result to arrive at exactly 120:00:00.000 — practically irrelevant. A proper fix would require rewriting the timer as a state machine with a pre-fire check, deferred to v4.13.0 if it ever matters.

#### Testing

**350 tests total** (330 baseline from v4.12.0 + 20 new). All green, TSC clean.

- 8 `test/stuck-timer.test.ts` — pure state-machine unit tests
- 4 `test/claude-sdk-tool-use-id.test.ts` — contract pins for `toolUseId` + `runInBackground` on tool_use chunks
- 3 new assertions in `test/system-prompt-background-hint.test.ts` (CRITICAL framing, Telegram blocking, 30-second threshold)
- 5 `test/sync-task-timeout.test.ts` — integration tests over realistic timing scales + regression guard for the pre-fix flat-timeout behavior

Live verification after release: local bot restart, Telegram `/` auto-complete shows `/workspace` + `/workspaces`, `curl https://api.telegram.org/bot$TOKEN/getMyCommands` returns the new entries.

#### Files changed

- **NEW**: `src/handlers/stuck-timer.ts`
- **NEW tests**: `test/stuck-timer.test.ts`, `test/claude-sdk-tool-use-id.test.ts`, `test/sync-task-timeout.test.ts`
- **Modified**: `src/providers/types.ts` (`StreamChunk.runInBackground`), `src/providers/claude-sdk-provider.ts` (extract `runInBackground` before truncation, yield `toolUseId` on tool_use), `src/handlers/message.ts` (`createStuckTimer` integration + task-aware flow), `src/services/personality.ts` (`BACKGROUND_SUBAGENT_HINT` rewrite), `src/handlers/commands.ts` (setMyCommands + `/help`), `test/system-prompt-background-hint.test.ts` (3 new assertions)

---

## [4.12.0] — 2026-04-13

### 🧭 Multi-Session + Slack Interface — parallel contexts, per-channel workspaces

A colleague's feature request the same day v4.11.0 shipped: *"Multiple Session und Interface über Slack — wie bei OpenClaw. Du hast mehrere parallele Sessions, die den jeweiligen Kontext voneinander nicht kennen aber in sich einen bestimmten Kontext und Zweck haben. Sie hatten dabei Zugriff auf das gesamte Knowledge (Skills + Memory). Und konnten bei Bedarf eigene agents starten."*

The ultra-analysis revealed Alvin was already ~80% built for this: the Slack adapter existed (355 LOC with `@slack/bolt@4.6.0`), the platform abstraction was clean, `buildSessionKey()` already supported `per-channel` mode, `session.workingDir` was already per-session, sub-agents were already async and session-isolated (v4.10.0), and memory/skills were already globally shared. **The single blocker: one line in `platform-message.ts` that bypassed `buildSessionKey` with a naive `hashUserId(userId)`, collapsing every non-Telegram channel from the same user into one session.**

This release adds a thin workspace layer on top plus Slack polish. **No breaking changes** — if no workspaces are configured, pre-v4.12 behavior is preserved exactly.

#### P0 #1 — Session-Key Fix (`src/handlers/platform-message.ts`)

`handlePlatformMessage` now routes through `buildSessionKey(msg.platform, msg.chatId, msg.userId)` instead of `hashUserId(msg.userId)`. On Slack with `SESSION_MODE=per-channel`, each channel gets its own session. Cross-channel isolation is automatic.

`buildSessionKey` signature widened from `userId: number` to `userId: string | number` so Slack user IDs (`U01ABC...`) pass through unchanged.

**6 unit tests** covering per-channel / per-channel-peer / per-user modes, cross-channel isolation, cross-platform isolation, and backwards compat with numeric Telegram user IDs.

#### P0 #2 — Workspace Registry (`src/services/workspaces.ts`, NEW)

Loads `~/.alvin-bot/workspaces/*.md` markdown files with YAML frontmatter. Each workspace has: `name`, `purpose`, `cwd`, optional `color`/`emoji`, explicit `channels: []` array for ID-based mapping, and a markdown body that becomes the system prompt override.

Hot-reload via `fs.watch()` with 500 ms debounce — same pattern as `src/services/skills.ts`. Changes to workspace files are picked up without a bot restart.

Public API: `loadWorkspaces`, `reloadWorkspaces`, `listWorkspaces`, `getWorkspace`, `getDefaultWorkspace`, `matchWorkspaceForChannel`, `resolveWorkspaceOrDefault`, `initWorkspaces`, `startWorkspaceWatcher`, `stopWorkspaceWatcher`.

**13 unit tests** covering default fallback, single/multi-workspace load, `~` expansion in cwd, channel-ID match, channel-name match, hot-reload, non-`.md` file skipping, malformed frontmatter resilience, missing directory graceful handling.

#### P0 #3 — Workspace Resolver Integration (`src/handlers/platform-message.ts`, `src/handlers/message.ts`)

Both the platform handler (Slack/Discord/WhatsApp) and the Telegram main handler now resolve the incoming message to a workspace before building the system prompt. If the session's `workspaceName` changed vs. the previous turn, `workingDir` is updated and persisted via `session-persistence` (v4.11.0).

`buildSystemPrompt` and `buildSmartSystemPrompt` gained a new optional `workspacePersona` parameter that injects a `## Workspace Persona` section into the system prompt. Empty string = no-op (default workspace).

`UserSession` gained a new `workspaceName: string | null` field. Persisted across restarts via the new v2 envelope format in `sessions.json` (backwards compatible with v4.11 flat format — the loader auto-detects).

#### P0 #4 — Slack Setup Documentation (`docs/install/slack-setup.md`, `docs/install/slack-manifest.json`)

Step-by-step guide: create Slack App from manifest → Socket Mode → App-Level Token → Bot Token → `~/.alvin-bot/.env` → restart → invite bot → create workspace files. Covers troubleshooting for common issues. The `slack-manifest.json` is copy-paste-ready: pre-configured bot user, all required scopes, event subscriptions, Socket Mode enabled. Both files are gitignored (the maintainer's docs/install/ convention) and ship via GitHub Release assets.

#### P1 #1 — Slack Progress Ticker (`src/platforms/slack.ts`)

`SlackAdapter.sendText()` now returns the message `ts` so callers can hold on to it. New `SlackAdapter.editMessage(chatId, messageId, newText)` wraps `chat.update`. Fail-silent: if Slack API errors, the ticker degrades gracefully and the full message still arrives at query end.

`PlatformAdapter` interface: `sendText` return type widened from `void` to `string | void`, optional `editMessage` method added. Existing adapters (Telegram, WhatsApp, Discord, Signal) that don't implement `editMessage` are unaffected.

**3 unit tests** with mocked `@slack/bolt` covering `chat.update` call, `sendText` ts return, and graceful failure handling.

#### P1 #2 — Slack Typing Status + Channel Name Resolution (`src/platforms/slack.ts`)

`SlackAdapter.setTyping()` now calls `assistant.threads.setStatus` so Slack shows "Alvin is thinking…" under the message during long queries. Silently no-ops in channels where the assistant scope isn't granted.

New `SlackAdapter.getChannelName(channelId)` resolves + caches channel names via `conversations.info`. `platform-message.ts` detects this helper via duck-typing on the adapter and passes the resolved name to `resolveWorkspaceOrDefault` — enabling channel-name matching (`#my-project` → `workspaces/my-project.md`) without hardcoding the Slack type in the platform handler.

#### P1 #3 — Telegram `/workspace` + `/workspaces` Commands

Feature parity for Telegram. `/workspaces` lists all configured workspaces with emojis, purposes, and the active one marked ✅. `/workspace <name>` switches the active workspace for the Telegram user; next message uses the new persona and cwd. `/workspace default` resets.

New `session.ts` exports: `getTelegramWorkspace(userId)` / `setTelegramWorkspace(userId, name)` + a module-level `telegramWorkspaces` map persisted via a new v2 envelope format in `sessions.json` (backwards compatible with v4.11 flat format).

**5 new unit tests** covering getter/setter/null-clear, persistence roundtrip, and v4.11 flat-format backwards compat.

#### P1 #4 — Per-Workspace Cost Aggregation (`src/services/session.ts`)

New `getCostByWorkspace()` helper aggregates `session.totalCost` by `session.workspaceName` across all active sessions in memory. Returns per-workspace totals for cost, session count, message count, and tool use count. Used by the Web UI workspace cards.

Sessions with `workspaceName === null` aggregate under `"default"` in the breakdown.

#### P1 #5 — Web UI Workspace Cards (`src/web/server.ts`, `web/public/index.html`, `web/public/js/app.js`)

New `GET /api/workspaces` endpoint returns the workspace registry merged with `getCostByWorkspace()`. Dashboard SPA gains a "🧭 Workspaces" page in the Data section of the sidebar (between Sessions and Files). Cards show emoji, name, purpose, cwd, channel mappings, session count, message count, and cumulative cost — color-coded via workspace frontmatter `color` field.

Default workspace is always included even when no user configs exist, so the UI always shows at least one card.

#### Architecture Decisions

- **Workspace is channel-scoped, not thread-scoped.** Slack channel = workspace. Threads within a channel are continuations of the same session.
- **Memory stays global.** All workspaces share `MEMORY.md`, the Hub memory, and the embeddings index.
- **Provider stays global.** Per-workspace provider override deferred to v4.13.
- **`@slack/bolt@^4.6.0`** is a regular dep, already in `package.json` from a previous branch.
- **Backwards compat is absolute.** If no workspaces exist, `resolveWorkspaceOrDefault` returns the default workspace with empty persona + global cwd. v4.11 flat-format `sessions.json` files still load without migration.
- **v2 envelope format**: `sessions.json` is now `{ version: 2, sessions: {...}, telegramWorkspaces: {...} }`. Loader auto-detects and handles both legacy flat format and new envelope.

#### Testing

**330 tests total** (292 baseline from v4.11 + 38 new). All green. TSC clean.

- 6 platform-session-key unit tests
- 14 workspaces unit + integration tests
- 3 slack-progress-ticker tests (mocked @slack/bolt)
- 5 telegram-workspace-command tests
- 10 multi-session end-to-end stress tests

**Live verified** via `tmp/live-multi-session.mjs` probe against the real `dist/`: 5 parallel workspaces, 5 simulated Slack channels, full persistence roundtrip with v2 envelope, cost aggregation, hot-reload picking up new workspace files, channel-name fallback, telegramWorkspaces map persistence. **All 7 phases passed.**

#### Files changed

- **NEW code:** `src/services/workspaces.ts`
- **NEW tests:** `test/platform-session-key.test.ts`, `test/workspaces.test.ts`, `test/slack-progress-ticker.test.ts`, `test/telegram-workspace-command.test.ts`, `test/multi-session-stress.test.ts`
- **NEW docs (gitignored, in Release assets):** `docs/install/slack-setup.md`, `docs/install/slack-manifest.json`
- **Modified:** `src/handlers/platform-message.ts`, `src/handlers/message.ts`, `src/handlers/commands.ts`, `src/platforms/slack.ts`, `src/platforms/types.ts`, `src/services/session.ts`, `src/services/session-persistence.ts`, `src/services/personality.ts`, `src/paths.ts`, `src/index.ts`, `src/web/server.ts`, `web/public/index.html`, `web/public/js/app.js`
- **Plan:** `docs/superpowers/plans/2026-04-13-multi-session-slack.md`

---

## [4.11.0] — 2026-04-13

### 🧠 Memory Persistence + Smart Loading — sessions survive restart, memory is layered

A colleague asked the same day v4.10.0 shipped: *"Memory after session restart is also a bit fiddly. I installed mempalace as a workaround — maybe build something like that natively."* He was right. Alvin had a hand-curated `MEMORY.md`, a 128 MB embeddings vector index, and an AI-powered compaction service — but **the in-memory `sessions Map` was wiped on every bot restart**. Claude SDK then started a fresh conversation on the next user message, behaving like a goldfish despite all that memory infrastructure on disk.

This release fixes that with **five complementary tasks**, all bundled into v4.11.0. Three core fixes (P0) plus two structural improvements (P1) inspired by mempalace's L0–L3 stack and Mem0's auto-extraction pattern.

#### P0 #1 — Session Persistence (`src/services/session-persistence.ts`, NEW)

The core fix. The `sessions Map` in `src/services/session.ts` was in-memory only; every `launchctl kickstart` wiped every user's `sessionId`, history, language, effort, voiceReply, and tracking counters.

- **Debounced flush** (1.5 s coalesce window) writes a sanitized snapshot of `getAllSessions()` to `~/.alvin-bot/state/sessions.json` via atomic tmp+rename.
- **`loadPersistedSessions()`** rehydrates the Map at bot startup; `flushSessions()` flushes synchronously on graceful shutdown (SIGINT/SIGTERM).
- **`attachPersistHook()` / `markSessionDirty()`** in `session.ts` give handlers a callback to trigger persist after direct mutations (`/lang`, `/effort`, `/voice`). `addToHistory()` and `trackProviderUsage()` trigger it automatically.
- History is capped at `MAX_PERSISTED_HISTORY = 50` per session so the file stays small.
- Runtime-only fields (`abortController`, `isProcessing`, `messageQueue`) are stripped before persisting.
- Schema drift is handled: missing fields fall back to defaults; corrupt JSON loads zero sessions; null root rejected gracefully.
- **9 unit tests** + **18 stress tests** covering 100-session burst, 1000-mutate debounce coalescing, unicode (RTL/ZWJ/astral plane), atomic write recovery from stale `.tmp`, schema drift, hostile JSON, read-only filesystem, simulated bot restart.

#### P0 #2 — MEMORY.md Auto-Inject for SDK (`src/services/personality.ts`)

Before v4.11.0, only non-SDK providers (Groq, Gemini, NVIDIA) got `buildMemoryContext()` injected into their system prompt. The Claude SDK was *expected* to read memory files via tools, but in practice rarely did unless the user's first message specifically prompted it.

- Drops the `!isSDK` guard around `buildMemoryContext()` and asset-index injection.
- SDK now gets the same compact memory context (MEMORY.md + today + yesterday daily logs) at every turn — the same context non-SDK providers had since 4.0.
- **3 unit tests** verifying SDK includes the memory section, non-SDK regression, and graceful behavior when MEMORY.md is missing.

#### P0 #3 — Semantic Recall on SDK First Turn (`src/services/personality.ts`, `src/handlers/message.ts`, `src/handlers/platform-message.ts`)

`buildSmartSystemPrompt()` now accepts an `isFirstTurn` flag. For SDK providers it runs the embeddings-based `searchMemory()` only on the first turn (`session.sessionId === null` — meaning Claude hasn't given us a resume token yet for this session). After the first turn Claude carries the recalled context inside the SDK session via resume, so spamming the embeddings API on every subsequent turn is wasted work. Non-SDK providers still run the search on every turn (no resume mechanism).

- `handlers/message.ts` and `handlers/platform-message.ts` updated to compute `isFirstSDKTurn = isSDK && session.sessionId === null` and pass it through.
- The bare `buildSystemPrompt` calls on the SDK paths are gone — `buildSmartSystemPrompt` is the single entry point.
- **5 mocked-search tests** covering call-count semantics for SDK first/later turns, non-SDK every turn, missing `userMessage` skip, and graceful failure when `searchMemory` throws.

#### P1 #4 — Layered Memory Loader (`src/services/memory-layers.ts`, NEW)

Inspired by mempalace's L0–L3 stack. Replaces the monolithic `MEMORY.md → System Prompt` injection with a structured, token-budgeted layered loader:

- **L0** `~/.alvin-bot/memory/identity.md` — always loaded, ~200 tokens (core user facts: name, location, family, contact)
- **L1** `~/.alvin-bot/memory/preferences.md` — always loaded (communication style, do's and don'ts)
- **L1** `~/.alvin-bot/memory/MEMORY.md` — backwards-compat: existing curated knowledge (full content if no split files exist; truncated to 1500 chars when split files coexist)
- **L2** `~/.alvin-bot/memory/projects/*.md` — loaded only when the user's incoming query mentions the project topic (substring or first-200-char keyword overlap)
- **L3** daily logs — still handled by `embeddings.ts` vector search (unchanged)

The split is **opt-in**: if `identity.md` and `preferences.md` don't exist, the loader falls back to monolithic MEMORY.md exactly like before. No migration required for existing users. Users who want the cleaner layout can split MEMORY.md manually and the loader picks it up automatically. Token budget: L0+L1 capped at 5000 chars (~1300 tokens), L2 capped at 3000 chars total (~750 tokens, max 1500 per matched project file). New `query` parameter on `buildSystemPrompt()` and `buildMemoryContext()` propagates the user message all the way through. **9 unit tests** + 2 layered-context stress tests.

#### P1 #5 — Auto-Fact-Extraction in Compaction (`src/services/memory-extractor.ts`, NEW)

Inspired by Mem0's auto-extraction. When `compactSession()` archives old messages, it now runs an additional extraction pass that pulls structured facts (`user_facts`, `preferences`, `decisions`) out of the archived chunk via the active AI provider and appends them to MEMORY.md.

- **`parseExtractedFacts(text)`** — tolerates JSON wrapped in markdown code fences, surrounding prose, null/undefined fields, non-string entries.
- **`appendFactsToMemoryFile(facts)`** — exact-string dedup against existing MEMORY.md content, structured under `## Auto-extracted (YYYY-MM-DD)` header with `### User Facts` / `### Preferences` / `### Decisions` sub-sections.
- **`extractAndStoreFacts(chunk)`** — safe wrapper, never throws. Opt-out via `MEMORY_EXTRACTION_DISABLED=1` env var. Uses effort=low for cost minimization. Skips short input (<50 chars). Provider failures are swallowed; compaction always continues.
- Wired into `compactSession()` after the daily-log flush, before the AI summary generation.
- Marked **experimental** in v4.11.0. Semantic dedup (vs current exact-string match) deferred to v4.12+.
- **11 unit tests** covering JSON parsing edge cases, dedup, opt-out, short-input skip, garbage input, non-string filtering, graceful provider-failure handling.

#### Architecture decisions

- **mempalace as MCP server: rejected.** Considered installing mempalace as a Python MCP service. Rejected because (1) Alvin is all-TypeScript and adding a 2nd Python service to launchd is operational complexity, (2) Alvin already has an embeddings vector index — mempalace would be a parallel duplicate, (3) mempalace's MCP tools are only consumed by the SDK; cron jobs, sub-agents, and non-SDK providers wouldn't see them. Conclusion: **adopt the patterns natively** (L0–L3 layering, AAAK-style structured extraction) rather than running a second service.
- **SQLite migration deferred.** The 128 MB JSON embeddings index is a known performance issue and is already noted in `~/.claude/projects/-Users-alvin-de/memory/project_alvinbot_sqlite_migration.md` for v4.12+. Orthogonal to the "frickelig nach Restart" UX problem this release targets.
- **Multi-user isolation deferred.** Memories are still global per data dir. Single-user use case, not a privacy concern for the maintainer's setup.
- **Decay/aging deferred.** Daily logs grow monotonically. Will be addressed alongside SQLite migration.

#### Testing

**292 tests total** (237 baseline + 55 new). All green. TSC clean.

- 9 session-persistence unit tests
- 8 SDK memory-injection tests (3 base + 5 smart-prompt mocked-search)
- 9 memory-layers tests (loader + topic match + token budget)
- 11 memory-extractor tests (parse + append + extract pipeline)
- 18 stress tests (100 sessions, schema drift, unicode, atomic recovery, hostile JSON, simulated restart)

**Live verification:**
- `tmp/live-stress-memory.mjs` — 50 fake sessions against the built `dist/`, real ~/.alvin-bot/memory/MEMORY.md as the L1 source, simulated restart via Map clear + reload. Result: 215 KB state file, 1 ms flush, 1 ms reload, 50/50 perfect round-trip.
- `tmp/live-edge-cases.mjs` — 7 hostile scenarios: all-null fields, 1000-burst debounce (2 ms), 20 concurrent flushes, extreme unicode (RTL + ZWJ + astral plane), 4-layer memory with project topic match, atomic write recovery from stale .tmp, empty project file skipping. All passed.

#### Files changed

- **NEW:** `src/services/session-persistence.ts`, `src/services/memory-layers.ts`, `src/services/memory-extractor.ts`
- **NEW tests:** `test/session-persistence.test.ts`, `test/memory-sdk-injection.test.ts`, `test/memory-layers.test.ts`, `test/memory-extractor.test.ts`, `test/memory-stress-restart.test.ts`
- **Modified:** `src/services/session.ts` (persist hook), `src/services/personality.ts` (SDK injection + isFirstTurn), `src/services/memory.ts` (use layered loader), `src/services/compaction.ts` (extractor hook), `src/handlers/message.ts` + `src/handlers/platform-message.ts` (smart prompt wiring), `src/handlers/commands.ts` (`markSessionDirty` calls), `src/index.ts` (load + flush wiring), `src/paths.ts` (4 new constants)
- **Plan:** `docs/superpowers/plans/2026-04-13-memory-persistence.md`

---

## [4.10.0] — 2026-04-13

### 🚀 Async sub-agents — main session no longer blocks during long tasks

The big architecture upgrade: Claude can now delegate long-running work (SEO audits, multi-page research, full-repo analyses) to **background** sub-agents. The main Telegram session ends quickly, the user can keep chatting, and the sub-agent's final report arrives as a separate message when ready.

A colleague flagged the underlying problem on 2026-04-13 via WhatsApp voice note: *"It's weird that the main routine crashes when the sub-agents are still running. It should just run in the background, and that should have zero impact on the main routine."* He was right. OpenClaw had this years ago because back then the SDK didn't support async; today's `@anthropic-ai/claude-agent-sdk@0.2.97` already ships `run_in_background: true` on the Agent tool — Alvin just wasn't using it.

This release closes that gap in two complementary stages, both bundled into the same v4.10.0:

#### Stage 1 — System prompt teaches Claude when to use `run_in_background`

- New `BACKGROUND_SUBAGENT_HINT` constant in `src/services/personality.ts`, injected only into SDK sessions (non-SDK providers don't have an Agent tool).
- The hint tells Claude: for audits / multi-page research / >2 min tasks → ALWAYS set `run_in_background: true`. After launching, end the turn promptly. The bot delivers the result automatically when done.
- Net effect: Claude's main turn ends in ~5 s instead of 10+ minutes. `session.isProcessing` flips to `false` quickly so the user can keep chatting.

#### Stage 2 — Async-agent watcher polls and delivers

The hard part. Three new pure modules + one new wired-up service:

- **`src/services/async-agent-parser.ts`** (NEW, pure) — two helpers:
  - `parseAsyncLaunchedToolResult(text)` extracts `agentId` + `output_file` from the SDK's plain-text `Async agent launched successfully…` tool-result. **Important**: the `.d.ts` type in the SDK package claims this is a JSON object with `outputFile: string`. The runtime actually emits plain text with `output_file` (snake_case). Captured live via probe — see the parser test fixtures.
  - `parseOutputFileStatus(path)` tail-reads (64 KB) the JSONL `output_file` and detects completion by finding the most-recent `assistant` message with `stop_reason: "end_turn"`. Concatenates `content[].text` blocks for the final answer. Token usage extracted from the `usage` field. Survives partial last lines, garbage lines, and tail-cuts on huge files. **19 unit tests** including a 200 KB tail-test.
- **`src/services/async-agent-watcher.ts`** (NEW) — the polling service. `Map<agentId, PendingAsyncAgent>` in memory, persisted to `~/.alvin-bot/state/async-agents.json` for restart catch-up (same pattern as v4.9.0 cron scheduler). Public API: `startWatcher` / `stopWatcher` / `registerPendingAgent` / `pollOnce` / `listPendingAgents`. Polls every 15 s, gives up after 12 h per-agent (timeout banner). On completion → builds a `SubAgentInfo + SubAgentResult` and hands off to the existing `subagent-delivery.ts` from v4.9.x. **7 integration tests** including bot-restart catch-up.
- **`src/handlers/async-agent-chunk-handler.ts`** (NEW) — bridge between provider stream chunks and the watcher. Inspects `tool_result` chunks for the async_launched payload, extracts the `description` from the immediately preceding `tool_use` chunk, registers with the watcher. **4 unit tests**.
- **`src/providers/claude-sdk-provider.ts`** — extended to surface `tool_result` blocks from SDK `user` messages as a new `tool_result` chunk type. Previously the provider only emitted `text` and `tool_use` chunks.
- **`src/providers/types.ts`** — `StreamChunk` gets two new optional fields: `toolUseId` and `toolResultContent`.
- **`src/handlers/message.ts`** — captures `lastAgentToolUseInput` from each `tool_use` chunk and consumes it on the immediately-following `tool_result` chunk. Tool-name match also extended from `"Task"` → `"Task" | "Agent"` (the SDK renamed it in v2.1.63).
- **`src/index.ts`** — `startAsyncAgentWatcher()` after the cron scheduler, `stopAsyncAgentWatcher()` in the shutdown handler.
- **`src/paths.ts`** — new `ASYNC_AGENTS_STATE_FILE` constant under `~/.alvin-bot/state/`.

#### Investigation artifacts (gitignored, maintainer-local)

- `docs/superpowers/plans/2026-04-13-async-subagents.md` — full TDD plan
- `docs/superpowers/specs/sdk-async-agent-outputfile-format.md` — live-captured SDK format spec; documents the `.d.ts` mismatch that ate ~30 minutes of debugging time

#### Testing

**237 tests total** (201 baseline + 36 new). All green. TSC clean.

- 6 system-prompt-hint tests (Stage 1)
- 19 parser tests (8 plain-text format + 11 JSONL format including 200 KB tail-test)
- 7 watcher integration tests (register, deliver, persistence, restart catch-up, timeout, concurrent agents)
- 4 chunk-handler unit tests

Live-verified via isolated SDK probe (`node sdk-probe.mjs` inside the repo) which confirmed the real `output_file` path and JSONL format match the parser's expectations.

#### What you'll see as a user

Send: *"Make a SEO audit of example.com and example.com in parallel"*

- **0 s** — Claude responds: *"Starting both audits in the background — I'll send the reports when done."* Main session **unlocks**.
- **1–10 min later** — You can chat about anything else. The bot answers immediately.
- **~13 min** (when each agent finishes) — Two separate banner messages arrive: *"✅ SEO audit example.com completed · 13m 17s · 2.6M in / 28k out"* + the full report body, delivered via the v4.9.3 Markdown→plain-text fallback path.

#### Non-goals

- No session-mutex refactor (Stage 3 from the analysis, out of scope here)
- No replacement for Alvin's existing cron `spawnSubAgent` system (different use case)
- No SDK upgrade beyond `0.2.97`

#### Compatibility

- `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in `.env` disables background mode at the SDK level → Stage 1 hint becomes inert, watcher idles; foreground behavior is restored

## [4.9.4] — 2026-04-13

### 🔌 Web UI fully decoupled from main bot — port conflicts no longer crash anything

Colleague feedback (WhatsApp voice note, 2026-04-13):
> *"The gateway binds to port 3100 like OpenClaw. When the bot restarts,
> the port is often still held → catastrophic crash. I ended up
> decoupling the gateway process completely, because the actual bot
> runs independently of the gateway — it can still answer Telegram
> even if the web endpoint isn't reachable yet. It's weird that the
> main routine crashes when the port is busy. It should just run in
> the background, watch for the port to become free, and connect
> then. Zero impact on the main routine."*

He was right. My v4.9.0 `stopWebServer()` fix was *prevention* — it stopped the bot itself from holding 3100 across restarts. But it didn't cover the *resilience* side: a foreign process holding 3100 (another dev server, an OpenClaw-style orphan, a TIME_WAIT race after SIGKILL) still crashed the boot, because `startWebServer()` was synchronous and the `uncaught exception` from `server.listen()` escaped to the main event loop.

**Complete rewrite of the bind loop:**

- **`src/web/bind-strategy.ts` (new) — pure decision helper.** `decideNextBindAction(err, attempt, opts)` returns either `{type: "retry-port", port, attempt}` (climb the ladder) or `{type: "retry-background", delayMs, port}` (back off, retry the original port in 30 s). EADDRINUSE with attempts remaining → ladder. EADDRINUSE exhausted → background. Any other error → background. 8 unit tests covering every branch + purity.

- **`src/web/server.ts` startWebServer — non-blocking, fresh-server-per-attempt.** Returns `void` synchronously, NEVER throws, NEVER blocks on bind. Each attempt creates a new `http.Server` (no state-recycling bugs) and attaches its own error handler. On failure, cleans up and calls `decideNextBindAction` to decide the next move. If the ladder is exhausted, schedules a 30 s background retry at the original port — the Telegram bot keeps running the whole time, the web UI just isn't reachable yet.

- **`src/web/server.ts` WebSocketServer attached POST-bind.** The `ws` library's `WebSocketServer` constructor installs its own event plumbing on the underlying `http.Server` and — crucially — causes EADDRINUSE errors to escape as uncaught exceptions when attached pre-listen. Debugging this chewed an hour on 2026-04-13. Fix: only `new WebSocketServer({ server })` AFTER `listen()` has fired its callback. The unit-test `test/web-server-integration.test.ts "when the primary port is taken"` pins this behaviour.

- **`src/web/server.ts` error handler: `on` not `once`.** Previous version used `.once("error", handler)` and a node edge case where a single bind failure emits TWO error events left the second one uncaught. Handler is now `on` with a `handled` guard — idempotent, and a post-bind quiet logger replaces it on success.

- **`src/web/server.ts` defensive try/catch around `server.listen()`.** In the wild Node sometimes throws synchronously for edge-case binds (already-listening, invalid backlog, kernel race). The catch funnels sync throws through the same `handleBindFailure` path as async error events.

- **`src/web/server.ts` `closeHttpServerGracefully(server)` + `stopWebServer()`.** The old `stopWebServer(server)` took an explicit server arg; it's been split into a low-level helper (`closeHttpServerGracefully(server)`, exported for tests) and a stateful top-level (`stopWebServer()`, no args, cleans up `currentServer` + `wsServerRef` + `bindRetryTimer`). Safe to call before start, safe to call twice, cancels pending background retries.

- **`src/index.ts` call sites adjusted.** `const webServer = startWebServer()` → `startWebServer()`. `stopWebServer(webServer)` → `stopWebServer()`. The comment above the call explains the decoupling so nobody accidentally re-couples it in a future "clean up" refactor.

**Testing: 186 → 201 (+15 new).**

- `test/web-server-resilience.test.ts` — 8 unit tests for `decideNextBindAction`
- `test/web-server-integration.test.ts` — 7 real-server integration tests: startWebServer returns void, binds, stops, is idempotent, survives primary-port conflict by climbing the ladder, closes servers with hanging sockets.
- **Live-verified on the maintainer's machine**: `launchctl unload` + dual-stack Node hog on port 3100 + `launchctl load` → bot booted cleanly → out.log contained `[web] port 3100 busy (EADDRINUSE) — trying 3101` → `🌐 Web UI: http://localhost:3101   (Port 3100 was busy, using 3101 instead)` → Telegram responsive throughout. Exactly what the colleague described.

**Non-goals / intentionally unchanged:**
- Timeouts stay unlimited (v4.8.8 behaviour preserved).
- The primary port is still `WEB_PORT || 3100` — no config schema change.
- When the bot binds on a non-primary port (e.g. 3101), the README permalink still points at 3100. Users hitting a ladder-climbed bot should check the startup log; this is rare and temporary.

## [4.9.3] — 2026-04-11

### 🛠 Two UX bugs found in production after v4.9.2 — now closed

the maintainer triggered `/cron run Daily Job Alert` after the v4.9.2 deploy and saw 13 minutes of chat silence followed by nothing. Forensics on the live bot revealed two distinct problems on top of an already-successful run:

**1. `subagent-delivery` has been silently dropping every banner for days.** Err.log: `GrammyError: Call to 'sendMessage' failed! (400: Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 2636)`. The daily-job-alert sub-agent produces markdown-dense output (`|` tables, `**bold**`, `\|` escapes, mixed asterisks). Telegram's Markdown parser refuses it, `api.sendMessage(..., parse_mode: "Markdown")` throws, and the bare try/catch in `deliverSubAgentResult` logs + bails. **Result: the user has never seen a sub-agent-delivery banner, even when the underlying run succeeded perfectly and emailed the HTML report correctly.**

Fix in `src/services/subagent-delivery.ts`: new `sendWithMarkdownFallback()` helper that detects the "can't parse entities" pattern and retries the SAME text without `parse_mode`. All three code paths (file-upload case, single-message case, chunked case) now flow through the helper. 3 new tests drive the happy path, non-parse errors, and the chunked path.

**2. `/cron run` had zero proof-of-life for 13 minutes.** The handler used to `await runJobNow(...)` synchronously and reply only when finished. Telegram's typing indicator expires after 5s. Users saw: command sent → typing indicator blip → nothing → nothing → (much later, if at all) result. For cron jobs that take 10-15 min (daily-job-alert, trading-bot-health, p-and-l-summary), this is indistinguishable from a dead bot.

Fix — new handler flow:

```
bot:  🚀 Started *Daily Job Alert* — working…          ← instant ack
bot:  🔄 Running *Daily Job Alert* · 1m 0s elapsed…    ← edit every 60s
bot:  🔄 Running *Daily Job Alert* · 2m 0s elapsed…    ← edit
...
bot:  ✅ Done — *Daily Job Alert* · 13m 17s             ← final edit
bot:  ✅ *Daily Job Alert* completed · 13m · 2.6M/28k  ← subagent-delivery
       [full report body, Markdown-safe with plain-text fallback]
```

The ticker uses a single `editMessageText` call per minute on the same message — zero notification spam, clean visual progress. Every edit is wrapped with `isHarmlessTelegramError` so the inevitable "message is not modified" races stay silent. The ack itself falls back to plain text if the first `reply` hits a parse error, and the final edit falls back to a fresh plain message if the edit fails.

New module: `src/handlers/cron-progress.ts` with pure helpers — `formatElapsed`, `escapeMarkdown`, `buildTickerText`, `buildDoneText`. 8 tests cover the formatting rules and markdown-safety escapes so future cron jobs with weird names (`weird_job*name`) can't break the ticker.

**186 tests total** (+11 new). All green. Timeouts remain unlimited.

**What you see after this upgrade:**
- Instant "🚀 Started" ack on `/cron run`
- Live elapsed-time ticker every minute
- Final "✅ Done" when the sub-agent finishes
- A separate banner+body message with the full report — **this time actually delivered**, even when the body contains broken Markdown

## [4.9.2] — 2026-04-11

### 🔍 Post-review polish: three edge cases from the strict audit

A self-audit of the v4.9.0 + v4.9.1 batch surfaced three real-but-rare edge cases. None of them are user-visible on the happy path, but all three are two-line defensive fixes that make the stability story airtight. Verified under a live stress test: 4 back-to-back `launchctl kickstart -k` restarts produced clean beacon accounting (`crashCount=3/10, daily=5/20`), zero EADDRINUSE, zero false brake, 3.8 ms Web UI response after every boot. **175 tests total (9 new stress scenarios).**

**Issue A — watchdog brake must always halt the boot, even if `writeAlert` silently fails**
`src/services/watchdog.ts`. The old brake path called `writeAlert(...)` then `checkCrashLoopBrake()`, and the latter only exits if the alert file exists. If `writeAlert` hit a disk-full or permission error, the alert file wasn't created, `checkCrashLoopBrake` returned as a no-op, and the startup code continued past the brake — exactly the wrong behaviour for the one code path where we know the bot is in a bad state. Added an unconditional `process.exit(3)` after `checkCrashLoopBrake` so the brake is now a hard guarantee.

**Issue B — `bot.stop()` must be awaited so Telegram offset-commits actually fire**
`src/index.ts`. The shutdown handler called `if (bot) bot.stop();` without `await`, then raced `stopWebServer` in parallel and `process.exit(0)`'d. Grammy's `bot.stop()` commits the pending Telegram update-offset before resolving — without the await, the next boot could reprocess the last batch of messages. Now awaited with a catch-and-log wrapper so shutdown doesn't hang on a grammy-internal error either.

**Issue C — `runJobNow` defensive belt around `executeJob`**
`src/services/cron.ts`. `executeJob` has its own try/catch that converts every error into `{output, error}`, so in practice `runJobNow` never sees a throw. But a future refactor could remove that inner catch, and a leaked throw here would skip `runningJobs.delete` and permanently wedge the guard for that job. Added an inner try/catch in `runJobNow` that catches any thrown `executeJob` error and surfaces it as `{status: "ran", error}`, preserving the typed contract the `commands.ts` handler relies on. Two new tests (`cron-runjobnow-throw.test.ts`) verify both the error-propagation and the guard-cleanup invariants.

**Stress scenarios added** (`test/stress-scenarios.test.ts`, 9 tests):
1. **Port churn** — 20 open/close cycles with 5 hanging clients each, all <2s, port reusable afterward.
2. **Scheduler catchup chain** — 50-job mixed list (10 interrupted, 10 completed, 10 stale, 10 disabled, 10 fresh). `handleStartupCatchup` rewinds exactly the 10 interrupted, no false positives.
3. **Watchdog daily-cap escalation** — 19 crashes spaced 70 min apart (outside short window, inside 24h). The 20th crash trips the daily brake even though the short window is clean.
4. **Concurrent runJobNow guard** — 5 parallel async calls → 1 "ran" + 4 "already-running", never double-fire.
5. **Telegram error filter cross-check** — 7 benign patterns + 10 real errors, no false positives / false negatives, grammy `description` field handled.
6. **Cron resolver ambiguity** — exact-case wins over CI collision, ID wins over name collision, mixed case with 2 CI matches returns null.

## [4.9.1] — 2026-04-11

### 🐛 `/cron run <name>` accepts the job name, not just the opaque ID

Reported via screenshot: `/cron run Daily Job Alert` replied with `❌ Job not found.` because `runJobNow(id)` only matched against `job.id` — the random base-36 string (`mn90rrsndzto`) that nobody types. Worse, when Claude tried to trigger the same job through a natural-language request in an earlier session, it retried with different variants until one happened to succeed — and the absence of a re-entry guard in `runJobNow` meant the retries sometimes spawned a second parallel sub-agent, producing the "ups… wurde doppelt ausgeführt" message.

**Fix — pure resolver + guard, wired into the public API:**

- **`src/services/cron-resolver.ts` (new).** Two pure helpers:
  - `resolveJobByNameOrId(jobs, query)` — priority: exact ID > exact name > unique case-insensitive name > `null` on miss/ambiguous.
  - `runJobNowGuard(id, isRunning, run)` — higher-order re-entry check, testable without the scheduler loop.
- **`src/services/cron.ts` runJobNow**. Now returns a typed outcome (`not-found` | `already-running` | `ran`), consults the `runningJobs` set (previously only the scheduler loop did), and — when it actually runs — persists `lastAttemptAt` / `lastRunAt` / `runCount` / `lastResult` / `lastError` exactly like the scheduler path, so manual triggers show up in the timeline instead of vanishing.
- **`src/handlers/commands.ts /cron run`**. Matches against name OR ID, prints a helpful "Available:" list on miss, and announces the already-running case instead of silently double-firing.
- **10 new tests** (`test/cron-run-resolver.test.ts`) covering exact ID, exact name, case-insensitive, trimmed input, miss, ambiguity, ID-over-name preference, and both guard branches. **164 tests total.**

**What this also quietly fixes:** natural-language triggers ("Alvin, run the daily job alert"). When Claude invokes `/cron run Daily Job Alert` via its own turn, the command now succeeds on the first try — no retry cascade, no double execution.

## [4.9.0] — 2026-04-11

### 🛡 Stability batch: crash-loop eliminated, cron jobs restart-resistant, cleaner logs

Production users reported a daily-job-alert that "kept crashing" — the cron job triggered at 08:00, died mid-execution, and the next scheduled run silently disappeared until the next day. Root cause was not a single bug but a chain of four: the HTTP Web UI never released its port on shutdown → `EADDRINUSE :::3100` uncaught crash-loop → the cron scheduler persisted `nextRunAt = null` pre-execution → restart rewrote it to "tomorrow 08:00" → the run was lost. In parallel, sub-agents that ended on a tool call reported "completed" with only the pre-tool text as output, and grammy's "message is not modified" races leaked into Telegram replies as `Fehler: Call to 'editMessageText' failed!`.

This release closes the whole chain, adds the Tier 0 of the browser fallback, and installs timestamped logs so future forensics don't need timestamp-free grep archaeology.

**Pure functions extracted for isolated testing** (36 new tests, 154 total):

- `src/services/cron-scheduling.ts` — `prepareForExecution(job, now)` and `handleStartupCatchup(jobs, now, graceMs)`. The old scheduler set `nextRunAt = null` before `await executeJob(job)` and only recomputed after completion. A crash mid-execution left `nextRunAt = null`; the next boot recomputed it from the current time → always landed on tomorrow's trigger. Now `prepareForExecution` persists the NEXT regular trigger BEFORE running, and stamps `lastAttemptAt`. If `lastAttemptAt > lastRunAt` at boot and the attempt is ≤ 6 h old, `handleStartupCatchup` rewinds `nextRunAt` to `now` so the next tick picks it up. New `CronJob.lastAttemptAt` field (`number | null`).
- `src/services/watchdog-brake.ts` — `decideBrakeAction(prev, now, opts)` and `shouldResetCrashCounter(uptimeMs, opts)`. The old brake reset `crashCount` after 5 minutes of clean uptime, which was shorter than the typical sub-agent lifetime — chronic crashes with 5–10 min gaps passed the brake indefinitely. New policy: **1 h clean uptime required for reset**, plus a hard **20 crashes / 24 h** daily cap alongside the existing 10 crashes / 10 min short-window cap. Both counters persist in the beacon.
- `src/util/debounce.ts` — trailing-edge debounce for fs.watch coalescing.
- `src/util/console-formatter.ts` — `installConsoleFormatter()`: monkey-patches `console.log/warn/info/error` to prefix every line with an ISO timestamp, and drops libsignal "Closing session" multi-line SessionEntry dumps + `[claude] Native binary` spam that were pushing tens of KB per day into `alvin-bot.out.log` / `alvin-bot.err.log`.
- `src/util/telegram-error-filter.ts` — `isHarmlessTelegramError(err)`: single source of truth for benign grammy races (`message is not modified`, `query is too old`, `message to edit not found`, `MESSAGE_ID_INVALID`, …).
- `src/services/browser-webfetch.ts` — `webfetchNavigate(url, opts)` + `parseTitle(html)` + `WebfetchFailed`: Tier 0 of the browser fallback chain. Plain `fetch()` instead of Playwright for static pages.
- `src/platforms/whatsapp-auth-helpers.ts` — `makeResilientSaveCreds(authDir, inner)`: wraps baileys' `saveCreds` so an ENOENT from a vanished auth dir transparently recreates the directory and retries once.

**Fixes wired into the existing modules:**

- **`src/web/server.ts` — new `stopWebServer(server)`.** Closes WebSocket clients, calls `closeIdleConnections()` + `closeAllConnections()` (Node 18.2+) so long-poll clients can't stall the shutdown, then awaits `server.close()`. Called from `shutdown()` in `src/index.ts`. Before this fix, launchd restarted the bot → new process tried `server.listen(3100)` → `EADDRINUSE` → uncaught exception → exit → launchd again. Classic crash-loop. **This single fix stops the chain.**
- **`src/services/cron.ts`** — scheduler rewired to call `prepareForExecution` pre-execution and `handleStartupCatchup` at boot. `lastResult` truncation bumped from 500 → 4000 chars so post-mortem is possible without running the job again.
- **`src/services/watchdog.ts`** — beacon schema extended with `dailyCrashCount` + `dailyCrashWindowStart`; `startWatchdog` now delegates the brake decision to the pure `decideBrakeAction`. Recovery timer still fires, but only resets the counter if `shouldResetCrashCounter` agrees (≥ 1 h uptime).
- **`src/services/subagents.ts`** — `runSubAgent` now reads `finalText` from the `done` chunk as the authoritative final output (was ignored before), preserves buffered text when the stream emits an `error` chunk, and — most importantly — keeps `finalText` when the catch handler fires (was `output: ""`, throwing away multi-minute runs). Variable scope moved outside the try block. New `error` status branch for mid-stream provider failures.
- **`src/services/subagent-delivery.ts`** — `buildBanner` now renders `⚠️ completed · empty output` for the "successful run with zero text" case so truncated runs are immediately visible instead of hiding behind a green tick.
- **`src/services/skills.ts`** — `fs.watch` callbacks wrapped in `debounce(…, 300)` so macOS FSEvents duplicates coalesce into one reload.
- **`src/services/browser-manager.ts`** — new `webfetch` tier added as default for non-interactive tasks. `resolveStrategy` cascade is now `webfetch → hub-stealth → cdp → gateway → cli`. `navigate()` has an error-based fallback: if `webfetch` throws (403, 5xx, content-type mismatch), it transparently upgrades to `hub-stealth` then `cli` before giving up.
- **`src/platforms/whatsapp.ts`** — `saveCreds` wrapped in `makeResilientSaveCreds` so a vanished auth dir self-heals instead of becoming an unhandled rejection.
- **`src/handlers/message.ts`, `src/services/telegram.ts`, `src/index.ts` (bot.catch + streaming finalize)** — all three call sites that used to ship the raw grammy error to users now route through `isHarmlessTelegramError`. The `Fehler: Call to 'editMessageText' failed!` noise that 2-3 users per day were seeing is gone.

**What is NOT changed:**

- **Timeouts.** The v4.8.8 `defaultTimeoutMs = -1` (unlimited) behavior is preserved. Sub-agents and cron jobs can still run as long as they need.
- **The cron job `payload.prompt`s.** Users' existing cron definitions keep working unchanged.
- **The beacon file format back-compat.** Old beacons without the daily counters are read correctly; the new fields are seeded to 0/now on first boot.

**How to verify after update:**

1. `launchctl unload ~/Library/LaunchAgents/com.alvinbot.app.plist && launchctl load ~/Library/LaunchAgents/com.alvinbot.app.plist`
2. Tail `~/.alvin-bot/logs/alvin-bot.out.log` — every line should now carry an ISO timestamp and libsignal SessionEntry dumps should be gone.
3. Check `~/.alvin-bot/state/watchdog.json` — should contain `dailyCrashCount` / `dailyCrashWindowStart` within a minute.
4. Send `/cron run Daily Job Alert` — subagent-delivery banner should render fully, `~/.alvin-bot/cron-jobs.json` should show `lastAttemptAt` and a post-execution `lastRunAt`.
5. Trigger a deliberate edit race (double-tap an inline button quickly) — no `Fehler: Call to 'editMessageText' failed!` reply should land in the chat.

## [4.8.9] — 2026-04-11

### 🐛 Browser automation: dead `browse-server.cjs` path removed, 3-tier router now the source of truth

The `browse` skill used to instruct the agent to start `node scripts/browse-server.cjs` on port 3800 for every browser task. That file was deleted in an earlier cleanup (see `20283c9` for the original 577-line version — now gone), but `skills/browse/SKILL.md` was never updated. Result: any browser-related user message on Telegram — or any cron job that hit the skill — got a system-prompt injection telling it to call a gateway that didn't exist, producing half-failed runs like the "Daily Job Alert" cron that couldn't load LinkedIn or StepStone.

**What changed:**

- **`skills/browse/SKILL.md` — full rewrite.** Now documents the hub 3-tier router at `~/.claude/hub/SCRIPTS/browser.sh`:
  - **Tier 0** — WebFetch / `curl` for static pages and APIs
  - **Tier 1** — `browser.sh stealth <url>` (Playwright + stealth plugin, headless, Cloudflare-masking)
  - **Tier 2** — `browser.sh cdp {start|goto|shot|tabs|stop}` (real Chrome with persistent profile at `~/.claude/hub/BROWSER/profile/`, login cookies survive restarts)
  - **Tier 3** — Claude-in-Chrome extension via MCP tools (interactive CLI only)
  - Explicit escalation ladder (WebFetch → stealth → CDP → ask the maintainer to log in) and a `NIEMALS browse-server.cjs nutzen` anti-rule.
  - Concrete working targets (StepStone ✅, Michael Page ✅, LinkedIn ✅ with login, Indeed ❌) so the agent knows what to try where.

- **`src/services/browser-manager.ts` — hardened fallback chain.** The multi-strategy manager already had the right *shape* (`gateway → cdp → hub-stealth → cli`) but several ops silently broke or hung:
  - **`gatewayRequest` now has a 15 s timeout** (`req.destroy` on elapse). Previously a hung gateway would wedge the caller forever.
  - **CDP fallback for interactive ops.** `click`, `fill`, `type`, `press`, `scroll`, `evaluate`, `info`, and `getTree` used to hard-throw `"requires gateway"` when `browse-server.cjs` wasn't running. They now try the gateway first, then a short-lived `chromium.connectOverCDP()` via a new `withCdpPage()` helper that reuses the maintainer's live Chrome on port 9222. Refs are interpreted as CSS selectors when gateway is absent.
  - **Explicit PNG extension** on auto-generated screenshot filenames (`shot_<ts>.png`) so Playwright's format inference is unambiguous.
  - **Better error messages** — every "needs interactive" throw now includes the exact command to start CDP Chrome (`~/.claude/hub/SCRIPTS/browser.sh cdp start headless`).

- **`src/paths.ts` — `HUB_BROWSER_SH` constant.** New absolute path to `~/.claude/hub/SCRIPTS/browser.sh` so the manager can shell out without hard-coding `os.homedir()` inline.

**Why this matters:** `browser-manager.ts` is still not wired into any bot code path (it's future-proofing), so the production fix for user-interactive flows is `SKILL.md`. The manager hardening ensures that when it does eventually get wired into a sub-agent tool, it won't hang on missing gateways or lose all interactive capability when only CDP is available.

**Testing:** Tier 1 stealth end-to-end against `stepstone.de/jobs/it-delivery-director` → 1.2 MB HTML, title parsed. Module-level integration test: `navigate('https://example.com')` via auto-selected hub-stealth → correct title/URL. `resolveStrategy('gateway')` → cascades to CDP with visible warning. `info()` via CDP fallback → returns live Chrome state without throwing. Skills reload picks up the new SKILL.md (5977 chars), `matchSkills("browse linkedin")` hits the browse skill, `buildSkillContext("open stepstone.de")` injects the 3-tier guidance block.

## [4.8.8] — 2026-04-11

### ✨ Unlimited sub-agent & cron timeouts (user-configurable)

Sub-agents and `ai-query` cron jobs used to hard-cap at 5 minutes (`SUBAGENT_TIMEOUT=300000` default), and `shell` cron jobs at 60 s. Long-running research, deep-dive audits, or anything that crossed the threshold got killed mid-stream with `status: "timeout"`. 4.8.8 flips the default to **unlimited** and lets the user override both globally and per job.

**What changed:**

- **Default is now infinite.** `src/config.ts` seeds `subAgentTimeout` from `SUBAGENT_TIMEOUT` env or falls back to `-1` (unlimited). The runtime value lives in `~/.alvin-bot/sub-agents.json` as `defaultTimeoutMs` and is changeable at runtime without restart.
- **New `/subagents timeout` command.** `/subagents timeout` shows the current value; `/subagents timeout 3600` sets 1 h; `/subagents timeout off` (or `-1`, `0`, `unlimited`, `infinite`) disables the cap entirely. The default-status output now includes a `⏱ Timeout` line.
- **Per-job override on cron.** `/cron add 1h ai-query "deep audit" --timeout off` gives this one job no timeout. `/cron add 5m shell "pm2 ls" --timeout 30` caps this shell at 30 s. Omitting `--timeout` inherits the current global default. Same flag exists on `scripts/cron-manage.js add --timeout <sec|off>`.
- **`CronJob.timeoutMs` field.** Optional number in `cron-jobs.json`. Undefined = inherit global default. Value ≤ 0 = unlimited.
- **Semantics.** `spawnSubAgent` now only arms the `setTimeout(abort)` when `timeout > 0`. At ≤ 0, no abort timer is created, existing `if (timeoutId) clearTimeout(…)` call sites are null-safe, and the agent runs until it finishes, is cancelled via `/subagents cancel`, or the process dies.
- **Shell cron unchanged behaviour preserved.** If the shell job has no `timeoutMs`, `execSync` is called without a `timeout` option, which Node treats as infinite — same effect as before was *meant* to provide, but the old hard-coded 60 s removed that freedom.

**ENV var still works but is seed-only.** `SUBAGENT_TIMEOUT=600000` at startup still seeds the config on first load, but the persisted value in `sub-agents.json` wins after that.

### 🐛 Silenced harmless `message is not modified` Telegram errors

Occasionally the maintainer would see a red banner at the bottom of an Alvin message:

> Error: Call to 'editMessageText' failed! (400: Bad Request: message is not modified: specified new message content and reply markup are exactly the same as a current content and reply markup of the message)

It never broke anything, but it polluted logs and showed up as an "internal error" reply to the user. Root cause: Telegram's Bot API refuses `editMessageText` when the new content + reply markup are byte-identical to the existing message. This happens legitimately in callback handlers — e.g. tapping a cron-toggle button twice, re-rendering a sudo/keys/platforms menu, language-switch callbacks that render the same content, or stream flushes where the throttled partial hasn't changed since the last edit.

**Fix**: `bot.catch()` in `src/index.ts` now filters out this specific error early. Two regex patterns (`/message is not modified/i` and `/specified new message content.*exactly the same/i`) cover both variants Telegram sends. Real errors (network, SDK, provider failures) still log and still surface the "internal error" reply to the user — only this one harmless class gets dropped.

### 📝 CLAUDE.md: PM2 references updated to launchd

The project `CLAUDE.md` still said *"PM2: `alvin-bot` Prozess, Config in `ecosystem.config.cjs`"* — outdated since the 4.8.6 switch to launchd. Updated to reflect the actual process manager (`~/Library/LaunchAgents/com.alvinbot.app.plist`, `KeepAlive=true`, `RunAtLoad=true`), the log paths, and a note that `watchdog.ts` only brakes process crash-loops — it does **not** kill long-running sessions or sub-agents. `ecosystem.config.cjs` is now labelled legacy.

The global `~/.claude/CLAUDE.md` was also corrected: `alvin-bot` was removed from the VPS PM2-process list (it runs locally, not on the VPS) and the cron-hub note now correctly says "als **launchd LaunchAgent**".

## [4.8.7] — 2026-04-11

### 🐛 `/update` now detects stale-runtime (rebuild without restart)

Caught immediately after publishing 4.8.6 on the Mac mini: `/update` reported "Already up to date — no new commits" even though the running process was on **v4.8.5** while the disk was already built at **v4.8.6**. The user could see the version mismatch in `/status` (v4.8.5) but `/update` refused to acknowledge it.

**Root cause**: The updater only compared **git commits** (or **npm registry version**) against the local install. It never checked whether the **running process's in-memory version** was older than the **on-disk built version**. This is the dev/CI loop scenario:

1. You edit src/, bump package.json, commit + push
2. `npm run build` regenerates dist/ at the new version
3. The running process has the OLD code in memory
4. You run `/update` in Telegram
5. git: HEAD == origin/main (just pushed) → 0 commits behind → "up to date"
6. Process never restarts → keeps running OLD code

**Fix**: New `isRuntimeStale()` check at the very start of `runUpdate()`. Compares `BOT_VERSION` (in-memory at process start) against `package.json.version` from disk via the existing semver compare. If disk is newer, returns success with `requiresRestart=true` immediately — skip the git/npm fetch entirely, just signal a restart so the fresh code takes effect.

After 4.8.7, running `/update` after a manual rebuild will correctly say *"Disk is already built at vX, running vY. Restarting to pick up the new code..."* and trigger the restart.

### ✨ Internal watchdog with crash-loop brake (`src/services/watchdog.ts`)

the maintainer asked for "derbe persistent" — already 95% there with `KeepAlive: true` from 4.8.6, but the missing piece was a brake to stop the bot from infinite-restart-looping if a deterministic crash happens (corrupt state file, missing dependency, broken upgrade).

**New module**: `src/services/watchdog.ts`. Two responsibilities:

**1. Liveness beacon**. Every 30 s the bot writes `~/.alvin-bot/state/watchdog.json` with `{lastBeat, pid, bootTime, crashCount, crashWindowStart, version}`. Fast disk write, no I/O blocking.

**2. Crash-loop brake**. On every fresh boot, the watchdog reads the previous beacon:

- If the previous beacon is **less than 90 s old** → the previous process exited very recently → that's a crash (or a deliberate restart, treated the same way for the brake's purpose). Increment `crashCount`.
- If the previous beacon is **older than 90 s** → previous process had clean uptime → reset counter to 0.
- The crash window is **10 minutes**. Crashes within this window accumulate; older ones don't count.
- If `crashCount` reaches **10**, the brake engages:
  - Writes `~/.alvin-bot/state/crash-loop.alert` with the timestamp, version, error log path, and recovery steps
  - Tries to `launchctl unload -w` its own LaunchAgent so launchd stops retrying (otherwise `KeepAlive: true` would keep burning CPU forever)
  - Exits with code 3

**3. Recovery**. After **5 minutes of clean uptime**, the watchdog auto-resets the crash counter to 0. So a healthy bot that occasionally has a transient hiccup doesn't slowly accumulate toward the brake over days.

**4. Brake check at startup**. `checkCrashLoopBrake()` runs in `index.ts` **before** any expensive init — if the alert file already exists, the bot exits cleanly with code 3 and tries to unload itself again. This prevents launchd from spinning the bot up just to write the same alert over and over.

**Recovery from a tripped brake**:

```bash
# 1. Investigate the error log
cat ~/.alvin-bot/logs/alvin-bot.err.log

# 2. Fix whatever was wrong
# 3. Remove the alert file
rm ~/.alvin-bot/state/crash-loop.alert

# 4. Reload the LaunchAgent
alvin-bot launchd install
```

**What this catches**:

- Process crashes (segfault, OOM kill) → exit non-zero → brake increments
- `process.exit()` from unhandled rejection → similar
- Tight crash loops → brake engages at 10 within 10 min
- Corrupted state files that crash on read → brake engages eventually

**What this does NOT catch (yet)**:

- Event-loop deadlocks where the process is alive but completely frozen. The watchdog beacon needs the event loop to be alive, so it can't detect freeze. A future release will add an external sister LaunchAgent (`com.alvinbot.watchdog`) that runs every 2 minutes via `StartInterval` and kills the main bot if its beacon file is too stale. Tracked as a follow-up.

**Telemetry surface**: `alvin-bot status` could read the beacon file in a future release to show "crash count: X in last Y minutes" — for now, the alert file is the main user-facing signal.

### 🛡 LaunchAgent: ProcessType + LimitLoadToSessionType

Two small plist hardening tweaks:

- **`ProcessType: Background`** — explicit hint to launchd that this is a long-running background service. macOS treats Background processes with friendlier scheduling and is less likely to kill them under memory pressure (vs `Standard` which is the default for unlabeled jobs).
- **`LimitLoadToSessionType: Aqua`** — only loads in user GUI sessions. Prevents the LaunchAgent from accidentally loading in non-GUI contexts (e.g. SSH login session) where it would not have Keychain access. Defensive: matches our existing assumption that the bot needs the GUI keychain unlocked for Claude SDK OAuth.

These don't change behaviour for normal use, but they're explicit about our intent. macOS will treat the bot as a proper background service rather than a generic foreground job.

### Tests

87 still passing — no test changes (the stale-runtime check is a fast-path branch that doesn't disturb the existing git/npm logic).

## [4.8.6] — 2026-04-11

### 🐛 LaunchAgent: `/restart` left the bot down forever

Caught on the Mac mini production bot: running `/restart` in Telegram killed the bot cleanly but the process never came back, leaving the bot dead until manual intervention.

**Root cause**: The 4.6.0 LaunchAgent plist template hardcoded a conditional `KeepAlive`:

```xml
<key>KeepAlive</key>
<dict>
    <key>SuccessfulExit</key>
    <false/>   <!-- don't restart on normal exit -->
    <key>Crashed</key>
    <true/>    <!-- only restart on crash -->
</dict>
```

That meant launchd would only auto-restart on **crashes**, not on normal exits. But `/restart` (and `/update`) work by calling `process.exit(0)` — a deliberate clean exit — and relying on the process manager to bring the bot back up. With pm2 this always worked because pm2's default is "restart on any exit". With launchd's conditional KeepAlive, `process.exit(0)` was the ONE exit code that guaranteed the bot stayed down.

**Fix**: Plist template now uses `<key>KeepAlive</key><true/>` — unconditional restart on any exit. Matches pm2's default behavior. `ThrottleInterval` dropped from 10s to 5s so recovery is quicker.

**Migration for existing installs**: re-run `alvin-bot launchd install` to get the new plist. The install script unloads the old plist, writes the new one, and reloads it — existing data and running state are preserved.

Also removed the stale "(PM2)" suffix from the `/restart` Telegram command description — it's just "Restart the bot" now, since the command works identically with both pm2 and launchd.

## [4.8.5] — 2026-04-11

### 🐛 `/update` now works for npm-global installs

Caught on the test MacBook: `/update` reported *"Already up to date — no new commits"* even though npm had a newer version published. Root cause was two separate bugs feeding into each other.

**Bug 1 — false git-repo detection**. `isGitRepo()` used `git rev-parse --is-inside-work-tree` which walks up the directory tree looking for any ancestor `.git` folder. On the test MacBook, `alvin-bot` was installed at `/opt/homebrew/lib/node_modules/alvin-bot/` which has no `.git` itself — but Homebrew stores its formula tree at `/opt/homebrew/` as a git repo. So `git rev-parse` walked up, found Homebrew's `.git`, and returned `true`. The updater then dutifully fetched Homebrew's upstream (which was up-to-date), found 0 new commits, and reported "Already up to date" — about the wrong repository.

**Fix**: `isOwnGitRepo()` now does a strict check for `PROJECT_ROOT/.git` directly, no directory walk. False positives from ancestor git repos are impossible.

**Bug 2 — no update path for npm-global installs**. Even with a correct `isGitRepo()` check, the updater would return *"Not in a git repo — update only supported for source installs."* for npm-global installs. That meant you could never update an npm-installed alvin-bot from within the bot itself.

**Fix**: New `runNpmUpdate()` path that kicks in when `PROJECT_ROOT` looks like a `node_modules/alvin-bot` install (covers Homebrew node, plain npm, nvm, volta). It:

1. Reads the local version from `package.json`
2. Queries `npm view alvin-bot version` for the latest published version
3. Compares via a tiny semver compare
4. If newer: runs `npm install -g alvin-bot@latest --no-audit --no-fund` (5 minute timeout)
5. Signals the caller to restart so the new code takes effect
6. Detects `EACCES` and suggests `sudo` explicitly instead of a cryptic error

Also improved the git path: falls back to `npm install` + `npm run build` when `pnpm-lock.yaml` doesn't exist (previously hard-coded pnpm).

After 4.8.5, `/update` on the test MacBook will correctly detect the npm install, see that v4.8.4 is the latest, fetch it, and restart. No more false-positive "up to date" when a newer release is out.

## [4.8.4] — 2026-04-11

### 🐛 WhatsApp self-chat detection for the new `@lid` identity format

the maintainer reported that the WhatsApp bot wasn't responding to "Hi" in his self-chat even after enabling both `Self-chat only` and `Reply to private messages` in the Web UI. Debug logging showed the bot receiving the message correctly and detecting `fromMe=true`, but then hitting the "skip: own message in group/DM" branch because `isSelfChat()` was returning `false`.

**Root cause**: WhatsApp has rolled out a new privacy feature that replaces phone-number JIDs in self-chats (and some groups) with a **LID — Linked Identity**. Instead of `4917661236656@s.whatsapp.net`, messages in a self-chat now arrive with `jid = "162805718225143@lid"` — a completely opaque identifier that looks nothing like the phone number.

Our `isSelfChat(jid)` compared the incoming JID against `sock.user.id` (the traditional phone-number format `4917661236656:22@s.whatsapp.net`), stripped the device suffix, and compared the bare numbers. But the LID has a completely different number (`162805718225143`), so the match failed and every self-chat message fell through to the "own message in DM" skip branch.

**Fix**: `isSelfChat()` now checks **both** identity formats:

- **Traditional phone JID** via `sock.user.id` (legacy path, still matches on older WhatsApp clients)
- **LID** via `sock.user.lid` (baileys ≥ 6.7 exposes this) with `@lid` suffix matching

Either match wins. The check short-circuits on groups (`@g.us`) so the new code never misclassifies a group as self-chat.

Caught on the Mac mini production bot after midnight — WhatsApp connected, QR scanned, user sending "Hi", bot silent. Debug logging revealed the actual incoming JID (`162805718225143@lid`) which immediately pointed at the LID format as the culprit.

### 🧹 Dual-bot session collision (root cause of WhatsApp reconnect flapping)

While debugging the `@lid` issue above, the test revealed a deeper problem: two `node dist/index.js` processes were running simultaneously on the Mac mini (PID 47744 from an earlier `launchctl kickstart` that didn't cleanly kill the old instance, plus PID 49153 from a new `launchd install`). Both processes were trying to hold the same WhatsApp Multi-Device session at the same time, causing:

- WhatsApp `Reconnecting in 3s` every few seconds (each process would claim the session, the other would be kicked)
- Baileys `Closing session` dumps to the log
- Signal session state corruption → "Warte auf diese Nachricht" (waiting-to-decrypt) messages appearing spontaneously in the self-chat

**Short-term workaround**: explicit `pkill -9 -f 'node.*alvin-bot/dist/index'` before `launchctl kickstart` to ensure only one process is running.

**Session wipe procedure** (when the corruption is already baked in):

1. `launchctl unload -w ~/Library/LaunchAgents/com.alvinbot.app.plist`
2. `pkill -9 -f "node.*alvin-bot/dist/index"`
3. `rm -rf ~/.alvin-bot/data/whatsapp-auth`
4. Remove the zombie linked-device from your phone (iPhone Settings → Linked Devices → remove all "Alvin Bot" entries)
5. `launchctl load -w ~/Library/LaunchAgents/com.alvinbot.app.plist`
6. Re-scan the QR code

A future release should add a proper `alvin-bot wa reset` command to automate this and a startup check that refuses to boot if another instance is already running.

## [4.8.3] — 2026-04-11

### 🐛 Critical: Claude SDK heartbeat false-positive "unavailable"

Caught in production on the Mac mini: the heartbeat monitor was marking `claude-sdk` as unhealthy every 5 minutes, triggering failover to Ollama, even though `claude -p "ping"` from the same user's terminal worked perfectly. After 9 consecutive heartbeat failures, the main Telegram bot was stuck serving responses via Gemma 4 instead of Claude Max.

**Root cause**: `isAvailable()` in the Claude SDK provider used `claude -p "ping" --output-format text` as an auth probe. That command spawns a full SDK query, takes **6-10 seconds warm** (longer on cold starts), and my timeout was only **10 seconds**. Under load or on cold starts it crossed the timeout threshold, was killed by Node, and execFileAsync rejected → caught by the outer try/catch → cached as "unavailable" for 60 seconds → next heartbeat re-probed and failed the same way.

**Fix**: Replaced the `-p "ping"` probe with `claude auth status`. This is a purpose-built Claude CLI command that:

- Completes in ~150 ms (vs 6-10 s)
- Returns structured JSON with an explicit `loggedIn` boolean
- Consumes zero tokens
- Doesn't touch the SDK or model init path

The new code parses the JSON and returns `true` only when `loggedIn === true`. A fallback path keeps the old `-p "ping"` sniff for older Claude CLI versions that don't support `auth status` as JSON.

Before/after the fix:

```
Before: 6800ms warm probe, 10s timeout, consumed tokens,
        failed under load → 9 consecutive false-positive "unavailable"
After:  150ms probe, 5s timeout, no tokens, structured JSON check
```

### ✨ New CLI command: `alvin-bot status`

Offline-friendly status command — no running bot required. Prints:

- **Version**: `Alvin Bot vX.Y.Z` + Node version + platform/arch
- **Data dir**: path + whether `.env` exists + configured `PRIMARY_PROVIDER`
- **Runtime state**:
  - On macOS: LaunchAgent plist installed? PID from `launchctl list`?
  - On Linux/Windows: `pm2 jlist` check for the `alvin-bot` process
- **Live info** (when the bot is running with the web UI on :3100): Uptime, active model

Answers the maintainer's request: *"alvin-bot status im Terminal soll auch die Version anzeigen"*. The command prominently features the version at the top so it's the first thing you see.

Example:

```
🤖 Alvin Bot v4.8.3
   Node v25.9.0 · darwin/arm64

📁 Data dir:  ~/.alvin-bot
   .env:      ✅ present
   Provider:  claude-sdk

🚀 LaunchAgent: installed
   Running:    ✅ yes (PID 43589)
   Uptime:     0h 55m
   Model:      Gemma 4 E4B (Ollama)
```

### Tests

2 new test cases in `test/claude-sdk-provider.test.ts` cover the new flow:

- `claude auth status` returning `{loggedIn: true}` → `isAvailable()` returns `true`
- `claude auth status` returning `{loggedIn: false}` → `isAvailable()` returns `false`
- Older CLI where `auth status` throws → fall back to `-p "ping"` path (preserves old behavior)

87 tests passing (up from 85).

## [4.8.2] — 2026-04-11

### 🐛 Offline setup: wait long enough for Ollama's first-run init

Second follow-up to 4.8.0's offline-gemma4 wizard. The 4.8.1 brew path successfully installs Ollama, but the subsequent `ensureOllamaServe()` was reporting "Could not start Ollama daemon" because it only waited **2 seconds** after spawning the server.

What actually happens on first run:

1. `nohup ollama serve &` spawns the server process
2. Server generates a fresh SSH keypair at `~/.ollama/id_ed25519` (~1 s)
3. Server discovers GPUs — on Apple Silicon this initializes Metal (~5 s)
4. Server starts the runner subprocess (~1 s)
5. Server begins listening on `127.0.0.1:11434`

Total cold-start time: **5–15 seconds**. The old 2-second wait was racing ahead of GPU discovery and failing the next `ollama list` call.

Fix: `ensureOllamaServe()` now polls `ollama list` every second for up to **30 seconds**. On success it reports which attempt worked (for visibility). On failure it dumps the last 15 lines of `/tmp/ollama-setup.log` so users can see what Ollama itself said.

Caught during the second run of the setup wizard on the fresh test MacBook — brew install succeeded, daemon was actually running (PID confirmed via pgrep), but the wizard bailed out anyway because it gave up too soon.

## [4.8.1] — 2026-04-11

### 🐛 Offline setup: Homebrew preferred on macOS

Caught during the first real run of the new offline setup wizard on a fresh test MacBook: the official Ollama `install.sh` script on macOS wants to drop `Ollama.app` into `/Applications` and start it as a GUI app. That requires a real user session with sudo and completely breaks over SSH or any non-interactive context. The install downloads the 25 MB .app, then fails at `Unable to find application named 'Ollama'` and drops the wizard back to the fallback provider picker.

Fix in `bin/cli.js` `installOllama()`:

- **macOS preferred path**: if Homebrew is available (`brew --version` succeeds), use `brew install ollama`. Brew installs `/opt/homebrew/bin/ollama` as a CLI binary with no sudo prompt, no /Applications drop, no GUI dependency — works over SSH and in any CI/non-interactive context.
- **Fallback**: if brew is not installed or `brew install` itself fails, fall through to the official `install.sh` with an explicit heads-up that the installer may prompt for admin password and may only work in a local terminal.
- **Better error messaging**: on macOS install failure, suggest `brew install ollama` or the `.dmg` from ollama.com/download as alternatives. On Linux, unchanged.

Linux always uses `install.sh` — systemd user units work non-interactively there.

## [4.8.0] — 2026-04-11

### ✨ Offline mode — Gemma 4 E4B via Ollama in the setup wizard

Fresh installs on a machine without any AI-provider key can now pick **Offline mode** as the first option in the setup wizard. It runs **Google Gemma 4 E4B** locally via Ollama — no API key, zero running cost, works 100% offline once downloaded.

New in `bin/cli.js`:

- `PROVIDERS[0]` is now `offline-gemma4`, labeled prominently with the `~10 GB one-time download` so users can't miss the size.
- `setupOfflineGemma4()` helper walks the user through:
  1. **Warning** about download size (15–70 min depending on connection) and on-disk footprint (~10 GB in `~/.ollama/models`)
  2. **Confirmation prompt** — if the user declines, the wizard loops back to the normal provider picker (no dead ends)
  3. **Ollama install** via the official `curl -fsSL https://ollama.com/install.sh | sh` if the `ollama` binary is missing
  4. **Daemon check** — ensures Ollama is listening, spawns it in the background if not
  5. **Cache check** — if `gemma4:e4b` is already pulled, skips the download
  6. **Model pull** with a second confirmation before the 10 GB actually starts, streaming progress output so the user sees every layer land
- `.env` gets `PRIMARY_PROVIDER=ollama`. The registry's Ollama preset in `src/providers/types.ts` already defaults to `gemma4:e4b`, so no extra environment variable is needed.

macOS + Linux only. Windows users get pointed at https://ollama.com/download.

### ✨ `/version` command + version display in `/status`

- New `/version` command in both **Telegram** and **TUI**. Shows `Alvin Bot vX.Y.Z · Node vN · platform/arch`. Registered in `setMyCommands` so Telegram shows it in the autocomplete menu.
- `/status` header on Telegram now reads `🤖 Alvin Bot vX.Y.Z` instead of just `Alvin Bot Status`.
- TUI `/status` header also carries the version.
- **Bug fix**: `/api/status` used to hard-code `version: "3.0.0"` (a leftover from v3). It now reads `BOT_VERSION` dynamically, so the TUI and Web UI see the actual running version.

Implementation: new `src/version.ts` module reads `package.json` once at module load, exports `BOT_VERSION` as a const. Path resolution uses `import.meta.url` so the cwd can't break it.

### 🐛 `alvin-bot launchd install` preserves other pm2 projects

The initial 4.7.0 release called `pm2 kill` during `launchd install` to stop the pm2 daemon. That's wrong for users who have **other** pm2-managed projects (e.g. `another-pm2-project`) alongside `alvin-bot` — their other work would go down with the switch.

New behavior in `bin/cli.js`:

- Parse `pm2 jlist` JSON to detect (a) whether `alvin-bot` is pm2-managed and (b) whether any other pm2 projects exist.
- Only run `pm2 delete alvin-bot` — never `pm2 kill`. The daemon keeps running for the other projects.
- Post-install hint is smarter:
  - **pm2 now empty** → *"pm2 now has zero managed processes. Remove it with: `npm uninstall -g pm2`"*
  - **pm2 still has other projects** → *"pm2 still has other projects running — leaving it installed."*

Caught immediately after 4.7.0 shipped when the maintainer pointed out his Mac mini has `another-pm2-project` in pm2 alongside `alvin-bot` and didn't want it touched.

## [4.7.0] — 2026-04-11

### ✨ Sub-Agents Stufe 2 — live-stream, bounded queue, 24h stats

Stufe 2 of the sub-agents refinement spec lands alongside the same-day 4.6.0 release. Everything here builds on the Stufe 1 foundation and is fully unit-tested (85 passing tests).

#### A4 Live-Stream for user-spawns

`/subagents visibility live` enables a new delivery mode where user-spawned sub-agents stream their text incrementally into a single Telegram message, then post a completion banner as a separate message.

Implementation in `src/services/subagent-delivery.ts`:

- `LiveStream` class with `start()` / `update()` / `finalize()`
- `start()` posts an initial `⏳ <name> thinking…` placeholder and records its `message_id`
- `update()` is called on every text chunk from the agent's generator; it coalesces rapid updates via a throttle window of **800 ms** so we never exceed Telegram's edit rate limit. Multiple `update()` calls within the window collapse into a single edit with the latest accumulated text.
- `finalize()` flushes any pending text, replaces the `thinking…` header with the final body, then sends a new banner message so the user gets a completion notification (edits don't trigger push notifications).
- The live-stream message uses **plain text** (no `parse_mode`) so half-formed markdown during streaming can never cause an edit to be rejected. The final banner does use markdown.

Wiring in `runSubAgent`:

- Detects `effectiveVisibility === "live"` AND `source === "user"` AND `parentChatId`. Cron and implicit spawns are never live-streamed — cron because there's no interactive watcher, implicit because the parent Claude stream already shows everything inline.
- Creates the `LiveStream` via `createLiveStream()` before the for-await loop.
- Calls `liveStream.update(chunk.text)` on every text chunk.
- Calls `liveStream.finalize(info, result)` after the loop and marks `entry.delivered = true` so `spawnSubAgent.finally()` skips the regular `deliverSubAgentResult` path. If finalize fails, the `delivered` flag stays false and the normal banner delivery fires as a fallback.
- Falls back to `"banner"` mode transparently if the bot API doesn't support `editMessageText` (e.g. during tests or if `attachBotApi` was never called).

Tests added in `test/subagent-delivery.test.ts`:

- `start` posts an initial placeholder and stores the message_id
- `update` coalesces rapid calls into a single throttled edit within the 800 ms window
- `finalize` posts a banner as a new message
- `createLiveStream` returns `null` when `editMessageText` is missing

#### D3 Bounded priority queue

Previously, hitting `maxParallel` returned a hard reject. Now spawn requests that don't fit run into a **bounded priority queue**:

- Default cap: **20** slots (configurable via `/subagents queue <n>`, clamped to 0–200)
- Setting cap to 0 disables the queue entirely and restores the old reject-on-full behavior
- Priority order on drain: **user > cron > implicit**
- FIFO within each priority class
- Drains automatically when a running agent finishes — the `runSubAgent.finally()` now calls `drainQueue()` after cleanup

New fields:

- `SubAgentsConfig.queueCap: number` — persisted in `~/.alvin-bot/sub-agents.json`
- `SubAgentInfo.status: "queued"` — new valid state
- `SubAgentInfo.queuePosition?: number` — 1-based position in the queue, shown in `/subagents list` as `#N`

Functions in `subagents.ts`:

- `getQueueCap()` / `setQueueCap(n)` — public config accessors
- `drainQueue()` — called from `runSubAgent.finally()`, pops in priority order and transitions entries from `queued` to `running`
- `popHighestPriorityQueued()` — internal FIFO-per-priority scan
- `reindexQueue()` — keeps `SubAgentInfo.queuePosition` in sync after pop/cancel
- `cancelSubAgent()` now handles queued entries by removing them from the queue without starting `runSubAgent` at all
- `cancelAllSubAgents()` clears the pending queue before cancelling running agents, so shutdown doesn't spawn anything new
- `spawnSubAgent()` is split: queue decision first (run immediately vs queue vs reject), then `startRun()` helper starts the background loop

Reject messages stay priority-aware (D4) but now mention queue saturation:

- `user` spawn + pool full + cron/implicit in pool + queue full → *"Alle Slots belegt (N/M), davon X cron/implicit im Hintergrund. Queue voll (Q/C). /subagents list für Details …"*
- `user` spawn + pool full + user in pool + queue full → *"Alle Slots belegt (N/M) mit eigenen user-Spawns. Queue voll (Q/C). /subagents cancel <name> oder warten."*
- Non-user spawns + pool + queue full → *"Sub-agent limit reached (N running, Q/C queued). Wait for a running agent to finish or cancel one."*

Tests added in `test/subagents-queue.test.ts`:

- Default cap is 20
- Clamping (negative → 0, above 200 → 200, fractional floors)
- Round-trip through disk
- Third spawn at full pool lands as `status: "queued"` with `queuePosition: 1`
- Queue drains automatically when a running agent finishes
- Priority order: user spawns drain before cron at the same moment
- `cancelSubAgent` removes a queued entry

The existing priority-reject tests now explicitly set `queueCap = 0` to test the old reject path, and a new "queue enabled" test fills both pool and queue before asserting the reject message.

#### H3 24-hour run stats

New module `src/services/subagent-stats.ts` — a simple append-only JSON ring buffer persisted to `~/.alvin-bot/subagent-stats.json`. Each completed sub-agent run appends one entry:

```ts
{
  completedAt: number;
  name: string;
  source: "user" | "cron" | "implicit";
  status: "completed" | "timeout" | "error" | "cancelled";
  durationMs: number;
  inputTokens: number;
  outputTokens: number;
}
```

On every load or append, entries older than 24 hours are pruned. A hard cap of 5000 entries protects against unbounded growth on high-frequency bots.

Accessors:

- `recordSubAgentRun(info, result)` — called from `runSubAgent.finally()` as a non-blocking side effect. Errors are logged but don't affect delivery.
- `getSubAgentStats()` — returns a `StatsSummary` with totals, per-source breakdown, and per-status counts.

New Telegram command **`/subagents stats`** renders the summary:

```
📊 Sub-Agent Stats — last 24h

Total: 44 runs · 165k in / 89k out · 12m

By source:
  👤 user:     12 runs · 45k in / 22k out
  ⏰ cron:      8 runs · 31k in / 15k out
  🔗 implicit: 24 runs · 89k in / 52k out

By status:
  ✅ completed: 42
  ⚠️ cancelled: 1
  ⏱️ timeout:   0
  ❌ error:     1
```

The JSON backing file is a deliberate short-term choice. When the SQLite migration lands (already scoped in a separate memory entry as `project_alvinbot_sqlite_migration.md`), we swap the backend without touching `getSubAgentStats()` or `recordSubAgentRun()` — both are designed as a narrow interface.

Tests added in `test/subagent-stats.test.ts`:

- Fresh install returns zeros
- Recording 3 runs updates totals + per-source breakdown
- Persistence + reload round-trip
- Entries older than 24h are pruned on load
- `byStatus` tracks cancelled/error/timeout separately

### 🖥 CLI: `alvin-bot start` / `stop` now auto-detect LaunchAgent

The `start` and `stop` commands previously always went through pm2. That created a conflict after `alvin-bot launchd install`: the LaunchAgent ran the bot, but `alvin-bot start` would happily spawn a second instance via pm2, and `alvin-bot stop` would try to stop a pm2 process that didn't exist.

Now both commands check for `~/Library/LaunchAgents/com.alvinbot.app.plist` on macOS and switch transparently:

- **`alvin-bot start`** with a LaunchAgent present → `launchctl kickstart -k gui/$UID/com.alvinbot.app` (or `launchctl load -w` if not loaded yet). No pm2 involvement.
- **`alvin-bot stop`** with a LaunchAgent present → `launchctl unload -w` (doesn't remove the plist, just stops the daemon).
- **`alvin-bot start`** on macOS without a LaunchAgent → pm2 path + a helpful tip: *"💡 Tip: on macOS with Claude Code, switch to launchd for automatic Keychain access: alvin-bot launchd install"*.

Linux and Windows users are unaffected — they always get the pm2 path.

### 🐛 Other

- `/subagents queue` is registered in the usage string for en/de/es/fr.
- `/subagents stats` is registered in the usage string for en/de/es/fr.
- `/subagents visibility` usage now lists `live` as a valid mode.
- Removed the leftover `alvin-bot-4.5.1.tgz` from the repo root.

## [4.6.0] — 2026-04-11

### ✨ Sub-Agents Stufe 1 — context-aware delivery, name-first addressing, shutdown notifications

**The big one.** Stufe 1 of the SubAgents refinement spec (9 design axes, two-stage rollout) is complete. Everything here is live-validated on a remote test MacBook via `@Alvin_testbot_bot` over Telegram with Claude Agent SDK + Max OAuth.

#### A4 + I3 — Source-aware delivery router

New module `src/services/subagent-delivery.ts`. Every completed sub-agent routes through a single entry point that picks its delivery path based on `SubAgentInfo.source`:

- `implicit` (Main-Claude calling the SDK `Task` tool) → **no-op**, the parent stream already shows the result.
- `user` (explicit user spawn) → **banner + final** to `parentChatId` in the originating chat.
- `cron` (scheduled job) → **banner + final** to the `chatId` from the cron job's target.

The banner format is fixed: `{icon} *{name}* {status} · {duration} · {input_tokens} in / {output_tokens} out` followed by the agent output. Status icons: ✅ completed, ⚠️ cancelled, ⏱️ timeout, ❌ error. Duration is human-formatted (`42s`, `3m 12s`). Token counts collapse at 1000 (`4.2k`).

Output chunking:
- ≤3800 chars → single message `banner + body`
- 3800–20000 chars → banner alone, then body chunks of 3800 chars each
- \>20000 chars → banner + the body as a `.md` file upload (via `grammy`'s `InputFile`)

The bot API is attached lazily at startup via `attachBotApi()` so `subagent-delivery.ts` stays free of a circular import on `index.ts`. Test hook `__setBotApiForTest()` lets Vitest inject a fake.

#### New command: `/subagents visibility <auto|banner|silent>`

Per-install persistent visibility setting, written to `~/.alvin-bot/sub-agents.json`. `silent` suppresses the delivery entirely — the result is still stored in the `activeAgents` map and pullable via `/subagents result <name>`. `auto` is the default and falls through to the source-based routing described above.

#### B2 — Name-first addressing with automatic `#N` collision suffixes

`/subagents cancel <name|id>` and `/subagents result <name|id>` now accept names, not just UUIDs. When a new spawn collides with an existing name, the resolver appends `#2`, `#3`, … using the smallest free index. Example: three parallel `review` spawns appear as `review`, `review#2`, `review#3` in `/subagents list`.

Resolution order:
1. Explicit `#N` suffix (e.g. `review#2`) → exact match wins, never falls through to ambiguity
2. Base name with a single sibling → that sibling
3. Base name with multiple siblings **and** `ambiguousAsList: true` opt-in → disambiguation reply listing all candidates
4. Base name with multiple siblings, no opt-in → first sibling
5. No name match → UUID prefix (back-compat)

#### C3 — Parent inheritance

Sub-agents now inherit `workingDir` (with `inheritCwd: false` opt-out), `CLAUDE.md` (via `settingSources: ["project"]`), and the registry's provider/model. Conversation history is **not** inherited — the sub-agent reads only its own prompt, which forces clean, self-describing spawn requests and keeps parallel agents from colliding on shared context.

#### D4 — Priority-aware reject messages

Pool is still strictly capped (no preemption), but the error message when it's full now depends on who holds the slots:
- User spawn + background (cron/implicit) hold slots → message points at `/subagents list` so the user knows the pool isn't stuck on another interactive task
- User spawn + other user spawns → suggests cancel-or-wait with command hints
- Cron/implicit rejects → generic "limit reached" (those callers handle retry themselves)

#### E2 — Shutdown notifications

`cancelAllSubAgents(notify: true)` is now async and fires a delivery to each still-running agent before the process exits. Each notification is a synth `cancelled` result with the body `⚠️ Agent wurde durch Bot-Restart unterbrochen. Bitte neu triggern.` and routes through the normal I3 delivery path. Total delivery phase is capped at 5s so a hanging Telegram send can't block shutdown.

The shutdown hook in `src/index.ts` now `await`s `cancelAllSubAgents(true)` before stopping the grammy bot and tearing down plugins.

#### F2 — Depth cap (hard limit = 2)

`SubAgentConfig.depth` is a new optional field (defaults to 0 = root). `spawnSubAgent` rejects any depth > 2 with a clear error. The depth shows in `/subagents list` as `d0` / `d1` / `d2` with 2-space indentation per level, so nested scatter-gather runs are visually nested.

#### G1 — Toolset preset infrastructure

New `SubAgentConfig.toolset` field with a single valid value `"full"`. Runtime validation rejects any other string. This is purely infrastructure for future `"readonly"` / `"research"` presets — no behavior change today, but adding a preset later is a one-line diff.

#### H2 — Per-run token accounting in the banner

Every completed sub-agent's banner carries the input/output token counts it actually consumed. No aggregation (H3) — that comes later with the SQLite migration. For now, you can see "this agent cost me 4.2k/2.1k" right next to the result.

#### Tests

67 passing Vitest tests across 12 files. New test files added for this release:
- `test/claude-sdk-provider.test.ts` — auth probe + `isAuthErrorOutput` helper
- `test/subagents-depth.test.ts` — depth cap (F2)
- `test/subagents-inheritance.test.ts` — cwd inheritance (C3)
- `test/subagents-toolset.test.ts` — toolset literal (G1)
- `test/subagents-name-resolver.test.ts` — `findSubAgentByName` including regression for exact-match vs ambiguity
- `test/subagents-commands.test.ts` — `cancelSubAgentByName`/`getSubAgentResultByName` helpers
- `test/subagent-delivery.test.ts` — I3 delivery router (all 5 source/visibility paths)
- `test/subagents-shutdown.test.ts` — E2 notify=true / notify=false + regression for shutdown double-delivery
- `test/subagents-priority-reject.test.ts` — D4 priority-aware reject messages
- `test/subagents-config.test.ts` — expanded with visibility config round-trip

### 🖥 New CLI: `alvin-bot launchd install|uninstall|status` (macOS only)

**Why this matters.** Claude Code 2.x stores the Max-subscription OAuth token in the macOS Keychain, service `"Claude Code-credentials"`. Accessing the token requires:
1. A Keychain ACL that permits the `claude` binary (granted via the "Always Allow" dialog on first GUI invocation)
2. An *unlocked* Keychain in the calling process's security context

Processes started via SSH, pm2, or `nohup` run in a detached launchd session that does **not** inherit the GUI user's unlocked-Keychain state. Even a manual `security unlock-keychain -p '...'` only unlocks the current SSH session — the pm2 daemon running in its own context stays locked out. Result: the Bot saw `Not logged in · Please run /login` on every sub-agent query, and the fix in 4.6.0's Phase 0 exposes that as a clean error instead of leaking it as chat text.

**The fix**: run the bot as a **launchd user agent**. LaunchAgents run inside the GUI login session and inherit the unlocked Keychain automatically. No SSH dance, no pm2 drama, no manual unlocks on every restart.

```
alvin-bot launchd install    — Write ~/Library/LaunchAgents/com.alvinbot.app.plist,
                                unload any existing instance, launchctl load -w.
alvin-bot launchd uninstall  — Unload and rm the plist.
alvin-bot launchd status     — Plist existence, PID from `launchctl list`,
                                tail of ~/.alvin-bot/logs/alvin-bot.{out,err}.log.
```

Plist details:
- `KeepAlive` → auto-restart on crash, not on successful exit
- `RunAtLoad` → starts on login
- `ThrottleInterval 10` → prevents rapid restart loops
- `PATH` covers `~/.local/bin`, `/opt/homebrew/bin` (Apple Silicon), `/usr/local/bin` (Intel Homebrew)
- stdout → `~/.alvin-bot/logs/alvin-bot.out.log`
- stderr → `~/.alvin-bot/logs/alvin-bot.err.log`

macOS users should migrate from `alvin-bot start` (pm2) to `alvin-bot launchd install`. Pm2 still works and remains the Linux/Windows default.

### 🐛 Bug fixes

- **`ClaudeSDKProvider.isAvailable()` now actually probes authentication.** The old check only ran `claude --version`, which succeeds whether or not the CLI has a valid OAuth token. A locked-out CLI would be reported as available, and the `Not logged in` response would leak into the chat as a normal assistant message. New behavior: `claude --version` for the binary check, then `claude -p "ping"` to verify auth. If the output matches the "Not logged in" pattern, the provider reports `false` and the registry falls through to the next provider.

- **`ClaudeSDKProvider.query()` surfaces `Not logged in` as an error chunk.** Even in code paths where `isAvailable()` returned stale cache, a runtime failure during the stream would emit `Not logged in · Please run /login` as text. The query loop now detects the auth pattern on the first text chunk and yields a typed `error` chunk with a clear "Run `claude login`" message, instead of pretending it's a normal response.

- **`/subagents cancel|result <name#N>` now hits the exact entry.** Regression caught during the remote test: asking for `test-ping#2` returned the "Mehrdeutig — welchen meinst du?" ambiguity reply instead of the specific `#2` entry, because `findSubAgentByName` checked base-name siblings before the exact-name match when `ambiguousAsList: true` was set. Explicit `#N` queries now always win.

- **Shutdown double-delivery race fixed.** If the bot received SIGTERM while a sub-agent was mid-stream, Telegram saw two messages: a "completed · (empty output)" banner from `runSubAgent.finally()` (because the test generator exited gracefully after the abort), followed by the "cancelled · Bot-Restart" banner from `cancelAllSubAgents`. Fixed with a `delivered: boolean` flag on each `activeAgents` entry — whoever posts first sets it, the other skips.

- **`providerKeyMap` alignment in `src/index.ts`.** The pre-flight provider-key warning used `gemini-2.5-flash` as the map key, but the registry registers Google Gemini under `google`. Users who set `PRIMARY_PROVIDER=google` never saw the "GOOGLE_API_KEY missing" warning. Fixed by canonical `google → GOOGLE_API_KEY`; legacy custom-model aliases stay for rollback safety.

- **`cron.ts` ai-query triple-notification cleanup.** A single failed ai-query cron job was sending three legacy error messages (`slow-fox: cancelled — cancelled`, `AI-Query Error (slow-fox)`, `Cron Error (slow-fox)`) because the failure path fired `notifyCallback` in the inner `if`, the inner `catch`, and the outer `catch`. The I3 delivery router already posts the cancellation banner for ai-query jobs, so all three legacy notify calls are now skipped and ai-query errors propagate via the outer catch for bookkeeping only. Other job types (reminder, shell, http, message) keep the legacy notify path.

- **`/subagents` now shows up in Telegram's command autocomplete.** The grammy handler was registered from v4.0.0 but `setMyCommands` never listed it, so users had to know the exact spelling. Added.

### 📚 Documentation

- New English-language handbook at `docs/HANDBOOK.md` — covers installation, architecture, all providers, the sub-agents system, cron jobs, platform adapters, security audit, and the web UI. Written to be readable standalone without cross-referencing the README.
- README.md updated with a pointer to the handbook and the new `launchd` command.

## [4.5.1] — 2026-04-09

### 🐛 TUI Header Rendering Hotfix

**The header was appearing inline in the middle of the conversation after scrolling** — a follow-up bug to the 4.5.0 TUI fix. Reported from a live 4.5.0 Test MacBook session where the header popped up right after a long bot response.

**Root cause**: `redrawHeader()` in 4.5.0 used `\x1b[H` (move to top-left) + `\x1b[s`/`\x1b[u` (save/restore cursor) to update the header in place when cost/model/target changed. But `\x1b[H` resolves to the **current viewport top**, not the document top — and once the terminal has scrolled past the original header, the "viewport top" is somewhere in the middle of the conversation. So the header got re-rendered inline in the middle of the bot's output.

**Fix**: removed all `redrawHeader()` calls from mid-session code paths:
- `ws.on("open")` (connect): no redraw, header was already drawn at startup
- `ws.on("close")` (disconnect): no redraw, just the error message
- `case "done"` (after each bot response): no redraw (this was the primary bug site — it fired after every message)
- `case "model"` (model switch): no redraw, just a success info line
- `case "target tui|telegram"` (target switch): no redraw, just an info line
- `process.stdout.on("resize")`: no redraw, just re-renders the prompt line

The only remaining `redrawHeader()` call is inside `/clear`, which calls `console.clear()` first to wipe the whole buffer — the only context where an in-place redraw is safe.

The trade-off: the header no longer reflects live cost/model/target updates mid-session. You'll see the up-to-date values after the next `/clear` or on the next TUI start. In exchange, the conversation flow stays clean. A future release could add a proper status-line region using terminal scrolling regions if this becomes annoying.

## [4.5.0] — 2026-04-09

### 🐛 TUI Bug Fixes (critical — the old TUI was effectively unusable)

**Double-character echo fixed** — Every keystroke in `alvin-bot tui` appeared twice (typing `hello` showed as `hheelllloo`). Root cause: `src/tui/index.ts` called `process.stdin.setRawMode(false)` alongside `readline.createInterface({ terminal: true })`. readline with `terminal: true` already controls the tty mode for its own line editor; forcing cooked mode on top of that makes both the terminal AND readline echo every keystroke. Removed the explicit `setRawMode(false)` call and let readline manage the tty state itself.

**Cursor chaos fixed** — The old `redrawHeader()` function wrote `\x1b7` / `\x1b8` (save/restore cursor) escape sequences that raced with readline's internal cursor tracking, producing garbled output mid-stream. The header redraw is now a no-op during active streaming and uses readline's own `cursorTo`/`clearLine` helpers otherwise — cooperating with readline instead of fighting it.

**Prompt state machine consolidated** — `showPrompt()` was called at ~7 different places, each re-rendering the prompt at potentially racy moments. It is now the single source of truth and no-ops during streaming. Every helper (`printUser`, `printAssistantStart`, `printInfo`, `printError`, `printSuccess`, `printTool`) calls `clearCurrentLine()` first, so the input line is always cleanly wiped before output is written above it.

**Terminal resize handling** — Added `process.stdout.on("resize", …)` so the header redraws correctly when the user resizes the window (when safe).

### ✨ New Feature: Parallel Observation + Session Routing

**The big one.** Before 4.5.0, the TUI/Web-UI shared the exact same session as the Telegram bot (both keyed to `config.allowedUsers[0]`). That meant `/new` in the TUI wiped the Telegram history, and the TUI had no visibility into live Telegram activity. This release cleanly separates the two and adds live mirroring in both directions.

#### New in-process broadcast bus — `src/services/broadcast.ts`

A tiny typed `EventEmitter` with four event types:
- `user_msg` — a user sent a message on a platform (Telegram, WhatsApp, etc.)
- `response_start` — the bot started generating a response
- `response_delta` — a streaming text chunk
- `response_done` — the response is complete

Fire-and-forget, zero backpressure, no history retention. The Telegram handler (`src/handlers/message.ts`) emits these events around its normal processing. The web server subscribes once at module load and fan-outs to every connected WebSocket client as `mirror:*` messages. Platform-agnostic signature so WhatsApp/Signal can plug in later without architectural changes.

#### TUI session isolation

The TUI now owns its own session key, completely separate from the Telegram user's session:

- **Default**: `alvin-bot tui` → fresh ephemeral session `tui:ephemeral:<timestamp>`. Every TUI start is a clean slate.
- **Persistent**: `alvin-bot tui --resume` → resumes `tui:local`, a long-lived session that survives TUI restarts.

Your Telegram conversation and your TUI conversation no longer overwrite each other's history. `/new` in the TUI only resets the TUI session.

#### `/target tui|telegram` — remote-control the Telegram session from TUI

New TUI command to switch where your typed messages go:

- **`/target tui`** (default) — Your messages go into the TUI's isolated session. Responses are rendered in the TUI only.
- **`/target telegram`** — Your messages enter the Telegram session (shared memory with whoever messages your bot on Telegram). The bot responds **both** in the TUI (via the open WebSocket) **and** in the actual Telegram chat (via the existing delivery queue). The active target is shown in the header as `→ Telegram` or `TUI session`.

Note: Telegram bot API does not allow bots to forge user messages, so your original prompt stays in the TUI — only the bot's response lands in Telegram. This is the closest possible equivalent to "remote typing into Telegram".

#### `/observe on|off` — mirror Telegram activity into the TUI

When observer mode is on (default), every Telegram user message and streaming bot response is mirrored into the TUI with distinct dim + `📱 Tel` styling. You can watch a live conversation happen from the TUI while running your own independent session in parallel. Toggle off with `/observe off` if the mirror noise gets in the way.

### 🧠 Architecture / Design Note

This feature deliberately does **not** go through the Claude Agent SDK or touch the `pathToClaudeCodeExecutable` flow. The broadcast bus is a pure observation layer in alvin-bot's own process, and session routing is just a different `sessionKey` lookup in the existing `getSession()` map. The bot's 1st-party auth behavior (CLI-backed session routing) is preserved exactly as before.

### 📦 Compatibility

This is a minor release (new feature), not a patch. No breaking changes to existing commands, existing behavior, or existing API endpoints. Old clients that don't send a `target` field continue to work exactly as before (falling back to the primary Telegram user's session).

```bash
npm update -g alvin-bot
alvin-bot tui               # fresh TUI session, observer on by default
alvin-bot tui --resume      # resume persistent tui:local session
```

Once inside TUI:
- `/target telegram` — route your messages into the Telegram session (responses land in both TUI and Telegram chat)
- `/target tui`      — switch back to isolated TUI session (default)
- `/observe off`     — stop mirroring Telegram activity
- `/observe on`      — resume mirroring

## [4.4.7] — 2026-04-09

### 🔐 Security / Dependencies

**6 of 9 npm audit vulnerabilities fixed (non-breaking)** — Ran `npm audit fix` to patch the transitive `@xmldom/xmldom` XML injection, `basic-ftp` CRLF command injection, and `brace-expansion` DoS vulnerabilities. Also upgraded the direct dependency `@anthropic-ai/claude-agent-sdk` from `0.2.92` to `0.2.97` (latest, non-breaking patch release with no changes to the `query()` API surface Alvin-Bot uses).

Remaining unaddressed (by design, require breaking upgrades or overrides):
- `@anthropic-ai/sdk` Memory Tool sandbox escape — **not exploitable** in Alvin-Bot because the `Memory` tool is not listed in `allowedTools` (we only use `Read`, `Write`, `Edit`, `Bash`, `Glob`, `Grep`, `WebSearch`, `WebFetch`, `Task`).
- `electron` (17 advisories) — waiting for a planned breaking upgrade to `electron@41.x`.

### ✨ Stability Improvements

**Session memory hygiene (`src/services/session.ts`)** — The in-memory `sessions` Map grew unbounded: every user that ever messaged the bot kept a full session object (including conversation history, cost breakdown, abort controller) forever. On a single-user bot like the maintainer's this is a non-issue; on any multi-user deployment it's a steady leak.

New behavior:
- **Conservative 7-day TTL**: a session is only eligible for cleanup after 7 full days of complete inactivity. Configurable via `ALVIN_SESSION_TTL_DAYS` env var.
- **Never touches active sessions**: the cleanup loop explicitly skips any session with `isProcessing === true`.
- **`lastActivity` touched on every `getSession()` call**: any interaction at all keeps the session alive indefinitely.
- **Orphaned `abortController` cleanup** before removal (defensive).
- Runs hourly; logs a message when it actually purges something.

This is memory hygiene only — it cannot reduce Alvin-Bot's capabilities, permissions, or responsiveness. Active users see zero behavioral change.

**MAX_BUDGET_USD tracking (`src/services/session.ts:trackProviderUsage`)** — The `MAX_BUDGET_USD` config was declared but never read anywhere. Now it's tracked as a **soft warning** (never a block):
- When a session's cumulative cost crosses 80% of the configured budget, a `⚠️  Session budget 80% consumed` message is logged.
- When it crosses 100%, a `💸 Session budget exceeded … bot continues (no hard limit enforced)` message is logged.
- **The bot never blocks** — the warnings exist purely as operator signals. `/new` resets the warning flags so subsequent sessions get fresh thresholds.
- `session.totalCost` is now correctly incremented (previously declared in the interface but never written to).

### 📦 Compatibility

No breaking changes. User-facing behavior is identical — same commands, same permissions, same response patterns. The only visible change is new log messages for cleanup events and budget thresholds.

```bash
npm update -g alvin-bot
```

## [4.4.6] — 2026-04-09

### 🐛 Bug Fixes

**`alvin-bot audit` now reads `.env` from `DATA_DIR`** — Before this release, `audit` was a subprocess that never loaded the bot's config: it only inspected `process.env`, which for an ad-hoc CLI invocation is the shell environment, not the bot's actual runtime state. Result: `ALLOWED_USERS` and `WEB_PASSWORD` were always reported as "not set" even when the bot was correctly configured and running. `audit` now calls `dotenv.config({ path: ENV_FILE })` at the start of `runAudit()` so its output matches `alvin-bot doctor` and the actual engine.

**`alvin-bot doctor` no longer hangs indefinitely on missing `.env`** — The CLI's `readline` interface was created eagerly at module load, which made `stdin` readable for the entire process lifetime. Commands like `doctor`, `audit`, `version` that have no interactive prompts would therefore never terminate — even though the `doctor()` function correctly early-returned when `.env` was missing, `node` refused to exit because the event loop still saw stdin as a live resource. Readline is now lazy-created only when `ask()` is actually called. Measured improvement: **doctor with missing .env terminates in 82 ms** (previously: 20+ second hang, often requiring Ctrl+C).

**`validateProviderKey("claude-sdk", …)` no longer false-negatives on Agent SDK auth** — The CLI's Claude check ran `claude auth status` and hard-failed on `loggedIn: false`. But the Claude Agent SDK has multiple auth paths that the CLI doesn't see: `ANTHROPIC_API_KEY` env var, Claude Code IDE sessions, and native-binary session cookies. Real-world example: a bot that was actively answering Telegram messages correctly was reported as "❌ Claude CLI not authenticated" by `doctor`. The validation is now:
- `ANTHROPIC_API_KEY` set → `ok: true` (immediate pass, CLI irrelevant)
- `claude` binary present + `auth status: loggedIn: true` → `ok: true`
- `claude` binary present + `auth status: loggedIn: false` → `ok: true` with a **warning** (the Agent SDK may still work via session / env var; user is advised to run `claude auth login` only if the bot fails to respond)
- `claude` binary missing → `ok: false` (hard error with install hint)

`doctor` now renders the warning as ⚠️ instead of ❌, making the output match actual behavior.

### ✨ New Feature

**`alvin-bot setup --non-interactive` for CI, Docker, and scripted installs** — The interactive setup wizard was the only way to write `~/.alvin-bot/.env`, which blocked automated provisioning. Now supports flag-driven, non-interactive setup:

```bash
alvin-bot setup --non-interactive \
  --bot-token=123456789:AAE... \
  --allowed-users=12345,67890 \
  --primary-provider=claude-sdk \
  --fallback-providers=ollama \
  --groq-key=gsk_... \
  --google-key=AIza... \
  --openai-key=sk-... \
  --nvidia-key=nvapi-... \
  --anthropic-key=sk-ant-... \
  --openrouter-key=sk-or-... \
  --web-password=... \
  --platform=telegram \
  --skip-validation   # optional, skips the live Telegram getMe call
```

- Refuses to overwrite an existing `.env` (exits 1 with a clear message).
- Writes with mode `0600`.
- Validates `--bot-token` format and `--allowed-users` numeric format before writing.
- Optionally pings Telegram `getMe` unless `--skip-validation` is passed.

`-y` and `--yes` work as aliases for `--non-interactive`.

## [4.4.5] — 2026-04-09

### 🔐 Security / Information Disclosure

**`BACKLOG.md` removed from published tarball** — The project's internal roadmap was listed in `.gitignore` but not in `.npmignore`, so every `npm install -g alvin-bot` shipped an 8.7 KB file containing the full list of open P0/P1 issues, including known-but-unpatched security weaknesses (WebSocket auth gap, tool-executor sandbox gaps, Web UI HTTP-only, etc.). A published backlog of known vulnerabilities is effectively an attack roadmap for anyone inspecting the package.

`BACKLOG.md` is now listed in `.npmignore` alongside `CLAUDE.md`, `SOUL.md`, and `TOOLS.md`. Verified with `npm pack --dry-run`: the file no longer appears in the tarball.

Users on `4.4.4` or earlier should update:
```bash
npm update -g alvin-bot
```

## [4.4.4] — 2026-04-09

### 🔐 Security / Data Layout

**`.env` now lives only in `DATA_DIR`** — The `ENV_FILE` path constant in `src/paths.ts` has been moved from `BOT_ROOT/.env` to `DATA_DIR/.env` (e.g. `~/.alvin-bot/.env`). This fixes a latent drift bug affecting 6 code sites in `web/server.ts`, `web/setup-api.ts`, `web/doctor-api.ts`, and `services/fallback-order.ts`: before this release, the Web UI's Settings tab, the setup wizard, the doctor repair flow, and the `/fallback` sync were all writing to `BOT_ROOT/.env`, while the bot's config loader in `src/config.ts` reads from `DATA_DIR/.env` first. Changes made through any of those tools were silently written to a file the bot never reads (for globally-installed users, `BOT_ROOT` is inside `node_modules/alvin-bot/` and gets wiped on `npm update -g`).

Why this also matters for security: keeping `.env` inside the code repo is defense-in-depth weak. `.gitignore` can be edited, editors create swap files (`.env.swp`, `.env~`), `git add -f` bypasses ignores, backup tools sync whole project folders, and screensharing shows project directories. Secrets belong physically outside the repo.

**Automatic migration for legacy installs** — `src/migrate.ts` now copies a legacy `BOT_ROOT/.env` to `DATA_DIR/.env` on first run (only if the destination doesn't exist) and enforces `0600` mode regardless of the source permissions. `hasLegacyData()` now recognizes a stray `BOT_ROOT/.env` as a migration trigger. No action is required from existing users — the bot migrates itself.

### 📦 Compatibility

No breaking changes. Existing installs upgrade in place and are auto-migrated.

```bash
npm update -g alvin-bot
```

## [4.4.3] — 2026-04-09

### 🔐 Security
- **Sudo password storage** — Fixed a vulnerability where the sudo password was passed to `/usr/bin/security` as a command-line argument, making it briefly visible in `ps aux` output during keychain writes. Password is now piped via stdin using the documented `-w` prompt mode (must be the last option, and the password is supplied twice for the interactive prompt + confirmation). Byte-exact round-trip verified for arbitrary special characters.

### 🛠 Providers
- **Gemini auto-registration narrowed** — The Google Gemini chat provider is no longer registered automatically just because `GOOGLE_API_KEY` is set. It is now registered only when `google` is referenced as the primary provider or in the fallback chain. The environment variable is still used for other Google-powered features (e.g. `/imagine` image generation) without forcing Gemini onto the chat provider list.

### 🧰 Tooling
- `package-lock.json` now tracks `package.json` version correctly.

## [2.2.0] — 2026-02-24

### 🔐 Security
- **Group approval system** — New groups must be approved by admin before bot responds
- `/groups` — Manage all groups with approve/block inline buttons
- `/security` — Toggle forwarded messages, auto-approve settings
- Blocked groups completely ignored (zero response)
- `data/access.json` persists approvals (gitignored)

### 🤖 Multi-Model
- **Provider abstraction layer** with unified interface
- **Fallback chain**: Claude SDK → Kimi K2.5 → Llama 3.3 70B (all via NVIDIA NIM)
- `/model` — Switch models with inline keyboard buttons
- **Cost tracking per provider** in `/status`
- **Fallback notifications** — User sees ⚡ when provider switches

### 🧠 Memory
- **SOUL.md** — Customizable personality file, hot-reloadable via `/reload`
- **Memory service** — Auto-writes session summaries to daily logs on `/new`
- Non-SDK providers get memory context injected into system prompt
- `/memory` — View memory stats

### 🎨 Rich Interactions
- **Emoji reactions**: 🤔 thinking, 🎧 listening, 👀 looking, 👍 done, 👎 error
- **Inline keyboards** for `/model`, `/effort`, `/lang`
- **Document handling** — PDFs, Word, Excel, code files, CSV, JSON (30+ types)
- **Image generation** — `/imagine` via Gemini API
- **Reply threading** — Bot responses are replies to the original message
- **Reply context** — Quoted messages included as context
- **Forward handling** — Forwarded messages analyzed with sender context
- **Group chat** — Responds to @mentions and replies only

### 📦 Tools & Commands
- `/help` — Complete command overview
- `/web` — DuckDuckGo instant search
- `/remind` — Set, list, cancel reminders
- `/export` — Download conversation as markdown
- `/system` — System info (OS, CPU, RAM, Node)
- `/lang` — Switch DE/EN with inline buttons
- `/ping` — Health check with latency
- `/status` — Enhanced with provider stats, memory, uptime

### 🛠 Infrastructure
- **Dockerfile** + `docker-compose.yml` for containerized deployment
- **CLI**: `npx alvin-bot setup` (wizard), `doctor`, `update`, `version`
- **Markdown sanitizer** — Fixes unbalanced markers for Telegram
- **Graceful shutdown** with 5s grace period
- **Error resilience** — Uncaught exceptions logged, not crashed
- `alvin-bot.config.example.json` for all configurable options

## [2.0.0] — 2026-02-24

### Initial Release
- grammY + Claude Agent SDK integration
- Streaming responses with live message editing
- Voice (Groq Whisper STT + Edge TTS)
- Photo analysis (Claude vision)
- Session management (in-memory)
- PM2 ecosystem config
