# Code Map

Code Map is a per-project structural index of your JavaScript / TypeScript / JSX /
TSX, Python, PHP, and Java codebase. It parses each supported file with Tree-sitter and records the
symbols it defines (functions, classes, types, interfaces, React components and
hooks), what it imports and exports, and how files relate — then answers fast
"what should I read before I edit this?" questions for both human operators and
coding agents.

Code Map is a **discovery aid**, not a build artifact. It never changes your
code, never blocks `bclaw_work`, and degrades gracefully: if the index is
missing or stale, every command says so via a freshness badge instead of
returning silently wrong answers.

The index lives under `.brainclaw/code-map/` (one JSONL shard per file, plus
named symbol/import indexes and a manifest). It is safe to delete; a refresh
rebuilds it.

## When to use it

- **Before editing** an unfamiliar area: `code-map brief <symbol-or-path>` (or
  `bclaw_code_brief`) returns a ranked list of files to read plus related
  brainclaw memory.
- **To locate** a function/class/component/hook by name without grepping:
  `code-map find <query>` (or `bclaw_code_find`).
- **To check coverage / staleness**: `code-map status` (or `bclaw_code_status`).
- **After pulling changes or doing work**: `code-map refresh` to bring the index
  back to `fresh`.

## CLI

All commands are available as `brainclaw code-map …` or `bclaw code-map …`, and
honor the global options (`--cwd`, `--verbose`, `--debug`). Every command also
accepts `--json` for machine-readable output, and prints a `Freshness:` line.

### `brainclaw code-map status`

Read-only. Reports whether the store exists, the freshness badge, and index
stats (files indexed, nodes, edges). Never refreshes.

```bash
brainclaw code-map status
```

```
Code Map status
  Store:    present
  Freshness: fresh
  Files:    142
  Nodes:    1873
  Edges:    2410
```

### `brainclaw code-map refresh [--changed | --all]`

Rebuilds the index behind a per-project lock. Defaults to `--changed`.

| Flag | Behavior |
|---|---|
| `--changed` (default) | Re-parses files whose **content** changed (git status + file-hash diff) **and** any shard whose stored extractor-config / grammar / engine hashes no longer match the current ones (i.e. `stale_extractor` / `stale_grammar`). A config or grammar bump is therefore healed by this cheap path — not only by `--all`. Compaction is limited to git-proven deletes. |
| `--all` | Enumerates every supported file, re-parses, and performs full orphan compaction (drops shards whose file is gone or now ignored). |

If a live writer already holds the project lock, `refresh` **fails fast** with a
clear status rather than blocking — it never stalls `bclaw_work`.

```bash
brainclaw code-map refresh            # changed (cheap, default)
brainclaw code-map refresh --all      # full rebuild + compaction
```

### `brainclaw code-map find <query>`

Read-only. Searches the symbol index for a name/token and returns ranked matches
with path and score. A `missing_index` badge means you should run `refresh`
first.

```bash
brainclaw code-map find useAuth
```

```
Code Map find: "useAuth"
  Freshness: fresh
  [9.0] useAuth hook — src/hooks/useAuth.ts
```

### `brainclaw code-map brief <symbol-or-path>`

Read-only. Builds a reading brief for a symbol or file: a ranked
`suggested files to read` list (capped at 12) plus related brainclaw memory
(decisions / constraints / traps, capped at 5). Use it before editing.

```bash
brainclaw code-map brief App
```

## MCP tools

Capable agents should prefer the MCP surface. The four tools mirror the CLI and
all return a `freshness_badge`:

| Tool | Kind | Purpose |
|---|---|---|
| `bclaw_code_status` | read | Store presence, freshness badge, index stats. Never refreshes. |
| `bclaw_code_find` | read | Ranked symbol-index search (`query`, optional `limit`). Never refreshes. |
| `bclaw_code_brief` | read | Reading brief for a symbol/path (`target`, optional `limit`, files capped at 12). Never refreshes. |
| `bclaw_code_refresh` | write | Rebuild the index. `scope` = `"changed"` (default) or `"all"`. Fails fast on a live lock. |

The read tools never trigger a parse — if `bclaw_code_status` /
`bclaw_code_find` / `bclaw_code_brief` report `missing_index` or a stale badge,
call `bclaw_code_refresh` and retry.

## Freshness badge model

Every Code Map response carries a freshness badge so a stale index is always
visible rather than silently misleading. The status is one of:

| Status | Meaning | Fix |
|---|---|---|
| `fresh` | Index matches the working tree, the extractor config, and the parser binaries. | — |
| `stale_changed_files` | One or more indexed files have changed on disk since they were parsed. | `refresh --changed` |
| `stale_extractor` | The extractor configuration (ignore rules, size caps, supported extensions, query budget, or active language set) changed since these shards were produced. | `refresh --changed` (heals on the cheap path) |
| `stale_grammar` | A Tree-sitter grammar (or the engine glue) binary changed since these shards were produced. | `refresh --changed` (heals on the cheap path) |
| `partial` | The index could not be fully read/built this pass (e.g. the project lock was held by a live writer). | retry |
| `missing_index` | No index exists yet for this project. | `refresh --all` |

Staleness reasons are kept separate on purpose: a content change
(`stale_changed_files`) is independent from a config change (`stale_extractor`)
which is independent from a parser-binary change (`stale_grammar`). The badge
surfaces the dominant reason; `--json` output and the manifest carry the per-file
counts.

**Index freshness vs this call's spot-check.** `bclaw_code_status` reports the
*index* freshness (the manifest state). `bclaw_code_find` / `bclaw_code_brief`
additionally run a bounded, per-query *spot-check* of the files they actually
touch — so a single call can read `stale_changed_files` (a file it looked at
changed on disk) or `partial` (the spot-check hit its budget) even while the index
itself is `fresh`. When the call-level status diverges from the index, the badge
carries an `index_status` detail so the two are not confused, e.g.
`{ status: "partial", details: { index_status: "fresh", partial_reason:
"lazy_check_budget_exhausted" } }` reads as *"index fresh, this call's spot-check
incomplete (budget)"* — not a contradiction with a `fresh` `status()`.

## Lifecycle — pull-based, no daemon

Code Map never runs in the background and never auto-reindexes. The model is lazy
reconciliation at the read path:

1. You edit or pull code — the index does not change.
2. The next `status` / `find` / `brief` recomputes a freshness badge (git status +
   file-hash diff vs the stored shards), so a stale index is always *visible*,
   never silently wrong.
3. `refresh --changed` re-parses only the changed files (incremental); `--all` does
   a full rebuild + orphan compaction.
4. `bclaw_work` nudges a refresh when the badge is `missing_index` or stale, so an
   agent knows to reconcile before trusting the map.

It never blocks `bclaw_work` (a held lock fails fast), so the worst case of a stale
index is a one-line "run refresh" hint — not a wrong answer.

## Monorepos and nested projects

Code Map is **per project**: the index lives at `<project>/.brainclaw/code/`, and
`refresh` indexes the source tree under the project root it runs in — descending
into subdirectories but skipping `node_modules`, `dist`, `.git`, `.brainclaw`,
`vendor`, `target`, … at any depth.

By default there is no nested-project *boundary*, so a plain (non-cascade) scope
follows **where you run it**:

| You run refresh / find / brief … | … against |
|---|---|
| at the monorepo root (plain) | one index covering the whole tree (every child project's source) |
| inside a child project (e.g. `apps/api`) | that child's own index, at `apps/api/.brainclaw/code/` |

When an agent works inside a child project, brainclaw's project resolution routes
Code Map to **that child** — the same per-project scoping that powers `bclaw_work`
/ `bclaw_switch` — so each project gets its own clean map without manual `--cwd`
juggling. A submodule that is itself an application (under e.g. `apps/`) is indexed
like any other directory.

### Cascading a multi-project workspace (`--cascade`)

In a `project_mode: multi-project` workspace, one refresh at the root can index
the whole monorepo **per project** instead of building one monolithic root index:

```bash
brainclaw code-map refresh --all --cascade     # CLI
# bclaw_code_refresh(scope="all", cascade=true) # MCP
```

This refreshes **every nested brainclaw project** into its own
`<child>/.brainclaw/code/` store, and refreshes the **root** store *scoped to the
files no child owns*. The rule is "each file is indexed by exactly the most
specific brainclaw project that contains it" — so there is **zero
double-indexing**, even when projects nest inside one another. `--cascade` is
opt-in; without it, the root refresh keeps its single-tree behaviour (above), and
single-project repos ignore the flag entirely.

`status --cascade` (or `bclaw_code_status(cascade=true)`) adds a per-child recap —
which nested projects have a built index vs `missing_index`, plus an aggregate
count — so you can see workspace-wide freshness from the root.

**Not yet supported** (roadmap):

- A single **federated query** at the root that fans out across the per-child
  indexes and merges the results (today, `--cascade` builds the per-child indexes;
  `find` / `brief` still run against one store at a time).
- **Cross-service edges** — e.g. linking an API call to the route that defines it in
  another service. Code Map indexes language *symbols* and *module imports*, not
  framework routes or runtime HTTP calls, so it does not (today) map "service A calls
  endpoint X defined in service B".

## WASM bundling note

The parser is [Tree-sitter](https://tree-sitter.github.io/) compiled to
WebAssembly. The engine glue (`web-tree-sitter`) and the prebuilt grammar `.wasm`
files (JavaScript / TypeScript / JSX / TSX, Python, PHP, Java) are **bundled into the package** during the
build (`scripts/copy-code-map-wasm.mjs` copies them into `dist/wasm/` and vendors
the engine glue into `dist/vendor/web-tree-sitter/`).

Two properties matter for packaging:

1. **Lazy load on first parse only.** The WASM engine is loaded via a dynamic
   import the first time a file is actually parsed. Nothing in the CLI / MCP
   module-load graph statically imports the parser, so `--version`,
   `code-map status`, `code-map find`, and `code-map brief` all work even if the
   engine is absent — only `refresh` needs it.
2. **Self-contained at runtime.** Because the glue and grammars are vendored into
   `dist/`, parsing works from the published package without the build-time
   dev dependencies. WASM assets are resolved relative to the module
   (`import.meta.url`), never the current working directory, so the loader is
   safe inside git worktrees.
