# AI Kit

[![npm version](https://img.shields.io/npm/v/%40vpxa%2Faikit?label=npm)](https://www.npmjs.com/package/@vpxa/aikit)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/anvpx/kb/blob/main/knowledge-base/LICENSE)
[![Node >=18](https://img.shields.io/badge/node-%3E%3D18-339933)](https://nodejs.org/)
[![MCP Server](https://img.shields.io/badge/MCP-local--first-blue)](https://modelcontextprotocol.io/)

Your agent starts every task half-blind. AI Kit fixes that locally.

AI Kit (`@vpxa/aikit`) is a local-first MCP server for coding agents that need real codebase awareness, not just prompt glue. It gives VS Code Copilot, Cursor, Claude Code, Windsurf, Codex, and other MCP-compatible clients a searchable code graph, token-efficient context tools, guided workflows, and persistent memory, all without sending your code to a cloud backend.

It also turns one overloaded coding agent into a small team of specialists that can plan, research, review, and implement together.

Today that means **65 MCP tools**, **49 CLI commands**, **5 built-in tool profiles**, a persistent curated knowledge system, local ONNX embeddings, and an embedded SQLite-vec store that runs on your machine.

## The Problem

Without an MCP layer built for codebases, most agents behave like this:

- They start each task from zero and re-learn the repo every session.
- They burn tokens on full-file reads when they only needed one function or one symbol path.
- They miss dependencies, side effects, and import edges before making changes.
- They lose decisions, debugging context, and task state the moment the session ends.
- They use too many tools too early because tool discovery itself is expensive.
- They use a single agent for everything: planning, coding, reviewing, and debugging, with no second opinion.

With AI Kit, the loop changes:

| Without AI Kit | With AI Kit |
|---|---|
| Blind repo exploration | Hybrid search, symbol resolution, scope maps |
| Full-file token burn | `file_summary`, `compact`, `digest`, STRATUM cards |
| Risky edits | `audit`, `blast_radius`, `graph`, `check`, `test_run` |
| Lost context between sessions | Curated memory, checkpoints, stash, session digests |
| Tool overload | Profiles and meta-tool discovery |
| One agent doing every job | 17 specialized agents with parallel dispatch and multi-model review |

## How AI Kit Helps

AI Kit sits between your agent and your repo, then keeps the useful state around.

```text
Codebase
  |
  +--> Tree-sitter chunking + symbol extraction + graph build
  |
  +--> Local embeddings + SQLite-vec index + curated memory
  |
AI Kit MCP Server
  |
  +--> Search what matters
  +--> Compress context before reading
  +--> Analyze impact before editing
  +--> Remember what was learned
  |
Agent
  |
  +--> Faster orientation
  +--> Safer changes
  +--> Lower token spend
  +--> Better continuity across sessions
```

Everything runs local by default: search, embeddings, graph queries, memory, and workflow guidance.

## What Your Agent Gets

These are the same **65 tools**, grouped by what they let an agent do, not by internal package layout.

### 🔎 Find anything instantly

- Hybrid search combines semantic search, keyword search, BM25, and Reciprocal Rank Fusion so agents can find concepts and exact strings in one call.
- Symbol resolution finds definitions, references, and imports across files without making the agent grep the repo manually.
- Scope maps generate task-focused reading plans so the agent reads the smallest useful slice first.
- Dead symbol detection surfaces unused exports before they turn into maintenance drag.

### 🧭 Understand before changing

- `audit` rolls structure, dependency, symbol, pattern, entry-point, and health analysis into one call.
- `blast_radius` estimates what a change can break before the edit lands.
- `graph` exposes an auto-built knowledge graph of modules, symbols, and imports with traversal and neighbor queries.
- Mermaid diagram generation turns architecture into something an agent can explain quickly.
- Entry point discovery finds handlers, package exports, tests, and runtime surfaces fast.

### ✂️ Stop burning tokens

- `file_summary` gives file structure in about **50 tokens** instead of a **~1500 token** full read.
- `compact` extracts only the relevant sections, usually cutting context by **5x to 20x**.
- `digest` compresses multiple files or handoff notes into a token-budgeted summary.
- STRATUM cards create reusable context cards with **10x to 100x** reduction for repeat lookups.
- Tool profiles let you expose only the tools a task actually needs: `full`, `safe`, `research`, `minimal`, `discovery`.
- Meta-tool discovery cuts startup overhead by using `list_tools` → `search_tools` → `describe_tool` instead of loading every tool description up front.

### 🧠 Never lose context

- Curated knowledge survives across sessions with `remember`, `read`, `update`, `forget`, and `list`.
- Categories keep memory usable: `conventions`, `decisions`, `patterns`, `context`, `session`.
- Checkpoints save and restore task state when work spans multiple sessions.
- `stash` acts as a lightweight key-value store for intermediate results.
- `session_digest` compresses session activity for handoff or restart.
- Background knowledge workflows can keep durable project knowledge in sync instead of relying on chat history alone.

### 🛠️ Change code and validate it

- Smart rename handles symbol changes across files instead of search-and-replace roulette.
- Regex codemods make repetitive migrations scriptable.
- Sandboxed JS/TS eval lets agents test snippets without leaving the tool surface.
- `check` runs incremental typecheck and lint with structured output.
- `test_run` returns structured test results instead of noisy raw logs.
- JQ-like JSON transforms help agents reshape tool output instead of writing throwaway parsing code.

### 🧱 Work in guided flows, not chaos

- Built-in `basic` and `advanced` flows turn fuzzy work into explicit steps.
- FORGE adds quality gates with Floor, Standard, and Critical tiers plus evidence tracking.
- `guide` recommends which tools to use and in what order for a given task.
- `onboard` runs a full codebase analysis pass in one command.
- Verified lanes give isolated file copies for safe parallel exploration.

### 🤝 Coordinate multiple agents

- AI Kit can scaffold agent instructions, skills, and IDE integration files for Copilot, Claude, Cursor, and related clients.
- Skills document tool categories, operating patterns, and session protocols so agents start with the right behavior.
- Structured handoff artifacts make it easier to resume work without re-explaining the repo.
- Multi-workspace search lets one agent query across more than one enrolled project when needed.

### 🏗️ A team, not a solo act

AI Kit is built around the idea that one agent should not have to do everything in a long serial chain. Instead, work gets split across specialists with smaller scopes, and the pieces that do not depend on each other can run at the same time.

That matters because most real tasks are a mix of planning, implementation, review, debugging, and documentation. When those jobs are separated and parallelized, you spend less time waiting on one overloaded agent and more time moving the whole task forward.

- **Specialization:** AI Kit includes 17 agents with focused roles, so planning, coding, debugging, reviewing, documenting, and architecture work are handled by agents built for those jobs instead of one generalist trying to do all of it.
- **Specialization:** The Orchestrator conducts the work, the Planner researches and shapes the approach, the Implementer writes code, the Debugger traces failures, the Security agent audits risk, the Frontend agent handles UI, the Refactor agent cleans up structure, the Documenter writes docs, and the Explorer maps unfamiliar codebases.
- **Specialization:** Each agent loads only the context and skills it needs for its slice of work, which keeps prompts tighter and reduces the noise that comes from dragging an entire conversation into every task.
- **Specialization:** Different agents can also run on different AI models, so the model that is strongest for planning does not have to be the same one used for implementation, review, or architectural analysis.
- **Parallel dispatch:** The Orchestrator breaks a feature request into independent sub-tasks, such as implementing an API endpoint, writing tests, updating docs, and reviewing the design, instead of treating the whole request as one indivisible job.
- **Parallel dispatch:** Those independent sub-tasks are dispatched to multiple agents at the same time, not one after another, so progress happens in parallel instead of waiting for a single thread of back-and-forth to clear.
- **Parallel dispatch:** Read-only work such as research, analysis, and review can run fully in parallel, which means the system can gather findings from several angles without blocking on file edits.
- **Parallel dispatch:** File-modifying agents can run in parallel too as long as they work in different files, with up to four active editing lanes at once when the work can be safely separated.
- **Parallel dispatch:** In practice, that means a feature that might take one agent 20 rounds of context switching and follow-up can often be finished in 3 or 4 coordinated parallel batches.
- **Parallel dispatch:** Every sub-agent gets fresh, scoped context with just the code and instructions needed for its assignment, which avoids forcing every participant to inherit the full history of the entire task.
- **Multi-model decisions:** When there is a hard technical choice, such as which database to use, which pattern fits best, or how an API should be structured, four Researcher variants can investigate the same question in parallel.
- **Multi-model decisions:** Each Researcher runs on a different AI model, so the recommendation comes from multiple perspectives rather than one model's habits, blind spots, or default biases.
- **Multi-model decisions:** The Orchestrator then compares where those researchers agree, where they disagree, and what tradeoffs they surfaced, turning that into a more balanced recommendation.
- **Dual-perspective review:** Two Code Reviewers examine each change from different AI models, which increases the odds of catching logic issues, regressions, or weak assumptions that one reviewer might miss.
- **Dual-perspective review:** Two Architect Reviewers do the same for higher-level design choices, so structural decisions get challenged from more than one angle before they harden into the codebase.
- **Dual-perspective review:** It works more like having two independent reviewers on a pull request than asking one agent to rubber-stamp its own work.
- **Quality and coordination:** The Orchestrator never writes code itself. Its job is to delegate work, compare results, keep the pieces aligned, and decide when the overall task is actually ready.
- **Quality and coordination:** Every change stays attached to evidence about what was checked, so completion is based on validation and review instead of confidence alone.
- **Quality and coordination:** If one sub-agent gets stuck, the Orchestrator can diagnose the failure, re-delegate the task with a better scope, or escalate the issue instead of letting the whole workflow stall.

### 🌐 Bring useful extras with you

- Web fetch converts pages into clean markdown for LLM use.
- Web search uses DuckDuckGo without requiring an API key.
- The HTTP client is there for endpoint checks and API debugging.
- Utilities cover regex testing, encoding, schema validation, code metrics, changelog generation, and timezone math.

## Token Savings (Real Numbers)

AI Kit exists partly because raw reading is one of the fastest ways to waste context.

| Task | Naive approach | AI Kit approach | Typical reduction |
|---|---|---|---|
| Understand a file | Full file read: ~1500 tokens | `file_summary`: ~50 tokens | ~30x |
| Pull one relevant section | Scan the full file manually | `compact` | 5x to 20x |
| Carry context across files | Read N files separately | `digest` with token budget | Significant multi-file compression |
| Reuse context repeatedly | Re-read source over and over | STRATUM cards | 10x to 100x |
| Discover tools | Load every description: ~3000 tokens | `list_tools` + `search_tools` + `describe_tool`: ~200 tokens | ~15x |

If your agent keeps re-reading files, re-learning commands, or reloading tool docs, you are paying for the same context multiple times.

## Quick Start

```bash
npx @vpxa/aikit init --user
```

That is all you need, installs the server once.

## Editor Support

AI Kit is built for MCP clients. Some editors get scaffolded setup, others use manual MCP config.

| Editor | Status | Setup | Notes |
|---|---|---|---|
| VS Code Copilot | First-class | `npx @vpxa/aikit init --user` | Scaffold + MCP config workflow supported |
| Cursor | First-class | `npx @vpxa/aikit init --user` | Supported by user-level install flow |
| Claude Code | First-class | `npx @vpxa/aikit init --user` | `.mcp.json` workflow supported |
| Windsurf | First-class | `npx @vpxa/aikit init --user` | Included in user-level install targets |
| Codex | MCP-compatible | Manual config | Use AI Kit as a local MCP server where Codex MCP integration is available |

AI Kit is still just an MCP server at the boundary, which means any client that speaks MCP can integrate with it even if the scaffold is not editor-specific yet.

## Where It Fits Best

AI Kit is a strong fit when your agent is doing real repository work instead of toy demos.

- Large or medium codebases where naive search stops being enough.
- Refactors where symbol resolution and blast radius matter.
- Long-running work where decisions need to survive beyond one chat.
- Multi-agent setups where another agent may need to continue the task later.
- Teams that want local-first indexing and memory instead of wiring together hosted services.

If the repo is tiny and the work is one-shot, you may not need the full surface. That is also why profiles and the minimal path exist.

## Tool Highlights

These are short examples of the kind of interactions AI Kit is built for.

### 1. Find the right implementation first

```ts
search({ query: "retry with backoff", limit: 3 })
```

```json
{
  "results": [
    {
      "path": "src/retry.ts",
      "score": 0.92,
      "preview": "export async function retryWithBackoff(...)"
    },
    {
      "path": "src/http/client.ts",
      "score": 0.81,
      "preview": "uses exponential backoff for 429 responses"
    }
  ],
  "_next": "Use symbol() or compact() on the top result"
}
```

The point is not just search. It is narrowing the next best action immediately.

### 2. Understand a file without paying for the whole file

```ts
file_summary({ path: "src/auth/guard.ts" })
```

```json
{
  "exports": ["requireAuth", "resolvePrincipal"],
  "imports": ["jwt", "errors", "session-store"],
  "functions": ["requireAuth", "resolvePrincipal", "parseBearerToken"],
  "summary": "Auth gate, token parsing, principal resolution"
}
```

That is usually enough to decide whether deeper reading is worth the tokens.

### 3. Check risk before a refactor

```ts
blast_radius({ changed_files: ["src/payments/ledger.ts"] })
```

```json
{
  "directDependents": 6,
  "indirectDependents": 14,
  "hotspots": [
    "src/payments/reconcile.ts",
    "src/api/routes/payments.ts"
  ],
  "risk": "medium"
}
```

That is the difference between changing a file and changing a system.

### 4. Audit the repo surface in one shot

```ts
audit({ path: ".", detail: "summary" })
```

```json
{
  "score": 83,
  "checks": ["structure", "dependencies", "patterns", "health", "entry_points"],
  "topRecommendations": [
    "add tests around server startup config",
    "review circular dependency warnings"
  ]
}
```

Instead of asking an agent to stitch seven separate analyses together, you hand it one synthesized baseline.

### 5. Resume tomorrow without losing the thread

```ts
session_digest({ persist: true, focus: "auth migration" })
```

```json
{
  "summary": "Completed middleware rename, verified 14 tests, open question on refresh token expiry.",
  "artifacts": ["stash", "checkpoint", "replay"],
  "saved": true
}
```

That turns a long task into something a new session can actually pick up.

## Tool Profiles (Control What's Loaded)

AI Kit ships with **5 built-in profiles** so you do not have to expose the full surface to every task.

| Profile | What it optimizes for | Best use |
|---|---|---|
| `full` | Everything enabled | General development and orchestration |
| `safe` | Read-only analysis | Reviews, audits, architecture inspection |
| `research` | Search, analysis, knowledge, web | Investigation and documentation |
| `minimal` | Essential low-overhead set | Simple tasks and low-token sessions |
| `discovery` | Guided exploration + meta-tools | Onboarding, tool learning, capability discovery |

The `discovery` profile is especially useful when you want the agent to learn the tool surface incrementally instead of front-loading all descriptions.

## Development Workflows

AI Kit does more than expose tools. It gives agents an operating model.

### Guided flows

- `aikit:basic` is built for bug fixes, config work, and focused implementation.
- `aikit:advanced` adds spec, planning, execution, and verification stages for larger changes.
- Custom flows let teams define their own repeatable delivery process.

### FORGE quality gates

- **Floor** for low-risk tasks.
- **Standard** for normal implementation work.
- **Critical** for changes with bigger blast radius or contract risk.
- Evidence tracking keeps claims marked as verified, assumed, or unknown instead of blending them together.

### One-command orientation

```bash
aikit onboard
aikit guide "fix a failing test"
aikit audit
```

That is the difference between an agent improvising and an agent following a playbook.

## Custom Flows (Your Workflow, Your Rules)

Every team has its own delivery rhythm. AI Kit does not lock you into a single way of working. Its flow system is pluggable, so you can shape guidance around the process your team already trusts.

### Built-in flows

- `aikit:basic` and `aikit:advanced` are available out of the box.
- Your own flows live alongside them, so built-in guidance and team-specific workflows can coexist cleanly.

### Your own flows

- A flow is a sequence of steps, and each step is written in simple markdown instructions.
- Those steps can point to any skill, tool, or agent that makes sense for that part of the job.
- Teams can create flows like `deploy`, `code-review`, or `incident-response` to match real work.
- Flows are not hardcoded. They are runtime data you can add, remove, update, and refine as your process changes.

### Flow tools

- `flow({ action: 'list' })`, `flow({ action: 'info' })`, and `flow({ action: 'status' })` show what flows exist and where work stands.
- `flow({ action: 'start' })`, `flow({ action: 'step' })`, and `flow({ action: 'read' })` help the agent move through the current workflow.
- `flow({ action: 'reset' })` gives you a clean restart when needed.
- `flow({ action: 'add' })`, `flow({ action: 'remove' })`, and `flow({ action: 'update' })` let you manage custom flows over time.

That means AI Kit can adapt to your workflow instead of asking your workflow to adapt to AI Kit.

## Persistent Memory

Most agent workflows break because context disappears between sessions. AI Kit treats memory as a first-class system.

### Session 1

- The agent debugs a permissions bug.
- It stores the finding with `remember()` under `decisions`.
- It saves a checkpoint before stopping.

### Session 2

- The agent runs `search("SESSION CHECKPOINT")`.
- It reads the previous decision instead of rediscovering it.
- It resumes from the saved task state and open question.

### Session 3

- Another agent joins the work.
- `session_digest` summarizes what happened.
- The handoff is short because the memory is already structured.

This is the practical value: less re-explaining, less drift, fewer repeated mistakes.

## Privacy & Security

AI Kit is built to keep your code local.

- Embeddings run with **local ONNX models**. No API key required.
- Vector storage uses **SQLite-vec** embedded on disk.
- Tree-sitter chunking and graph extraction run locally against your source.
- Search, memory, tool discovery, and session digests do not require a cloud backend.
- You decide when web fetch or web search is used.

If you work in air-gapped or sensitive environments, that local-first architecture is not a bonus feature. It is the baseline.

## Tech Stack

| Layer | What AI Kit uses |
|---|---|
| Package | `@vpxa/aikit` |
| Protocol | Model Context Protocol (MCP) |
| Runtime | Node.js 18+ |
| Language | TypeScript |
| Embeddings | Local ONNX (`mixedbread-ai/mxbai-embed-large-v1`, 1024 dims) |
| Vector store | SQLite-vec |
| Chunking | Tree-sitter AST + regex fallback |
| Graph | Auto-populated module/symbol/import knowledge graph from TS/JS source |
| Search | Semantic + keyword + BM25 + Reciprocal Rank Fusion |
| Output surfaces | MCP server, CLI, scaffolded editor integration |

## Why Teams Evaluate It

AI Kit is useful when you need one or more of these at the same time:

- Better code search than plain text grep.
- Lower token cost without throwing away relevant context.
- Safer changes through blast radius and workflow gates.
- Cross-session memory that survives the chat window.
- A local-first setup that does not depend on remote embedding or vector services.
- A single toolkit that works across editor agents instead of being trapped in one client.
- A multi-agent team that splits work across specialized roles instead of overloading one agent.

It is not trying to be a hosted coding platform. It is the local intelligence layer that makes coding agents less forgetful and less reckless.

---

If your agent already writes code well but still wastes time re-reading files, missing dependencies, forgetting yesterday's conclusions, and relying on one model to plan, code, review, and decide everything, AI Kit is the missing layer.