---
title: RAG (Retrieval-Augmented Generation)
description: How Smithers chunks documents, embeds them, stores vectors, and retrieves context at query time.
---

Your agent needs to answer questions about a codebase, a set of design docs, or a knowledge base. The model's training data does not cover your private documents. You could paste everything into the prompt, but that blows up the context window and costs a fortune. RAG solves this by fetching only the relevant pieces at query time.

## The Pipeline

RAG in Smithers is a four-step pipeline:

1. **Chunk** -- split documents into small, overlapping pieces
2. **Embed** -- convert each chunk into a vector using an embedding model
3. **Store** -- persist vectors in a SQLite table alongside the original text
4. **Retrieve** -- embed the query, find the closest vectors, return the matching chunks

```
Document ──▶ Chunker ──▶ Embedder ──▶ Vector Store
                                           │
Query ──▶ Embedder ──▶ Similarity Search ──┘──▶ Ranked Results
```

Each step is a plain function. You can use them individually or wire them together with `createRagPipeline`.

## Chunking Strategies

A document rarely fits in a single embedding. Chunking breaks it into pieces that are small enough to embed and specific enough to be useful when retrieved.

Smithers ships five strategies:

| Strategy | Splits on | Best for |
|-----------|-----------|----------|
| `recursive` | Paragraphs, then lines, then words, then characters | General text (default) |
| `character` | Fixed character count | Uniform chunk sizes |
| `sentence` | Sentence boundaries | Prose, articles |
| `markdown` | Headings and sections | Documentation, READMEs |
| `token` | Approximate token count (~4 chars/token) | Token-budget-aware splitting |

Every strategy accepts `size` (max characters per chunk) and `overlap` (characters shared between adjacent chunks). Overlap prevents information loss at chunk boundaries. The `character` strategy also accepts `separator` to override the default `"\n\n"` split boundary.

```ts
import { chunk, createDocument } from "smithers-orchestrator/rag";

const doc = createDocument("Your long document text here...");
const chunks = chunk(doc, { strategy: "recursive", size: 1000, overlap: 200 });

// Character strategy with a custom separator
const csvChunks = chunk(doc, { strategy: "character", size: 500, overlap: 50, separator: "\n" });
```

## Embedding

Smithers wraps the Vercel AI SDK's `embed()` and `embedMany()`. You bring any embedding model the AI SDK supports -- OpenAI, Google, Mistral, Cohere.

```ts
import { embedChunks, embedQuery } from "smithers-orchestrator/rag";
import { openai } from "@ai-sdk/openai";

const model = openai.embedding("text-embedding-3-small");
const embedded = await embedChunks(chunks, model);
const queryVector = await embedQuery("How does caching work?", model);
```

The embedder is intentionally thin. It bridges Smithers chunk types to the AI SDK and adds structured logging. No custom vector math.

## Vector Store

Vectors live in SQLite. No external database required. The `_smithers_vectors` table stores each chunk's text, embedding (as a Float32 BLOB), dimensions, and metadata (as JSON). Document metadata set via `createDocument` is propagated to every chunk and persisted in `metadata_json`, so it survives round-trips through the store. Queries do a full-table scan with JavaScript cosine similarity using the AI SDK's `cosineSimilarity()`.

This is fast enough for typical RAG workloads (hundreds to low thousands of chunks). If you outgrow it, swap in a different store implementation.

```ts
import { createSqliteVectorStore } from "smithers-orchestrator/rag";

const store = createSqliteVectorStore(workflow.db);
await store.upsert(embedded);
const results = await store.query(queryVector, { topK: 5 });
```

## The RAG Pipeline

`createRagPipeline` wires all four steps together:

```ts
import { createRagPipeline } from "smithers-orchestrator/rag";

const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown", size: 1000, overlap: 200 },
  topK: 10, // default topK for all retrieve calls (default 10)
  namespace: "docs", // optional: scope to a namespace
});

// Ingest
await pipeline.ingest([doc1, doc2]);
await pipeline.ingestFile("./docs/architecture.md");

// Retrieve — per-call topK overrides the pipeline default
const results = await pipeline.retrieve("How does the scheduler work?", { topK: 5 });
```

## The RAG Tool

Agents can search the knowledge base themselves. `createRagTool` exposes the pipeline as a tool:

```ts
import { createRagTool } from "smithers-orchestrator/rag";

const searchKnowledge = createRagTool(pipeline, {
  name: "search_knowledge",
  description: "Search the project knowledge base",
  defaultTopK: 5, // default results returned when agent omits topK
});
```

Hand this tool to any agent. When the agent calls it, Smithers embeds the query, searches the vector store, and returns the top results with relevance scores and metadata. The agent's `topK` parameter accepts 1-50; when omitted, `defaultTopK` (default 5) is used.

## Namespaces

A single vector store can hold multiple document collections. Pass a `namespace` to keep them separate:

```ts
const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "recursive" },
  namespace: "api-docs",
});
```

Different namespaces share the same SQLite table but queries only search within their namespace.

## Document Format Detection

When you call `createDocument(content)`, Smithers auto-detects the format so the chunker can split intelligently:

| Detection rule | Assigned format |
|---------------|-----------------|
| Content starts with `{` or `[` and is valid JSON | `json` |
| Content starts with `<!` or `<html` | `html` |
| Content has a line starting with one to six `#` characters | `markdown` |
| Everything else | `text` |

You can override auto-detection by passing `format` explicitly:

```ts
const doc = createDocument(content, { format: "markdown" });
```

`loadDocument(path)` uses the file extension (`.md`, `.mdx`, `.html`, `.htm`, `.json`) as a hint before inspecting the content, so the chunker uses heading-aware splitting for Markdown files even if the heading markers are uncommon.

## Deleting Vectors

Remove specific chunks from the vector store by ID:

```ts
await store.delete(["chunk-id-1", "chunk-id-2"]);
```

Passing an empty array is a no-op. Use this to keep the store current when source documents are updated or removed.

## Counting Vectors

Check how many chunks are stored in a namespace:

```ts
const total = await store.count();            // default namespace
const apiDocs = await store.count("api-docs"); // specific namespace
```

Useful for verifying that ingestion completed and for monitoring store growth over time.

## Query Filters

`VectorQueryOptions` accepts an optional `filter` map that is passed through to the store implementation. The SQLite store does not apply metadata filters during the SQL query (it scores all rows and sorts), but custom store implementations can use `filter` to pre-select rows:

```ts
const results = await store.query(queryVector, {
  topK: 5,
  namespace: "api-docs",
  filter: { source: "architecture.md" },
});
```

When using the default SQLite store, include relevant metadata fields in the document itself and filter the returned results in application code.

## Effect Service Layer

For Effect-native workflows, a `RagService` Effect layer wraps the pipeline:

```ts
import { RagService, createRagServiceLayer, retrieve, ingest } from "smithers-orchestrator/rag";
import { Effect } from "effect";

const layer = createRagServiceLayer({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown" },
});

const program = Effect.gen(function* () {
  yield* ingest([doc]);
  const results = yield* retrieve("How does auth work?", 5);
  return results;
}).pipe(Effect.provide(layer));
```

`ingest` and `retrieve` are convenience functions that pull `RagService` from Effect context automatically.

Lower-level Effect wrappers are also exported for direct use outside the service layer: `embedChunksEffect`, `embedQueryEffect`, `ingestEffect`, `retrieveEffect`, `upsertEffect`, and `queryEffect`. These give you Effect-typed versions of each pipeline step without requiring the full `RagService` context.

## Observability Metrics

RAG operations export four metrics:

| Metric | Type | Description |
|--------|------|-------------|
| `smithers.rag.ingest_total` | counter | Total documents ingested (incremented per `ingest` call by document count) |
| `smithers.rag.retrieve_total` | counter | Total retrieval queries executed |
| `smithers.rag.retrieve_duration_ms` | histogram | End-to-end retrieval latency (embed + query) |
| `smithers.rag.embed_duration_ms` | histogram | Time to embed a batch of chunks |

These integrate with the standard Smithers observability pipeline and appear in Prometheus exports and OpenTelemetry traces.

## CLI

Ingest files and query from the command line:

```bash
# Ingest a file
smithers rag ingest ./docs/api.md --workflow my-workflow.tsx

# Query the knowledge base
smithers rag query "How does authentication work?" --workflow my-workflow.tsx --top-k 5
```
