# Observational Memory

**Added in:** `@mastra/memory@1.1.0`

Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an **Observer** that watches conversations and creates observations, and a **Reflector** that restructures observations by combining related items, reflecting on overarching patterns, and condensing where possible — maintain an observation log that replaces raw message history as it grows.

## Usage

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: true,
    },
  }),
})
```

## Configuration

The `observationalMemory` option accepts `true`, a configuration object, or `false`. Setting `true` enables OM with `google/gemini-2.5-flash` as the default model. When passing a config object, a `model` must be explicitly set — either at the top level, or on `observation.model` and/or `reflection.model`.

Observer input is multimodal-aware. OM keeps text placeholders like `[Image #1: screenshot.png]` in the transcript it builds for the Observer, and also sends the underlying image parts when possible. This applies to both single-thread observation and batched multi-thread observation. Non-image files appear as placeholders only.

OM performs thresholding with fast local token estimation. Text uses `tokenx`, and image-like inputs use provider-aware heuristics plus deterministic fallbacks when metadata is incomplete.

**enabled** (`boolean`): Enable or disable Observational Memory. When omitted from a config object, defaults to \`true\`. Only \`enabled: false\` explicitly disables it. (Default: `true`)

**model** (`string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]`): Model for both the Observer and Reflector agents. Sets the model for both at once. Cannot be used together with \`observation.model\` or \`reflection.model\` — an error will be thrown if both are set. When using \`observationalMemory: true\`, defaults to \`google/gemini-2.5-flash\`. When passing a config object, this or \`observation.model\`/\`reflection.model\` must be set. Use \`"default"\` to explicitly use the default model (\`google/gemini-2.5-flash\`). (Default: `'google/gemini-2.5-flash' (when using observationalMemory: true)`)

**scope** (`'resource' | 'thread'`): Memory scope for observations. \`'thread'\` keeps observations per-thread. \`'resource'\` (experimental) shares observations across all threads for a resource, enabling cross-conversation memory. (Default: `'thread'`)

**shareTokenBudget** (`boolean`): Share the token budget between messages and observations. When enabled, the total budget is \`observation.messageTokens + reflection.observationTokens\`. Messages can use more space when observations are small, and vice versa. This maximizes context usage through flexible allocation. \`shareTokenBudget\` is not yet compatible with async buffering. You must set \`observation: { bufferTokens: false }\` when using this option (this is a temporary limitation). (Default: `false`)

**retrieval** (`boolean | { vector?: boolean; scope?: 'thread' | 'resource' }`): \*\*Experimental.\*\* Enable retrieval-mode observation groups as durable pointers to raw message history. \`true\` enables cross-thread browsing by default. \`{ vector: true }\` also enables semantic search using Memory's vector store and embedder. \`{ scope: 'thread' }\` restricts the recall tool to the current thread only. Default scope is \`'resource'\`. (Default: `false`)

**observation** (`ObservationalMemoryObservationConfig`): Configuration for the observation step. Controls when the Observer agent runs and how it behaves.

**observation.model** (`string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]`): Model for the Observer agent. Cannot be set if a top-level \`model\` is also provided. If neither this nor the top-level \`model\` is set, falls back to \`reflection.model\`.

**observation.instruction** (`string`): Custom instruction appended to the Observer's system prompt. Use this to customize what the Observer focuses on, such as domain-specific preferences or priorities.

**observation.threadTitle** (`boolean`): When \`true\`, the Observer suggests short thread titles and updates the thread title when the conversation topic meaningfully changes. This is opt-in and defaults to disabled.

**observation.messageTokens** (`number`): Token count of unobserved messages that triggers observation. When unobserved message tokens exceed this threshold, the Observer agent is called. Text is estimated locally with \`tokenx\`. Image parts are included with model-aware heuristics when possible, with deterministic fallbacks when image metadata is incomplete. Image-like \`file\` parts are counted the same way when uploads are normalized as files.

**observation.maxTokensPerBatch** (`number`): Maximum tokens per batch when observing multiple threads in resource scope. Threads are chunked into batches of this size and processed in parallel. Lower values mean more parallelism but more API calls.

**observation.modelSettings** (`ObservationalMemoryModelSettings`): Model settings for the Observer agent.

**observation.modelSettings.temperature** (`number`): Temperature for generation. Lower values produce more consistent output.

**observation.modelSettings.maxOutputTokens** (`number`): Maximum output tokens. Set high to prevent truncation of observations.

**observation.bufferTokens** (`number | false`): Token interval for async background observation buffering. Can be an absolute token count (e.g. \`5000\`) or a fraction of \`messageTokens\` (e.g. \`0.25\` = buffer every 25% of threshold). When set, observations run in the background at this interval, storing results in a buffer. When the main \`messageTokens\` threshold is reached, buffered observations activate instantly without a blocking LLM call. Must resolve to less than \`messageTokens\`. Set to \`false\` to explicitly disable all async buffering (both observation and reflection).

**observation.bufferActivation** (`number`): Controls how much of the message window to retain after activation. Accepts a ratio (0-1) or an absolute token count (≥ 1000). For example, \`0.8\` means: activate enough buffers to remove 80% of \`messageTokens\` and leave 20% as active message history. An absolute token count like \`4000\` targets a goal of keeping \~4k message tokens remaining after activation. Higher values remove more message history per activation when using a ratio. Higher values keep more message history when using a token count.

**observation.blockAfter** (`number`): Token threshold above which synchronous (blocking) observation is forced. Between \`messageTokens\` and \`blockAfter\`, only async buffering/activation is used. Above \`blockAfter\`, a synchronous observation runs as a last resort, while buffered activation still preserves a minimum remaining context (min(1000, retention floor)). Accepts a multiplier (1 < value < 2, multiplied by \`messageTokens\`) or an absolute token count (≥ 2, must be greater than \`messageTokens\`). Only relevant when \`bufferTokens\` is set. Defaults to \`1.2\` when async buffering is enabled.

**observation.previousObserverTokens** (`number | false`): Optional token budget for the observer's previous-observations context. When set to a number, the observations passed to the Observer agent are tail-truncated to fit within this budget while keeping the newest observations and preserving highlighted 🔴 items when possible. When a buffered reflection is pending, the already-reflected observation lines are automatically replaced with the reflection summary before truncation. Set to \`0\` to omit previous observations entirely, or \`false\` to disable truncation explicitly.

**reflection** (`ObservationalMemoryReflectionConfig`): Configuration for the reflection step. Controls when the Reflector agent runs and how it behaves.

**reflection.model** (`string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]`): Model for the Reflector agent. Cannot be set if a top-level \`model\` is also provided. If neither this nor the top-level \`model\` is set, falls back to \`observation.model\`.

**reflection.instruction** (`string`): Custom instruction appended to the Reflector's system prompt. Use this to customize how the Reflector consolidates observations, such as prioritizing certain types of information.

**reflection.observationTokens** (`number`): Token count of observations that triggers reflection. When observation tokens exceed this threshold, the Reflector agent is called to condense them.

**reflection.modelSettings** (`ObservationalMemoryModelSettings`): Model settings for the Reflector agent.

**reflection.modelSettings.temperature** (`number`): Temperature for generation. Lower values produce more consistent output.

**reflection.modelSettings.maxOutputTokens** (`number`): Maximum output tokens. Set high to prevent truncation of observations.

**reflection.bufferActivation** (`number`): Ratio (0-1) controlling when async reflection buffering starts. When observation tokens reach \`observationTokens \* bufferActivation\`, reflection runs in the background. On activation at the full threshold, the buffered reflection replaces the observations it covers, preserving any new observations appended after that range.

**reflection.blockAfter** (`number`): Token threshold above which synchronous (blocking) reflection is forced. Between \`observationTokens\` and \`blockAfter\`, only async buffering/activation is used. Above \`blockAfter\`, a synchronous reflection runs as a last resort. Accepts a multiplier (1 < value < 2, multiplied by \`observationTokens\`) or an absolute token count (≥ 2, must be greater than \`observationTokens\`). Only relevant when \`bufferActivation\` is set. Defaults to \`1.2\` when async reflection is enabled.

### Token estimate metadata cache

OM persists token payload estimates so repeated counting can reuse prior token estimation work.

- Part-level cache: `part.providerMetadata.mastra`.
- String-content fallback cache: message-level metadata when no parts exist.
- Cache entries are ignored and recomputed if cache version/tokenizer source doesn't match.
- Per-message and per-conversation overhead are always recomputed at runtime and aren't cached.
- `data-*` and `reasoning` parts are skipped and don't receive cache entries.

## Examples

### Resource scope with custom thresholds (experimental)

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        scope: 'resource',
        observation: {
          messageTokens: 20_000,
        },
        reflection: {
          observationTokens: 60_000,
        },
      },
    },
  }),
})
```

### Shared token budget

When `shareTokenBudget` is enabled, the total budget is `observation.messageTokens + reflection.observationTokens` (100k in this example). If observations only use 30k tokens, messages can expand to use up to 70k. If messages are short, observations have more room before triggering reflection.

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        shareTokenBudget: true,
        observation: {
          messageTokens: 20_000,
          bufferTokens: false, // required when using shareTokenBudget (temporary limitation)
        },
        reflection: {
          observationTokens: 80_000,
        },
      },
    },
  }),
})
```

### Custom model

By passing a `model` in the config, you can use any model from Mastra's model router.

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'openai/gpt-5-mini',
      },
    },
  }),
})
```

### Different models per agent

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        observation: {
          model: 'google/gemini-2.5-flash',
        },
        reflection: {
          model: 'openai/gpt-5-mini',
        },
      },
    },
  }),
})
```

### Custom instructions

Customize what the Observer and Reflector focus on by providing custom instructions:

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'health-assistant',
  instructions: 'You are a health and wellness assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          // Focus observations on health-related preferences and goals
          instruction:
            'Prioritize capturing user health goals, dietary restrictions, exercise preferences, and medical considerations. Avoid capturing general chit-chat.',
        },
        reflection: {
          // Guide reflection to consolidate health patterns
          instruction:
            'When consolidating, group related health information together. Preserve specific metrics, dates, and medical details.',
        },
      },
    },
  }),
})
```

### Async buffering

Async buffering is **enabled by default**. It pre-computes observations in the background as the conversation grows — when the `messageTokens` threshold is reached, buffered observations activate instantly with no blocking LLM call.

The lifecycle is: **buffer → activate → remove messages → repeat**. Background Observer calls run at `bufferTokens` intervals, each producing a chunk of observations. At threshold, chunks activate: observations move into the log, raw messages are removed from context. The `blockAfter` threshold forces a synchronous fallback if buffering can't keep up.

Default settings:

- `observation.bufferTokens: 0.2` — buffer every 20% of `messageTokens` (e.g. every \~6k tokens with a 30k threshold)
- `observation.bufferActivation: 0.8` — on activation, remove enough messages to keep only 20% of the threshold remaining
- Buffered observations include continuation hints (`suggestedResponse`, `currentTask`) that survive activation to maintain conversational continuity
- `reflection.bufferActivation: 0.5` — start background reflection at 50% of observation threshold

To customize:

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          messageTokens: 30_000,
          // Buffer every 5k tokens (runs in background)
          bufferTokens: 5_000,
          // Activate to retain 30% of threshold
          bufferActivation: 0.7,
          // Force synchronous observation at 1.5x threshold
          blockAfter: 1.5,
        },
        reflection: {
          observationTokens: 60_000,
          // Start background reflection at 50% of threshold
          bufferActivation: 0.5,
          // Force synchronous reflection at 1.2x threshold
          blockAfter: 1.2,
        },
      },
    },
  }),
})
```

To disable async buffering entirely:

```typescript
observationalMemory: {
  model: "google/gemini-2.5-flash",
  observation: {
    bufferTokens: false,
  },
}
```

Setting `bufferTokens: false` disables both observation and reflection async buffering. Observations and reflections will run synchronously when their thresholds are reached.

> **Note:** Async buffering isn't supported with `scope: 'resource'` and is automatically disabled in resource scope.

## Streaming data parts

Observational Memory emits typed data parts during agent execution that clients can use for real-time UI feedback. These are streamed alongside the agent's response.

### `data-om-status`

Emitted once per agent loop step, before model generation. Provides a snapshot of the current memory state, including token usage for both context windows and the state of any async buffered content.

```typescript
interface DataOmStatusPart {
  type: 'data-om-status'
  data: {
    windows: {
      active: {
        /** Unobserved message tokens and the threshold that triggers observation */
        messages: { tokens: number; threshold: number }
        /** Observation tokens and the threshold that triggers reflection */
        observations: { tokens: number; threshold: number }
      }
      buffered: {
        observations: {
          /** Number of buffered chunks staged for activation */
          chunks: number
          /** Total message tokens across all buffered chunks */
          messageTokens: number
          /** Projected message tokens that would be removed if activation happened now (based on bufferActivation ratio and chunk boundaries) */
          projectedMessageRemoval: number
          /** Observation tokens that will be added on activation */
          observationTokens: number
          /** idle: no buffering in progress. running: background observer is working. complete: chunks are ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
        reflection: {
          /** Observation tokens that were fed into the reflector (pre-compression size) */
          inputObservationTokens: number
          /** Observation tokens the reflection will produce on activation (post-compression size) */
          observationTokens: number
          /** idle: no reflection buffered. running: background reflector is working. complete: reflection is ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
      }
    }
    recordId: string
    threadId: string
    stepNumber: number
    /** Increments each time the Reflector creates a new generation */
    generationCount: number
  }
}
```

`buffered.reflection.inputObservationTokens` is the size of the observations that were sent to the Reflector. `buffered.reflection.observationTokens` is the compressed result — the size of what will replace those observations when the reflection activates. A client can use these two values to show a compression ratio.

Clients can derive percentages and post-activation estimates from the raw values:

```typescript
// Message window usage %
const msgPercent = status.windows.active.messages.tokens / status.windows.active.messages.threshold

// Observation window usage %
const obsPercent =
  status.windows.active.observations.tokens / status.windows.active.observations.threshold

// Projected message tokens after buffered observations activate
// Uses projectedMessageRemoval which accounts for bufferActivation ratio and chunk boundaries
const postActivation =
  status.windows.active.messages.tokens -
  status.windows.buffered.observations.projectedMessageRemoval

// Reflection compression ratio (when buffered reflection exists)
const { inputObservationTokens, observationTokens } = status.windows.buffered.reflection
if (inputObservationTokens > 0) {
  const compressionRatio = observationTokens / inputObservationTokens
}
```

### `data-om-observation-start`

Emitted when the Observer or Reflector agent begins processing.

**cycleId** (`string`): Unique ID for this cycle — shared between start/end/failed markers.

**operationType** (`'observation' | 'reflection'`): Whether this is an observation or reflection operation.

**startedAt** (`string`): ISO timestamp when processing started.

**tokensToObserve** (`number`): Message tokens (input) being processed in this batch.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

**threadIds** (`string[]`): All thread IDs in this batch (for resource-scoped).

**config** (`ObservationMarkerConfig`): Snapshot of \`messageTokens\`, \`observationTokens\`, and \`scope\` at observation time.

### `data-om-observation-end`

Emitted when observation or reflection completes successfully.

**cycleId** (`string`): Matches the corresponding \`start\` marker.

**operationType** (`'observation' | 'reflection'`): Type of operation that completed.

**completedAt** (`string`): ISO timestamp when processing completed.

**durationMs** (`number`): Duration in milliseconds.

**tokensObserved** (`number`): Message tokens (input) that were processed.

**observationTokens** (`number`): Resulting observation tokens (output) after the Observer compressed them.

**observations** (`string`): The generated observations text.

**currentTask** (`string`): Current task extracted by the Observer.

**suggestedResponse** (`string`): Suggested response extracted by the Observer.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

### `data-om-observation-failed`

Emitted when observation or reflection fails. The system falls back to synchronous processing.

**cycleId** (`string`): Matches the corresponding \`start\` marker.

**operationType** (`'observation' | 'reflection'`): Type of operation that failed.

**failedAt** (`string`): ISO timestamp when the failure occurred.

**durationMs** (`number`): Duration until failure in milliseconds.

**tokensAttempted** (`number`): Message tokens (input) that were attempted.

**error** (`string`): Error message.

**observations** (`string`): Any partial content available for display.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

### `data-om-buffering-start`

Emitted when async buffering begins in the background. Buffering pre-computes observations or reflections before the main threshold is reached.

**cycleId** (`string`): Unique ID for this buffering cycle.

**operationType** (`'observation' | 'reflection'`): Type of operation being buffered.

**startedAt** (`string`): ISO timestamp when buffering started.

**tokensToBuffer** (`number`): Message tokens (input) being buffered in this cycle.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

**threadIds** (`string[]`): All thread IDs being buffered (for resource-scoped).

**config** (`ObservationMarkerConfig`): Snapshot of config at buffering time.

### `data-om-buffering-end`

Emitted when async buffering completes. The content is stored but not yet activated in the main context.

**cycleId** (`string`): Matches the corresponding \`buffering-start\` marker.

**operationType** (`'observation' | 'reflection'`): Type of operation that was buffered.

**completedAt** (`string`): ISO timestamp when buffering completed.

**durationMs** (`number`): Duration in milliseconds.

**tokensBuffered** (`number`): Message tokens (input) that were buffered.

**bufferedTokens** (`number`): Observation tokens (output) after the Observer compressed them.

**observations** (`string`): The buffered content.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

### `data-om-buffering-failed`

Emitted when async buffering fails. The system falls back to synchronous processing when the threshold is reached.

**cycleId** (`string`): Matches the corresponding \`buffering-start\` marker.

**operationType** (`'observation' | 'reflection'`): Type of operation that failed.

**failedAt** (`string`): ISO timestamp when the failure occurred.

**durationMs** (`number`): Duration until failure in milliseconds.

**tokensAttempted** (`number`): Message tokens (input) that were attempted to buffer.

**error** (`string`): Error message.

**observations** (`string`): Any partial content.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

### `data-om-activation`

Emitted when buffered observations or reflections are activated (moved into the active context window). This is an instant operation — no LLM call is involved.

**cycleId** (`string`): Unique ID for this activation event.

**operationType** (`'observation' | 'reflection'`): Type of content activated.

**activatedAt** (`string`): ISO timestamp when activation occurred.

**chunksActivated** (`number`): Number of buffered chunks activated.

**tokensActivated** (`number`): Message tokens (input) from activated chunks. For observation activation, these are removed from the message window. For reflection activation, this is the observation tokens that were compressed.

**observationTokens** (`number`): Resulting observation tokens after activation.

**messagesActivated** (`number`): Number of messages that were observed via activation.

**generationCount** (`number`): Current reflection generation count.

**observations** (`string`): The activated observations text.

**recordId** (`string`): The OM record ID.

**threadId** (`string`): This thread's ID.

**config** (`ObservationMarkerConfig`): Snapshot of config at activation time.

## Standalone usage

Most users should use the `Memory` class above. Using `ObservationalMemory` directly is mainly useful for benchmarking, experimentation, or when you need to control processor ordering with other processors (like [guardrails](https://mastra.ai/docs/agents/guardrails)).

```typescript
import { ObservationalMemory } from '@mastra/memory/processors'
import { Agent } from '@mastra/core/agent'
import { LibSQLStore } from '@mastra/libsql'

const storage = new LibSQLStore({
  id: 'my-storage',
  url: 'file:./memory.db',
})

const om = new ObservationalMemory({
  storage: storage.stores.memory,
  model: 'google/gemini-2.5-flash',
  scope: 'resource',
  observation: {
    messageTokens: 20_000,
  },
  reflection: {
    observationTokens: 60_000,
  },
})

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  inputProcessors: [om],
  outputProcessors: [om],
})
```

### Standalone config

The standalone `ObservationalMemory` class accepts all the same options as the `observationalMemory` config object above, plus the following:

**storage** (`MemoryStorage`): Storage adapter for persisting observations. Must be a MemoryStorage instance (from \`MastraStorage.stores.memory\`).

**onDebugEvent** (`(event: ObservationDebugEvent) => void`): Debug callback for observation events. Called whenever observation-related events occur. Useful for debugging and understanding the observation flow.

**obscureThreadIds** (`boolean`): When enabled, thread IDs are hashed before being included in observation context. This prevents the LLM from recognizing patterns in thread identifiers. Automatically enabled when using resource scope through the Memory class. (Default: `false`)

## Recall tool

When `retrieval` is set (any truthy value), a `recall` tool is registered so the agent can page through raw messages behind observation group ranges. By default (scope `'resource'`), the tool supports listing threads (`mode: "threads"`), browsing other threads (`threadId`), and cross-thread search. With `retrieval: { vector: true }`, semantic search is available (`mode: "search"`). Set `scope: 'thread'` to restrict the tool to the current thread only. The tool is automatically added to the agent's tool list — no manual registration is needed.

### Parameters

**mode** (`'messages' | 'threads' | 'search'`): What to retrieve. \`"messages"\` (default) pages through message history. \`"threads"\` lists all threads for the current user. \`"search"\` finds messages by semantic similarity across all threads (requires vector store and embedder). (Default: `'messages'`)

**query** (`string`): Search query for \`mode: "search"\`. Finds messages semantically similar to this text across all threads for the current user.

**cursor** (`string`): A message ID to anchor the recall query. Required for \`mode: "messages"\` when browsing the current thread. Extract the start or end ID from an observation group range (e.g. from \`\_range: \\\`startId:endId\\\`\_\`, use either \`startId\` or \`endId\`). If a range string is passed directly, the tool returns a hint explaining how to extract the correct ID. Can be omitted when \`threadId\` is provided to start reading from the beginning of that thread.

**threadId** (`string`): Browse a different thread by its ID. Use \`mode: "threads"\` first to discover thread IDs. When provided without a \`cursor\`, reading starts from the beginning of the thread.

**page** (`number`): Pagination offset. For messages: positive values page forward from cursor, negative values page backward. For threads: page number (0-indexed). \`0\` is treated as \`1\` for messages. (Default: `1`)

**limit** (`number`): Maximum number of items to return per page. (Default: `20`)

**detail** (`'low' | 'high'`): Controls how much content is shown per message part. \`'low'\` shows truncated text and tool names with positional indices (\`\[p0]\`, \`\[p1]\`). \`'high'\` shows full content including tool arguments and results, clamped to one part per call with continuation hints. (Default: `'low'`)

**partIndex** (`number`): Fetch a single message part at full detail by its positional index. Use this when a low-detail recall shows an interesting part at \`\[p1]\` — call again with \`partIndex: 1\` to see the full content without loading every part.

**before** (`string`): For \`mode: "threads"\` only. Filter to threads created before this date. Accepts ISO 8601 format (e.g. \`"2026-03-15"\`, \`"2026-03-10T00:00:00Z"\`).

**after** (`string`): For \`mode: "threads"\` only. Filter to threads created after this date. Accepts ISO 8601 format (e.g. \`"2026-03-01"\`, \`"2026-03-10T00:00:00Z"\`).

### Returns (messages mode)

**messages** (`string`): Formatted message content. Format depends on the \`detail\` level.

**count** (`number`): Number of messages in this page.

**cursor** (`string`): The cursor message ID used for this query.

**page** (`number`): The page number returned.

**limit** (`number`): The limit used for this query.

**hasNextPage** (`boolean`): Whether more messages exist after this page.

**hasPrevPage** (`boolean`): Whether more messages exist before this page.

**truncated** (`boolean`): Present and \`true\` when the output was capped by the token budget. The agent can paginate or use \`partIndex\` to access remaining content.

**tokenOffset** (`number`): Approximate number of tokens that were trimmed when \`truncated\` is true.

### Returns (threads mode)

**threads** (`string`): Formatted thread listing. Each thread shows its title, ID, and dates. The current thread is marked with \`← current\`.

**count** (`number`): Number of threads returned.

**page** (`number`): The page number returned.

**hasMore** (`boolean`): Whether more threads exist on the next page.

### Returns (search mode)

**results** (`string`): Formatted search results grouped by thread. Each result shows the thread title, thread ID, relevance score, message preview, and a cursor ID for browsing into that thread.

**count** (`number`): Number of matching messages found.

### ModelByInputTokens

`ModelByInputTokens` selects a model based on the input token count. It chooses the model for the smallest threshold that covers the actual input size.

#### Constructor

```typescript
new ModelByInputTokens(config)
```

Where `config` is an object with `upTo` keys that map token thresholds (numbers) to model targets.

#### Example

```typescript
import { ModelByInputTokens } from '@mastra/memory'

const selector = new ModelByInputTokens({
  upTo: {
    10_000: 'google/gemini-2.5-flash', // Fast for small inputs
    40_000: 'openai/gpt-5.4-mini', // Stronger for medium inputs
    1_000_000: 'openai/gpt-5.4', // Most capable for large inputs
  },
})
```

#### Behavior

- Thresholds are sorted internally, so the order in the config object doesn't matter.
- `inputTokens ≤ smallest threshold` → uses that threshold's model
- `inputTokens > largest threshold` → `resolve()` throws an error. If this happens during an OM Observer or Reflector run, OM aborts via TripWire, so callers receive an empty `text` result or streamed `tripwire` instead of a normal assistant response.
- OM computes the input token count for the Observer or Reflector call and resolves the matching model tier directly

#### Methods

**resolve** (`(inputTokens: number) => MastraModelConfig`): Returns the model for the given input token count. Throws if inputTokens exceeds the largest configured threshold. When this happens during an OM run, callers receive a TripWire/empty-text outcome instead of a normal assistant response.

**getThresholds** (`() => number[]`): Returns the configured thresholds in ascending order. Useful for introspection.

### Related

- [Observational Memory](https://mastra.ai/docs/memory/observational-memory)
- [Memory Overview](https://mastra.ai/docs/memory/overview)
- [Memory Class](https://mastra.ai/reference/memory/memory-class)
- [Memory Processors](https://mastra.ai/docs/memory/memory-processors)
- [Processors](https://mastra.ai/docs/agents/processors)