# TokenLimiterProcessor

The `TokenLimiterProcessor` limits the number of tokens in messages. It can be used as an input, per-step input, and output processor:

- **Input processor** (`processInput`): Filters historical messages to fit within the context window before the agentic loop starts, prioritizing recent messages
- **Per-step input processor** (`processInputStep`): Prunes messages at each step of a multi-step agent workflow, preventing unbounded token growth when tools trigger additional LLM calls
- **Output processor**: Limits generated response tokens via streaming or non-streaming with configurable strategies for handling exceeded limits

## Usage example

```typescript
import { TokenLimiterProcessor } from '@mastra/core/processors'

const processor = new TokenLimiterProcessor({
  limit: 1000,
  strategy: 'truncate',
  countMode: 'cumulative',
})
```

## Constructor parameters

**options** (`number | Options`): Either a simple number for token limit, or configuration options object

**options.limit** (`number`): Maximum number of tokens to allow in the response

**options.encoding** (`TiktokenBPE`): Optional encoding to use. Defaults to o200k\_base which is used by gpt-5.1

**options.strategy** (`'truncate' | 'abort'`): Strategy when token limit is reached: 'truncate' stops emitting chunks, 'abort' calls abort() to stop the stream

**options.countMode** (`'cumulative' | 'part'`): Whether to count tokens from the beginning of the stream or just the current part: 'cumulative' counts all tokens from start, 'part' only counts tokens in current part

**options.trimMode** (`'best-fit' | 'contiguous'`): Controls how messages are trimmed when exceeding the token limit: 'best-fit' keeps as many messages as possible (may create gaps), 'contiguous' stops at the first message that does not fit, ensuring a continuous suffix of conversation history

## Returns

**id** (`string`): Processor identifier set to 'token-limiter'

**name** (`string`): Optional processor display name

**processInput** (`(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>`): Filters input messages to fit within token limit before the agentic loop starts, prioritizing recent messages while preserving system messages

**processInputStep** (`(args: ProcessInputStepArgs) => Promise<void>`): Prunes messages at each step of the agentic loop (including tool call continuations) to keep the conversation within the token limit. Mutates the messageList directly by removing oldest messages first while preserving system messages.

**processOutputStream** (`(args: { part: ChunkType; streamParts: ChunkType[]; state: Record<string, any>; abort: (reason?: string) => never }) => Promise<ChunkType | null>`): Processes streaming output parts to limit token count during streaming

**processOutputResult** (`(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>`): Processes final output results to limit token count in non-streaming scenarios

**getMaxTokens** (`() => number`): Get the maximum token limit

## Error behavior

When used as an input processor (both `processInput` and `processInputStep`), `TokenLimiterProcessor` throws a `TripWire` error in the following cases:

- **Empty messages**: If there are no messages to process, a TripWire is thrown because you can't send an LLM request with no messages.
- **System messages exceed limit**: If system messages alone exceed the token limit, a TripWire is thrown because you can't send an LLM request with only system messages and no user/assistant messages.

```typescript
import { TripWire } from '@mastra/core/agent'

try {
  await agent.generate('Hello')
} catch (error) {
  if (error instanceof TripWire) {
    console.log('Token limit error:', error.message)
  }
}
```

## Extended usage example

### As an input processor (limit context window)

Use `inputProcessors` to limit historical messages sent to the model, which helps stay within context window limits:

```typescript
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
  name: 'context-limited-agent',
  instructions: 'You are a helpful assistant',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    /* ... */
  }),
  inputProcessors: [
    new TokenLimiterProcessor({ limit: 4000 }), // Limits historical messages to ~4000 tokens
  ],
})
```

### As a per-step input processor (limit multi-step token growth)

When an agent uses tools across multiple steps (e.g. `maxSteps > 1`), each step accumulates conversation history from all previous steps. Use `inputProcessors` to also limit tokens at each step of the agentic loop — the `TokenLimiterProcessor` automatically applies to both the initial input and every subsequent step:

```typescript
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
  name: 'multi-step-agent',
  instructions: 'You are a helpful research assistant with access to tools',
  model: 'openai/gpt-5.4',
  inputProcessors: [
    new TokenLimiterProcessor({ limit: 8000 }), // Applied at every step
  ],
})

// Each tool call step will be limited to ~8000 input tokens
const result = await agent.generate('Research this topic using your tools', {
  maxSteps: 10,
})
```

### As an output processor (limit response length)

Use `outputProcessors` to limit the length of generated responses:

```typescript
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
  name: 'response-limited-agent',
  instructions: 'You are a helpful assistant',
  model: 'openai/gpt-5.4',
  outputProcessors: [
    new TokenLimiterProcessor({
      limit: 1000,
      strategy: 'truncate',
      countMode: 'cumulative',
    }),
  ],
})
```

## Related

- [Guardrails](https://mastra.ai/docs/agents/guardrails)