# Voice

Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.

## Basic usage

The simplest way to add voice to an agent is to use a single provider for both speaking and listening:

```typescript
import { createReadStream } from 'fs'
import path from 'path'
import { Agent } from '@mastra/core/agent'
import { OpenAIVoice } from '@mastra/voice-openai'

// Initialize the voice provider with default settings
const voice = new OpenAIVoice()

// Create an agent with voice capabilities
export const agent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: 'openai/gpt-5.4',
  voice,
})

// The agent can now use voice for interaction
const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", {
  filetype: 'm4a',
})

playAudio(audioStream!)

try {
  const transcription = await agent.voice.listen(audioStream)
  console.log(transcription)
} catch (error) {
  console.error('Error transcribing audio:', error)
}
```

## Working with audio streams

The `speak()` and `listen()` methods work with Node.js streams. Here's how to save and load audio files:

### Saving Speech Output

The `speak` method returns a stream that you can pipe to a file or speaker.

```typescript
import { createWriteStream } from 'fs'
import path from 'path'

// Generate speech and save to file
const audio = await agent.voice.speak('Hello, World!')
const filePath = path.join(process.cwd(), 'agent.mp3')
const writer = createWriteStream(filePath)

audio.pipe(writer)

await new Promise<void>((resolve, reject) => {
  writer.on('finish', () => resolve())
  writer.on('error', reject)
})
```

### Transcribing Audio Input

The `listen` method expects a stream of audio data from a microphone or file.

```typescript
import { createReadStream } from 'fs'
import path from 'path'

// Read audio file and transcribe
const audioFilePath = path.join(process.cwd(), '/agent.m4a')
const audioStream = createReadStream(audioFilePath)

try {
  console.log('Transcribing audio file...')
  const transcription = await agent.voice.listen(audioStream, {
    filetype: 'm4a',
  })
  console.log('Transcription:', transcription)
} catch (error) {
  console.error('Error transcribing audio:', error)
}
```

## Speech-to-speech voice interactions

For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities:

```typescript
import { Agent } from '@mastra/core/agent'
import { getMicrophoneStream } from '@mastra/node-audio'
import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
import { search, calculate } from '../tools'

// Initialize the realtime voice provider
const voice = new OpenAIRealtimeVoice({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-5.1-realtime',
  speaker: 'alloy',
})

// Create an agent with speech-to-speech voice capabilities
export const agent = new Agent({
  id: 'speech-to-speech-agent',
  name: 'Speech-to-Speech Agent',
  instructions: `You are a helpful assistant with speech-to-speech capabilities.`,
  model: 'openai/gpt-5.4',
  tools: {
    // Tools configured on Agent are passed to voice provider
    search,
    calculate,
  },
  voice,
})

// Establish a WebSocket connection
await agent.voice.connect()

// Start a conversation
agent.voice.speak("Hello, I'm your AI assistant!")

// Stream audio from a microphone
const microphoneStream = getMicrophoneStream()
agent.voice.send(microphoneStream)

// When done with the conversation
agent.voice.close()
```

### Event System

The realtime voice provider emits several events you can listen for:

```typescript
// Listen for speech audio data sent from voice provider
agent.voice.on('speaking', ({ audio }) => {
  // audio contains ReadableStream or Int16Array audio data
})

// Listen for transcribed text sent from both voice provider and user
agent.voice.on('writing', ({ text, role }) => {
  console.log(`${role} said: ${text}`)
})

// Listen for errors
agent.voice.on('error', error => {
  console.error('Voice error:', error)
})
```

## Examples

### End-to-end voice interaction

This example demonstrates a voice interaction between two agents. The hybrid voice agent, which uses multiple providers, speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the `audio` directory.

The following files are created:

- **hybrid-question.mp3** – Hybrid agent's spoken question.
- **unified-response.mp3** – Unified agent's spoken response.

```typescript
import 'dotenv/config'

import path from 'path'
import { createReadStream } from 'fs'
import { Agent } from '@mastra/core/agent'
import { CompositeVoice } from '@mastra/core/voice'
import { OpenAIVoice } from '@mastra/voice-openai'
import { Mastra } from '@mastra/core'

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const saveAudioToFile = async (
  audio: NodeJS.ReadableStream,
  filename: string,
): Promise<void> => {
  const audioDir = path.join(process.cwd(), 'audio')
  const filePath = path.join(audioDir, filename)

  await fs.promises.mkdir(audioDir, { recursive: true })

  const writer = createWriteStream(filePath)
  audio.pipe(writer)
  return new Promise((resolve, reject) => {
    writer.on('finish', resolve)
    writer.on('error', reject)
  })
}

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const convertToText = async (input: string | NodeJS.ReadableStream): Promise<string> => {
  if (typeof input === 'string') {
    return input
  }

  const chunks: Buffer[] = []
  return new Promise((resolve, reject) => {
    inputData.on('data', chunk => chunks.push(Buffer.from(chunk)))
    inputData.on('error', reject)
    inputData.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')))
  })
}

export const hybridVoiceAgent = new Agent({
  id: 'hybrid-voice-agent',
  name: 'Hybrid Voice Agent',
  model: 'openai/gpt-5.4',
  instructions: 'You can speak and listen using different providers.',
  voice: new CompositeVoice({
    input: new OpenAIVoice(),
    output: new OpenAIVoice(),
  }),
})

export const unifiedVoiceAgent = new Agent({
  id: 'unified-voice-agent',
  name: 'Unified Voice Agent',
  instructions: 'You are an agent with both STT and TTS capabilities.',
  model: 'openai/gpt-5.4',
  voice: new OpenAIVoice(),
})

export const mastra = new Mastra({
  agents: { hybridVoiceAgent, unifiedVoiceAgent },
})

const hybridVoiceAgent = mastra.getAgent('hybridVoiceAgent')
const unifiedVoiceAgent = mastra.getAgent('unifiedVoiceAgent')

const question = 'What is the meaning of life in one sentence?'

const hybridSpoken = await hybridVoiceAgent.voice.speak(question)

await saveAudioToFile(hybridSpoken!, 'hybrid-question.mp3')

const audioStream = createReadStream(path.join(process.cwd(), 'audio', 'hybrid-question.mp3'))
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream)

const inputText = await convertToText(unifiedHeard!)

const unifiedResponse = await unifiedVoiceAgent.generate(inputText)
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text)

await saveAudioToFile(unifiedSpoken!, 'unified-response.mp3')
```

### Using Multiple Providers

For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:

```typescript
import { Agent } from '@mastra/core/agent'
import { CompositeVoice } from '@mastra/core/voice'
import { OpenAIVoice } from '@mastra/voice-openai'
import { PlayAIVoice } from '@mastra/voice-playai'

export const agent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: 'openai/gpt-5.4',

  // Create a composite voice using OpenAI for listening and PlayAI for speaking
  voice: new CompositeVoice({
    input: new OpenAIVoice(),
    output: new PlayAIVoice(),
  }),
})
```

### Using AI SDK

Mastra supports using AI SDK's transcription and speech models directly in `CompositeVoice`, giving you access to a wide range of providers through the AI SDK ecosystem:

```typescript
import { Agent } from '@mastra/core/agent'
import { CompositeVoice } from '@mastra/core/voice'
import { openai } from '@ai-sdk/openai'
import { elevenlabs } from '@ai-sdk/elevenlabs'
import { groq } from '@ai-sdk/groq'

export const agent = new Agent({
  id: 'aisdk-voice-agent',
  name: 'AI SDK Voice Agent',
  instructions: `You are a helpful assistant with voice capabilities.`,
  model: 'openai/gpt-5.4',

  // Pass AI SDK models directly to CompositeVoice
  voice: new CompositeVoice({
    input: openai.transcription('whisper-1'), // AI SDK transcription model
    output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech model
  }),
})

// Use voice capabilities as usual
const audioStream = await agent.voice.speak('Hello!')
const transcribedText = await agent.voice.listen(audioStream)
```

#### Mix and Match Providers

You can mix AI SDK models with Mastra voice providers:

```typescript
import { CompositeVoice } from '@mastra/core/voice'
import { PlayAIVoice } from '@mastra/voice-playai'
import { openai } from '@ai-sdk/openai'

// Use AI SDK for transcription and Mastra provider for speech
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'), // AI SDK
  output: new PlayAIVoice(), // Mastra provider
})
```

For the complete list of supported AI SDK providers and their capabilities:

- [Transcription](https://ai-sdk.dev/docs/providers/openai/transcription)
- [Speech](https://ai-sdk.dev/docs/providers/elevenlabs/speech)

## Supported voice providers

Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities:

| Provider        | Package                         | Features                  | Reference                                                          |
| --------------- | ------------------------------- | ------------------------- | ------------------------------------------------------------------ |
| OpenAI          | `@mastra/voice-openai`          | TTS, STT                  | [Documentation](https://mastra.ai/reference/voice/openai)          |
| OpenAI Realtime | `@mastra/voice-openai-realtime` | Realtime speech-to-speech | [Documentation](https://mastra.ai/reference/voice/openai-realtime) |
| ElevenLabs      | `@mastra/voice-elevenlabs`      | High-quality TTS          | [Documentation](https://mastra.ai/reference/voice/elevenlabs)      |
| PlayAI          | `@mastra/voice-playai`          | TTS                       | [Documentation](https://mastra.ai/reference/voice/playai)          |
| Google          | `@mastra/voice-google`          | TTS, STT                  | [Documentation](https://mastra.ai/reference/voice/google)          |
| Deepgram        | `@mastra/voice-deepgram`        | STT                       | [Documentation](https://mastra.ai/reference/voice/deepgram)        |
| Murf            | `@mastra/voice-murf`            | TTS                       | [Documentation](https://mastra.ai/reference/voice/murf)            |
| Speechify       | `@mastra/voice-speechify`       | TTS                       | [Documentation](https://mastra.ai/reference/voice/speechify)       |
| Sarvam          | `@mastra/voice-sarvam`          | TTS, STT                  | [Documentation](https://mastra.ai/reference/voice/sarvam)          |
| Azure           | `@mastra/voice-azure`           | TTS, STT                  | [Documentation](https://mastra.ai/reference/voice/mastra-voice)    |
| Cloudflare      | `@mastra/voice-cloudflare`      | TTS                       | [Documentation](https://mastra.ai/reference/voice/mastra-voice)    |

## Next steps

- [Voice API Reference](https://mastra.ai/reference/voice/mastra-voice): Detailed API documentation for voice capabilities
- [Text to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/text-to-speech): Interactive story generator and other TTS implementations
- [Speech to Text Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-text): Voice memo app and other STT implementations
- [Speech to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-speech): Real-time voice conversation with call analysis