# Voice in Mastra

Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.

## Adding voice to agents

To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.

```typescript
import { Agent } from '@mastra/core/agent'
import { OpenAIVoice } from '@mastra/voice-openai'

// Initialize OpenAI voice for TTS

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new OpenAIVoice(),
})
```

You can then use the following voice capabilities:

### Text to Speech (TTS)

Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.

For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech).

**OpenAI**:

```typescript
import { Agent } from '@mastra/core/agent'
import { OpenAIVoice } from '@mastra/voice-openai'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new OpenAIVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'default', // Optional: specify a speaker
  responseFormat: 'wav', // Optional: specify a response format
})

playAudio(audioStream)
```

Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.

**Azure**:

```typescript
import { Agent } from '@mastra/core/agent'
import { AzureVoice } from '@mastra/voice-azure'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new AzureVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'en-US-JennyNeural', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.

**ElevenLabs**:

```typescript
import { Agent } from '@mastra/core/agent'
import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new ElevenLabsVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'default', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.

**PlayAI**:

```typescript
import { Agent } from '@mastra/core/agent'
import { PlayAIVoice } from '@mastra/voice-playai'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new PlayAIVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'default', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.

**Google**:

```typescript
import { Agent } from '@mastra/core/agent'
import { GoogleVoice } from '@mastra/voice-google'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new GoogleVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'en-US-Studio-O', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.

**Cloudflare**:

```typescript
import { Agent } from '@mastra/core/agent'
import { CloudflareVoice } from '@mastra/voice-cloudflare'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new CloudflareVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'default', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.

**Deepgram**:

```typescript
import { Agent } from '@mastra/core/agent'
import { DeepgramVoice } from '@mastra/voice-deepgram'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new DeepgramVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'aura-english-us', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.

**Speechify**:

```typescript
import { Agent } from '@mastra/core/agent'
import { SpeechifyVoice } from '@mastra/voice-speechify'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new SpeechifyVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'matthew', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.

**Sarvam**:

```typescript
import { Agent } from '@mastra/core/agent'
import { SarvamVoice } from '@mastra/voice-sarvam'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new SarvamVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'shubh', // Optional: specify a bulbul:v3 speaker
})

playAudio(audioStream)
```

Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.

**Murf**:

```typescript
import { Agent } from '@mastra/core/agent'
import { MurfVoice } from '@mastra/voice-murf'
import { playAudio } from '@mastra/node-audio'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new MurfVoice(),
})

const { text } = await voiceAgent.generate('What color is the sky?')

// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: 'default', // Optional: specify a speaker
})

playAudio(audioStream)
```

Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.

### Speech to Text (STT)

Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text).

You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3).

[](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3)

**OpenAI**:

```typescript
import { Agent } from '@mastra/core/agent'
import { OpenAIVoice } from '@mastra/voice-openai'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new OpenAIVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.

**Azure**:

```typescript
import { createReadStream } from 'fs'
import { Agent } from '@mastra/core/agent'
import { AzureVoice } from '@mastra/voice-azure'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new AzureVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.

**ElevenLabs**:

```typescript
import { Agent } from '@mastra/core/agent'
import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new ElevenLabsVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.

**Google**:

```typescript
import { Agent } from '@mastra/core/agent'
import { GoogleVoice } from '@mastra/voice-google'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new GoogleVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.

**Cloudflare**:

```typescript
import { Agent } from '@mastra/core/agent'
import { CloudflareVoice } from '@mastra/voice-cloudflare'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new CloudflareVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.

**Deepgram**:

```typescript
import { Agent } from '@mastra/core/agent'
import { DeepgramVoice } from '@mastra/voice-deepgram'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new DeepgramVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.

**Sarvam**:

```typescript
import { Agent } from '@mastra/core/agent'
import { SarvamVoice } from '@mastra/voice-sarvam'
import { createReadStream } from 'fs'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new SarvamVoice(),
})

// Use an audio file from a URL
const audioStream = await createReadStream('./how_can_i_help_you.mp3')

// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream)
console.log(`User said: ${transcript}`)

// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript)
```

Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.

### Speech to Speech (STS)

Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech).

**OpenAI**:

```typescript
import { Agent } from '@mastra/core/agent'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new OpenAIRealtimeVoice(),
})

// Listen for agent audio responses
voiceAgent.voice.on('speaker', ({ audio }) => {
  playAudio(audio)
})

// Initiate the conversation
await voiceAgent.voice.speak('How can I help you today?')

// Send continuous audio from the microphone
const micStream = getMicrophoneStream()
await voiceAgent.voice.send(micStream)
```

Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider.

**Google**:

```typescript
import { Agent } from '@mastra/core/agent'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live'

const voiceAgent = new Agent({
  id: 'voice-agent',
  name: 'Voice Agent',
  instructions: 'You are a voice assistant that can help users with their tasks.',
  model: 'openai/gpt-5.4',
  voice: new GeminiLiveVoice({
    // Live API mode
    apiKey: process.env.GOOGLE_API_KEY,
    model: 'gemini-2.0-flash-exp',
    speaker: 'Puck',
    debug: true,
    // Vertex AI alternative:
    // vertexAI: true,
    // project: 'your-gcp-project',
    // location: 'us-central1',
    // serviceAccountKeyFile: '/path/to/service-account.json',
  }),
})

// Connect before using speak/send
await voiceAgent.voice.connect()

// Listen for agent audio responses
voiceAgent.voice.on('speaker', ({ audio }) => {
  playAudio(audio)
})

// Listen for text responses and transcriptions
voiceAgent.voice.on('writing', ({ text, role }) => {
  console.log(`${role}: ${text}`)
})

// Initiate the conversation
await voiceAgent.voice.speak('How can I help you today?')

// Send continuous audio from the microphone
const micStream = getMicrophoneStream()
await voiceAgent.voice.send(micStream)
```

Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.

## Voice configuration

Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:

**OpenAI**:

```typescript
// OpenAI Voice Configuration
const voice = new OpenAIVoice({
  speechModel: {
    name: 'gpt-3.5-turbo', // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: 'en-US', // Language code
    voiceType: 'neural', // Type of voice model
  },
  listeningModel: {
    name: 'whisper-1', // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: 'en-US', // Language code
    format: 'wav', // Audio format
  },
  speaker: 'alloy', // Example speaker name
})
```

Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.

**Azure**:

```typescript
// Azure Voice Configuration
const voice = new AzureVoice({
  speechModel: {
    name: 'en-US-JennyNeural', // Example model name
    apiKey: process.env.AZURE_SPEECH_KEY,
    region: process.env.AZURE_SPEECH_REGION,
    language: 'en-US', // Language code
    style: 'cheerful', // Voice style
    pitch: '+0Hz', // Pitch adjustment
    rate: '1.0', // Speech rate
  },
  listeningModel: {
    name: 'en-US', // Example model name
    apiKey: process.env.AZURE_SPEECH_KEY,
    region: process.env.AZURE_SPEECH_REGION,
    format: 'simple', // Output format
  },
})
```

Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.

**ElevenLabs**:

```typescript
// ElevenLabs Voice Configuration
const voice = new ElevenLabsVoice({
  speechModel: {
    voiceId: 'your-voice-id', // Example voice ID
    model: 'eleven_multilingual_v2', // Example model name
    apiKey: process.env.ELEVENLABS_API_KEY,
    language: 'en', // Language code
    emotion: 'neutral', // Emotion setting
  },
  // ElevenLabs may not have a separate listening model
})
```

Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.

**PlayAI**:

```typescript
// PlayAI Voice Configuration
const voice = new PlayAIVoice({
  speechModel: {
    name: 'playai-voice', // Example model name
    speaker: 'emma', // Example speaker name
    apiKey: process.env.PLAYAI_API_KEY,
    language: 'en-US', // Language code
    speed: 1.0, // Speech speed
  },
  // PlayAI may not have a separate listening model
})
```

Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.

**Google**:

```typescript
// Google Voice Configuration
const voice = new GoogleVoice({
  speechModel: {
    name: 'en-US-Studio-O', // Example model name
    apiKey: process.env.GOOGLE_API_KEY,
    languageCode: 'en-US', // Language code
    gender: 'FEMALE', // Voice gender
    speakingRate: 1.0, // Speaking rate
  },
  listeningModel: {
    name: 'en-US', // Example model name
    sampleRateHertz: 16000, // Sample rate
  },
})
```

Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.

**Cloudflare**:

```typescript
// Cloudflare Voice Configuration
const voice = new CloudflareVoice({
  speechModel: {
    name: 'cloudflare-voice', // Example model name
    accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
    apiToken: process.env.CLOUDFLARE_API_TOKEN,
    language: 'en-US', // Language code
    format: 'mp3', // Audio format
  },
  // Cloudflare may not have a separate listening model
})
```

Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.

**Deepgram**:

```typescript
// Deepgram Voice Configuration
const voice = new DeepgramVoice({
  speechModel: {
    name: 'nova-2', // Example model name
    speaker: 'aura-english-us', // Example speaker name
    apiKey: process.env.DEEPGRAM_API_KEY,
    language: 'en-US', // Language code
    tone: 'formal', // Tone setting
  },
  listeningModel: {
    name: 'nova-2', // Example model name
    format: 'flac', // Audio format
  },
})
```

Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.

**Speechify**:

```typescript
// Speechify Voice Configuration
const voice = new SpeechifyVoice({
  speechModel: {
    name: 'speechify-voice', // Example model name
    speaker: 'matthew', // Example speaker name
    apiKey: process.env.SPEECHIFY_API_KEY,
    language: 'en-US', // Language code
    speed: 1.0, // Speech speed
  },
  // Speechify may not have a separate listening model
})
```

Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.

**Sarvam**:

```typescript
// Sarvam Voice Configuration
const voice = new SarvamVoice({
  speechModel: {
    model: 'bulbul:v3', // TTS model (bulbul:v2 or bulbul:v3)
    apiKey: process.env.SARVAM_API_KEY,
    language: 'en-IN', // BCP-47 language code
  },
  listeningModel: {
    model: 'saarika:v2.5', // STT model (saarika:v2.5 or saaras:v3)
    apiKey: process.env.SARVAM_API_KEY,
  },
  speaker: 'shubh', // Default bulbul:v3 speaker
})
```

Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.

**Murf**:

```typescript
// Murf Voice Configuration
const voice = new MurfVoice({
  speechModel: {
    name: 'murf-voice', // Example model name
    apiKey: process.env.MURF_API_KEY,
    language: 'en-US', // Language code
    emotion: 'happy', // Emotion setting
  },
  // Murf may not have a separate listening model
})
```

Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.

**OpenAI Realtime**:

```typescript
// OpenAI Realtime Voice Configuration
const voice = new OpenAIRealtimeVoice({
  speechModel: {
    name: 'gpt-3.5-turbo', // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: 'en-US', // Language code
  },
  listeningModel: {
    name: 'whisper-1', // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    format: 'ogg', // Audio format
  },
  speaker: 'alloy', // Example speaker name
})
```

For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime).

**Google Gemini Live**:

```typescript
// Google Gemini Live Voice Configuration
const voice = new GeminiLiveVoice({
  speechModel: {
    name: 'gemini-2.0-flash-exp', // Example model name
    apiKey: process.env.GOOGLE_API_KEY,
  },
  speaker: 'Puck', // Example speaker name
  // Google Gemini Live is a realtime bidirectional API without separate speech and listening models
})
```

Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.

**AI SDK**:

```typescript
// AI SDK Voice Configuration
import { CompositeVoice } from '@mastra/core/voice'
import { openai } from '@ai-sdk/openai'
import { elevenlabs } from '@ai-sdk/elevenlabs'

// Use AI SDK models directly - no need to install separate packages
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'), // AI SDK transcription
  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
})

// Works seamlessly with your agent
const voiceAgent = new Agent({
  id: 'aisdk-voice-agent',
  name: 'AI SDK Voice Agent',
  instructions: 'You are a helpful assistant with voice capabilities.',
  model: 'openai/gpt-5.4',
  voice,
})
```

### Using Multiple Voice Providers

This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).

Start by creating instances of the voice providers with any necessary configuration.

```typescript
import { OpenAIVoice } from '@mastra/voice-openai'
import { PlayAIVoice } from '@mastra/voice-playai'
import { CompositeVoice } from '@mastra/core/voice'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'

// Initialize OpenAI voice for STT
const input = new OpenAIVoice({
  listeningModel: {
    name: 'whisper-1',
    apiKey: process.env.OPENAI_API_KEY,
  },
})

// Initialize PlayAI voice for TTS
const output = new PlayAIVoice({
  speechModel: {
    name: 'playai-voice',
    apiKey: process.env.PLAYAI_API_KEY,
  },
})

// Combine the providers using CompositeVoice
const voice = new CompositeVoice({
  input,
  output,
})

// Implement voice interactions using the combined voice provider
const audioStream = getMicrophoneStream() // Assume this function gets audio input
const transcript = await voice.listen(audioStream)

// Log the transcribed text
console.log('Transcribed text:', transcript)

// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
  speaker: 'default', // Optional: specify a speaker,
  responseFormat: 'wav', // Optional: specify a response format
})

// Play the audio response
playAudio(responseAudio)
```

### Using AI SDK Model Providers

You can also use AI SDK models directly with `CompositeVoice`:

```typescript
import { CompositeVoice } from '@mastra/core/voice'
import { openai } from '@ai-sdk/openai'
import { elevenlabs } from '@ai-sdk/elevenlabs'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'

// Use AI SDK models directly - no provider setup needed
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'), // AI SDK transcription
  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
})

// Works the same way as Mastra providers
const audioStream = getMicrophoneStream()
const transcript = await voice.listen(audioStream)

console.log('Transcribed text:', transcript)

// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
  speaker: 'Rachel', // ElevenLabs voice
})

playAudio(responseAudio)
```

You can also mix AI SDK models with Mastra providers:

```typescript
import { CompositeVoice } from '@mastra/core/voice'
import { PlayAIVoice } from '@mastra/voice-playai'
import { groq } from '@ai-sdk/groq'

const voice = new CompositeVoice({
  input: groq.transcription('whisper-large-v3'), // AI SDK for STT
  output: new PlayAIVoice(), // Mastra provider for TTS
})
```

For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice).

## More resources

- [CompositeVoice](https://mastra.ai/reference/voice/composite-voice)
- [MastraVoice](https://mastra.ai/reference/voice/mastra-voice)
- [OpenAI Voice](https://mastra.ai/reference/voice/openai)
- [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime)
- [Azure Voice](https://mastra.ai/reference/voice/azure)
- [Google Voice](https://mastra.ai/reference/voice/google)
- [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live)
- [Deepgram Voice](https://mastra.ai/reference/voice/deepgram)
- [PlayAI Voice](https://mastra.ai/reference/voice/playai)
- [Voice Examples](https://github.com/mastra-ai/voice-examples)