---
title: Voice Quickstart
description: Set up text-to-speech and speech-to-text in a Smithers workflow.
---

## Prerequisites

Smithers ships with voice support built in. You need:

- An OpenAI API key (or another AI SDK-supported provider)
- `smithers-orchestrator` version 0.12.8 or later

## Install

No extra packages. The `ai` and `@ai-sdk/openai` dependencies are already included.

## Create a Voice Provider

The simplest provider wraps AI SDK models for batch TTS and STT:

```ts
import { createAiSdkVoice } from "smithers-orchestrator/voice";
import { openai } from "@ai-sdk/openai";

const voice = createAiSdkVoice({
  speechModel: openai.speech("tts-1"),
  transcriptionModel: openai.transcription("whisper-1"),
});
```

## Add Voice to a Workflow

Wrap tasks with the `<Voice>` component:

```tsx
import { Workflow, Task, Voice, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { outputs, workflow } = createSmithers({
  transcript: z.object({ text: z.string() }),
  summary: z.object({ content: z.string() }),
});

export default (
  <Workflow>
    <Voice provider={voice} speaker="alloy">
      <Task id="transcribe" output={outputs.transcript} agent={myAgent}>
        Transcribe the audio input and return the text.
      </Task>
      <Task id="summarize" output={outputs.summary} agent={myAgent} dependsOn={["transcribe"]}>
        Summarize the transcript.
      </Task>
    </Voice>
  </Workflow>
);
```

## Use Composite Voice

Mix different providers for input and output:

```ts
import { createCompositeVoice, createAiSdkVoice } from "smithers-orchestrator/voice";
import { openai } from "@ai-sdk/openai";

const stt = createAiSdkVoice({
  transcriptionModel: openai.transcription("whisper-1"),
});

const tts = createAiSdkVoice({
  speechModel: openai.speech("tts-1"),
});

const voice = createCompositeVoice({
  input: stt,
  output: tts,
});
```

## Use Realtime Voice

For low-latency bidirectional audio, use the OpenAI Realtime provider:

```ts
import { createOpenAIRealtimeVoice } from "smithers-orchestrator/voice";

const realtime = createOpenAIRealtimeVoice({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini-realtime-preview-2024-12-17",
  speaker: "alloy",
});

// Connect before use
await realtime.connect();

// Listen for events
realtime.on("speaking", (data) => {
  // handle audio output
});

realtime.on("writing", (data) => {
  // handle text transcription
});

// Send audio
await realtime.send(audioStream);

// Disconnect when done
realtime.close();
```

## Voice with Effect.ts

Use the Effect service layer for typed voice operations:

```ts
import { VoiceService, speak, listen } from "smithers-orchestrator/voice";
import { Effect } from "effect";

const program = Effect.gen(function* () {
  const text = yield* listen(audioStream);
  const audio = yield* speak(`The transcript says: ${text}`);
  return { text, audio };
}).pipe(Effect.provideService(VoiceService, voice));
```

## Supported Providers

Any provider supported by the Vercel AI SDK works with `createAiSdkVoice`:

| Provider | TTS | STT |
| --- | --- | --- |
| OpenAI | `openai.speech("tts-1")` | `openai.transcription("whisper-1")` |
| ElevenLabs | `elevenlabs.speech(...)` | `elevenlabs.transcription(...)` |
| Deepgram | -- | `deepgram.transcription("nova-3")` |
| Google | `google.speech(...)` | `google.transcription(...)` |

For realtime speech-to-speech, use `createOpenAIRealtimeVoice` directly.
