--- summary: "Azure AI Speech text-to-speech for OpenClaw replies" read_when: - You want Azure Speech synthesis for outbound replies - You need native Ogg Opus voice-note output from Azure Speech title: "Azure Speech" --- Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call. OpenClaw uses the Azure Speech REST API directly with SSML and sends the provider-owned output format through `X-Microsoft-OutputFormat`. | Detail | Value | | ----------------------- | -------------------------------------------------------------------------------------------------------------- | | Website | [Azure AI Speech](https://azure.microsoft.com/products/ai-services/ai-speech) | | Docs | [Speech REST text-to-speech](https://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speech) | | Auth | `AZURE_SPEECH_KEY` plus `AZURE_SPEECH_REGION` | | Default voice | `en-US-JennyNeural` | | Default file output | `audio-24khz-48kbitrate-mono-mp3` | | Default voice-note file | `ogg-24khz-16bit-mono-opus` | ## Getting started In the Azure portal, create a Speech resource. Copy **KEY 1** from Resource Management > Keys and Endpoint, and copy the resource location such as `eastus`. ``` AZURE_SPEECH_KEY= AZURE_SPEECH_REGION=eastus ``` ```json5 { messages: { tts: { auto: "always", provider: "azure-speech", providers: { "azure-speech": { voice: "en-US-JennyNeural", lang: "en-US", }, }, }, }, } ``` Send a reply through any connected channel. OpenClaw synthesizes the audio with Azure Speech and delivers MP3 for standard audio, or Ogg/Opus when the channel expects a voice note. ## Configuration options | Option | Path | Description | | ----------------------- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | | `apiKey` | `messages.tts.providers.azure-speech.apiKey` | Azure Speech resource key. Falls back to `AZURE_SPEECH_KEY`, `AZURE_SPEECH_API_KEY`, or `SPEECH_KEY`. | | `region` | `messages.tts.providers.azure-speech.region` | Azure Speech resource region. Falls back to `AZURE_SPEECH_REGION` or `SPEECH_REGION`. | | `endpoint` | `messages.tts.providers.azure-speech.endpoint` | Optional Azure Speech endpoint/base URL override. | | `baseUrl` | `messages.tts.providers.azure-speech.baseUrl` | Optional Azure Speech base URL override. | | `voice` | `messages.tts.providers.azure-speech.voice` | Azure voice ShortName (default `en-US-JennyNeural`). | | `lang` | `messages.tts.providers.azure-speech.lang` | SSML language code (default `en-US`). | | `outputFormat` | `messages.tts.providers.azure-speech.outputFormat` | Audio-file output format (default `audio-24khz-48kbitrate-mono-mp3`). | | `voiceNoteOutputFormat` | `messages.tts.providers.azure-speech.voiceNoteOutputFormat` | Voice-note output format (default `ogg-24khz-16bit-mono-opus`). | ## Notes Azure Speech uses a Speech resource key, not an Azure OpenAI key. The key is sent as `Ocp-Apim-Subscription-Key`; OpenClaw derives `https://.tts.speech.microsoft.com` from `region` unless you provide `endpoint` or `baseUrl`. Use the Azure Speech voice `ShortName` value, for example `en-US-JennyNeural`. The bundled provider can list voices through the same Speech resource and filters voices marked deprecated or retired. Azure accepts output formats such as `audio-24khz-48kbitrate-mono-mp3`, `ogg-24khz-16bit-mono-opus`, and `riff-24khz-16bit-mono-pcm`. OpenClaw requests Ogg/Opus for `voice-note` targets so channels can send native voice bubbles without an extra MP3 conversion. `azure` is accepted as a provider alias for existing PRs and user config, but new config should use `azure-speech` to avoid confusion with Azure OpenAI model providers. ## Related TTS overview, providers, and `messages.tts` config. Full config reference including `messages.tts` settings. All bundled OpenClaw providers. Common issues and debugging steps.