Spracherkennung für: .md vermutete Sprache: Unknown {[0] [0] [0]} [Methode: Schwerpunktbildung, einfache Gewichte, sechs Dimensionen]
---
summary: "Use ElevenLabs speech, Scribe STT, and realtime transcription with OpenClaw"
read_when:
- You want ElevenLabs text-to-speech in OpenClaw
- You want ElevenLabs Scribe speech-to-text for audio attachments
- You want ElevenLabs realtime transcription for Voice Call
title: "ElevenLabs"
---
OpenClaw uses ElevenLabs for text-to-speech, batch speech-to-text with Scribe
v2, and Voice Call streaming STT with Scribe v2 Realtime.
| Capability | OpenClaw surface | Default |
| ------------------------ | --------------------------------------------- | ------------------------ |
| Text-to-speech | `messages.tts` / `talk` | `eleven_multilingual_v2` |
| Batch speech-to-text | `tools.media.audio` | `scribe_v2` |
| Streaming speech-to-text | Voice Call `streaming.provider: "elevenlabs"` | `scribe_v2_realtime` |
## Authentication
Set `ELEVENLABS_API_KEY` in the environment. `XI_API_KEY` is also accepted for
compatibility with existing ElevenLabs tooling.
```bash
export ELEVENLABS_API_KEY="..."
```
## Text-to-speech
```json5
{
messages: {
tts: {
providers: {
elevenlabs: {
apiKey: "${ELEVENLABS_API_KEY}",
voiceId: "pMsXgVXv3BLzUgSXRplE",
modelId: "eleven_multilingual_v2",
},
},
},
},
}
```
## Speech-to-text
Use Scribe v2 for inbound audio attachments and short recorded voice segments:
```json5
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "elevenlabs", model: "scribe_v2" }],
},
},
},
}
```
OpenClaw sends multipart audio to ElevenLabs `/v1/speech-to-text` with
`model_id: "scribe_v2"`. Language hints map to `language_code` when present.
## Voice Call streaming STT
The bundled `elevenlabs` plugin registers Scribe v2 Realtime for Voice Call
streaming transcription.
| Setting | Config path | Default |
| --------------- | ------------------------------------------------------------------------- | ------------------------------------------------- |
| API key | `plugins.entries.voice-call.config.streaming.providers.elevenlabs.apiKey` | Falls back to `ELEVENLABS_API_KEY` / `XI_API_KEY` |
| Model | `...elevenlabs.modelId` | `scribe_v2_realtime` |
| Audio format | `...elevenlabs.audioFormat` | `ulaw_8000` |
| Sample rate | `...elevenlabs.sampleRate` | `8000` |
| Commit strategy | `...elevenlabs.commitStrategy` | `vad` |
| Language | `...elevenlabs.languageCode` | (unset) |
```json5
{
plugins: {
entries: {
"voice-call": {
config: {
streaming: {
enabled: true,
provider: "elevenlabs",
providers: {
elevenlabs: {
apiKey: "${ELEVENLABS_API_KEY}",
audioFormat: "ulaw_8000",
commitStrategy: "vad",
languageCode: "en",
},
},
},
},
},
},
},
}
```
<Note>
Voice Call receives Twilio media as 8 kHz G.711 u-law. The ElevenLabs realtime
provider defaults to `ulaw_8000`, so telephony frames can be forwarded without
transcoding.
</Note>
## Related
- [Text-to-speech](/tools/tts)
- [Model selection](/concepts/model-providers)
[Dauer der Verarbeitung: 0.14 Sekunden, vorverarbeitet 2026-04-27]