WebSocket

WebSocket channel for real-time TTS synthesis. **Note:** This API Reference page is provided for informational purposes only. The Try It playground may not provide the best experience for streaming audio. For optimal streaming performance, please use the SDK or implement your own WebSocket client. **Model-Specific Notes:** - **bulbul:v2:** Supports pitch, loudness, pace (0.3-3.0). Default sample rate: 22050 Hz. - **bulbul:v3-beta:** Does NOT support pitch/loudness. Pace range: 0.5-2.0. Supports temperature parameter. Default sample rate: 24000 Hz. Preprocessing is always enabled.

Handshake

WSS
wss://api.sarvam.ai/text-to-speech/ws

Headers

Api-Subscription-KeystringRequired
API subscription key for authentication

Query parameters

modelenumOptionalDefaults to bulbul:v2
Text to speech model to use. - **bulbul:v2** (default): Standard TTS model with pitch/loudness support - **bulbul:v3-beta**: Advanced model with temperature control (no pitch/loudness)
Allowed values:
send_completion_eventenumOptionalDefaults to true
Enable completion event notifications when TTS generation finishes. When set to true, an event message will be sent when the final audio chunk has been generated.
Allowed values:

Send

Configure ConnectionobjectRequired

Send initial configuration for text-to-speech streaming

OR
Send TextobjectRequired
Send text chunk for speech synthesis
OR
Flush SignalobjectRequired
Send signal to end text streaming.
OR
Ping SignalobjectRequired
Send ping signal to keep the TTS WebSocket connection alive.

Receive

Audio OutputobjectRequired
Receive audio chunks from the TTS WebSocket.
OR
Event NotificationobjectRequired

Receive completion event notifications from the TTS WebSocket (if send_completion_event is enabled)

OR
Error ResponseobjectRequired
Receive error messages from the TTS WebSocket