Speech To Text Translate WebSocket

WebSocket channel for real-time speech to text streaming with English translation.

Note: This API Reference page is provided for informational purposes only. The Try It playground may not provide the best experience for streaming audio. For optimal streaming performance, please use the SDK or implement your own WebSocket client.

HandshakeTry it

WSS
wss://api.sarvam.ai/speech-to-text-translate/ws

Headers

Api-Subscription-KeystringRequired
API subscription key for authentication

Query parameters

modelenumOptionalDefaults to saaras:v2.5

Speech to text model to use (defaults to “saaras:v2.5” if not specified)

Allowed values:
input_audio_codecenumOptional

Audio codec/format of the input file. Our API automatically detects all codec formats, but for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. PCM files supports sample rate 16000 and 8000.

sample_rateenumOptional
Audio sample rate for the WebSocket connection. When specified as a connection parameter, only 16kHz and 8kHz are supported. 8kHz is only available via this connection parameter. If not specified, defaults to 16kHz.
Allowed values:
high_vad_sensitivityenumOptional

Enable high VAD (Voice Activity Detection) sensitivity

Allowed values:
vad_signalsenumOptional
Enable VAD signals in response
Allowed values:
flush_signalenumOptional
Signal to flush the audio buffer and finalize transcription and translation
Allowed values:

Send

Audio Translation MessageobjectRequired

Send audio data for real-time speech to text streaming with translation

OR
Translation Config MessageobjectRequired
Send configuration for speech to text streaming with translation
OR
Speech Translate Flush SignalobjectRequired
Send signal to flush audio buffer and finalize transcription and translation

Receive

TranslationobjectRequired

Receive real-time transcription and translation results from the WebSocket