Speech To Text WebSocket
WebSocket channel for real-time speech to text streaming.
Note: This API Reference page is provided for informational purposes only. The Try It playground may not provide the best experience for streaming audio. For optimal streaming performance, please use the SDK or implement your own WebSocket client.
HandshakeTry it
WSS
wss://api.sarvam.ai/speech-to-text/ws
Headers
Api-Subscription-Key
API subscription key for authentication
Query parameters
language-code
Language code for speech recognition
model
Speech to text model to use
Allowed values:
input_audio_codec
Audio codec/format of the input file. Our API automatically detects all codec formats, but for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. PCM files supports sample rate 16000 and 8000.
sample_rate
Audio sample rate for the WebSocket connection. When specified as a connection parameter, only 16kHz and 8kHz are supported. 8kHz is only available via this connection parameter. If not specified, defaults to 16kHz.
Allowed values:
high_vad_sensitivity
Enable high VAD (Voice Activity Detection) sensitivity
Allowed values:
vad_signals
Enable VAD signals in response
Allowed values:
flush_signal
Signal to flush the audio buffer and finalize transcription
Allowed values:
Send
Audio Transcription Message
Send audio data for real-time speech to text streaming
OR
Speech Flush Signal
Send signal to flush audio buffer and finalize transcription
Receive
Transcription
Receive real-time transcription results from the WebSocket