Endpoints
Speech to Text
Transcribes audio input to text in same language using Sarvam speech-to-text models. Suitable for use cases that require transcripts in native language.
The ideal input audio size is below 5 mins
POST
Headers
Your subscription key
Body
multipart/form-data
The audio file to transcribe. Supported formats are wave (.wav) and MPEG-3 (.mp3). Works best at 16kHz. Multiple channels will be merged.
Language code
Available options:
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, gu-IN
Model to be used for speech to text
Available options:
saarika:v1
Use this to enable word level timestamps