Endpoints
Speech to Text
Transcribes audio input to text in same language using Sarvam speech-to-text models. Suitable for use cases that require transcripts in native language.
The ideal input audio size is below 5 mins
POST
/
speech-to-text
Headers
api-subscription-key
string
default: Your subscription key
Body
multipart/form-data
file
file
requiredThe audio file to transcribe. Supported formats are wave (.wav) and MPEG-3 (.mp3). Works best at 16kHz. Multiple channels will be merged.
language_code
enum<string>
requiredLanguage code
Available options:
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, gu-IN
model
enum<string>
Model to be used for speech to text
Available options:
saarika:v1
with_timestamps
boolean
default: falseUse this to enable word level timestamps
Response
200 - application/json
transcript
string
requiredTranscript of the provided speech.
timestamps
object | null
Timestamps of words in the transcript.