Endpoints
Speech to Text
Transcribes audio input to text using sarvam speech-to-text models. Suitable for audio files which are less than 30 seconds in length. For longer audio files, use the speech-to-text-translate endpoint.
POST
/
speech-to-text
Headers
api-subscription-key
string
default: Your subscription key
Body
multipart/form-data
file
file
requiredThe audio file to transcribe. Supported formats are wave (.wav) and MPEG-3 (.mp3). Works best at 16kHz. Multiple channels will be merged.
language_code
enum<string>
requiredLanguage code
Available options:
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, gu-IN
model
enum<string>
default: saarika:v1Model to be used for speech to text
Available options:
saarika:v1
Response
200 - application/json
transcript
string
requiredTranscript of the provided speech.