Speech To Text Translate
Single model that can automatically detect language and directly translate output text to English. This is the model to use if you are building voice-based LLM applications.
The ideal input audio size is below 5 mins.
Headers
Your unique subscription key for authenticating requests to the Sarvam AI Speech-to-Text API. Here are the steps to get your api key
Body
The audio file to transcribe. Supported formats are wave (.wav) and MPEG-3 (.mp3). Works best at 16kHz. Multiple channels will be merged.
Model to be used for converting speech to text in target language
saaras:v1
, saaras:v2
Conversation context can be passed as a prompt to boost model accuracy. However, the current system is at an experimentation stage and doesn’t match the prompt performance of large language models.
Include diarization in the output.
Response
This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, gu-IN
, en-IN
Transcript of the provided speech
Diarized transcript of the provided speech