This API transcribes speech to text in multiple Indian languages and English. Supports transcription for interactive applications.
Input Audio codec/format of the input file. PCM files are supported only at 16kHz sample rate.
This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null
Contains timestamps for the transcribed text. This field is included only if with_timestamps is set to true
Float value (0.0 to 1.0) indicating the probability of the detected language being correct. Higher values indicate higher confidence.
When it returns a value:
language_code is not provided in the requestlanguage_code is set to unknownWhen it returns null:
language_code is provided (language detection is skipped)The parameter is always present in the response.
The audio file to transcribe. Supported formats include WAV, MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4/M4A, AMR, WMA, WebM, and PCM formats. The API automatically detects most codec formats, but for PCM files (pcm_s16le, pcm_l16, pcm_raw), you must specify the input_audio_codec parameter. PCM files are supported only at 16kHz sample rate. The API works best with audio files sampled at 16kHz. If the audio contains multiple channels, they will be merged into a single channel.
Specifies the model to use for speech-to-text conversion.
saarika:v2.5 (default): Transcribes audio in the spoken language.
saaras:v3: State-of-the-art model with flexible output formats. Supports multiple modes via the mode parameter: transcribe, translate, verbatim, translit, codemix.
Mode of operation. Only applicable when using saaras:v3 model.
Example audio: ‘मेरा फोन नंबर है 9840950950’
transcribe (default): Standard transcription in the original language with proper formatting and number normalization.
मेरा फोन नंबर है 9840950950translate: Translates speech from any supported Indic language to English.
My phone number is 9840950950verbatim: Exact word-for-word transcription without normalization, preserving filler words and spoken numbers as-is.
मेरा फोन नंबर है नौ आठ चार zero नौ पांच zero नौ पांच zerotranslit: Romanization - Transliterates speech to Latin/Roman script only.
mera phone number hai 9840950950codemix: Code-mixed text with English words in English and Indic words in native script.
मेरा phone number है 9840950950Specifies the language of the input audio in BCP-47 format.
Note: This parameter is optional for saarika:v2.5 model.
Available Options:
unknown: Use when the language is not known; the API will auto-detect.hi-IN: Hindibn-IN: Bengalikn-IN: Kannadaml-IN: Malayalammr-IN: Marathiod-IN: Odiapa-IN: Punjabita-IN: Tamilte-IN: Teluguen-IN: Englishgu-IN: GujaratiAdditional Options (saaras:v3 only):
as-IN: Assameseur-IN: Urdune-IN: Nepalikok-IN: Konkaniks-IN: Kashmirisd-IN: Sindhisa-IN: Sanskritsat-IN: Santalimni-IN: Manipuribrx-IN: Bodomai-IN: Maithilidoi-IN: Dogri