Speech To Text
Speech to Text API
This API transcribes speech to text in multiple Indian languages and English. Supports transcription for interactive applications.
Available Options:
- REST API (Current Endpoint): For quick responses under 30 seconds with immediate results
- Batch API: For longer audio files, Follow This Documentation
- Supports diarization (speaker identification)
Note:
- Pricing differs for REST and Batch APIs
- Diarization is only available in Batch API with separate pricing
- Please refer to here for detailed pricing information
Headers
Request
The audio file to transcribe. Supported formats are WAV
(.wav) and MP3
(.mp3).
The API works best with audio files sampled at 16kHz. If the audio contains multiple channels, they will be merged into a single channel.
Specifies the model to use for speech-to-text conversion.
Note:- Default model is saarika:v2.5
Specifies the language of the input audio.
For the saarika:v2.5
model, it is optional.
unknown
: Use this when the language is not known; the API will detect it automatically.
Input Audio codec/format of the input file. PCM files are supported only at 16kHz sample rate.
Response
Contains timestamps for the transcribed text. This field is included only if with_timestamps is set to true
This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null