Speech-to-Text Rest API
Synchronous Processing
Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.
Saarika: Speech to Text Transcription Model
Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.
Automatic Language Detection: Set language_code to "unknown" to enable automatic language detection. The API will identify the spoken language and return the transcript along with the detected language code.
The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.
Code Examples for Speech to Text Transcription
Python
JavaScript
cURL
Check out our detailed API Reference to explore Speech To Text Transcription and all available options.
Saaras Model: SOTA Speech to Text Translation Model
Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.
The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.
Code Examples for Speech to Text Translation
Python
JavaScript
cURL
Check out our detailed API Reference to explore Speech To Text Translation and all available options.
API Response Format
Speech to Text Response
Response Schema
Example Response
Unique identifier for the request
The transcribed text from the provided audio file
Example: "नमस्ते, आप कैसे हैं?"
The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code. Returns null if no language is detected.
Example: "hi-IN"
Speech to Text Translate Response
Response Schema
Example Response
Unique identifier for the request
Translated transcript of the provided speech in English
The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code.
Supported Languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN
Next Steps
Need help? Contact us on discord for guidance.