Speech-to-Text Rest API
Synchronous Processing
Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.
Saaras v3: State-of-the-Art Speech Recognition (Recommended)
Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.
Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more about Saaras v3.
Output Modes
Code Examples for Saaras v3
Check out our detailed API Reference to explore all available options.
Legacy Models (Deprecated Soon)
The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.
Saarika v2.5: Speech to Text Transcription
Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings.
Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.
Saaras v2.5: Speech to Text Translation
Saaras v2.5 is available in the Speech-to-Text Translate endpoint for translating speech directly to English.
Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.
API Response Format
Speech to Text Transcription Response
Speech to Text Translation Response
Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN
Error Responses
All errors return a JSON object with an error field containing details about what went wrong.
Error Response Structure
Error Codes Reference
Example Error Response
Error Handling Code Example
Next Steps
Need help? Contact us on discord for guidance.