Batch Speech-to-Text (STT) API Tutorial Using Saarika Model
Overview
This guide demonstrates how to use Sarvam AI’s Batch Speech-to-Text (STT) API for transcribing audio files at scale. You’ll learn both synchronous and asynchronous usage patterns, understand key parameters, and see how to upload files, poll for job completion, and download results.
1. Installation
Install the Sarvam AI Python SDK:
2. API Key Setup
- Get your API key: Sign up at the Sarvam AI Dashboard to obtain your API key.
- Set your API key: Replace
"YOUR_API_KEY_HERE"
in the code below with your actual key.
3. STT Parameters
Job Parameters
File Upload
wait_until_complete
Sets up the job configuration for the STT batch process. Parameters:
language_code
: Language code of input audio (e.g.,“en-IN”
for Indian English)model
: Transcription model (e.g.,“saarika:v2.5”
for latest general-purpose STT)with_timestamps
: IfTrue
, includes chunk-level timestampswith_diarization
: IfTrue
, enables speaker diarizationnum_speakers
: Number of speakers (used with diarization)
4. Synchronous STT Batch Example
5. Asynchronous STT Batch Example
6. Tips & Best Practices
- Audio Quality: Use clear audio for best results.
- Diarization: Set
with_diarization=True
and specifynum_speakers
for multi-speaker audio. - Polling: Adjust
poll_interval
andtimeout
based on expected job duration and file size. - Output: Results are saved in the specified
output_dir
. - API Key Security: Keep your API key confidential.
7. Error Handling
You may encounter these errors while using the API:
-
403 Forbidden (
invalid_api_key_error
)- Cause: Invalid API key.
- Solution: Use a valid API key from the Sarvam AI Dashboard.
-
429 Too Many Requests (
insufficient_quota_error
)- Cause: Exceeded API quota.
- Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
-
500 Internal Server Error (
internal_server_error
)- Cause: Issue on our servers.
- Solution: Try again later. If persistent, contact support.
-
400 Bad Request (
invalid_request_error
)- Cause: Incorrect request formatting.
- Solution: Verify your request structure, and parameters.
-
422 Unprocessable Entity Request (
unprocessable_entity_error
)- Cause: Unable to detect the language of the input text.
- Solution: Explicitly pass the source_language_code parameter with a supported language.
8. Additional Resources
For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:
- Documentation: docs.sarvam.ai
- Community: Join the Discord Community
9. Final Notes
- Keep your API key secure.
- Use clear audio for best results.
- Check audio quality and supported formats.
- Increase
timeout
for large files or slow networks.
Keep Building! 🚀