Batch Speech-to-Text API
Process long audio files (up to 1 hour) using synchronous or asynchronous methods. Ideal for meetings, interviews, call center recordings, and large-scale content processing pipelines.
- Supports files up to 1 hour long
- Advanced transcription and translation
- Speaker diarization and timestamp support
Note: You can upload up to 20
audio files per
job.
Features
- Supports up to 1 hour audio
- Synchronous and asynchronous job-based API
- Upload multiple files per job
- Indian languages and English
- Automatic language detection
- Diarization and timestamp support
- Chunk-level timestamp support
- Useful for subtitle alignment and audio navigation
- Provides start and end times for each segment of text
- Identify multiple speakers
- Output includes speaker labels (SPEAKER_00, etc.)
- Ideal for meetings and interviews
The SarvamAI SDK supports both synchronous and asynchronous programming in Python. This refers to how your code interacts with the SDK, not how the server handles the processing of requests.
Code Examples
Saarika Model: Batch Speech-to-Text Transcription
Synchronous (Python)
Asynchronous (Python)
JavaScript
Saaras Model: Batch Speech-to-Text Translation
Synchronous (Python)
Asynchronous (Python)
JavaScript
Webhook Support
For long-running batch jobs, you can use webhooks to receive notifications when jobs complete instead of polling for status updates.
Setting Up Webhooks
When creating a job, include a callback
parameter with your webhook URL and authentication token:
Python (Async)
Python (Sync)
Webhook Payload
When a job completes, Sarvam AI will send a POST request to your webhook URL with the following payload:
Webhook Server Example
Here’s a simple FastAPI server to handle webhook callbacks:
Your webhook server must respond with a 200 status code within 30 seconds. Make sure your webhook URL is publicly accessible and uses HTTPS in production.
Next Steps
Need help choosing the right API? Contact us on discord for guidance.