Batch Speech-to-Text API

Process long audio files (up to 1 hour) using synchronous or asynchronous methods. Ideal for meetings, interviews, call center recordings, and large-scale content processing pipelines.

Supports files up to 1 hour long
Advanced transcription and translation
Speaker diarization and timestamp support

Note: You can upload up to 20 audio files per job.

Features

Processing

Supports up to 1 hour audio
Synchronous and asynchronous job-based API
Upload multiple files per job

Audio & Language Support

Indian languages and English
Automatic language detection
Diarization and timestamp support

Timestamps

Chunk-level timestamp support
Useful for subtitle alignment and audio navigation
Provides start and end times for each segment of text

Speaker Diarization

Identify multiple speakers
Output includes speaker labels (SPEAKER_00, etc.)
Ideal for meetings and interviews

Code Examples

Saarika Model: Batch Speech-to-Text Transcription

Synchronous (Python)

Asynchronous (Python)

JavaScript

1 from sarvamai import SarvamAI
2 
3 def main():
4     client = SarvamAI(api_subscription_key="YOUR_API_KEY")
5 
6     # Create and configure batch STT job
7     job = client.speech_to_text_job.create_job(
8         language_code="en-IN",
9         model="saarika:v2.5",
10         with_diarization=True,
11         num_speakers=2
12     )
13 
14     # Upload and process files
15     audio_paths = ["path/to/audio1.mp3", "path/to/audio2.mp3"]
16     job.upload_files(file_paths=audio_paths)
17     job.start()
18 
19     # Wait for completion
20     job.wait_until_complete()
21 
22     # Check file-level results
23     file_results = job.get_file_results()
24 
25     print(f"\nSuccessful: {len(file_results['successful'])}")
26     for f in file_results['successful']:
27         print(f"  ✓ {f['file_name']}")
28 
29     print(f"\nFailed: {len(file_results['failed'])}")
30     for f in file_results['failed']:
31         print(f"  ✗ {f['file_name']}: {f['error_message']}")
32 
33     # Handle all files failed
34     if len(file_results['successful']) == 0:
35         print("\nAll files failed.")
36         return
37 
38     # Download outputs for successful files
39     output_dir = "./output"
40     job.download_outputs(output_dir=output_dir)
41     print(f"\nDownloaded {len(file_results['successful'])} file(s) to: {output_dir}")
42 
43 if __name__ == "__main__":
44     main()
45 
46 # --- Notebook/Colab usage ---
47 # main()

Saaras Model: Batch Speech-to-Text Translation

Synchronous (Python)

Asynchronous (Python)

JavaScript

1 from sarvamai import SarvamAI
2 
3 def main():
4     client = SarvamAI(api_subscription_key="YOUR_API_KEY")
5 
6     # Create and configure batch STTT job
7     job = client.speech_to_text_translate_job.create_job(
8         model="saaras:v2.5",
9         with_diarization=True,
10         num_speakers=2,
11         prompt="Official meeting"
12     )
13 
14     # Upload and process files
15     audio_paths = ["path/to/audio1.mp3", "path/to/audio2.mp3"]
16     job.upload_files(file_paths=audio_paths)
17     job.start()
18 
19     # Wait for completion
20     job.wait_until_complete()
21 
22     # Check file-level results
23     file_results = job.get_file_results()
24 
25     print(f"\nSuccessful: {len(file_results['successful'])}")
26     for f in file_results['successful']:
27         print(f"  ✓ {f['file_name']}")
28 
29     print(f"\nFailed: {len(file_results['failed'])}")
30     for f in file_results['failed']:
31         print(f"  ✗ {f['file_name']}: {f['error_message']}")
32 
33     # Handle all files failed
34     if len(file_results['successful']) == 0:
35         print("\nAll files failed.")
36         return
37 
38     # Download outputs for successful files
39     output_dir = "./output"
40     job.download_outputs(output_dir=output_dir)
41     print(f"\nDownloaded {len(file_results['successful'])} file(s) to: {output_dir}")
42 
43 if __name__ == "__main__":
44     main()
45 
46 # --- Notebook/Colab usage ---
47 # main()

Speaker Diarization

Speaker diarization automatically identifies and separates different speakers in an audio recording. This feature is ideal for meetings, interviews, and multi-speaker conversations where you need to know who said what.

Capabilities

Identify multiple speakers in a single audio file
Assign unique speaker IDs (speaker 1, speaker 2, etc.)
Provide timestamps for each speaker segment
Works with up to 8 speakers per audio file

Output Format

When diarization=True is passed in the request, the response includes a diarized_transcript field with detailed speaker information:

1 {
2   "diarized_transcript": {
3     "entries": [
4       {
5         "transcript": "hello",
6         "start_time_seconds": 42,
7         "end_time_seconds": 42,
8         "speaker_id": "speaker 1"
9       }
10     ]
11   }
12 }

Each entry contains:

transcript: The text spoken by the speaker
start_time_seconds: When the speaker started speaking
end_time_seconds: When the speaker stopped speaking
speaker_id: Unique identifier for the speaker

The SarvamAI SDK supports both synchronous and asynchronous programming in Python. This refers to how your code interacts with the SDK, not how the server handles the processing of requests.

Webhook Support

For long-running batch jobs, you can use webhooks to receive notifications when jobs complete instead of polling for status updates.

Setting Up Webhooks

When creating a job, include a callback parameter with your webhook URL and authentication token:

Python (Async)

Python (Sync)

1 from sarvamai import AsyncSarvamAI, BulkJobCallbackParams
2 
3 client = AsyncSarvamAI(api_subscription_key="YOUR_API_KEY")
4 
5 job = await client.speech_to_text_job.create_job(
6     model="saarika:v2.5",
7     with_diarization=True,
8     callback=BulkJobCallbackParams(
9         url="https://your-server.com/webhook-endpoint",
10         auth_token="your-secret-token"
11     )
12 )

Webhook Payload

When a job completes, Sarvam AI will send a POST request to your webhook URL with the following payload:

1 {
2   "job_id": "job_12345",
3   "job_state": "COMPLETED",
4   "results": {
5     "transcripts": [...],
6     "metadata": {...}
7   },
8   "error_message": null
9 }

Webhook Server Example

Here’s a simple FastAPI server to handle webhook callbacks:

1 from fastapi import FastAPI, Request, HTTPException
2 import uvicorn
3 
4 app = FastAPI()
5 VALID_TOKEN = "your-secret-token"
6 
7 @app.post("/webhook-endpoint")
8 async def handle_webhook(request: Request):
9     # Validate authentication
10     token = request.headers.get("X-SARVAM-JOB-CALLBACK-TOKEN")
11     if token != VALID_TOKEN:
12         raise HTTPException(status_code=403, detail="Invalid token")
13     
14     # Process the webhook data
15     data = await request.json()
16     job_id = data.get("job_id")
17     job_state = data.get("job_state")
18     
19     if job_state == "COMPLETED":
20         print(f"Job {job_id} completed successfully!")
21         # Handle successful completion
22     elif job_state == "FAILED":
23         print(f"Job {job_id} failed!")
24         # Handle failure
25     
26     return {"status": "success"}
27 
28 if __name__ == "__main__":
29     uvicorn.run(app, host="0.0.0.0", port=8000)

Your webhook server must respond with a 200 status code within 30 seconds. Make sure your webhook URL is publicly accessible and uses HTTPS in production.

Next Steps

Choose Your API

Select the appropriate API type based on your use case.

Get API Key

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.