Batch Speech-to-Text API

Process long audio files (up to 1 hour) using synchronous or asynchronous methods. Ideal for meetings, interviews, call center recordings, and large-scale content processing pipelines.

  • Supports files up to 1 hour long
  • Advanced transcription and translation
  • Speaker diarization and timestamp support

Note: You can upload up to 20 audio files per job.

Model Availability: The Batch API supports Saaras v3 (recommended) with multiple output modes via the mode parameter. Legacy models Saarika v2.5 and Saaras v2.5 are also available but we recommend switching to Saaras v3 for the best accuracy and features.

Supported Modes (Saaras v3)

ModeDescriptionOutput
transcribeStandard transcription in the original languageText in source language
translateTranscribe and translate to EnglishEnglish text
verbatimWord-for-word transcription including filler words and repetitionsVerbatim text in source language
translitTranscribe and transliterate to Roman scriptRomanized text
codemixTranscribe code-mixed speech (e.g., Hindi-English) naturallyCode-mixed text

Features

Processing
  • Supports up to 1 hour audio
  • Synchronous and asynchronous job-based API
  • Upload multiple files per job
Audio & Language Support
  • Indian languages and English
  • Automatic language detection
  • Diarization and timestamp support
Timestamps
  • Chunk-level timestamp support
  • Useful for subtitle alignment and audio navigation
  • Provides start and end times for each segment of text
Speaker Diarization
  • Identify multiple speakers
  • Output includes speaker labels (SPEAKER_00, etc.)
  • Ideal for meetings and interviews

Code Examples

Choosing a Mode

To switch between modes, simply change the mode parameter in your job creation call. The rest of the workflow (upload, start, wait, download) remains the same.

Transcribe audio in the original language.

1job = client.speech_to_text_job.create_job(
2 model="saaras:v3",
3 mode="transcribe", # Standard transcription
4 language_code="hi-IN",
5 with_diarization=True,
6 num_speakers=2,
7)

Full Example

Once you’ve created a job with your chosen mode, the upload, processing, and download workflow is the same for all modes:

1from sarvamai import SarvamAI
2
3def main():
4 client = SarvamAI(api_subscription_key="YOUR_API_KEY")
5
6 # Create batch job — change mode as needed
7 job = client.speech_to_text_job.create_job(
8 model="saaras:v3",
9 mode="transcribe",
10 language_code="en-IN",
11 with_diarization=True,
12 num_speakers=2
13 )
14
15 # Upload and process files
16 audio_paths = ["path/to/audio1.mp3", "path/to/audio2.mp3"]
17 job.upload_files(file_paths=audio_paths)
18 job.start()
19
20 # Wait for completion
21 job.wait_until_complete()
22
23 # Check file-level results
24 file_results = job.get_file_results()
25
26 print(f"\nSuccessful: {len(file_results['successful'])}")
27 for f in file_results['successful']:
28 print(f" ✓ {f['file_name']}")
29
30 print(f"\nFailed: {len(file_results['failed'])}")
31 for f in file_results['failed']:
32 print(f" ✗ {f['file_name']}: {f['error_message']}")
33
34 # Download outputs for successful files
35 if file_results['successful']:
36 job.download_outputs(output_dir="./output")
37 print(f"\nDownloaded {len(file_results['successful'])} file(s) to: ./output")
38
39if __name__ == "__main__":
40 main()

Speaker Diarization

Speaker diarization automatically identifies and separates different speakers in an audio recording. This feature is ideal for meetings, interviews, and multi-speaker conversations where you need to know who said what.

Capabilities

  • Identify multiple speakers in a single audio file
  • Assign unique speaker IDs (speaker 1, speaker 2, etc.)
  • Provide timestamps for each speaker segment
  • Works with up to 8 speakers per audio file

Output Format

When with_diarization=True is passed in the request, the response includes a diarized_transcript field with detailed speaker information:

1{
2 "request_id": "20260130_d8d2c0e6-1eb6-4982-8045-b267d5165c44",
3 "transcript": "Full transcript text...",
4 "timestamps": {
5 "words": ["Hello, how can I help you today?", "I have a question."],
6 "start_time_seconds": [0.01, 2.8],
7 "end_time_seconds": [2.5, 4.2]
8 },
9 "diarized_transcript": {
10 "entries": [
11 {
12 "transcript": "Hello, how can I help you today?",
13 "start_time_seconds": 0.01,
14 "end_time_seconds": 2.5,
15 "speaker_id": "0"
16 },
17 {
18 "transcript": "I have a question.",
19 "start_time_seconds": 2.8,
20 "end_time_seconds": 4.2,
21 "speaker_id": "1"
22 }
23 ]
24 },
25 "language_code": "en-IN"
26}

Each entry contains:

  • transcript: The text spoken by the speaker
  • start_time_seconds: When the speaker started speaking (float)
  • end_time_seconds: When the speaker stopped speaking (float)
  • speaker_id: Unique identifier for the speaker (e.g., “0”, “1”)

The SarvamAI SDK supports both synchronous and asynchronous programming in Python. This refers to how your code interacts with the SDK, not how the server handles the processing of requests.

Webhook Support

For long-running batch jobs, you can use webhooks to receive notifications when jobs complete instead of polling for status updates.

Setting Up Webhooks

When creating a job, include a callback parameter with your webhook URL and authentication token:

1from sarvamai import AsyncSarvamAI, BulkJobCallbackParams
2
3client = AsyncSarvamAI(api_subscription_key="YOUR_API_KEY")
4
5job = await client.speech_to_text_job.create_job(
6 model="saaras:v3",
7 mode="transcribe",
8 with_diarization=True,
9 callback=BulkJobCallbackParams(
10 url="https://your-server.com/webhook-endpoint",
11 auth_token="your-secret-token"
12 )
13)

Webhook Payload

When a job completes, Sarvam AI will send a POST request to your webhook URL with the following payload:

1{
2 "job_id": "job_12345",
3 "job_state": "COMPLETED",
4 "results": {
5 "transcripts": [...],
6 "metadata": {...}
7 },
8 "error_message": null
9}

Webhook Server Example

Here’s a simple FastAPI server to handle webhook callbacks:

1from fastapi import FastAPI, Request, HTTPException
2import uvicorn
3
4app = FastAPI()
5VALID_TOKEN = "your-secret-token"
6
7@app.post("/webhook-endpoint")
8async def handle_webhook(request: Request):
9 # Validate authentication
10 token = request.headers.get("X-SARVAM-JOB-CALLBACK-TOKEN")
11 if token != VALID_TOKEN:
12 raise HTTPException(status_code=403, detail="Invalid token")
13
14 # Process the webhook data
15 data = await request.json()
16 job_id = data.get("job_id")
17 job_state = data.get("job_state")
18
19 if job_state == "COMPLETED":
20 print(f"Job {job_id} completed successfully!")
21 # Handle successful completion
22 elif job_state == "FAILED":
23 print(f"Job {job_id} failed!")
24 # Handle failure
25
26 return {"status": "success"}
27
28if __name__ == "__main__":
29 uvicorn.run(app, host="0.0.0.0", port=8000)

Your webhook server must respond with a 200 status code within 30 seconds. Make sure your webhook URL is publicly accessible and uses HTTPS in production.

Next Steps

1

Choose Your API

Select the appropriate API type based on your use case.

2

Get API Key

Sign up and get your API key from the dashboard.

3

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.