Batch Speech-to-Text (STT) API Tutorial Using Saarika Model

Overview

This guide demonstrates how to use Sarvam AI’s Batch Speech-to-Text (STT) API for transcribing audio files at scale. You’ll learn both synchronous and asynchronous usage patterns, understand key parameters, and see how to upload files, poll for job completion, and download results.

1. Installation

Install the Sarvam AI Python SDK:

1!pip install -U sarvamai

2. API Key Setup

  1. Get your API key: Sign up at the Sarvam AI Dashboard to obtain your API key.
  2. Set your API key: Replace "YOUR_API_KEY_HERE" in the code below with your actual key.
1API_KEY = "YOUR_API_KEY_HERE"

3. STT Parameters

Sets up the job configuration for the STT batch process. Parameters:

  • language_code: Language code of input audio (e.g., “en-IN” for Indian English)
  • model: Transcription model (e.g., “saarika:v2.5” for latest general-purpose STT)
  • with_timestamps: If True, includes chunk-level timestamps
  • with_diarization: If True, enables speaker diarization
  • num_speakers: Number of speakers (used with diarization)

4. Synchronous STT Batch Example

1from pathlib import Path
2from sarvamai import SarvamAI
3
4API_KEY = "YOUR_API_KEY_HERE"
5audio_files = ["/path/to/your/audio1.mp3", "/path/to/your/audio2.mp3"] # Update with your file paths
6output_dir = Path("/output")
7output_dir.mkdir(exist_ok=True)
8
9def run_stt_sync():
10 client = SarvamAI(api_subscription_key=API_KEY)
11
12 # Create and configure batch STT job
13 job = client.speech_to_text_job.create_job(
14 model="saarika:v2.5",
15 with_diarization=True,
16 with_timestamps=True,
17 language_code="en-IN",
18 num_speakers=2,
19 )
20 print(f"Job created: {job._job_id}")
21
22 # Upload and process files
23 job.upload_files(file_paths=audio_files, timeout=120.0)
24 job.start()
25 print("Transcription started...")
26
27 # Wait for completion
28 job.wait_until_complete(poll_interval=5, timeout=600)
29
30 # Check file-level results
31 file_results = job.get_file_results()
32
33 print(f"\nSuccessful: {len(file_results['successful'])}")
34 for f in file_results['successful']:
35 print(f" ✓ {f['file_name']}")
36
37 print(f"\nFailed: {len(file_results['failed'])}")
38 for f in file_results['failed']:
39 print(f" ✗ {f['file_name']}: {f['error_message']}")
40
41 # Handle all files failed
42 if len(file_results['successful']) == 0:
43 print("\nAll files failed.")
44 return
45
46 # Download outputs for successful files
47 job.download_outputs(output_dir=str(output_dir))
48 print(f"\nDownloaded {len(file_results['successful'])} file(s) to: {output_dir}")
49
50run_stt_sync()

5. Asynchronous STT Batch Example

1import asyncio
2from pathlib import Path
3from sarvamai import AsyncSarvamAI
4
5API_KEY = "YOUR_API_KEY_HERE"
6audio_files = ["/path/to/your/audio1.mp3", "/path/to/your/audio2.mp3"] # Update with your file paths
7output_dir = Path("/output")
8output_dir.mkdir(exist_ok=True)
9
10async def run_stt_async_job():
11 client = AsyncSarvamAI(api_subscription_key=API_KEY)
12
13 # Create and configure batch STT job
14 job = await client.speech_to_text_job.create_job(
15 model="saarika:v2.5",
16 with_diarization=True,
17 with_timestamps=True,
18 language_code="en-IN",
19 num_speakers=2,
20 )
21 print(f"Job created: {job._job_id}")
22
23 # Upload and process files
24 await job.upload_files(file_paths=audio_files, timeout=120.0)
25 await job.start()
26 print("Transcription started...")
27
28 # Wait for completion
29 await job.wait_until_complete(poll_interval=5, timeout=600)
30
31 # Check file-level results
32 file_results = await job.get_file_results()
33
34 print(f"\nSuccessful: {len(file_results['successful'])}")
35 for f in file_results['successful']:
36 print(f" ✓ {f['file_name']}")
37
38 print(f"\nFailed: {len(file_results['failed'])}")
39 for f in file_results['failed']:
40 print(f" ✗ {f['file_name']}: {f['error_message']}")
41
42 # Handle all files failed
43 if len(file_results['successful']) == 0:
44 print("\nAll files failed.")
45 return
46
47 # Download outputs for successful files
48 await job.download_outputs(output_dir=str(output_dir))
49 print(f"\nDownloaded {len(file_results['successful'])} file(s) to: {output_dir}")
50
51# For Jupyter environments:
52import nest_asyncio
53nest_asyncio.apply()
54await run_stt_async_job()

6. Tips & Best Practices

  • Audio Quality: Use clear audio for best results.
  • Diarization: Set with_diarization=True and specify num_speakers for multi-speaker audio.
  • Polling: Adjust poll_interval and timeout based on expected job duration and file size.
  • Output: Results are saved in the specified output_dir.
  • API Key Security: Keep your API key confidential.

7. Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure, and parameters.
  • 422 Unprocessable Entity Request (unprocessable_entity_error)

    • Cause: Unable to detect the language of the input text.
    • Solution: Explicitly pass the source_language_code parameter with a supported language.

8. Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:

9. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Check audio quality and supported formats.
  • Increase timeout for large files or slow networks.

Keep Building! 🚀