Batch Speech-to-Text API

Process long audio files (up to 1 hour) using synchronous or asynchronous methods. Ideal for meetings, interviews, call center recordings, and large-scale content processing pipelines.

  • Supports files up to 1 hour long
  • Advanced transcription and translation
  • Speaker diarization and timestamp support

Note: You can upload up to 20 audio files per job.

Features

Processing
  • Supports up to 1 hour audio
  • Synchronous and asynchronous job-based API
  • Upload multiple files per job
Audio & Language Support
  • Indian languages and English
  • Automatic language detection
  • Diarization and timestamp support
Timestamps
  • Chunk-level timestamp support
  • Useful for subtitle alignment and audio navigation
  • Provides start and end times for each segment of text
Speaker Diarization
  • Identify multiple speakers
  • Output includes speaker labels (SPEAKER_00, etc.)
  • Ideal for meetings and interviews

The SarvamAI SDK supports both synchronous and asynchronous programming in Python.
This refers to how your code interacts with the SDK, not how the server handles the processing of requests.

Code Examples

Saarika Model: Batch Speech-to-Text Transcription

1from sarvamai import SarvamAI
2
3def main():
4 client = SarvamAI(api_subscription_key="YOUR_API_KEY")
5
6 job = client.speech_to_text_job.create_job(
7 language_code="en-IN",
8 model="saarika:v2.5",
9 with_timestamps=True,
10 with_diarization=True,
11 num_speakers=2
12 )
13
14 audio_paths = ["path/to/audio.mp3"]
15 job.upload_files(file_paths=audio_paths)
16
17 job.start()
18
19 final_status = job.wait_until_complete()
20
21 if job.is_failed():
22 print("STT job failed.")
23 return
24 output_dir = "./output"
25 job.download_outputs(output_dir=output_dir)
26 print(f"Output downloaded to: {output_dir}")
27
28if __name__ == "__main__":
29 main()
30
31# --- Notebook/Colab usage ---
32#main()

Saaras Model: Batch Speech-to-Text Translation

1from sarvamai import SarvamAI
2
3def main():
4 client = SarvamAI(api_subscription_key="YOUR_API_KEY")
5
6 job = client.speech_to_text_translate_job.create_job(
7 model="saaras:v2.5",
8 with_diarization=True,
9 num_speakers=2,
10 prompt="Official meeting"
11 )
12
13 audio_paths = ["path/to/audio.mp3"]
14 job.upload_files(file_paths=audio_paths)
15
16 job.start()
17
18 final_status = job.wait_until_complete()
19
20 if job.is_failed():
21 print("STT job failed.")
22 return
23
24 output_dir = "./output"
25 job.download_outputs(output_dir=output_dir)
26 print(f"Output downloaded to: {output_dir}")
27
28if __name__ == "__main__":
29 main()
30
31# --- Notebook/Colab usage ---
32#main()

Next Steps

1

Choose Your API

Select the appropriate API type based on your use case.

3

Get API Key

Sign up and get your API key from the dashboard.

4

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.