For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
CommunityAPI StatusAPI PricingSign Up
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
  • Getting Started
    • Welcome
    • Quickstart
    • SDKs & Libraries
    • Building for Indian Languages
    • Models
    • Credits & Rate Limits
    • Errors & Troubleshooting
    • Talk to us
    • Pricing
    • Changelog
  • API Guides & Tutorials
      • Overview
      • Which API to Use
      • Rest API
      • Batch API
      • Streaming API
        • Select Output Mode
        • Specify Language Codes
        • Enable Speaker Diarization
      • FAQs
LogoLogo
CommunityAPI StatusAPI PricingSign Up
On this page
  • Key Features
  • Parameters
  • Example Code
  • Output Format
  • Use Cases
API Guides & TutorialsSpeech to TextHow-to

How to enable speaker diarization

||View as Markdown|
Was this page helpful?
Previous

FAQs

Next
Built with

Batch API only: Speaker diarization is only available through the Batch API, not the REST or Streaming APIs.

Speaker diarization identifies and labels different speakers in your audio, making it easy to know “who said what.” This is ideal for meetings, interviews, podcasts, and call center recordings.

Key Features

  • Automatic speaker detection
  • Support for up to 10 speakers
  • Speaker-wise transcription with timestamps

Parameters

ParameterTypeDescription
with_diarizationbooleanEnable speaker diarization (default: false)
num_speakersintegerExpected number of speakers (optional, 1-10)

If you don’t specify num_speakers, the model will automatically detect the number of speakers.

Example Code

Basic Diarization
With Speaker Count
1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5# Create batch job with diarization
6job = client.speech_to_text_job.create_job(
7 model="saaras:v3",
8 language_code="hi-IN",
9 mode="transcribe",
10 with_diarization=True
11)
12
13# Upload audio files
14job.upload_files(file_paths=["meeting_recording.mp3"])
15
16# Start processing
17job.start()
18
19# Wait for completion
20job.wait_until_complete()
21
22# Download results
23job.download_outputs(output_dir="./output")

Output Format

When with_diarization=True is passed, the response includes a diarized_transcript field with speaker information:

1{
2 "request_id": "20260130_d8d2c0e6-1eb6-4982-8045-b267d5165c44",
3 "transcript": "Full transcript text...",
4 "timestamps": {
5 "words": ["Hello, how can I help you today?", "I have a question about my order."],
6 "start_time_seconds": [0.01, 2.8],
7 "end_time_seconds": [2.5, 5.2]
8 },
9 "diarized_transcript": {
10 "entries": [
11 {
12 "transcript": "Hello, how can I help you today?",
13 "start_time_seconds": 0.01,
14 "end_time_seconds": 2.5,
15 "speaker_id": "0"
16 },
17 {
18 "transcript": "I have a question about my order.",
19 "start_time_seconds": 2.8,
20 "end_time_seconds": 5.2,
21 "speaker_id": "1"
22 }
23 ]
24 },
25 "language_code": "en-IN"
26}

Each entry contains:

  • transcript: The text spoken by the speaker
  • start_time_seconds: When the speaker started speaking (float)
  • end_time_seconds: When the speaker stopped speaking (float)
  • speaker_id: Unique identifier for the speaker (e.g., “0”, “1”)

Use Cases

Use CaseRecommended Settings
Call center recordingsnum_speakers=2
MeetingsLet model auto-detect
InterviewsSpecify exact count
Podcastsnum_speakers=2-4

Speaker diarization is available via the Batch API and has separate pricing. For detailed pricing information, visit dashboard.sarvam.ai.

→ Full Batch API Documentation