Batch API only: Speaker diarization is only available through the Batch API, not the REST or Streaming APIs.
Speaker diarization identifies and labels different speakers in your audio, making it easy to know “who said what.” This is ideal for meetings, interviews, podcasts, and call center recordings.
If you don’t specify num_speakers, the model will automatically detect the number of speakers.
When with_diarization=True is passed, the response includes a diarized_transcript field with speaker information:
Each entry contains:
transcript: The text spoken by the speakerstart_time_seconds: When the speaker started speaking (float)end_time_seconds: When the speaker stopped speaking (float)speaker_id: Unique identifier for the speaker (e.g., “0”, “1”)Speaker diarization is available via the Batch API and has separate pricing. For detailed pricing information, visit dashboard.sarvam.ai.