How to enable speaker diarization
Batch API only: Speaker diarization is only available through the Batch API, not the REST or Streaming APIs.
Speaker diarization identifies and labels different speakers in your audio, making it easy to know “who said what.” This is ideal for meetings, interviews, podcasts, and call center recordings.
Key Features
- Automatic speaker detection
- Support for up to 10 speakers
- Speaker-wise transcription with timestamps
Parameters
If you don’t specify num_speakers, the model will automatically detect the number of speakers.
Example Code
Basic Diarization
With Speaker Count
Output Format
When with_diarization=True is passed, the response includes a diarized_transcript field with speaker information:
Each entry contains:
transcript: The text spoken by the speakerstart_time_seconds: When the speaker started speaking (float)end_time_seconds: When the speaker stopped speaking (float)speaker_id: Unique identifier for the speaker (e.g., “0”, “1”)
Use Cases
Speaker diarization is available via the Batch API and has separate pricing. For detailed pricing information, visit dashboard.sarvam.ai.