> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# How to enable speaker diarization

> Identify and distinguish between multiple speakers in audio using the Batch API.

**Batch API only:** Speaker diarization is only available through the Batch API, not the REST or Streaming APIs.

Speaker diarization identifies and labels different speakers in your audio, making it easy to know "who said what." This is ideal for meetings, interviews, podcasts, and call center recordings.

### Key Features

* Automatic speaker detection
* Support for up to 10 speakers
* Speaker-wise transcription with timestamps

### Parameters

| Parameter          | Type    | Description                                   |
| ------------------ | ------- | --------------------------------------------- |
| `with_diarization` | boolean | Enable speaker diarization (default: `false`) |
| `num_speakers`     | integer | Expected number of speakers (optional, 1-10)  |

If you don't specify `num_speakers`, the model will automatically detect the number of speakers.

### Example Code

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

# Create batch job with diarization
job = client.speech_to_text_job.create_job(
    model="saaras:v3",
    language_code="hi-IN",
    mode="transcribe",
    with_diarization=True
)

# Upload audio files
job.upload_files(file_paths=["meeting_recording.mp3"])

# Start processing
job.start()

# Wait for completion
job.wait_until_complete()

# Download results
job.download_outputs(output_dir="./output")
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

// Create batch job with diarization
const job = await client.speechToTextJob.createJob({
    model: "saaras:v3",
    languageCode: "hi-IN",
    mode: "transcribe",
    withDiarization: true
});

// Upload audio files
await job.uploadFiles({ filePaths: ["meeting_recording.mp3"] });

// Start processing
await job.start();

// Wait for completion
await job.waitUntilComplete();

// Download results
await job.downloadOutputs({ outputDir: "./output" });
```

```bash
# Step 1: Create job
curl -X POST https://api.sarvam.ai/speech-to-text/job \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "saaras:v3",
    "language_code": "hi-IN",
    "mode": "transcribe",
    "with_diarization": true
  }'

# Step 2: Upload files using job_id from response
# Step 3: Start job and poll for completion
```

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

# Create batch job with known speaker count
job = client.speech_to_text_job.create_job(
    model="saaras:v3",
    language_code="en-IN",
    mode="transcribe",
    with_diarization=True,
    num_speakers=3  # Interview with 3 participants
)

job.upload_files(file_paths=["interview.mp3"])
job.start()
job.wait_until_complete()
job.download_outputs(output_dir="./output")
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

// Create batch job with known speaker count
const job = await client.speechToTextJob.createJob({
    model: "saaras:v3",
    languageCode: "en-IN",
    mode: "transcribe",
    withDiarization: true,
    numSpeakers: 3  // Interview with 3 participants
});

await job.uploadFiles({ filePaths: ["interview.mp3"] });
await job.start();
await job.waitUntilComplete();
await job.downloadOutputs({ outputDir: "./output" });
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text/job \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "saaras:v3",
    "language_code": "en-IN",
    "mode": "transcribe",
    "with_diarization": true,
    "num_speakers": 3
  }'
```

### Output Format

When `with_diarization=True` is passed, the response includes a `diarized_transcript` field with speaker information:

```json
{
  "request_id": "20260130_d8d2c0e6-1eb6-4982-8045-b267d5165c44",
  "transcript": "Full transcript text...",
  "timestamps": {
    "words": ["Hello, how can I help you today?", "I have a question about my order."],
    "start_time_seconds": [0.01, 2.8],
    "end_time_seconds": [2.5, 5.2]
  },
  "diarized_transcript": {
    "entries": [
      {
        "transcript": "Hello, how can I help you today?",
        "start_time_seconds": 0.01,
        "end_time_seconds": 2.5,
        "speaker_id": "0"
      },
      {
        "transcript": "I have a question about my order.",
        "start_time_seconds": 2.8,
        "end_time_seconds": 5.2,
        "speaker_id": "1"
      }
    ]
  },
  "language_code": "en-IN"
}
```

Each entry contains:

* `transcript`: The text spoken by the speaker
* `start_time_seconds`: When the speaker started speaking (float)
* `end_time_seconds`: When the speaker stopped speaking (float)
* `speaker_id`: Unique identifier for the speaker (e.g., "0", "1")

### Use Cases

| Use Case               | Recommended Settings  |
| ---------------------- | --------------------- |
| Call center recordings | `num_speakers=2`      |
| Meetings               | Let model auto-detect |
| Interviews             | Specify exact count   |
| Podcasts               | `num_speakers=2-4`    |

Speaker diarization is available via the Batch API and has separate pricing. For detailed pricing information, visit [dashboard.sarvam.ai](https://dashboard.sarvam.ai).

→ [Full Batch API Documentation](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api)