Frequently Asked Questions

> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# FAQs

> Frequently asked questions about Sarvam AI speech-to-text services. Get answers about models, pricing, language support, audio formats, and implementation best practices.

<h1>
  Frequently Asked Questions
</h1>

<p>
  Find answers to common questions about our speech-to-text services
</p>

## General Questions

**REST and Batch APIs** support a wide range of audio formats including:

* WAV
* MP3
* M4A
* AAC
* OGG
* FLAC
* WebM
* PCM (pcm\_s16le, pcm\_l16, pcm\_raw)

**WebSocket/Streaming APIs** only support:

* WAV
* Raw PCM (pcm\_s16le, pcm\_l16, pcm\_raw)

For optimal results, we recommend:

* Sample rate: 16kHz or higher
* Bit depth: 16-bit
* Channels: Mono or Stereo

Our models support multiple Indian and global languages:

### Indian Languages

* Hindi
* English (Indian)
* Bengali
* Tamil
* Telugu
* Kannada
* Malayalam
* Marathi
* Gujarati
* Punjabi

### Global Languages

* English (US, UK, AU)
* French
* German
* Spanish
* Japanese

Check our [models page](/api-reference-docs/getting-started/models) for the complete list and specific model capabilities.

The limits vary by API endpoint:

### REST API

* Maximum duration: 30 seconds per request

### Batch API

* Maximum duration: 2 hours per file
* Maximum files per job: 20

### WebSocket API (Streaming)

* Continuous streaming with chunked audio — no duration limit
* Concurrency limits apply per plan (see [Rate Limits](/api-reference-docs/ratelimits))

For audio longer than 30 seconds, use the [Batch API](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api). For files longer than 2 hours, we recommend:

1. Splitting into smaller segments
2. Contacting support for custom solutions

Accuracy varies based on several factors:

### Typical Accuracy Rates

* Clear speech, minimal background noise: 95-98%
* Multiple speakers, moderate noise: 90-95%
* Heavy accent or background noise: 85-90%

Factors affecting accuracy:

* Audio quality
* Background noise
* Speaker accent
* Speaking speed
* Domain-specific terminology

Use our [interactive API reference](/api-reference-docs/speech-to-text/transcribe) to test with your specific audio.

## Technical Questions

Speaker diarization identifies and labels different speakers in the audio:

1. **Process**:
   * Voice activity detection
   * Speaker segmentation
   * Speaker clustering
   * Speaker labeling

2. **Usage** (via Batch API):

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

# Speaker diarization is available through the Batch API
# See: https://docs.sarvam.ai/api-reference-docs/speech-to-text/batch
job = client.speech_to_text_job.create_job(
    model="saaras:v3",
    mode="transcribe",
    with_diarization=True,
)
job.upload_files(file_paths=["audio.mp3"])
job.start()
job.wait_until_complete()
job.download_outputs(output_dir="./output")
```

3. **Output**:
   ```json
   {
     "segments": [
       {
         "speaker": "Speaker 1",
         "text": "Hello, how are you?",
         "start": 0.0,
         "end": 1.5
       },
       {
         "speaker": "Speaker 2",
         "text": "I'm doing well, thanks!",
         "start": 1.8,
         "end": 3.2
       }
     ]
   }
   ```

Rate limits are applied per account based on your subscription plan:

| Plan           | Rate Limit         |
| -------------- | ------------------ |
| **Starter**    | 60 requests/min    |
| **Pro**        | 200 requests/min   |
| **Business**   | 1,000 requests/min |
| **Enterprise** | Custom limits      |

### Duration Limits

* **REST API**: Max 30 seconds of audio per request
* **Batch API**: Up to 2 hours per file, 20 files per job
* **Streaming API**: Continuous (chunked) streaming; concurrency limits per plan

For batch endpoints, implement a minimum 5ms delay between status polling requests.

View the full [Credits & Rate Limits](/api-reference-docs/ratelimits) page for details on HTTP headers, error handling, and upgrade paths.

Common errors and solutions:

### 1. Authentication Errors (403)

```json
{
  "error": {
    "code": "invalid_api_key_error",
    "message": "API key is invalid or expired"
  }
}
```

Solution: Check API key validity and proper configuration. **Note:** Sarvam returns HTTP `403` (not `401`) for invalid/missing API keys — see the [Authentication](/api-reference-docs/authentication) page.

### 2. Rate Limit / Quota Errors (429)

```json
{
  "error": {
    "code": "insufficient_quota_error",
    "message": "API quota exceeded"
  }
}
```

Solution: Implement exponential backoff or upgrade plan. A `429` with `rate_limit_exceeded_error` means too many requests; `insufficient_quota_error` means credits are exhausted — see [Errors & Troubleshooting](/api-reference-docs/errors-troubleshooting).

### 3. Invalid Input (400)

```json
{
  "error": {
    "code": "invalid_request_error",
    "message": "Unsupported audio format"
  }
}
```

Solution: Check supported formats and requirements

### 4. Failed to read the file (400)

```json
{
  "error": {
    "message": "Failed to read the file, please check the audio format.",
    "code": "invalid_request_error"
  }
}
```

This almost always means the uploaded bytes are not a readable audio file — not that the format is unsupported. Common causes:

* **Empty or zero-length file** — the upload contains no bytes, or a buffer of all zeros
* **Empty WebM blob from a browser recorder** — `MediaRecorder` produced a header with no audio frames (see "How do I record audio in the browser?" below)
* **Junk or placeholder bytes** — the payload isn't a real audio container
* **Truncated or incomplete container** — the file was cut off during recording, download, or copy
* **Passing a filename string instead of a file object** — use `file=open("audio.wav", "rb")` in Python, not `file="audio.wav"`

Solution: before uploading, verify the file exists, its size is greater than 0, and you're passing a file handle/stream (not a path string).

See our [error handling guide](/api-reference-docs/errors-troubleshooting) for more details.

All the Node.js examples in these docs read audio with `fs.createReadStream(...)`, which doesn't exist in the browser. To transcribe microphone audio from a web page, record with `MediaRecorder` and upload the resulting blob.

The most common mistake is uploading an **empty WebM blob** (a container header with no audio frames), which the API rejects with `"Failed to read the file, please check the audio format."` The recipe below avoids that by stopping the recorder cleanly, waiting for the final `dataavailable` event, and checking `blob.size > 0` before uploading:

```javascript
async function recordAndTranscribe(durationMs = 5000) {
  // 1. Capture the microphone
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

  // 2. Pick a supported mimeType (WebM/Opus is widely supported and accepted by the API)
  const mimeType = MediaRecorder.isTypeSupported("audio/webm;codecs=opus")
    ? "audio/webm;codecs=opus"
    : "audio/webm";

  const recorder = new MediaRecorder(stream, { mimeType });
  const chunks = [];
  recorder.ondataavailable = (event) => {
    if (event.data.size > 0) chunks.push(event.data);
  };

  // 3. Stop cleanly: the final dataavailable fires before "stop" resolves
  const stopped = new Promise((resolve) => (recorder.onstop = resolve));
  recorder.start();
  setTimeout(() => recorder.stop(), durationMs);
  await stopped;
  stream.getTracks().forEach((track) => track.stop());

  // 4. Never upload an empty recording
  const blob = new Blob(chunks, { type: mimeType });
  if (blob.size === 0) {
    throw new Error("Recording is empty — no audio frames were captured.");
  }

  // 5. Upload to the Speech-to-Text API
  const formData = new FormData();
  formData.append("file", blob, "recording.webm");
  formData.append("model", "saaras:v3");
  formData.append("mode", "transcribe");

  const response = await fetch("https://api.sarvam.ai/speech-to-text", {
    method: "POST",
    headers: { "api-subscription-key": SARVAM_API_KEY },
    body: formData,
  });
  return await response.json();
}
```

Pre-flight checklist before any upload:

* The recording/file exists and `size > 0`
* You're sending the blob/file object, not a path or filename string
* Audio longer than 30 seconds goes to the [Batch API](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api) instead of the sync REST endpoint

Don't ship your API key in client-side code. In production, upload the recording to your own backend and call the Sarvam API from there.

Tips for optimal real-time performance:

1. **Audio Settings**

```javascript
const config = {
  sampleRate: 16000,
  encoding: 'LINEAR16',
  channels: 1
}
```

2. **Chunk Size**

* Optimal: 100ms - 500ms chunks
* Balance between latency and accuracy

3. **WebSocket Connection**

```javascript
const ws = new WebSocket('wss://api.sarvam.ai/v1/stt/stream')
ws.binaryType = 'arraybuffer'
```

4. **Error Handling**

```javascript
ws.onerror = (error) => {
  console.error('WebSocket Error:', error)
  // Implement reconnection logic
}
```

View our [real-time streaming guide](/api-reference-docs/speech-to-text/apis/streaming) for detailed examples.

## Billing & Support

Usage is calculated based on:

1. **Audio Duration**

* Rounded up to the nearest second
* Minimum charge: 1 second

2. **Features Used**

* Base transcription
* Speaker diarization (+20%)
* Language detection (+10%)
* Word timestamps (+10%)

3. **Model Type**

* Saarika: Base rate
* Saaras: Premium rate

Example calculation:

```
5 minutes audio × Base rate
+ Speaker diarization (20%)
+ Word timestamps (10%)
= Total cost
```

Multiple support channels available:

1. **Documentation**

* [API Reference](/api-reference-docs/speech-to-text/transcribe)
* [Guides](/api-reference-docs/getting-started/quickstart)
* [Examples](https://github.com/sarvamai/sarvam-ai-cookbook)

2. **Community**

* [Discord Community](https://discord.com/invite/5rAsykttcs)

3. **Direct Support**

* Email: [developer@sarvam.ai](mailto:developer@sarvam.ai)
* Enterprise: Dedicated support manager

## Still Have Questions?

<h3>
  Can't find what you're looking for?
</h3>

<p>
  Our team is here to help! Reach out through any of our support channels.
</p>

Join Discord

Email Support

{" "}