> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Saarika

> Saarika v2.5 - High-accuracy speech recognition model for Indian languages with superior multi-speaker handling, telephony optimization, and automatic code-mixing support.

Saarika-v2.5 is our flagship speech recognition model, specifically designed for Indian languages and accents. It always transcribes the audio in the same language it was spoken. It excels in handling complex multi-speaker conversations, telephony audio, and code-mixed speech with superior accuracy across 11 languages.

**Deprecation Notice:** Saarika v2.5 will be deprecated soon. For transcription features, we recommend using [**Saaras v3**](/api-reference-docs/getting-started/models/saaras) with `mode="transcribe"`, which offers improved accuracy and additional output modes.

## Key Features

Optimized for 8KHz telephony audio with enhanced noise handling and superior multi-speaker recognition capabilities.

Preserves proper nouns and entities accurately across languages, maintaining context and meaning in transcriptions.

Optional automatic language identification with LID output. Use "unknown" when language is not known for automatic detection.

Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API processing.

Intelligently handles mid-sentence language switches in code-mixed speech, perfect for India's multilingual conversations.

Comprehensive support for Indian languages with high accuracy in mixed-language environments.

## Language Support

Saarika supports 11 languages with comprehensive dialect and accent coverage, including code-mixed audio support and intelligent proper noun preservation.

| Language  | Language Code |
| --------- | ------------- |
| English   | `en-IN`       |
| Hindi     | `hi-IN`       |
| Bengali   | `bn-IN`       |
| Tamil     | `ta-IN`       |
| Telugu    | `te-IN`       |
| Gujarati  | `gu-IN`       |
| Kannada   | `kn-IN`       |
| Malayalam | `ml-IN`       |
| Marathi   | `mr-IN`       |
| Punjabi   | `pa-IN`       |
| Odia      | `od-IN`       |

For automatic language detection, use `language_code="unknown"`. The model will automatically identify the spoken language and return it in the response.

## Performance Benchmarks

Saarika delivers exceptional accuracy across all supported languages, as measured on the VISTAAR Benchmark.

### CER (Character Error Rate) Scores

*Lower is better - Compared on VISTAAR Benchmark*

* **Across 11 Languages: 4.96%**
* **English: 4.45%**
* **Hindi: 4.42%**
* **9 Other languages: 5.07%**

### WER (Word Error Rate) Scores

*Lower is better - Compared on VISTAAR Benchmark*

* **Across 11 Languages: 18.32%**
* **English: 8.26%**
* **Hindi: 11.81%**
* **9 Other languages: 20.15%**

### Detailed CER Performance by Language

CER (Character Error Rate) measures the percentage of characters that are wrong in a transcription.
Lower scores are better, with 0% being perfect.

## Key Capabilities

**Legacy Model:** Saarika v2.5 is a legacy model. We recommend using **Saaras v3** (`model="saaras:v3"`) with the `mode` parameter for the best accuracy and features. See the [Saaras documentation](/api-reference-docs/models/saaras) for details.

Basic transcription with specified language code. Perfect for single-language content with clear audio quality.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY"
)

response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saarika:v2.5",
    language_code="hi-IN"
)

print(response)
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const API_KEY = "YOUR_SARVAM_API_KEY";
const FILE_PATH = "/path/to/audio.wav"; // or .mp3

async function main() {
  const client = new SarvamAIClient({ apiSubscriptionKey: API_KEY });

  const response = await client.speechToText.transcribe({
    file: fs.createReadStream(FILE_PATH),
    model: "saarika:v2.5",
    language_code: "hi-IN",
  });

  console.log(response);
}

main();
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: multipart/form-data" \
  -F model="saarika:v2.5" \
  -F language_code="hi-IN" \
  -F file=@file.wav
```

Handles mixed-language content with automatic detection of language switches within sentences. Ideal for natural Indian conversations that mix multiple languages.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY"
)

response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saarika:v2.5"
)

print(response)

# Example Output:
# {
#   "request_id": "20250430_b7cbeb34-3ff2-4730-abaf-90d23fca9827",
#   "transcript": "मैंने apply किया but rejected हो गया",
#   "language_code": "en-IN"
# }
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const API_KEY = "YOUR_SARVAM_API_KEY";
const FILE_PATH = "/path/to/audio.wav"; // or .mp3

async function main() {
  const client = new SarvamAIClient({ apiSubscriptionKey: API_KEY });

  const response = await client.speechToText.transcribe({
    file: fs.createReadStream(FILE_PATH),
    model: "saarika:v2.5",
  });

  console.log(response);
}

main();

// Example Output:
// {
//  "request_id": "20250430_b7cbeb34-3ff2-4730-abaf-90d23fca9827",
//  "transcript": "मैंने apply किया but rejected हो गया",
//  "language_code": "en-IN"
// }
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: multipart/form-data" \
  -F model="saarika:v2.5" \
  -F file=@file.wav
```

Let Saarika automatically detect the language being spoken. Useful when the input language is unknown or for handling multi-language content.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY"
)

response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saarika:v2.5",
    language_code="unknown"  # Enables automatic language detection
)

print(response)

# Example Output:
# {
#   "request_id": "20250430_78730d0e-532c-4d1c-949a-a0469f86f932",
#   "transcript": "என் பெயர் வியான். எனது குரல் நம்பகமானதாகவும் பலத்துறையிலும் பயன்படும் வகையிலும் இருக்கும்.",
#   "language_code": "ta-IN"
# }
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const API_KEY = "YOUR_SARVAM_API_KEY";
const FILE_PATH = "/path/to/audio.wav"; // Can be .mp3 or .wav

async function main() {
  const client = new SarvamAIClient({ apiSubscriptionKey: API_KEY });

  const response = await client.speechToText.transcribe({
    file: fs.createReadStream(FILE_PATH),
    model: "saarika:v2.5",
    language_code: "unknown",
  });

  console.log(response);
}

main();

// Example Output:
// {
//  "request_id": "20250430_78730d0e-532c-4d1c-949a-a0469f86f932",
//  "transcript": "என் பெயர் வியான். எனது குரல் நம்பகமானதாகவும் பலத்துறையிலும் பயன்படும் வகையிலும் இருக்கும்.",
//  "language_code": "ta-IN"
// }
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: multipart/form-data" \
  -F model="saarika:v2.5" \
  -F language_code="unknown" \
  -F file=@file.wav
```

## Limits

| Limit                                             | Value                                                                                                           |
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| Max audio duration (real-time REST)               | 30 seconds                                                                                                      |
| Supported formats                                 | WAV, MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4, AMR, WMA, WebM (auto-detected)                                       |
| Raw PCM input (`pcm_s16le`, `pcm_l16`, `pcm_raw`) | Requires `input_audio_codec`; must be 16 kHz                                                                    |
| Longer audio                                      | Use the [Batch API](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api) (up to 2 hours per file) |
| Rate limits                                       | See [Rate Limits](/api-reference-docs/ratelimits)                                                               |

## Known Limitations

| Limitation                          | Detail                                                                            | Workaround                                                                                                                            |
| ----------------------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **30-second cap on real-time REST** | The real-time `/speech-to-text` endpoint only accepts audio up to 30 seconds long | Use the [Batch API](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api) for longer recordings (up to 2 hours per file) |

## Next Steps

Learn how to integrate the Saarika API within your application.

Complete API documentation for speech to text endpoints.

Step-by-step tutorial for speech-to-text transcription.