> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Speech-to-Text Rest API

> Process short audio files synchronously with immediate response. Instant transcription and translation for quick audio processing with multiple format support.

<h3>
  Synchronous Processing
</h3>

<p>
  Process short audio files with immediate response. Best for quick
  transcriptions and testing with a maximum duration of 30 seconds.
</p>

## Saaras v3: State-of-the-Art Speech Recognition (Recommended)

Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.

**Recommended for new integrations.** Saaras v3 offers improved accuracy and flexible output modes. [Learn more about Saaras v3](/api-reference-docs/getting-started/models/saaras).

### Output Modes

| Mode                   | Description                                     |
| ---------------------- | ----------------------------------------------- |
| `transcribe` (default) | Standard transcription in the original language |
| `translate`            | Translates speech to English                    |
| `verbatim`             | Exact word-for-word transcription               |
| `translit`             | Romanization to Latin script                    |
| `codemix`              | Code-mixed text output                          |

### Code Examples for Saaras v3

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

# Transcribe mode (default)
response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="transcribe"  # or "translate", "verbatim", "translit", "codemix"
)

print(response)
```

```javascript
import {SarvamAIClient} from "sarvamai";
import fs from 'fs';

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const audioFile = fs.createReadStream("recording.wav");

const response = await client.speechToText.transcribe({
    file: audioFile,
    model: "saaras:v3",
    mode: "transcribe"  // or "translate", "verbatim", "translit", "codemix"
});

console.log(response);
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F model="saaras:v3" \
  -F mode="transcribe" \
  -F file=@file.wav
```

Check out our detailed [API Reference](/api-reference-docs/speech-to-text/transcribe)
to explore all available options.

***

## Legacy Models (Deprecated Soon)

The following models will be deprecated soon. We recommend migrating to **Saaras v3** for new integrations.

### Saarika v2.5: Speech to Text Transcription

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings.

**Deprecation Notice:** Saarika v2.5 will be deprecated soon. Use [Saaras v3](/api-reference-docs/getting-started/models/saaras) with `mode="transcribe"` instead.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="transcribe",
    language_code="hi-IN"
)

print(response)
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F model="saaras:v3" \
  -F mode="transcribe" \
  -F language_code="hi-IN" \
  -F file=@file.wav
```

### Saaras v2.5: Speech to Text Translation

Saaras v2.5 is available in the Speech-to-Text Translate endpoint for translating speech directly to English.

**Deprecation Notice:** Saaras v2.5 will be deprecated soon. Use [Saaras v3](/api-reference-docs/getting-started/models/saaras) with `mode="translate"` instead.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.speech_to_text.translate(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="translate"
)

print(response)
```

```bash
curl -X POST https://api.sarvam.ai/speech-to-text-translate \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file=@audio.wav \
  -F model="saaras:v3" \
  -F mode="translate"
```

## API Response Format

### Speech to Text Transcription Response

| Field           | Type   | Description                                                                                       |
| --------------- | ------ | ------------------------------------------------------------------------------------------------- |
| `request_id`    | string | Unique identifier for the request                                                                 |
| `transcript`    | string | The transcribed text from the audio file                                                          |
| `language_code` | string | BCP-47 language code of detected language (e.g., `hi-IN`). Returns `null` if no language detected |

```json
{
  "request_id": "20241115_12345678-1234-5678-1234-567812345678",
  "transcript": "नमस्ते, आप कैसे हैं?",
  "language_code": "hi-IN"
}
```

### Speech to Text Translation Response

| Field           | Type   | Description                                 |
| --------------- | ------ | ------------------------------------------- |
| `request_id`    | string | Unique identifier for the request           |
| `transcript`    | string | Translated text in English                  |
| `language_code` | string | BCP-47 code of the detected source language |

**Supported source languages:** `hi-IN`, `bn-IN`, `kn-IN`, `ml-IN`, `mr-IN`, `od-IN`, `pa-IN`, `ta-IN`, `te-IN`, `gu-IN`, `en-IN`

```json
{
  "request_id": "20241115_12345678-1234-5678-1234-567812345678",
  "transcript": "Hello, how are you?",
  "language_code": "hi-IN"
}
```

## Error Responses

All errors return a JSON object with an `error` field containing details about what went wrong.

### Error Response Structure

```json
{
  "error": {
    "message": "Human-readable error description",
    "code": "error_code_for_programmatic_handling",
    "request_id": "unique_request_identifier"
  }
}
```

### Error Codes Reference

| HTTP Status | Error Code                   | When This Happens                                | What To Do                                       |
| ----------- | ---------------------------- | ------------------------------------------------ | ------------------------------------------------ |
| `400`       | `invalid_request_error`      | Missing required parameters or malformed request | Check request format and required fields         |
| `403`       | `invalid_api_key_error`      | API key is invalid, missing, or expired          | Verify your API key in the dashboard             |
| `422`       | `unprocessable_entity_error` | Invalid audio format or file too large           | Use supported formats: WAV, MP3, AAC, FLAC, OGG  |
| `429`       | `insufficient_quota_error`   | API quota or rate limit exceeded                 | Wait for reset or upgrade your plan              |
| `500`       | `internal_server_error`      | Unexpected server error                          | Retry the request; contact support if persistent |
| `503`       | `rate_limit_exceeded_error`  | Service temporarily overloaded                   | Retry with exponential backoff                   |

### Example Error Response

```json
{
  "error": {
    "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
    "code": "unprocessable_entity_error",
    "request_id": "20241115_abc12345"
  }
}
```

```python
from sarvamai import SarvamAI
from sarvamai.core.api_error import ApiError

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

try:
    response = client.speech_to_text.transcribe(
        file=open("audio.wav", "rb"),
        model="saaras:v3",
        mode="transcribe"
    )
    print(response.transcript)
except ApiError as e:
    if e.status_code == 400:
        print(f"Bad request: {e.body}")
    elif e.status_code == 403:
        print("Invalid API key. Check your credentials.")
    elif e.status_code == 429:
        print("Rate limit exceeded. Wait and retry.")
    elif e.status_code == 503:
        print("Service overloaded. Retry with backoff.")
    else:
        print(f"Error {e.status_code}: {e.body}")
```

## Next Steps

Sign up and get your API key from the
[dashboard](https://dashboard.sarvam.ai).

Try the API with sample audio files.

Deploy your integration and monitor usage.

Need help? Contact us on [discord](https://discord.com/invite/5rAsykttcs) for
guidance.