> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Speech-to-Text APIs

> Complete overview of Sarvam AI Speech-to-Text APIs including real-time, batch, and streaming options. Process audio with Saarika and Saaras models for high-accuracy transcription.

Sarvam AI offers powerful speech recognition models: [Saaras v3](/api-reference-docs/getting-started/models/saaras) (recommended — state-of-the-art ASR with flexible output modes: transcribe, translate, verbatim, transliterate, and codemix) and [Saarika v2.5](/api-reference-docs/getting-started/models/saarika) (legacy model, will be deprecated — migrate to Saaras v3).

State-of-the-art ASR model with flexible output modes: transcribe, translate, verbatim, transliterate, and codemix. Best choice for new integrations.

ASR model that transcribes Indian language speech into the same spoken language. Will be deprecated soon - migrate to Saaras v3.

## API Types

Available API types: [REST API](/api-reference-docs/api-guides-tutorials/speech-to-text/rest-api) for synchronous processing (files under 30 seconds), [Batch API](/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api) for asynchronous processing (files up to 1 hour), and [Streaming API](/api-reference-docs/api-guides-tutorials/speech-to-text/streaming-api) for real-time audio with instant results.

Synchronous processing for files under 30 seconds.

Asynchronous processing for files up to 1 hour.

Real-time audio streaming with instant results.

Not sure which one fits your audio length and latency needs? See [Which Speech-to-Text API to Use](/api-reference-docs/api-guides-tutorials/speech-to-text/which-api-to-use) for a side-by-side comparison of REST, WebSocket, and Batch.

## Supported Audio Formats & MIME Types

The STT and STTT REST and Batch APIs support over 10 major audio formats and MIME type variants.
Supported formats and MIME types are listed below:

| Format Group                  | Supported MIME Types                        |
| ----------------------------- | ------------------------------------------- |
| **MP3 Variants**              | `mpeg`, `mp3`, `mpeg3`, `x-mpeg-3`, `x-mp3` |
| **WAV Variants**              | `wav`, `x-wav`, `wave`                      |
| **AAC Variants**              | `aac`, `x-aac`                              |
| **AIFF Variants**             | `aiff`, `x-aiff`                            |
| **OGG / Opus Formats**        | `ogg`, `opus`                               |
| **FLAC Variants (Lossless)**  | `flac`, `x-flac`                            |
| **MP4 / M4A Audio**           | `mp4`, `x-m4a`                              |
| **AMR (Narrowband)**          | `amr`                                       |
| **WMA (Windows Media Audio)** | `x-ms-wma`                                  |
| **WEBM (Audio & Video)**      | `webm`, `webm`                              |
| **PCM Formats**               | `pcm_s16le`, `pcm_l16`, `pcm_raw`           |

For most audio formats, our API automatically detects the codec. However, when
using PCM formats (`pcm_s16le`, `pcm_l16`, `pcm_raw`), you must explicitly
specify the `input_audio_codec` parameter. PCM files are only supported at
16kHz sample rate.

**WebSocket/Streaming APIs:** The STT and STTT WebSocket streaming APIs only support **WAV** and **raw PCM** formats (`wav`, `pcm_s16le`, `pcm_l16`, `pcm_raw`). Other audio formats are not supported for real-time streaming.

***

## Technical Capabilities

* 22 Indian languages (Saaras v3)
* Automatic language detection
* Code-mixing support
* Multi-speaker handling

- Speaker diarization (Batch API)
- Timestamp generation
- Entity preservation
- Telephony optimization

## Next Steps

Select the appropriate API type based on your use case.

Sign up and get your API key from the
[dashboard](https://dashboard.sarvam.ai).

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on
[discord](https://discord.com/invite/5rAsykttcs) for guidance.