Speech-to-Text APIs

Sarvam AI offers two powerful speech models:

View our pricing page for detailed information about model-specific pricing and usage tiers.

API Types

Supported Audio Formats & MIME Types

The STT and STTT APIs support over 10 major audio formats and MIME type variants. Supported formats and MIME types are listed below:

Format GroupSupported MIME Types
MP3 Variantsmpeg, mp3, mpeg3, x-mpeg-3, x-mp3
WAV Variantswav, x-wav, wave
AAC Variantsaac, x-aac
AIFF Variantsaiff, x-aiff
OGG / Opus Formatsogg, opus
FLAC Variants (Lossless)flac, x-flac
MP4 / M4A Audiomp4, x-m4a
AMR (Narrowband)amr
WMA (Windows Media Audio)x-ms-wma
WEBM (Audio & Video)webm, webm

API Features

Language Support
  • Multiple Indian languages and English support
  • Automatic language detection
  • High accuracy transcription
API Types
  • Real-Time API (under 30 seconds) - Batch API for longer files
Advanced Features
  • Speaker diarization (Batch API only)
  • Separate pricing for diarization
  • Real-time transcription

Next Steps

1

Choose Your API

Select the appropriate API type based on your use case.

3

Get API Key

Sign up and get your API key from the dashboard.

4

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.