Speech-to-Text APIs
Sarvam AI offers two powerful speech models:
View our pricing page for detailed information about model-specific pricing and usage tiers.
API Types
Process short audio files synchronously with immediate response. Best for files under 1 minute.
Handle large audio files asynchronously. Ideal for long recordings.
Real-time audio streaming with instant results. Perfect for live transcription.
Supported Audio Formats & MIME Types
The STT and STTT APIs support over 10 major audio formats and MIME type variants. Supported formats and MIME types are listed below:
For most audio formats, our API automatically detects the codec. However, when
using PCM formats (pcm_s16le
, pcm_l16
, pcm_raw
), you must explicitly
specify the input_audio_codec
parameter. PCM files are only supported at
16kHz sample rate.
API Features
- Multiple Indian languages and English support
- Automatic language detection
- High accuracy transcription
- Real-Time API (under 30 seconds) - Batch API for longer files
- Speaker diarization (Batch API only)
- Separate pricing for diarization
- Real-time transcription
Next Steps
Need help choosing the right API? Contact us on discord for guidance.