Speech-to-Text APIs
Sarvam AI offers two powerful speech models:
View our pricing page for detailed information about model-specific pricing and usage tiers.
API Types
Real-time API
Process short audio files synchronously with immediate response. Best for files under 1 minute.
Batch API
Handle large audio files asynchronously. Ideal for long recordings.
Streaming API
Real-time audio streaming with instant results. Perfect for live transcription.
Supported Audio Formats & MIME Types
The STT and STTT APIs support over 10 major audio formats and MIME type variants. Supported formats and MIME types are listed below:
API Features
Language Support
- Multiple Indian languages and English support
- Automatic language detection
- High accuracy transcription
API Types
- Real-Time API (under 30 seconds) - Batch API for longer files
Advanced Features
- Speaker diarization (Batch API only)
- Separate pricing for diarization
- Real-time transcription
Next Steps
Need help choosing the right API? Contact us on discord for guidance.