Text-to-Speech Overview
Sarvam AI offers a powerful text-to-speech model:
API Types
REST API
Generate speech for short text with immediate response. Best for quick conversions up to 2500 characters.
Streaming API
Stream audio in real time — via a single HTTP POST for simple pipelines, or a persistent WebSocket connection for interactive voice agents.
Supported Audio Formats & MIME Types
The TTS API supports over 8 major audio formats and MIME type variants. Supported formats and MIME types are listed below:
Experience the voices: Head to dashboard.sarvam.ai to explore 30+ speaker voices, test different languages, and generate audio samples with custom input.
Next Steps
Need help choosing the right API? Contact us on discord for guidance.