Text-to-Speech Overview
Text-to-Speech Overview
Sarvam AI offers a powerful text-to-speech model: Bulbul V3 — advanced TTS with 30+ voices and high-quality natural speech synthesis for Indian languages.
API Types
Available API types: REST API for quick conversions up to 2500 characters, and Streaming API for real-time audio via HTTP stream or WebSocket.
Generate speech for short text with immediate response. Best for quick conversions up to 2500 characters.
Stream audio in real time — via a single HTTP POST for simple pipelines, or a persistent WebSocket connection for interactive voice agents.
Supported Audio Formats & MIME Types
The TTS API supports over 8 major audio formats and MIME type variants. Supported formats and MIME types are listed below:
Experience the voices: Head to dashboard.sarvam.ai to explore 30+ speaker voices, test different languages, and generate audio samples with custom input.
Next Steps
Need help choosing the right API? Contact us on discord for guidance.