Text-to-Speech Overview
Text-to-Speech Overview
Text-to-Speech Overview
Sarvam AI offers a powerful text-to-speech model: Bulbul V3 — advanced TTS with 30+ voices and high-quality natural speech synthesis for Indian languages.
Available API types: REST API for quick conversions up to 2500 characters, and Streaming API for real-time audio via HTTP stream or WebSocket.
Generate speech for short text with immediate response. Best for quick conversions up to 2500 characters.
Stream audio in real time — via a single HTTP POST for simple pipelines, or a persistent WebSocket connection for interactive voice agents.
Not sure which one fits your latency and interactivity needs? See Which Text-to-Speech API to Use for a side-by-side comparison of REST, HTTP streaming, and WebSocket.
The TTS API supports over 8 major audio formats and MIME type variants. Supported formats and MIME types are listed below:
Experience the voices: Head to dashboard.sarvam.ai to explore 30+ speaker voices, test different languages, and generate audio samples with custom input.
Need help choosing the right API? Contact us on discord for guidance.