Text-to-Speech Overview

View as Markdown

Sarvam AI offers a powerful text-to-speech model: Bulbul V3 — advanced TTS with 30+ voices and high-quality natural speech synthesis for Indian languages.

API Types

Available API types: REST API for quick conversions up to 2500 characters, and Streaming API for real-time audio via HTTP stream or WebSocket.

Not sure which one fits your latency and interactivity needs? See Which Text-to-Speech API to Use for a side-by-side comparison of REST, HTTP streaming, and WebSocket.

Supported Audio Formats & MIME Types

The TTS API supports over 8 major audio formats and MIME type variants. Supported formats and MIME types are listed below:

Format GroupSupported MIME Types
MP3 Variantsmp3
WAV Variantswav
AAC Variantsaac
OPUS Formatopus
FLAC Variants (Lossless)flac
PCM LINEAR16pcm
MULAW (μ-law)mulaw
ALAW (A-law)alaw

Experience the voices: Head to dashboard.sarvam.ai to explore 30+ speaker voices, test different languages, and generate audio samples with custom input.

Limits

LimitValue
REST API: max characters per request2,500
HTTP streaming: max characters per request3,500
WebSocket: max characters per message2,500 (recommended under 500 for lowest latency)
Sample rates8000 / 16000 / 22050 / 24000 Hz on all surfaces; 32000 / 44100 / 48000 Hz on REST and WebSocket only
Rate limitsPer plan — see Rate Limits

Next Steps

1

Choose Your API

Select the appropriate API type based on your use case.

2

Get API Key

Sign up and get your API key from the dashboard.

3

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.