Text-to-Speech Overview | Sarvam API Docs

Sarvam AI offers a powerful text-to-speech model:

Advanced text-to-speech model with multiple voices, code-mixing support, and high-quality natural speech synthesis for Indian languages.

View our pricing page for detailed information about model-specific pricing and usage tiers.

API Types

Real Time API

Generate speech for short text with immediate response. Best for quick conversions up to 1000 characters.

Streaming API

Stream long or live text into speech with low latency. Ideal for real-time playback, WebSocket-based async use, and efficient resource handling.

Supported Audio Formats & MIME Types

The TTS API support over 8 major audio formats and MIME type variants.Supported formats and MIME types are listed below:

Format Group	Supported MIME Types
MP3 Variants	`mp3`
WAV Variants	`wav`
AAC Variants	`aac`
OPUS Format	`opus`
FLAC Variants (Lossless)	`flac`
PCM LINEAR16	`pcm`
MULAW (μ-law)	`mulaw`
ALAW (A-law)	`alaw`

Voice Sample

Female Speakers

Anushka

Vidya

Manisha

Arya

Anushka – Clear and Professional

Audio Text: सरवम एआई की टेक्स्ट-टू-स्पीच सेवा 11 भारतीय भाषाओं में प्राकृतिक और पेशेवर आवाज़ें प्रदान करती है, जो विविध उपयोग मामलों के लिए उपयुक्त हैं।

Best Used For: Audiobooks, Professional Narration, Corporate Training

Male Speakers

Abhilash

Karun

Hitesh

Abhilash – Deep and Authoritative

Audio Text: Warning. Unusual activity detected in Zone 7. Immediate verification is required to maintain system integrity. Proceed with caution.

Best Used For: Security Systems, Announcements, Documentaries

Try it yourself: You can explore different speakers, languages, and styles directly at Sarvam Dashboard. Generate your own audio samples and experiment with custom input!

Next Steps

Choose Your API

Select the appropriate API type based on your use case.

Get API Key

Go Live

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.