Text-to-Speech Overview | Sarvam API Docs

Sarvam AI offers a powerful text-to-speech model:

Advanced text-to-speech model with multiple voices, code-mixing support, and high-quality natural speech synthesis for Indian languages.

View our pricing page for detailed information about model-specific pricing and usage tiers.

API Types

Generate speech for short text with immediate response. Best for quick conversions up to 1000 characters.

Stream long or live text into speech with low latency. Ideal for real-time playback, WebSocket-based async use, and efficient resource handling.

The TTS API support over 8 major audio formats and MIME type variants.Supported formats and MIME types are listed below:

Check out our detailed API Reference to explore Text To Speech Generation and all available options.

Select the appropriate API type based on your use case.

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.