Text-to-Speech Quickstart

Sarvam AI offers a powerful text-to-speech model:

View our pricing page for detailed information about model-specific pricing and usage tiers.

Bulbul: Our Text to Speech Model

Bulbul is our state-of-the-art text-to-speech model that excels in generating natural-sounding speech with support for multiple Indian languages, code-mixing, and various voice options.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
5)
6
7response = client.text_to_speech.convert(
8 inputs=["Welcome to Sarvam AI!"],
9 model="bulbul:v2",
10 speaker="anushka"
11)
Key Considerations
  • Text length limit: 500 characters per input - Maximum 3 texts per API call - For numbers > 4 digits, use commas (e.g., ‘10,000’) - Enable preprocessing for better mixed-language handling