Text-to-Speech Quickstart

Sarvam AI offers a powerful text-to-speech model:

View our pricing page for detailed information about model-specific pricing and usage tiers.

Bulbul: Our Text to Speech Model

Bulbul is our state-of-the-art text-to-speech model that excels in generating natural-sounding speech with support for multiple Indian languages, code-mixing, and various voice options.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2from sarvamai.play import save
3
4client = SarvamAI(api_subscription_key="YOUR_API_SUBSCRIPTION_KEY")
5# Convert text to speech
6audio = client.text_to_speech.convert(
7 target_language_code="en-IN",
8 text="Welcome to Sarvam AI!",
9 model="bulbul:v2",
10 speaker="anushka"
11)
12save(audio, "output1.wav")
Key Considerations
  • Text length limit: 500 characters per input - Maximum 3 texts per API call - For numbers > 4 digits, use commas (e.g., ‘10,000’) - Enable preprocessing for better mixed-language handling