Text-to-Speech Rest API

Synchronous Processing

Convert text to speech with immediate response. Best for quick conversions and testing. Features include:

  • Instant audio generation
  • Multiple voice options
  • Support for SSML
  • Various audio formats

API Features

Key Features
  • Support for code-mixed text
  • Multiple speaker voices
  • Adjustable speech parameters
  • High-quality audio output
Output Format
  • Wave file format - Base64 encoded string - Configurable sample rates - Multiple quality options
Speech Parameters
  • Pitch control - Speech rate adjustment - Volume control - Language selection
Integration
  • Simple REST API
  • Multiple language SDKs
  • Comprehensive documentation
  • Easy-to-follow examples

Model Information

Bulbul v2

Our flagship text-to-speech model designed for Indian languages and accents.

Key Features:

  • Natural-sounding speech with human-like prosody
  • Multiple voice personalities
  • Multi-language and code-mixed text support
  • Real-time synthesis capabilities
  • Fine-grained control over pitch, pace, and loudness
Language Support

Supports 11 Indian languages with BCP-47 codes:

Supported Languages:

  • English (en-IN)
  • Hindi (hi-IN)
  • Bengali (bn-IN)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Kannada (kn-IN)
  • Malayalam (ml-IN)
  • Marathi (mr-IN)
  • Gujarati (gu-IN)
  • Punjabi (pa-IN)
  • Odia (or-IN)

Bulbul: Our Text to Speech Model

Bulbul is our state-of-the-art text-to-speech model that excels in generating natural-sounding speech with support for multiple Indian languages, code-mixing, and various voice options.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2from sarvamai.play import save
3
4client = SarvamAI(api_subscription_key="YOUR_API_SUBSCRIPTION_KEY")
5# Convert text to speech
6audio = client.text_to_speech.convert(
7 target_language_code="en-IN",
8 text="Welcome to Sarvam AI!",
9 model="bulbul:v2",
10 speaker="anushka"
11)
12save(audio, "output1.wav")
Key Considerations
  • Text length limit: 500 characters per input - Maximum 3 texts per API call - For numbers > 4 digits, use commas (e.g., ‘10,000’) - Enable preprocessing for better mixed-language handling