Bulbul

View as Markdown

Bulbul v3 is our latest text-to-speech model, specifically designed for Indian languages and accents. It features improved audio quality, 30+ speaker voices, and supports up to 2500 characters per request.

At a Glance

Model IDbulbul:v3 (legacy: bulbul:v2)
What it doesText-to-speech with 30+ natural speaker voices and pace control
Languages11 (10 Indian + English) — full list
APIsREST, HTTP streaming, WebSocket
Input limits2,500 characters per REST request; sample rates up to 48 kHz (REST/WebSocket) — all limits
PricingPricing page
Best forVoice agents, IVR prompts, narration and audio content
Known limitationsSee below

Key Features

30+ Speaker Voices

Wide selection of natural-sounding voices including Shubh, Aditya, Ritu, Simran, Anand, Roopa, Priya, and more.

Extended Character Limit

Support for up to 2500 characters per request for longer content generation.

Sample Rate Options

Multiple sample rates: 8kHz, 16kHz, 22.05kHz, 24kHz (default). Higher rates (32kHz, 44.1kHz, 48kHz) available in bulbul:v3 REST API only.

Language Support

Support for 11 languages (10 Indian + English) with BCP-47 codes. The target language code is primarily used by the pre-TTS text normalization model.

Natural Prosody

Human-like speech patterns with natural intonation and emotional expression.

Pace Control

Adjustable speech speed from 0.5x to 2.0x for customized delivery.

Language Support

Bulbul v3 supports the following Indian languages:

Hindi (hi-IN), Bengali (bn-IN), Tamil (ta-IN), Telugu (te-IN), Gujarati (gu-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN)

Available Speakers

Bulbul v3 offers 30+ speaker voices:

Speakers: Shubh (default), Aditya, Ritu, Priya, Neha, Rahul, Pooja, Rohan, Simran, Kavya, Amit, Dev, Ishita, Shreya, Ratan, Varun, Manan, Sumit, Roopa, Kabir, Aayan, Ashutosh, Advait, Anand, Tanya, Tarun, Sunny, Mani, Gokul, Vijay, Shruti, Suhani, Mohit, Kavitha, Rehan, Soham, Rupali

Use the speaker parameter to select specific voices for your use case. Each speaker has unique characteristics suitable for different applications.

Key Capabilities

Convert text to speech with default settings. This is the simplest way to get started with Bulbul v3.

1from sarvamai import SarvamAI
2from sarvamai.play import play, save
3
4client = SarvamAI(
5 api_subscription_key="YOUR_SARVAM_API_KEY"
6)
7
8response = client.text_to_speech.convert(
9 text="Hello, how are you today?",
10 target_language_code="en-IN",
11 model="bulbul:v3"
12)
13
14# Play the audio
15play(response)
16
17# Save the response to a file
18save(response, "output.wav")

Limits

LimitValue
Max characters per request (REST)2,500
pace0.5–2.0 (bulbul:v3) / 0.3–3.0 (bulbul:v2)
pitch-1.0 to 1.0; suitable range -0.75 to 0.75 (bulbul:v2 only)
loudness0.1–3.0 (bulbul:v2 only)
speech_sample_rate8000 / 16000 / 22050 / 24000 Hz; plus 32000 / 44100 / 48000 Hz (REST and WebSocket only). Default: 24000 (v3), 22050 (v2)
Rate limitsSee Rate Limits

Known Limitations

LimitationDetailWorkaround
No SSML supportBulbul does not support SSML tags for fine-grained prosody controlUse pace for coarse control; split text at natural pause points
Romanised Indic input degrades qualityTransliterated input (e.g., "Aapka order confirm ho gaya hai") significantly reduces output qualityAlways use native script for Indic words (e.g., "आपका order confirm हो गया है")
High sample rates not available on HTTP streaming32 kHz, 44.1 kHz, and 48 kHz are available via the REST and WebSocket APIs with bulbul:v3; HTTP streaming is capped at 24 kHzUse ≤ 24 kHz for HTTP streaming

Next Steps