Bulbul

Bulbul-v2 is our flagship text-to-speech model, specifically designed for Indian languages and accents. It excels in natural-sounding speech synthesis with human-like prosody, multiple voice personalities, and comprehensive support for multi-language and code-mixed text.

Key Features

Voice Control

Fine-grained control over pitch (-0.75 to 0.75), pace (0.3 to 3), and loudness (0.1 to 3) for customized voice output.

Sample Rate Options

Multiple sample rates: 8kHz, 16kHz, 22.05kHz, 24kHz for different quality requirements.

Text Preprocessing

Smart normalization of numbers, dates, and mixed-language text for improved pronunciation.

Language Support

Support for 11 Indian languages with BCP-47 codes and natural accent handling.

Natural Prosody

Human-like speech patterns with natural intonation and emotional expression.

Real-time Synthesis

Fast, real-time text-to-speech conversion suitable for live applications.

Language Support

Bulbul supports the following Indian languages:

Languages (Code):

Hindi (hi-IN), Bengali (bn-IN), Tamil (ta-IN), Telugu (te-IN), Gujarati (gu-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN)

Offers multiple speaker voices for all supported languages.

Each language supports multiple speaker voices with different characteristics. Use the speaker parameter to select specific voices for your use case.

Key Capabilities

Convert text to speech with default settings. This is the simplest way to get started with the Bulbul API.

1from sarvamai import SarvamAI
2from sarvamai.play import play, save
3
4client = SarvamAI(
5 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
6)
7
8response = client.text_to_speech.convert(
9 text="Hello, how are you today?",
10 target_language_code="en-IN",
11 enable_preprocessing=True
12)
13
14# Play the audio
15play(response)
16
17# Save the response to a file
18save(response, "output.wav")

Next Steps