Bulbul

Model Variants

Bulbul-v2

Overview

Bulbul-v2 is our flagship text-to-speech model, specifically designed for Indian languages and accents. It excels in:

  • Natural-sounding speech with human-like prosody
  • Multiple voice personalities
  • Multi-language and code-mixed text support
  • Real-time synthesis capabilities
  • Fine-grained control over pitch, pace, and loudness

Features

Voice Control

Fine-grained control over pitch (-1 to 1), pace (0.3 to 3), and loudness (0.1 to 3)

Sample Rate Options

Multiple sample rates: 8kHz, 16kHz, 22.05kHz, 24kHz

Text Preprocessing

Smart normalization of numbers, dates, and mixed-language text

Language Support

Support for 11 Indian languages with BCP-47 codes

Key Capabilities

Convert text to speech with default settings. This is the simplest way to get started with the Bulbul API.

1from sarvamai import SarvamAI
2from sarvamai.play import play, save
3
4client = SarvamAI(
5 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
6)
7
8response = client.text_to_speech.convert(
9 inputs=["Hello, how are you today?"],
10 target_language_code="en-IN",
11 enable_preprocessing=True
12)
13play(response)
14
15# Save the response to a file
16save(response, "output.wav")