Text-to-Speech Conversion using Bulbul Model

Overview

This guide demonstrates how to convert text into speech using the Sarvam AI Text-to-Speech API. The resulting audio files are saved as .wav files.

🛠 Prerequisites

Before running this, ensure you have:

  • Python 3.7 or higher
  • Python packages: sarvamai

Install the required package using pip:

$pip install sarvamai

📦 Import Required Libraries

1from sarvamai import SarvamAI
2from sarvamai.play import play, save

🔑 Set Up Your API Key

To use the TTS Bulbul API:

  1. Sign up at Sarvam AI Dashboard to get your API key.
  2. Replace the placeholder key in the code.
1SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"
2client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

📄 Example Text Input

1text = """
2Netaji Subhash Marg से Dayanand Road की तरफ, south की तरफ़ जाने से शुरू करें।
3Dayanand Road पर पहुँचने के बाद, बाएँ मुड़ जाएँ। 350 meters तक सीधा चलते रहें।
4आपको बायें तरफ़, United Bank of India ATM दिखेगा।
5Dayanand School के दाएँ तरफ़ से गुजरने के बाद, बाएँ मुड़ें।
6120 meters के बाद, Ghata Masjid Road पर, right turn करें।
7280 meters तक चलते रहें।
8Mahatma Gandhi Marg पे रहें और, 2.9 kilometers तक Old Delhi की तरफ जाएँ।
9फिर, HC Sen Marg पर continue करें, और Paranthe Wali Gali तक drive करें।
10"""

⚙️ API Parameters

ParameterDescription
target_language_codeLanguage of the input text (e.g., hi-IN)
speakerVoice used: Female - Anushka, Manisha, Vidya, Arya; Male - Abhilash, Karun, Hitesh
pitchPitch adjustment: -0.75 to 0.75 (default: 0.0)
paceSpeed control: 0.5 to 2.0 (default: 1.0)
loudnessVolume: 0.3 to 3.0 (default: 1.0)
speech_sample_rateOutput sample rate: 8000, 16000, 22050, or 24000 Hz
enable_preprocessingNormalize English/numeric entities (default: false)

🔁 Convert Text to Speech

1response = client.text_to_speech.convert(
2 text="Your Text",
3 target_language_code="hi-IN",
4 speaker="anushka",
5 enable_preprocessing=True,
6)

▶️ Play or 💾 Save Audio

To play the output:

1play(response)

To save the output:

1save(response, "output.wav")

📤 Output

Running the above code saves a output.wav file containing the speech.

✅ Conclusion

This MDX guide showed how to use Sarvam AI’s TTS API to convert Hindi text into lifelike speech. Customize the text, language, voice, and parameters to suit your application.


📚 Additional Resources

🛡️ Note: Keep your API key safe and avoid committing it in public repositories.

🚀 Keep Building!