Text-to-Speech Conversion using Sarvam AI API

This notebook demonstrates how to convert text into speech using the Sarvam AI Text-to-Speech API.The resulting audio files are saved as .wav files.

Prerequisites

Before running this notebook, ensure you have the following installed:

  • Python 3.7 or higher
  • Required Python packages: requests, base64, wave

You can install the required packages using pip:

1!pip install requests

Import Required Libraries

First, let’s import all the necessary libraries.

1import requests
2import base64
3import wave

2. Set Up the API Endpoint and Payload

To use the Saaras API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
1SARVAM_AI_API="d75d7bf3-b053-4084-ac80-c37561a35bfc"

Setting Up the API Endpoint and Payload

This section defines the API endpoint and the payload for the translation request. Replace the placeholder values with your actual API key and desired parameters.

1# API endpoint and headers
2url = "https://api.sarvam.ai/text-to-speech"
3headers = {
4 "Content-Type": "application/json",
5 "api-subscription-key": SARVAM_AI_API # Replace with your valid API key
6}

Text to be converted into speech

1text = """
2Netaji Subhash Marg से Dayanand Road की तरफ, south की तरफ़ जाने से शुरू करें। Dayanand Road पर पहुँचने के बाद, बाएँ मुड़ जाएँ। 350 meters तक सीधा चलते रहें।आपको बायें तरफ़, United Bank of India ATM दिखेगा। Dayanand School के दाएँ तरफ़ से गुजरने के बाद, बाएँ मुड़ें।
3120 meters के बाद, Ghata Masjid Road पर, right turn करें।
4280 meters तक चलते रहें।
5Mahatma Gandhi Marg पे रहें और, 2.9 kilometers तक Old Delhi की तरफ जाएँ।
6फिर, HC Sen Marg पर continue करें, और Paranthe Wali Gali तक drive करें।
7"""

Split Text into Chunks

The Sarvam AI API may have a limit on the number of characters per request. To handle this, we split the text into chunks of 500 characters or less.

1# Split the text into chunks of 500 characters or less
2chunk_size = 500
3chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
4
5# Print the number of chunks
6print(f"Total chunks: {len(chunks)}")

Process Each Chunk

Iterate over each chunk, send it to the Sarvam AI API, and save the resulting audio as a .wav file.

1# Iterate over each chunk and make the API call
2for i, chunk in enumerate(chunks):
3 # Prepare the payload for the API request
4 payload = {
5 "inputs": [chunk],
6 "target_language_code": "kn-IN", # Target language code (Kannada in this case)
7 "speaker": "neel", # Speaker voice
8 "model": "bulbul:v1", # Model to use
9 "pitch": 0, # Pitch adjustment
10 "pace": 1.0, # Speed of speech
11 "loudness": 1.0, # Volume adjustment
12 "enable_preprocessing": True, # Enable text preprocessing
13 }
14
15 # Make the API request
16 response = requests.post(url, json=payload, headers=headers)
17 print(response.json())
18 # Check if the request was successful
19 if response.status_code == 200:
20 # Decode the base64-encoded audio data
21 audio = response.json()["audios"][0]
22
23 audio = base64.b64decode(audio)
24
25 # Save the audio as a .wav file
26 with wave.open(f"output{i}.wav", "wb") as wav_file:
27 # Set the parameters for the .wav file
28 wav_file.setnchannels(1) # Mono audio
29 wav_file.setsampwidth(2) # 2 bytes per sample
30 wav_file.setframerate(22050) # Sample rate of 22050 Hz
31
32 # Write the audio data to the file
33 wav_file.writeframes(audio)
34
35 print(f"Audio file {i} saved successfully as 'output{i}.wav'!")
36 else:
37 # Handle errors
38 print(f"Error for chunk {i}: {response.status_code}")
39 print(response.json())

Output

After running the notebook, you will have multiple .wav files (e.g., output1.wav, output2.wav, etc.) containing the speech for each chunk of text.

Conclusion

This notebook provides a step-by-step guide to converting text into speech using the Sarvam AI API. You can modify the text, language, and other parameters to suit your specific needs.

Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:


9. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.

Keep Building! 🚀