Text-to-Speech Rest API

Provides a synchronous REST endpoint where a POST request with text returns base64-encoded audio as response.

Common use cases:

  • Story narration — Generate expressive audio for audiobooks and narratives
  • Podcast generation — Create natural-sounding voiceovers for episodes at scale
  • Content creation — Add voice to blogs, articles, and social media posts
  • E-learning — Build multilingual course material with clear pronunciation

What You Can Do

30+ Voices

Pick from male and female speakers — each with distinct tone and style.
Pass the speaker param to switch instantly.

11 Indian Languages

Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, and English (Indian accent).
Set via target_language_code.

Up to 2500 Characters

Send long-form text in a single request (v3). No need to chunk or paginate your input.

Pace Control

Speed up or slow down speech with the pace parameter — range 0.5 to 2.0 for v3.

Flexible Sample Rates

8kHz to 48kHz output. Higher rates (32k, 44.1k, 48k) available in v3 REST API only. Default: 24kHz.

Multiple Audio Formats

Response is base64-encoded. Supports WAV, MP3, Linear16, Mulaw, Alaw, Opus, FLAC, and AAC.

Model: Bulbul v3

Bulbul v3 is purpose-built for Indian languages and accents. It handles code-mixed text (e.g., Hinglish), number normalization, and natural prosody out of the box — with minimal preprocessing needed.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2from sarvamai.play import save
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5# Convert text to speech
6audio = client.text_to_speech.convert(
7 target_language_code="en-IN",
8 text="Welcome to Sarvam AI!",
9 model="bulbul:v3",
10 speaker="shubh"
11)
12save(audio, "output1.wav")

API Response Format

FieldTypeDescription
request_idstringUnique identifier for the request
audiosarrayBase64-encoded audio files. Each element corresponds to an input text

Supported audio formats: WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC

1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "audios": [
4 "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA..."
5 ]
6}

Python:

1import base64
2
3audio_base64 = response.audios[0]
4audio_bytes = base64.b64decode(audio_base64)
5
6with open("output.wav", "wb") as f:
7 f.write(audio_bytes)

JavaScript:

1import fs from "fs";
2
3const audioBase64 = response.audios[0];
4const audioBuffer = Buffer.from(audioBase64, 'base64');
5fs.writeFileSync('output.wav', audioBuffer);

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1{
2 "error": {
3 "message": "Human-readable error description",
4 "code": "error_code_for_programmatic_handling",
5 "request_id": "unique_request_identifier"
6 }
7}

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck text and target_language_code fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorText too long or invalid speaker/modelKeep text under 1500 chars (v2) or 2500 chars (v3)
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent

Example Error Response

1{
2 "error": {
3 "message": "Text exceeds maximum length of 1500 characters for bulbul:v3",
4 "code": "unprocessable_entity_error",
5 "request_id": "20241115_abc12345"
6 }
7}
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.text_to_speech.convert(
8 text="Welcome to Sarvam AI!",
9 target_language_code="en-IN",
10 speaker="shubh",
11 model="bulbul:v3"
12 )
13 # Process audio...
14except ApiError as e:
15 if e.status_code == 400:
16 print(f"Bad request: {e.body}")
17 elif e.status_code == 403:
18 print("Invalid API key. Check your credentials.")
19 elif e.status_code == 422:
20 print(f"Invalid parameters: {e.body}")
21 elif e.status_code == 429:
22 print("Rate limit exceeded. Wait and retry.")
23 else:
24 print(f"Error {e.status_code}: {e.body}")

Check out our detailed API Reference to explore Text to Speech and all available options.

Need help? Contact us on discord for guidance.