Text-to-Speech Rest API

Provides a synchronous REST endpoint where a POST request with text returns base64-encoded audio as response.

Common use cases:

  • Story narration — Generate expressive audio for audiobooks and narratives
  • Podcast generation — Create natural-sounding voiceovers for episodes at scale
  • Content creation — Add voice to blogs, articles, and social media posts
  • E-learning — Build multilingual course material with clear pronunciation

What You Can Do

30+ Voices

Pick from male and female speakers — each with distinct tone and style.
Pass the speaker param to switch instantly.

11 Languages (10 Indian + English)

Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, and English (Indian accent).
Set via target_language_code.

Up to 2500 Characters

Send long-form text in a single request (v3). No need to chunk or paginate your input.

Pace Control

Speed up or slow down speech with the pace parameter — range 0.5 to 2.0 for v3.

Flexible Sample Rates

8kHz to 48kHz output. Higher rates (32k, 44.1k, 48k) available in v3 REST API only. Default: 24kHz.

Multiple Audio Formats

Response is base64-encoded. Supports WAV, MP3, Linear16, Mulaw, Alaw, Opus, FLAC, and AAC.

Model: Bulbul v3

Bulbul v3 is purpose-built for Indian languages and accents. It handles code-mixed text (e.g., Hinglish), number normalization, and natural prosody out of the box — with minimal preprocessing needed.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2from sarvamai.play import save
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5# Convert text to speech
6audio = client.text_to_speech.convert(
7 target_language_code="en-IN",
8 text="Welcome to Sarvam AI!",
9 model="bulbul:v3",
10 speaker="shubh"
11)
12save(audio, "output1.wav")

API Response Format

FieldTypeDescription
request_idstringUnique identifier for the request
audiosarrayBase64-encoded audio files. Each element corresponds to an input text

Supported audio formats: WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC

1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "audios": [
4 "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA..."
5 ]
6}

Python:

1import base64
2
3audio_base64 = response.audios[0]
4audio_bytes = base64.b64decode(audio_base64)
5
6with open("output.wav", "wb") as f:
7 f.write(audio_bytes)

JavaScript:

1import fs from "fs";
2
3const audioBase64 = response.audios[0];
4const audioBuffer = Buffer.from(audioBase64, 'base64');
5fs.writeFileSync('output.wav', audioBuffer);

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1{
2 "error": {
3 "message": "Human-readable error description",
4 "code": "error_code_for_programmatic_handling",
5 "request_id": "unique_request_identifier"
6 }
7}

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck text and target_language_code fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorText too long or invalid speaker/modelKeep text under 1500 chars (v2) or 2500 chars (v3)
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent

Example Error Response

1{
2 "error": {
3 "message": "Text exceeds maximum length of 1500 characters for bulbul:v3",
4 "code": "unprocessable_entity_error",
5 "request_id": "20241115_abc12345"
6 }
7}
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.text_to_speech.convert(
8 text="Welcome to Sarvam AI!",
9 target_language_code="en-IN",
10 speaker="shubh",
11 model="bulbul:v3"
12 )
13 # Process audio...
14except ApiError as e:
15 if e.status_code == 400:
16 print(f"Bad request: {e.body}")
17 elif e.status_code == 403:
18 print("Invalid API key. Check your credentials.")
19 elif e.status_code == 422:
20 print(f"Invalid parameters: {e.body}")
21 elif e.status_code == 429:
22 print("Rate limit exceeded. Wait and retry.")
23 else:
24 print(f"Error {e.status_code}: {e.body}")

Check out our detailed API Reference to explore Text to Speech and all available options.

Need help? Contact us on discord for guidance.