Text-to-Speech Rest API | Sarvam API Docs

Provides a synchronous REST endpoint where a POST request with text returns base64-encoded audio as response.

Common use cases:

Story narration — Generate expressive audio for audiobooks and narratives
Podcast generation — Create natural-sounding voiceovers for episodes at scale
Content creation — Add voice to blogs, articles, and social media posts
E-learning — Build multilingual course material with clear pronunciation

What You Can Do

30+ Voices

Pick from male and female speakers — each with distinct tone and style.
Pass the speaker param to switch instantly.

11 Languages (10 Indian + English)

Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, and English (Indian accent).
Set via target_language_code.

Up to 2500 Characters

Send long-form text in a single request (v3). No need to chunk or paginate your input.

Pace Control

Speed up or slow down speech with the pace parameter — range 0.5 to 2.0 for v3.

Flexible Sample Rates

8kHz to 48kHz output. Higher rates (32k, 44.1k, 48k) available in v3 REST API only. Default: 24kHz.

Multiple Audio Formats

Response is base64-encoded. Supports WAV, MP3, Linear16, Mulaw, Alaw, Opus, FLAC, and AAC.

Model: Bulbul v3

Bulbul v3 is purpose-built for Indian languages and accents. It handles code-mixed text (e.g., Hinglish), number normalization, and natural prosody out of the box — with minimal preprocessing needed.

Text to Speech Features

Basic Synthesis

Voice Selection

Advanced Options

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

Multiple voice options
Support for Indian languages
Natural prosody and intonation
High-quality audio output

1 from sarvamai import SarvamAI
2 from sarvamai.play import save
3 
4 client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5 # Convert text to speech
6 audio = client.text_to_speech.convert(
7     target_language_code="en-IN",
8     text="Welcome to Sarvam AI!",
9     model="bulbul:v3",
10     speaker="shubh"
11 )
12 save(audio, "output1.wav")

API Response Format

Field	Type	Description
`request_id`	string	Unique identifier for the request
`audios`	array	Base64-encoded audio files. Each element corresponds to an input text

Supported audio formats: WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC

1 {
2   "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3   "audios": [
4     "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA..."
5   ]
6 }

Decoding Audio Examples

Python:

1 import base64
2 
3 audio_base64 = response.audios[0]
4 audio_bytes = base64.b64decode(audio_base64)
5 
6 with open("output.wav", "wb") as f:
7     f.write(audio_bytes)

JavaScript:

1 import fs from "fs";
2 
3 const audioBase64 = response.audios[0];
4 const audioBuffer = Buffer.from(audioBase64, 'base64');
5 fs.writeFileSync('output.wav', audioBuffer);

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1 {
2   "error": {
3     "message": "Human-readable error description",
4     "code": "error_code_for_programmatic_handling",
5     "request_id": "unique_request_identifier"
6   }
7 }

Error Codes Reference

HTTP Status	Error Code	When This Happens	What To Do
`400`	`invalid_request_error`	Missing required parameters or malformed request	Check `text` and `target_language_code` fields
`403`	`invalid_api_key_error`	API key is invalid, missing, or expired	Verify your API key in the dashboard
`422`	`unprocessable_entity_error`	Text too long or invalid speaker/model	Keep text under 1500 chars (v2) or 2500 chars (v3)
`429`	`insufficient_quota_error`	API quota or rate limit exceeded	Wait for reset or upgrade your plan
`500`	`internal_server_error`	Unexpected server error	Retry the request; contact support if persistent

Example Error Response

1 {
2   "error": {
3     "message": "Text exceeds maximum length of 1500 characters for bulbul:v3",
4     "code": "unprocessable_entity_error",
5     "request_id": "20241115_abc12345"
6   }
7 }

Error Handling Code Example

1 from sarvamai import SarvamAI
2 from sarvamai.core.api_error import ApiError
3 
4 client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5 
6 try:
7     response = client.text_to_speech.convert(
8         text="Welcome to Sarvam AI!",
9         target_language_code="en-IN",
10         speaker="shubh",
11         model="bulbul:v3"
12     )
13     # Process audio...
14 except ApiError as e:
15     if e.status_code == 400:
16         print(f"Bad request: {e.body}")
17     elif e.status_code == 403:
18         print("Invalid API key. Check your credentials.")
19     elif e.status_code == 422:
20         print(f"Invalid parameters: {e.body}")
21     elif e.status_code == 429:
22         print("Rate limit exceeded. Wait and retry.")
23     else:
24         print(f"Error {e.status_code}: {e.body}")

Check out our detailed API Reference to explore Text to Speech and all available options.

Need help? Contact us on discord for guidance.