> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Text-to-Speech Rest API

> Real-time conversion of text into speech using customizable voices. Instant audio generation with multiple voice options and various audio formats for Indian languages.

Provides a synchronous REST endpoint where a POST request with text returns base64-encoded audio as response.

The JSON response contains an `audios` array of **base64-encoded WAV strings**, not raw binary. Decode before saving or playing:

```python
import base64

combined = "".join(response.audios)
wav_bytes = base64.b64decode(combined)
with open("output.wav", "wb") as f:
    f.write(wav_bytes)
```

See [TTS best practices](/api-reference-docs/api-guides-tutorials/text-to-speech/best-practices) for JavaScript and streaming examples.

**Common use cases:**

* **Story narration** — Generate expressive audio for audiobooks and narratives
* **Podcast generation** — Create natural-sounding voiceovers for episodes at scale
* **Content creation** — Add voice to blogs, articles, and social media posts
* **E-learning** — Build multilingual course material with clear pronunciation

## What You Can Do

Pick from male and female speakers — each with distinct tone and style. <br />
Pass the `speaker` param to switch instantly.

Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, and English (Indian accent).
<br />Set via `target_language_code`.

Send long-form text in a single request (v3). No need to chunk or paginate your input.

Speed up or slow down speech with the `pace` parameter — range `0.5` to `2.0` for v3.

8kHz to 48kHz output. Higher rates (32k, 44.1k, 48k) available in **v3 REST API only**. Default: 24kHz.

Response is base64-encoded. Supports WAV, MP3, Linear16, Mulaw, Alaw, Opus, FLAC, and AAC.

## Model: Bulbul v3

Bulbul v3 is purpose-built for Indian languages and accents. It handles code-mixed text (e.g., Hinglish), number normalization, and natural prosody out of the box — with minimal preprocessing needed.

## Text to Speech Features

<h3>
  Basic Text to Speech Synthesis
</h3>

<p>
  Convert text to natural-sounding speech with high quality. Features include:
</p>

<ul>
  <li>
    Multiple voice options
  </li>

  <li>
    Support for Indian languages
  </li>

  <li>
    Natural prosody and intonation
  </li>

  <li>
    High-quality audio output
  </li>
</ul>

```python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
# Convert text to speech
audio = client.text_to_speech.convert(
    target_language_code="en-IN",
    text="Welcome to Sarvam AI!",
    model="bulbul:v3",
    speaker="shubh"
)
save(audio, "output1.wav")
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convert({
  text: "Welcome to Sarvam AI!",
  model: "bulbul:v3",
  speaker: "shubh",
  target_language_code: "en-IN"
});

const audio = Buffer.from(response.audios.join(""), "base64");
fs.writeFileSync("output.wav", audio);
```

```bash
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Sarvam AI!",
    "target_language_code": "en-IN",
    "speaker": "shubh",
    "model": "bulbul:v3"
  }'
```

<h3>
  Available Voices
</h3>

<p>
  Choose from 30+ natural-sounding voices for different use cases and languages.
</p>

**Male:** Shubh (default), Aditya, Rahul, Rohan, Amit, Dev, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham

**Female:** Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali

**Female:** Anushka (default), Manisha, Vidya, Arya

**Male:** Abhilash, Karun, Hitesh

```python
import base64
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

response = client.text_to_speech.convert(
    text="Welcome to Sarvam AI!",
    model="bulbul:v3",
    target_language_code="en-IN",
    speaker="shubh"
)

with open("output.wav", "wb") as f:
    f.write(base64.b64decode("".join(response.audios)))
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convert({
  text: "Hello, how are you?",
  target_language_code: "hi-IN",
  model: "bulbul:v3",
  speaker: "shubh"
});

const audio = Buffer.from(response.audios.join(""), "base64");
fs.writeFileSync("output.wav", audio);
```

```bash
# Generate speech with Shubh's voice
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Sarvam AI!",
    "target_language_code": "en-IN",
    "speaker": "shubh",
    "model": "bulbul:v3"
  }'

# Generate speech with Priya's voice
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Sarvam AI!",
    "target_language_code": "en-IN",
    "speaker": "priya",
    "model": "bulbul:v3"
  }'

# Generate speech with Roopa's voice
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Sarvam AI!",
    "target_language_code": "en-IN",
    "speaker": "roopa",
    "model": "bulbul:v3"
}'
```

<h3>
  Speech Customization
</h3>

<p>
  Fine-tune the speech output with various parameters:
</p>

<ul>
  <li>
    Adjust speech rate (pace)
  </li>

  <li>
    Configure audio quality (sample rate)
  </li>
</ul>

```python
import base64
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY"
)

audio = client.text_to_speech.convert(
    text="Welcome to Sarvam AI!",
    model="bulbul:v3",
    target_language_code="en-IN",
    speaker="shubh",
    pace=1.2,
    speech_sample_rate=24000
)

combined_audio = "".join(audio.audios)
b64_file = base64.b64decode(combined_audio)

with open("output1.wav", "wb") as f:
    f.write(b64_file)
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convert({
  text: "Welcome to Sarvam AI!",
  model: "bulbul:v3",
  target_language_code: "en-IN",
  speaker: "shubh",
  pace: 1.2,
  speech_sample_rate: 24000
});

const audio = Buffer.from(response.audios.join(""), "base64");
fs.writeFileSync("output.wav", audio);
```

```bash
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: <YOUR_SARVAM_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Sarvam AI!",
    "model": "bulbul:v3",
    "speaker": "shubh",
    "pace": 1.2,
    "target_language_code": "en-IN",
    "speech_sample_rate": 24000
  }'
```

## API Response Format

| Field        | Type   | Description                                                           |
| ------------ | ------ | --------------------------------------------------------------------- |
| `request_id` | string | Unique identifier for the request                                     |
| `audios`     | array  | Base64-encoded audio files. Each element corresponds to an input text |

**Supported audio formats:** WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC

```json
{
  "request_id": "20241115_12345678-1234-5678-1234-567812345678",
  "audios": [
    "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA..."
  ]
}
```

**Python:**

```python
import base64

audio_base64 = response.audios[0]
audio_bytes = base64.b64decode(audio_base64)

with open("output.wav", "wb") as f:
    f.write(audio_bytes)
```

**JavaScript:**

```javascript
import fs from "fs";

const audioBase64 = response.audios[0];
const audioBuffer = Buffer.from(audioBase64, 'base64');
fs.writeFileSync('output.wav', audioBuffer);
```

## Error Responses

All errors return a JSON object with an `error` field containing details about what went wrong.

### Error Response Structure

```json
{
  "error": {
    "message": "Human-readable error description",
    "code": "error_code_for_programmatic_handling",
    "request_id": "unique_request_identifier"
  }
}
```

### Error Codes Reference

| HTTP Status | Error Code                   | When This Happens                                | What To Do                                         |
| ----------- | ---------------------------- | ------------------------------------------------ | -------------------------------------------------- |
| `400`       | `invalid_request_error`      | Missing required parameters or malformed request | Check `text` and `target_language_code` fields     |
| `403`       | `invalid_api_key_error`      | API key is invalid, missing, or expired          | Verify your API key in the dashboard               |
| `422`       | `unprocessable_entity_error` | Text too long or invalid speaker/model           | Keep text under 1500 chars (v2) or 2500 chars (v3) |
| `429`       | `insufficient_quota_error`   | API quota or rate limit exceeded                 | Wait for reset or upgrade your plan                |
| `500`       | `internal_server_error`      | Unexpected server error                          | Retry the request; contact support if persistent   |

### Example Error Response

```json
{
  "error": {
    "message": "Text exceeds maximum length of 1500 characters for bulbul:v3",
    "code": "unprocessable_entity_error",
    "request_id": "20241115_abc12345"
  }
}
```

```python
from sarvamai import SarvamAI
from sarvamai.core.api_error import ApiError

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

try:
    response = client.text_to_speech.convert(
        text="Welcome to Sarvam AI!",
        target_language_code="en-IN",
        speaker="shubh",
        model="bulbul:v3"
    )
    # Process audio...
except ApiError as e:
    if e.status_code == 400:
        print(f"Bad request: {e.body}")
    elif e.status_code == 403:
        print("Invalid API key. Check your credentials.")
    elif e.status_code == 422:
        print(f"Invalid parameters: {e.body}")
    elif e.status_code == 429:
        print("Rate limit exceeded. Wait and retry.")
    else:
        print(f"Error {e.status_code}: {e.body}")
```

Check out our detailed [API Reference](/api-reference-docs/text-to-speech/convert)
to explore Text to Speech and all available options.

Need help? Contact us on [discord](https://discord.com/invite/5rAsykttcs) for
guidance.