Text-to-Speech Rest API

Synchronous Processing

Convert text to speech with immediate response. Best for quick conversions and testing. Features include:

  • Instant audio generation
  • Multiple voice options
  • Customizable speech parameters
  • Various audio formats

API Features

Key Features
  • Multiple speaker voices
  • Adjustable speech parameters
  • High-quality audio output
  • Natural prosody and intonation
Output Format
  • Multiple audio file formats
  • Base64 encoded string
  • Configurable sample rates
Speech Parameters
  • Pitch control
  • Speech rate adjustment
  • Language selection

Model Information

Bulbul v2

Our flagship text-to-speech model designed for Indian languages and accents.

Key Features:

  • Natural-sounding speech with human-like prosody
  • Multiple voice personalities
  • Multi-language support
  • Real-time synthesis capabilities
  • Fine-grained control over pitch, pace, and loudness
Language Support

Supports 11 Indian languages with BCP-47 codes:

Supported Languages:

  • English (en-IN)
  • Hindi (hi-IN)
  • Bengali (bn-IN)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Kannada (kn-IN)
  • Malayalam (ml-IN)
  • Marathi (mr-IN)
  • Gujarati (gu-IN)
  • Punjabi (pa-IN)
  • Odia (od-IN)

Bulbul: Our Text to Speech Model

Bulbul is our state-of-the-art text-to-speech model that excels in generating natural-sounding speech with support for multiple Indian languages and various voice options.

Text to Speech Features

Basic Text to Speech Synthesis

Convert text to natural-sounding speech with high quality. Features include:

  • Multiple voice options
  • Support for Indian languages
  • Natural prosody and intonation
  • High-quality audio output
1from sarvamai import SarvamAI
2from sarvamai.play import save
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5# Convert text to speech
6audio = client.text_to_speech.convert(
7 target_language_code="en-IN",
8 text="Welcome to Sarvam AI!",
9 model="bulbul:v2",
10 speaker="anushka"
11)
12save(audio, "output1.wav")
Key Considerations
  • For numbers > 4 digits, use commas (e.g., ‘10,000’)
  • Enable preprocessing for better numbers, dates handling

API Response Format

FieldTypeDescription
request_idstringUnique identifier for the request
audiosarrayBase64-encoded audio files. Each element corresponds to an input text

Supported audio formats: WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC

1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "audios": [
4 "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAAB9AAACABAAZGF0YQAAAAA..."
5 ]
6}

Python:

1import base64
2
3audio_base64 = response.audios[0]
4audio_bytes = base64.b64decode(audio_base64)
5
6with open("output.wav", "wb") as f:
7 f.write(audio_bytes)

JavaScript:

1import fs from "fs";
2
3const audioBase64 = response.audios[0];
4const audioBuffer = Buffer.from(audioBase64, 'base64');
5fs.writeFileSync('output.wav', audioBuffer);

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1{
2 "error": {
3 "message": "Human-readable error description",
4 "code": "error_code_for_programmatic_handling",
5 "request_id": "unique_request_identifier"
6 }
7}

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck text and target_language_code fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorText too long or invalid speaker/modelKeep text under 1500 chars (v2) or 2500 chars (v3)
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent

Example Error Response

1{
2 "error": {
3 "message": "Text exceeds maximum length of 1500 characters for bulbul:v2",
4 "code": "unprocessable_entity_error",
5 "request_id": "20241115_abc12345"
6 }
7}
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.text_to_speech.convert(
8 text="Welcome to Sarvam AI!",
9 target_language_code="en-IN",
10 speaker="anushka",
11 model="bulbul:v2"
12 )
13 # Process audio...
14except ApiError as e:
15 if e.status_code == 400:
16 print(f"Bad request: {e.body}")
17 elif e.status_code == 403:
18 print("Invalid API key. Check your credentials.")
19 elif e.status_code == 422:
20 print(f"Invalid parameters: {e.body}")
21 elif e.status_code == 429:
22 print("Rate limit exceeded. Wait and retry.")
23 else:
24 print(f"Error {e.status_code}: {e.body}")

Check out our detailed API Reference to explore Text to Speech and all available options.

Need help? Contact us on discord for guidance.