Text-to-Speech Rest API
Text-to-Speech Rest API
Text-to-Speech Rest API
Provides a synchronous REST endpoint where a POST request with text returns base64-encoded audio as response.
The JSON response contains an audios array of base64-encoded WAV strings, not raw binary. Decode before saving or playing:
See TTS best practices for JavaScript and streaming examples.
Common use cases:
Pick from male and female speakers — each with distinct tone and style.
Pass the speaker param to switch instantly.
Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Odia, and English (Indian accent).
Set via target_language_code.
Send long-form text in a single request (v3). No need to chunk or paginate your input.
Speed up or slow down speech with the pace parameter — range 0.5 to 2.0 for v3.
8kHz to 48kHz output. Higher rates (32k, 44.1k, 48k) available in v3 REST API only. Default: 24kHz.
Response is base64-encoded. Supports WAV, MP3, Linear16, Mulaw, Alaw, Opus, FLAC, and AAC.
Bulbul v3 is purpose-built for Indian languages and accents. It handles code-mixed text (e.g., Hinglish), number normalization, and natural prosody out of the box — with minimal preprocessing needed.
Convert text to natural-sounding speech with high quality. Features include:
Supported audio formats: WAV (default), MP3, Linear16, Mulaw, Alaw, Opus, FLAC, AAC
Python:
JavaScript:
All errors return a JSON object with an error field containing details about what went wrong.
Check out our detailed API Reference to explore Text to Speech and all available options.
Need help? Contact us on discord for guidance.