REST

Convert text into spoken audio. The output is a wave file encoded as a base64 string. **Available Models:** - **bulbul:v2** (default): Supports pitch, loudness, and pace controls - **bulbul:v3-beta**: Newer model with temperature control and improved quality **Important Notes for bulbul:v3-beta:** - Pitch and loudness parameters are NOT supported - Pace must be between 0.5 and 2.0 - Preprocessing is automatically enabled - Default sample rate is 24000 Hz - Temperature parameter available (0.01-1.0, default 0.6)

Authentication

api-subscription-keystring
API Key authentication via header

Request

This endpoint expects an object.
textstringRequired
The text(s) to be converted into speech. **Features:** - Supports code-mixed text (English and Indic languages) **Model-specific limits:** - **bulbul:v2:** Max 1500 characters - **bulbul:v3-beta:** Max 2500 characters **Important Note:** - For numbers larger than 4 digits, use commas (e.g., '10,000' instead of '10000') - This ensures proper pronunciation as a whole number
target_language_codeenumRequired

The language of the text is BCP-47 format

speakerenum or nullOptional
The speaker voice to be used for the output audio. **Default:** Anushka (for bulbul:v2), Aditya (for bulbul:v3-beta) **Model Compatibility (Speakers compatible with respective model):** - **bulbul:v2:** - Female: Anushka, Manisha, Vidya, Arya - Male: Abhilash, Karun, Hitesh - **bulbul:v3-beta:** - Aditya, Ritu, Priya, Neha, Rahul, Pooja, Rohan, Simran, Kavya, Amit, Dev, Ishita, Shreya, Ratan, Varun, Manan, Sumit, Roopa, Kabir, Aayan, Shubh, Ashutosh, Advait, Amelia, Sophia **Note:** Speaker selection must match the chosen model version.
pitchdouble or nullOptionalDefaults to 0

Controls the pitch of the audio. Lower values result in a deeper voice, while higher values make it sharper. The suitable range is between -0.75 and 0.75. Default is 0.0.

Note: This parameter is only supported for bulbul:v2. It is NOT supported for bulbul:v3-beta and will cause a validation error if provided.

pacedouble or nullOptional0.3-3Defaults to 1
Controls the speed of the audio. Lower values result in slower speech, while higher values make it faster. Default is 1.0. **Model-specific ranges:** - **bulbul:v2:** 0.3 to 3.0 - **bulbul:v3-beta:** 0.5 to 2.0
loudnessdouble or nullOptional0.1-3Defaults to 1

Controls the loudness of the audio. Lower values result in quieter audio, while higher values make it louder. The suitable range is between 0.3 and 3.0. Default is 1.0.

Note: This parameter is only supported for bulbul:v2. It is NOT supported for bulbul:v3-beta and will cause a validation error if provided.

speech_sample_rateenum or nullOptional
Specifies the sample rate of the output audio. Supported values are 8000, 16000, 22050, 24000 Hz. **Model-specific defaults:** - **bulbul:v2:** Default is 22050 Hz - **bulbul:v3-beta:** Default is 24000 Hz
Allowed values:
enable_preprocessingbooleanOptionalDefaults to false
Controls whether normalization of English words and numeric entities (e.g., numbers, dates) is performed. Set to true for better handling of mixed-language text. **Model-specific behavior:** - **bulbul:v2:** Default is false - **bulbul:v3-beta:** Automatically enabled (true) and cannot be disabled
modelenumOptional
Specifies the model to use for text-to-speech conversion. **Available models:** - **bulbul:v2:** Default model with pitch, loudness controls - **bulbul:v3-beta:** Newer model with temperature control, improved quality
Allowed values:
output_audio_codecenum or nullOptional
Specifies the audio codec for the output audio file. Different codecs offer various compression and quality characteristics.
temperaturedouble or nullOptional0.01-1Defaults to 0.6

Controls the randomness of the output. Lower values make the output more focused and deterministic, while higher values make it more random. The suitable range is between 0.01 and 1.0. Default is 0.6.

Note: This parameter is only supported for bulbul:v3-beta. It has no effect on bulbul:v2.

Response

Successful Response
request_idstring or null
audioslist of strings
The output audio files in WAV format, encoded as base64 strings. Each string corresponds to one of the input texts.

Errors