Text To Speech

This is the model to convert text into spoken audio. The output is a wave file encoded as a base64 string.

Headers

api-subscription-keystringRequired

Request

This endpoint expects an object.
textstringRequired

The text(s) to be converted into speech.

Features:

  • Each text should be no longer than 1500 characters
  • Supports code-mixed text (English and Indic languages)

Important Note:

  • For numbers larger than 4 digits, use commas (e.g., ‘10,000’ instead of ‘10000’)
  • This ensures proper pronunciation as a whole number
target_language_codeenumRequired

The language of the text is BCP-47 format

speakerenum or nullOptional

The speaker voice to be used for the output audio.

Default: Anushka

Model Compatibility (Speakers compatible with respective model):

  • bulbul:v2:
    • Female: Anushka, Manisha, Vidya, Arya
    • Male: Abhilash, Karun, Hitesh

Note: Speaker selection must match the chosen model version.

pitchdouble or nullOptionalDefaults to 0

Controls the pitch of the audio. Lower values result in a deeper voice, while higher values make it sharper. The suitable range is between -0.75 and 0.75. Default is 0.0.

pacedouble or nullOptional>=0.3<=3Defaults to 1
Controls the speed of the audio. Lower values result in slower speech, while higher values make it faster. The suitable range is between 0.5 and 2.0. Default is 1.0.
loudnessdouble or nullOptional>=0.1<=3Defaults to 1
Controls the loudness of the audio. Lower values result in quieter audio, while higher values make it louder. The suitable range is between 0.3 and 3.0. Default is 1.0.
speech_sample_rateenum or nullOptional
Specifies the sample rate of the output audio. Supported values are 8000, 16000, 22050, 24000 Hz. If not provided, the default is 22050 Hz.
Allowed values:
enable_preprocessingbooleanOptionalDefaults to false

Controls whether normalization of English words and numeric entities (e.g., numbers, dates) is performed. Set to true for better handling of mixed-language text. Default is false.

modelenumOptional

Specifies the model to use for text-to-speech conversion. Default is bulbul:v2.

Allowed values:

Response

Successful Response
request_idstring or null
audioslist of strings
The output audio files in WAV format, encoded as base64 strings. Each string corresponds to one of the input texts.

Errors