Text to Speech
This is the model to convert text into spoken audio. The output is a wave file encoded as a base64 string.
Headers
Your unique subscription key for authenticating requests to the Sarvam AI Speech-to-Text API. Here are the steps to get your api key
Body
The text(s) to be converted into speech. Each text should be no longer than 500 characters. You can send up to 3 texts in a single API call. The text can be code-mixed, combining English and Indic languages.
The language of the text is BCP-47 format
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, en-IN
, gu-IN
Controls whether normalization of English words and numeric entities (e.g., numbers, dates) is performed. Set to true for better handling of mixed-language text. Default is false.
Weight for interpolating with English speaker at encoder
Controls the loudness of the audio. Lower values result in quieter audio, while higher values make it louder. The suitable range is between 0.3 and 3.0.
0 < x < 3
Specifies the model to use for text-to-speech conversion.
bulbul:v1
Override the default speaker triplets
Controls the speed of the audio. Lower values result in slower speech, while higher values make it faster. The suitable range is between 0.5 and 2.0. Default is 1.0.
0.3 < x < 3
Controls the pitch of the audio. Lower values result in a deeper voice, while higher values make it sharper. The suitable range is between -0.75 and 0.75.
-1 < x < 1
The speaker to be used for the output audio. If not provided, Meera will be used as default.
meera
, pavithra
, maitreyi
, arvind
, amol
, amartya
, diya
, neel
, misha
, vian
, arjun
, maya
Specifies the sample rate of the output audio. Supported values are 8000, 16000, and 22050 Hz. If not provided, the default is 22050 Hz.
8000
, 16000
, 22050