> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# REST

POST https://api.sarvam.ai/text-to-speech
Content-Type: application/json

Convert text into spoken audio. The output is a base64-encoded audio string that must be decoded before use.

**Available Models:**
- **bulbul:v3**: Latest model with improved quality, 30+ voices, and temperature control
- **bulbul:v2**: Legacy model with pitch and loudness controls

**Important Notes for bulbul:v3:**
- Pitch and loudness parameters are NOT supported
- Pace range: 0.5 to 2.0
- Preprocessing is automatically enabled
- Default sample rate is 24000 Hz
- Supports sample rates: 8000, 16000, 22050, 24000 Hz (REST API also supports 32000, 44100, 48000 Hz)

Reference: https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: ''
  version: 1.0.0
paths:
  /text-to-speech:
    post:
      operationId: convert
      summary: Text to Speech
      description: >-
        Convert text into spoken audio. The output is a base64-encoded audio
        string that must be decoded before use.


        **Available Models:**

        - **bulbul:v3**: Latest model with improved quality, 30+ voices, and
        temperature control

        - **bulbul:v2**: Legacy model with pitch and loudness controls


        **Important Notes for bulbul:v3:**

        - Pitch and loudness parameters are NOT supported

        - Pace range: 0.5 to 2.0

        - Preprocessing is automatically enabled

        - Default sample rate is 24000 Hz

        - Supports sample rates: 8000, 16000, 22050, 24000 Hz (REST API also
        supports 32000, 44100, 48000 Hz)
      tags:
        - subpackage_textToSpeech
      parameters:
        - name: api-subscription-key
          in: header
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_TextToSpeechResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '422':
          description: Unprocessable Entity
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '429':
          description: Quota Exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/Sarvam_Model_API_TextToSpeechRequest'
servers:
  - url: https://api.sarvam.ai
components:
  schemas:
    SarvamModelApiTextToSpeechRequestText:
      oneOf:
        - type: string
      description: >-
        The text(s) to be converted into speech.


        **Features:**

        - Supports code-mixed text (English and Indic languages)


        **Model-specific limits:**

        - **bulbul:v3:** Max 2500 characters

        - **bulbul:v2:** Max 1500 characters


        **Important Note:**

        - For numbers larger than 4 digits, use commas (e.g., '10,000' instead
        of '10000')

        - This ensures proper pronunciation as a whole number
      title: SarvamModelApiTextToSpeechRequestText
    Sarvam_Model_API_TextToSpeechLanguage:
      type: string
      enum:
        - bn-IN
        - en-IN
        - gu-IN
        - hi-IN
        - kn-IN
        - ml-IN
        - mr-IN
        - od-IN
        - pa-IN
        - ta-IN
        - te-IN
      title: Sarvam_Model_API_TextToSpeechLanguage
    Sarvam_Model_API_TextToSpeechSpeaker:
      type: string
      enum:
        - anushka
        - abhilash
        - manisha
        - vidya
        - arya
        - karun
        - hitesh
        - aditya
        - ritu
        - priya
        - neha
        - rahul
        - pooja
        - rohan
        - simran
        - kavya
        - amit
        - dev
        - ishita
        - shreya
        - ratan
        - varun
        - manan
        - sumit
        - roopa
        - kabir
        - aayan
        - shubh
        - ashutosh
        - advait
        - anand
        - tanya
        - tarun
        - sunny
        - mani
        - gokul
        - vijay
        - shruti
        - suhani
        - mohit
        - kavitha
        - rehan
        - soham
        - rupali
      title: Sarvam_Model_API_TextToSpeechSpeaker
    Sarvam_Model_API_SpeechSampleRate:
      type: string
      enum:
        - '8000'
        - '16000'
        - '22050'
        - '24000'
        - '32000'
        - '44100'
        - '48000'
      title: Sarvam_Model_API_SpeechSampleRate
    Sarvam_Model_API_TextToSpeechModel:
      type: string
      enum:
        - bulbul:v2
        - bulbul:v3
      title: Sarvam_Model_API_TextToSpeechModel
    TextToSpeechOutputAudioCodec:
      type: string
      enum:
        - mp3
        - linear16
        - mulaw
        - alaw
        - opus
        - flac
        - aac
        - wav
      description: Audio codec options for the non-streaming /text-to-speech endpoint
      title: TextToSpeechOutputAudioCodec
    Sarvam_Model_API_TextToSpeechRequest:
      type: object
      properties:
        text:
          $ref: '#/components/schemas/SarvamModelApiTextToSpeechRequestText'
          description: >-
            The text(s) to be converted into speech.


            **Features:**

            - Supports code-mixed text (English and Indic languages)


            **Model-specific limits:**

            - **bulbul:v3:** Max 2500 characters

            - **bulbul:v2:** Max 1500 characters


            **Important Note:**

            - For numbers larger than 4 digits, use commas (e.g., '10,000'
            instead of '10000')

            - This ensures proper pronunciation as a whole number
        target_language_code:
          $ref: '#/components/schemas/Sarvam_Model_API_TextToSpeechLanguage'
          description: The language code in BCP-47 format.
        speaker:
          oneOf:
            - $ref: '#/components/schemas/Sarvam_Model_API_TextToSpeechSpeaker'
            - type: 'null'
          default: shubh
          description: >-
            The speaker voice to be used for the output audio.


            **Default:** shubh (for bulbul:v3), anushka (for bulbul:v2)


            **Model Compatibility (Speakers compatible with respective model):**

            - **bulbul:v3:**
              - shubh (default), aditya, ritu, priya, neha, rahul, pooja, rohan, simran, kavya, amit, dev, ishita, shreya, ratan, varun, manan, sumit, roopa, kabir, aayan, ashutosh, advait, anand, tanya, tarun, sunny, mani, gokul, vijay, shruti, suhani, mohit, kavitha, rehan, soham, rupali
            - **bulbul:v2:**
              - Female: anushka, manisha, vidya, arya
              - Male: abhilash, karun, hitesh

            **Note:** Speaker selection must match the chosen model version.


            **Important:** Speaker names are case-sensitive and must be
            lowercase (e.g., `ritu` not `Ritu`).
        pitch:
          type:
            - number
            - 'null'
          format: double
          description: >-
            Controls the pitch of the audio. Lower values result in a deeper
            voice, while higher values make it sharper. The suitable range is
            between -0.75 and 0.75. Default is 0.0.


            **Note:** This parameter is only supported for bulbul:v2. It is NOT
            supported for bulbul:v3.
        pace:
          type:
            - number
            - 'null'
          format: double
          default: 1
          description: >-
            Controls the speed of the audio. Lower values result in slower
            speech, while higher values make it faster. Default is 1.0.


            **Model-specific ranges:**

            - **bulbul:v3:** 0.5 to 2.0

            - **bulbul:v2:** 0.3 to 3.0
        loudness:
          type:
            - number
            - 'null'
          format: double
          description: >-
            Controls the loudness of the audio. Lower values result in quieter
            audio, while higher values make it louder. The suitable range is
            between 0.3 and 3.0. Default is 1.0.


            **Note:** This parameter is only supported for bulbul:v2. It is NOT
            supported for bulbul:v3.
        speech_sample_rate:
          oneOf:
            - $ref: '#/components/schemas/Sarvam_Model_API_SpeechSampleRate'
            - type: 'null'
          default: 22050
          description: >-
            Specifies the sample rate of the output audio. Supported values are
            8000, 16000, 22050, 24000, 32000, 44100, 48000 Hz.


            **Note:** Higher sample rates (32000, 44100, 48000 Hz) are only
            available with bulbul:v3 via the REST API, not in streaming mode.


            **Default:** 24000 Hz
        enable_preprocessing:
          type: boolean
          default: false
          description: >-
            Controls whether normalization of English words and numeric entities
            (e.g., numbers, dates) is performed. Set to true for better handling
            of mixed-language text.


            **Model-specific behavior:**

            - **bulbul:v3:** Not Supported

            - **bulbul:v2:** Default is false
        model:
          $ref: '#/components/schemas/Sarvam_Model_API_TextToSpeechModel'
          description: >-
            Specifies the model to use for text-to-speech conversion.


            **Available models:**

            - **bulbul:v3:** Latest model with improved quality, 30+ voices,
            pace, and temperature control

            - **bulbul:v2:** Legacy model with pitch, loudness, and pace
            controls
        output_audio_codec:
          oneOf:
            - $ref: '#/components/schemas/TextToSpeechOutputAudioCodec'
            - type: 'null'
          default: wav
          description: >-
            Specifies the audio codec for the output audio file. Different
            codecs offer various compression and quality characteristics.
        temperature:
          type:
            - number
            - 'null'
          format: double
          default: 0.6
          description: >-
            Temperature controls how much randomness and expressiveness the TTS
            model uses while generating speech.


            Lower values produce more stable and consistent output, while higher
            values sound more expressive but may introduce artifacts or errors.
            The suitable range is between 0.01 and 2.0. Default is 0.6.


            **Note:** This parameter is only supported for bulbul:v3. It has no
            effect on bulbul:v2.
        dict_id:
          type:
            - string
            - 'null'
          description: >-
            The ID of a pronunciation dictionary to apply during synthesis. When
            provided, matching words in the input text will be replaced with
            their custom pronunciations before generating speech.


            Create and manage dictionaries via the [Pronunciation Dictionary
            API](https://docs.sarvam.ai/api-reference-docs/pronunciation-dictionary/create).
            Only supported by **bulbul:v3**.
        enable_cached_responses:
          type: boolean
          default: false
          description: >-
            Enable caching for the request. When enabled, identical requests
            will return cached audio instead of regenerating. Default is false.


            **Note:** Currently in beta and only available for bulbul:v1 and
            bulbul:v2 models.
      required:
        - text
        - target_language_code
      title: Sarvam_Model_API_TextToSpeechRequest
    Sarvam_Model_API_TextToSpeechResponse:
      type: object
      properties:
        request_id:
          type:
            - string
            - 'null'
        audios:
          type: array
          items:
            type: string
          description: ' The output audio files in WAV format, encoded as base64 strings. Each string corresponds to one of the input texts.'
      required:
        - request_id
        - audios
      title: Sarvam_Model_API_TextToSpeechResponse
    Sarvam_Model_API_ErrorCode:
      type: string
      enum:
        - invalid_request_error
        - internal_server_error
        - unprocessable_entity_error
        - insufficient_quota_error
        - invalid_api_key_error
        - authentication_error
        - not_found_error
        - rate_limit_exceeded_error
      title: Sarvam_Model_API_ErrorCode
    Sarvam_Model_API_ErrorDetails:
      type: object
      properties:
        request_id:
          type:
            - string
            - 'null'
        message:
          type: string
          description: Message describing the error
        code:
          $ref: '#/components/schemas/Sarvam_Model_API_ErrorCode'
          description: >-
            Error code for the specific error that has occured. Refer to the
            error code documentation for more details.
      required:
        - request_id
        - message
        - code
      title: Sarvam_Model_API_ErrorDetails
    Sarvam_Model_API_ErrorMessage:
      type: object
      properties:
        error:
          $ref: '#/components/schemas/Sarvam_Model_API_ErrorDetails'
          description: Error details
      required:
        - error
      title: Sarvam_Model_API_ErrorMessage
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: api-subscription-key

```

## SDK Code Examples

```typescript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: process.env.SARVAM_API_KEY,
});

const response = await client.textToSpeech.convert({
  text: "Welcome to Sarvam AI!",
  target_language_code: "hi-IN",
  model: "bulbul:v3",
  speaker: "shubh",
});

const audio = Buffer.from(response.audios.join(""), "base64");
fs.writeFileSync("output.wav", audio);

```

```typescript
import { SarvamAIClient } from "sarvamai";

async function main() {
    const client = new SarvamAIClient({
        apiSubscriptionKey: "YOUR_API_KEY_HERE",
    });
    await client.textToSpeech.convert({
        text: "স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।",
        target_language_code: "bn-IN",
    });
}
main();

```

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_API_KEY_HERE",
)

client.text_to_speech.convert(
    text="স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।",
    target_language_code="bn-IN",
)

```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.sarvam.ai/text-to-speech"

	payload := strings.NewReader("{\n  \"text\": \"স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।\",\n  \"target_language_code\": \"bn-IN\"\n}")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("api-subscription-key", "<apiSubscriptionKey>")
	req.Header.Add("Content-Type", "application/json")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.sarvam.ai/text-to-speech")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["api-subscription-key"] = '<apiSubscriptionKey>'
request["Content-Type"] = 'application/json'
request.body = "{\n  \"text\": \"স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।\",\n  \"target_language_code\": \"bn-IN\"\n}"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.post("https://api.sarvam.ai/text-to-speech")
  .header("api-subscription-key", "<apiSubscriptionKey>")
  .header("Content-Type", "application/json")
  .body("{\n  \"text\": \"স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।\",\n  \"target_language_code\": \"bn-IN\"\n}")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.sarvam.ai/text-to-speech', [
  'body' => '{
  "text": "স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।",
  "target_language_code": "bn-IN"
}',
  'headers' => [
    'Content-Type' => 'application/json',
    'api-subscription-key' => '<apiSubscriptionKey>',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.sarvam.ai/text-to-speech");
var request = new RestRequest(Method.POST);
request.AddHeader("api-subscription-key", "<apiSubscriptionKey>");
request.AddHeader("Content-Type", "application/json");
request.AddParameter("application/json", "{\n  \"text\": \"স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।\",\n  \"target_language_code\": \"bn-IN\"\n}", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "api-subscription-key": "<apiSubscriptionKey>",
  "Content-Type": "application/json"
]
let parameters = [
  "text": "স্বাগতম Sarvam AI-তে! আজকের আবহাওয়া খুব সুন্দর।",
  "target_language_code": "bn-IN"
] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://api.sarvam.ai/text-to-speech")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```