> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Pronunciation Dictionary

> Teach Bulbul v3 how to say specific words — brand names, abbreviations, regional terms — exactly the way you want, across all 11 supported languages.

Bulbul v3 handles most text well out of the box — code-mixed Hinglish, numbers, common abbreviations. But some words need explicit guidance: your company name, niche acronyms, or terms borrowed from another language. That's what pronunciation dictionaries solve.

You upload a JSON file with `"word" → "how to say it"` mappings, get back a `dict_id`, and pass it in any TTS call. The engine swaps matching words before synthesis — no model retraining, no prompt engineering.

***

## When Do You Need This?

TTS models do a great job with everyday language. But they can stumble on words they haven't seen before — abbreviations specific to your industry, brand names with unusual spellings, or acronyms that should be spelled out rather than read as a word.

For example, if your app says:

```
NAIC policy number check karein aur B2B portal pe login karein
```

The model might try to read "NAIC" as a single word (like "naik") instead of spelling it out, or pronounce "B2B" literally. A pronunciation dictionary tells the model exactly what to do:

| Input text | Without dictionary          | With dictionary |
| ---------- | --------------------------- | --------------- |
| `NAIC`     | might say "naik" or "na-ic" | says "N A I C"  |
| `B2B`      | might say "b-दो-b"          | says "B to B"   |

Everything else in the sentence stays the same — the dictionary only touches exact matches.

***

## Dictionary Format

A single JSON file. The top-level key `pronunciations` maps language codes to word → replacement pairs:

```json
{
  "pronunciations": {
    "hi-IN": {
      "B2B": "B to B",
      "NAIC": "N A I C",
      "Sarvam": "सारवम"
    },
    "en-IN": {
      "Sarvam": "Saar-vum",
      "HDFC": "H D F C"
    },
    "ta-IN": {
      "EMI": "இ எம் ஐ"
    }
  }
}
```

Save this as a `.json` file. That's it — no XML, no special phoneme notation, no markup. Just plain text replacements.

Matching is **language-aware**. When `target_language_code` is `hi-IN`, only the `hi-IN` block applies. This means the same word (like "Sarvam") can have different spoken forms in Hindi vs English.

***

## Getting Started

**1. Create the dictionary JSON file**

You can create the file manually, or use this helper function:

```python
import json

def create_dictionary_file(pronunciations, filename="dict.json"):
    dictionary = {"pronunciations": pronunciations}
    with open(filename, "w") as f:
        json.dump(dictionary, f, ensure_ascii=False, indent=2)
    return filename

create_dictionary_file({
    "hi-IN": {
        "B2B": "B to B",
        "NAIC": "N A I C",
        "CIBIL": "सिबिल"
    },
    "en-IN": {
        "Sarvam": "Saar-vum"
    }
})
```

```javascript
import fs from "fs";

function createDictionaryFile(pronunciations, filename = "dict.json") {
  const dictionary = { pronunciations };
  fs.writeFileSync(filename, JSON.stringify(dictionary, null, 2));
  return filename;
}

createDictionaryFile({
  "hi-IN": {
    "B2B": "B to B",
    "NAIC": "N A I C",
    "CIBIL": "सिबिल"
  },
  "en-IN": {
    "Sarvam": "Saar-vum"
  }
});
```

**2. Upload it**

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

with open("dict.json", "rb") as f:
    result = client.pronunciation_dictionary.create(file=f)

print(result.dictionary_id)  # e.g. "p_5cb7faa6"
```

```javascript
import fs from "fs";

const buf = fs.readFileSync("dict.json");
const fd = new FormData();
fd.append(
  "file",
  new Blob([buf], { type: "application/json" }),
  "dict.json"
);

const res = await fetch(
  "https://api.sarvam.ai/text-to-speech/pronunciation-dictionary",
  {
    method: "POST",
    headers: { "api-subscription-key": process.env.SARVAM_API_KEY },
    body: fd,
  }
);

const result = await res.json();
console.log(result.dictionary_id); // e.g. "p_5cb7faa6"
```

The Node SDK currently does not set `Content-Type: application/json` on the multipart file part, so `client.pronunciationDictionary.create({ file })` is rejected with `Invalid content type 'application/octet-stream'`. The raw `fetch` + `FormData` workaround above sets the correct part content-type explicitly.

```bash
curl -X POST https://api.sarvam.ai/text-to-speech/pronunciation-dictionary \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -F "file=@dict.json;type=application/json"

# → {"dictionary_id": "p_5cb7faa6"}
```

The `;type=application/json` suffix is required. Without it, `curl` defaults to `application/octet-stream` and the API responds with `400: Invalid content type 'application/octet-stream'. Only application/json is accepted.`

**3. Pass `dict_id` in your TTS call**

```python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

audio = client.text_to_speech.convert(
    text="NAIC policy check karein aur B2B portal pe login karein",
    target_language_code="hi-IN",
    speaker="shubh",
    model="bulbul:v3",
    dict_id="p_5cb7faa6",
)

save(audio, "output.wav")
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convert({
  text: "NAIC policy check karein aur B2B portal pe login karein",
  target_language_code: "hi-IN",
  speaker: "shubh",
  model: "bulbul:v3",
  dict_id: "p_5cb7faa6",
});

const audio = Buffer.from(response.audios[0], "base64");
fs.writeFileSync("output.wav", audio);
```

```bash
curl -X POST https://api.sarvam.ai/text-to-speech \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "NAIC policy check karein aur B2B portal pe login karein",
    "target_language_code": "hi-IN",
    "speaker": "shubh",
    "model": "bulbul:v3",
    "dict_id": "p_5cb7faa6"
  }'
```

That's the core flow. The same `dict_id` works across REST, HTTP Stream, and WebSocket — just pass it as a parameter.

***

## Per-Language Matching

This is the key design choice: pronunciations are scoped to language codes. A single dictionary can hold mappings for multiple languages, and only the entries matching your `target_language_code` are applied at synthesis time.

```json
{
  "pronunciations": {
    "hi-IN": { "EMI": "ई एम आई",  "SIP": "सिप" },
    "en-IN": { "EMI": "E M I",     "SIP": "S I P" },
    "ta-IN": { "EMI": "இ எம் ஐ" },
    "te-IN": { "EMI": "ఇ ఎం ఐ" }
  }
}
```

When you call TTS with `target_language_code="ta-IN"`, only the Tamil entries are used. The Hindi and English entries are ignored for that request.

### Supported Language Codes

`hi-IN` `bn-IN` `ta-IN` `te-IN` `kn-IN` `ml-IN` `mr-IN` `gu-IN` `pa-IN` `od-IN` `en-IN`

***

## Managing Dictionaries

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

all_dicts = client.pronunciation_dictionary.list()
print(all_dicts.dictionary_count)   # number of dictionaries
print(all_dicts.dictionaries)       # list of dictionary IDs
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY",
});

const allDicts = await client.pronunciationDictionary.list();
console.log(allDicts.dictionary_count);
console.log(allDicts.dictionaries);
```

```bash
curl https://api.sarvam.ai/text-to-speech/pronunciation-dictionary \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY"
```

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

data = client.pronunciation_dictionary.get(dict_id="p_5cb7faa6")
print(data.pronunciations)
# {"hi-IN": {"B2B": "B to B", ...}, "en-IN": {...}}
```

```javascript
const data = await client.pronunciationDictionary.get("p_5cb7faa6");
console.log(data.pronunciations);
```

```bash
curl https://api.sarvam.ai/text-to-speech/pronunciation-dictionary/p_5cb7faa6 \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY"
```

Upload a JSON file to update the dictionary. You can add new words, change existing pronunciations, or do both — existing entries that aren't in the uploaded file remain unchanged. The `dict_id` stays the same, so your TTS integrations keep working.

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

with open("updated_dict.json", "rb") as f:
    result = client.pronunciation_dictionary.update(
        dict_id="p_5cb7faa6", file=f
    )
print(result.updated_pronunciations)
```

```javascript
import fs from "fs";

const buf = fs.readFileSync("updated_dict.json");
const fd = new FormData();
fd.append(
  "file",
  new Blob([buf], { type: "application/json" }),
  "updated_dict.json"
);

const res = await fetch(
  "https://api.sarvam.ai/text-to-speech/pronunciation-dictionary?dict_id=p_5cb7faa6",
  {
    method: "PUT",
    headers: { "api-subscription-key": process.env.SARVAM_API_KEY },
    body: fd,
  }
);

const result = await res.json();
console.log(result.updated_pronunciations);
```

```bash
curl -X PUT "https://api.sarvam.ai/text-to-speech/pronunciation-dictionary?dict_id=p_5cb7faa6" \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY" \
  -F "file=@updated_dict.json;type=application/json"
```

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

result = client.pronunciation_dictionary.delete(dict_id="p_5cb7faa6")
print(result.message)
```

```javascript
const result = await client.pronunciationDictionary.delete({
  dict_id: "p_5cb7faa6"
});
console.log(result);
```

```bash
curl -X DELETE "https://api.sarvam.ai/text-to-speech/pronunciation-dictionary?dict_id=p_5cb7faa6" \
  -H "api-subscription-key: YOUR_SARVAM_API_KEY"
```

***

## Limits

|                          | Limit            |
| ------------------------ | ---------------- |
| Dictionaries per user    | 10               |
| Words per dictionary     | 100              |
| File size                | 1 MB             |
| Model support            | `bulbul:v3` only |
| Dictionaries per request | 1                |

***

## Common Patterns

Here are some real-world patterns that work well with Sarvam pronunciation dictionaries:

**Financial services (IVR / voice bots)**

```json
{
  "hi-IN": {
    "NEFT": "एन ई एफ टी",
    "RTGS": "आर टी जी एस",
    "KYC": "के वाई सी",
    "EMI": "ई एम आई",
    "CIBIL": "सिबिल"
  }
}
```

**Healthcare**

```json
{
  "hi-IN": {
    "OPD": "ओ पी डी",
    "ICU": "आई सी यू",
    "MRI": "एम आर आई",
    "BP": "बी पी"
  }
}
```

**Brand and product names**

```json
{
  "hi-IN": {
    "Sarvam": "सारवम",
    "PhonePe": "फ़ोन पे",
    "Zerodha": "ज़ीरोधा"
  },
  "en-IN": {
    "Sarvam": "Saar-vum"
  }
}
```

***

## Tips

* **Only add words that actually mispronounce.** Bulbul v3 already handles common English words, numbers, and Hinglish well. Test without a dictionary first.
* **One dictionary per request.** If you need entries from multiple dictionaries, merge them into one (you have 100 words to work with).
* **Update preserves the ID.** When pronunciations change, use the update endpoint rather than delete + recreate. Your existing TTS integrations keep working.
* **Test with your production voice.** Different speakers may handle certain words differently — always verify with the voice you'll deploy.

***

## Using Pronunciation Dictionary with All TTS APIs

The `dict_id` parameter works across REST, HTTP Stream, and WebSocket. Here are complete examples for each:

```python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

audio = client.text_to_speech.convert(
    text="NEFT transfer karein aur KYC complete karein",
    target_language_code="hi-IN",
    speaker="shubh",
    model="bulbul:v3",
    dict_id="p_5cb7faa6",
)

save(audio, "output.wav")
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convert({
  text: "NEFT transfer karein aur KYC complete karein",
  target_language_code: "hi-IN",
  speaker: "shubh",
  model: "bulbul:v3",
  dict_id: "p_5cb7faa6",
});

const audio = Buffer.from(response.audios[0], "base64");
fs.writeFileSync("output.wav", audio);
```

```python
from sarvamai import SarvamAI

client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

with open("output.mp3", "wb") as f:
    for chunk in client.text_to_speech.convert_stream(
        text="NEFT transfer karein aur KYC complete karein",
        target_language_code="hi-IN",
        speaker="shubh",
        model="bulbul:v3",
        dict_id="p_5cb7faa6",
        output_audio_codec="mp3",
    ):
        f.write(chunk)
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const response = await client.textToSpeech.convertStream({
  text: "NEFT transfer karein aur KYC complete karein",
  target_language_code: "hi-IN",
  speaker: "shubh",
  model: "bulbul:v3",
  dict_id: "p_5cb7faa6",
  output_audio_codec: "mp3",
});

const audio = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", audio);
```

```python
import asyncio
import base64
from sarvamai import AsyncSarvamAI, AudioOutput, EventResponse

async def tts_with_dict():
    client = AsyncSarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")

    async with client.text_to_speech_streaming.connect(
        model="bulbul:v3", send_completion_event=True
    ) as ws:
        await ws.configure(
            target_language_code="hi-IN",
            speaker="shubh",
            output_audio_codec="mp3",
            dict_id="p_5cb7faa6",
        )

        await ws.convert("NEFT transfer karein aur KYC complete karein")
        await ws.flush()

        with open("output.mp3", "wb") as f:
            async for message in ws:
                if isinstance(message, AudioOutput):
                    f.write(base64.b64decode(message.data.audio))
                elif isinstance(message, EventResponse):
                    if message.data.event_type == "final":
                        break

asyncio.run(tts_with_dict())
```

```javascript
import { SarvamAIClient } from "sarvamai";
import fs from "fs";

const client = new SarvamAIClient({
  apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

const socket = await client.textToSpeechStreaming.connect({
  model: "bulbul:v3",
  send_completion_event: "true",
});

const outputStream = fs.createWriteStream("output.mp3");

socket.on("open", () => {
  socket.configureConnection({
    type: "config",
    data: {
      speaker: "shubh",
      target_language_code: "hi-IN",
      output_audio_codec: "mp3",
      dict_id: "p_5cb7faa6",
    },
  });

  socket.convert("NEFT transfer karein aur KYC complete karein");
});

socket.on("message", (message) => {
  if (message.type === "audio") {
    outputStream.write(Buffer.from(message.data.audio, "base64"));
  } else if (message.type === "event" && message.data.event_type === "final") {
    outputStream.end();
    socket.close();
  }
});

await socket.waitForOpen();
```

Full endpoint specs, request/response schemas, and error codes are in the [API Reference](/api-reference-docs/pronunciation-dictionary/create).

Need help? Reach out on [Discord](https://discord.com/invite/5rAsykttcs).