Best Practices for Writing Text for TTS

A guide to writing text that produces natural-sounding speech output.

1. Punctuation for Pauses

Punctuation	Effect	Example
`,` (comma)	Short pause	”हाँ, मैं समझ गया”
`.` (full stop)	Medium pause, sentence end	”यह Very good है।“
`!` (exclamation)	Emphasis + pause	”नमस्ते!”
`…` (ellipsis)	Hesitation / trailing off	”मुझे लगता है… शायद हम try कर सकते हैं”
Line break	Natural pause between paragraphs	See below

Tip: Use … (ellipsis) to create a hesitation or trailing-off effect — it signals the speaker is thinking or pausing mid-thought. Use sparingly for natural results.

Tip: Use line breaks between paragraphs for natural breathing pauses:

हमारी technology सबको समझती है।
हमारा mission है कि हर Indian अपनी mother tongue में technology use कर सके।

2. Fillers & Hesitations for Natural Speech

Add fillers and hesitation markers to make speech sound conversational:

Filler	Effect	Example
`um`	Thinking pause	”चाहे आप um Hindi बोलते हों”
`uh`	Short hesitation	”uh, मुझे एक second दो”
`hmm`	Contemplation	”hmm, यह interesting है”
`like...`	Casual filler	”या like… कोई भी Indian language”
`basically...`	Starting explanation	”So basically… हम India की हर language को voice देते हैं”
`actually...`	Adding emphasis	”हमारी technology actually… सबको समझती है”
`you know...`	Conversational connector	”you know… यह बहुत simple है”
`I mean...`	Self-correction	”I mean… दूसरा option भी है”

Combining fillers with ellipsis for natural hesitation:

So basically… हमारा goal है कि um हर Indian language को support करें।
I mean... यह easy नहीं है... but we're getting there.

3. Code-Mixing (Hinglish)

For natural Indian speech, mix English words where they’re commonly used. This is how most urban Indians speak — the model handles it well.

Rule: Write English words in English script, Hindi words in Devanagari:

✅ “Sarvam AI में आपका स्वागत है”
❌ “सरवम एआई में आपका स्वागत है”

Common code-mixed categories:

Category	Examples
Tech terms	technology, app, website, download, update, AI
Everyday words	basically, actually, like, amazing, simple
Social Expressions	thank you, sorry, please, welcome
Business	meeting, deadline, budget, report, feedback

Full code-mixed examples:

So basically... हम India की हर language को voice देते हैं।
चाहे आप um Hindi बोलते हों, Tamil, Telugu, Bengali या like... कोई भी Indian language।
अगर आपको koi doubt है तो please हमें contact करें।
Meeting actually postpone हो गई है, I mean... tomorrow रखते हैं।

Keep Hindi sentence structure, swap key nouns/verbs with English:

“हर Indian अपनी mother tongue में technology use कर सके”
“आज का weather actually बहुत pleasant है”
“यह app basically आपकी daily life को simple बना देगा”

4. Avoid These

Avoid	Why	Fix
Overusing `...`	Too many ellipses sound choppy	Use `…` sparingly for hesitation; prefer `,` or line breaks for regular pauses
Complex Sanskrit words	May mispronounce	Use simpler Hindi
Very long sentences	Unnatural breathing	Break into shorter sentences

5. Language-Specific Tips

Sentence-ending punctuation

If a sentence ends in Hindi or a regional language, use ।: "हमारी technology सबको समझती है।"
- If a sentence ends in English, use . : "प्लान simple है, just execute."

Writing Conventions

Write language names in English: Tamil, Telugu, Bengali (not तमिल, तेलुगु)
Keep brand names in English: Sarvam AI, Google, WhatsApp

6. Target Language Code

The target_language_code parameter is required for every TTS request. It is primarily effective for handling language-specific processing of numbers, abbreviations, and special characters.

Supported Languages

Language	Code
English	`en-IN`
Hindi	`hi-IN`
Bengali	`bn-IN`
Tamil	`ta-IN`
Telugu	`te-IN`
Kannada	`kn-IN`
Malayalam	`ml-IN`
Marathi	`mr-IN`
Gujarati	`gu-IN`
Punjabi	`pa-IN`
Odia	`od-IN`

Example

1 audio = client.text_to_speech.convert(
2     text="नमस्ते! Sarvam AI में आपका स्वागत है।",
3     model="bulbul:v3",
4     target_language_code="hi-IN",  
5     speaker="shubh"
6 )

If your text contains mixed languages (e.g. Hinglish), set the target_language_code to the language in which you want entities (e.g numbers) in speech.

7. Understanding the Audio Output (Base64)

The TTS API returns audio data as a base64-encoded string. You must decode this string before saving or playing the audio file.

REST API Response

The REST API returns a response with an audios field — an array of base64-encoded audio strings. You need to decode them:

Python

JavaScript

cURL

1 import base64
2 from sarvamai import SarvamAI
3 
4 client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5 
6 audio = client.text_to_speech.convert(
7     text="नमस्ते! Sarvam AI में आपका स्वागत है।",
8     model="bulbul:v3",
9     target_language_code="hi-IN",
10     speaker="shubh"
11 )
12 
13 # The response contains base64-encoded audio in the 'audios' field
14 # Combine all audio chunks and decode from base64
15 combined_audio = "".join(audio.audios)
16 audio_bytes = base64.b64decode(combined_audio)
17 
18 with open("output.wav", "wb") as f:
19     f.write(audio_bytes)

Streaming API Response

For the streaming (WebSocket) API, each chunk arrives as a base64-encoded audio string. Decode each chunk as it arrives:

1 import asyncio
2 import base64
3 from sarvamai import AsyncSarvamAI, AudioOutput
4 
5 async def tts_stream():
6     client = AsyncSarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
7 
8     async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
9         await ws.configure(
10             target_language_code="hi-IN",
11             speaker="shubh"
12         )
13 
14         await ws.convert("नमस्ते! Sarvam AI में आपका स्वागत है।")
15         await ws.flush()
16 
17         with open("output.wav", "wb") as f:
18             async for message in ws:
19                 if isinstance(message, AudioOutput):
20                     # Each chunk is base64-encoded — decode before writing
21                     audio_chunk = base64.b64decode(message.data.audio)
22                     f.write(audio_chunk)
23 
24 asyncio.run(tts_stream())

Do not write the raw base64 string directly to a file. The audio will be corrupted and unplayable. Always decode with base64.b64decode() (Python) or Buffer.from(data, "base64") (JavaScript) first.

9. Key Considerations

For numbers greater than 4 digits, use commas (e.g., 10,000 instead of 10000) for correct pronunciation.