Best Practices for Writing Text for TTS

A guide to writing text that produces natural-sounding speech output.


1. Punctuation for Pauses

PunctuationEffectExample
, (comma)Short pause”हाँ, मैं समझ गया”
. (full stop)Medium pause, sentence end”यह Very good है।“
! (exclamation)Emphasis + pause”नमस्ते!”
(ellipsis)Hesitation / trailing off”मुझे लगता है… शायद हम try कर सकते हैं”
Line breakNatural pause between paragraphsSee below

Tip: Use (ellipsis) to create a hesitation or trailing-off effect — it signals the speaker is thinking or pausing mid-thought. Use sparingly for natural results.

Tip: Use line breaks between paragraphs for natural breathing pauses:

हमारी technology सबको समझती है।
हमारा mission है कि हर Indian अपनी mother tongue में technology use कर सके।

2. Fillers & Hesitations for Natural Speech

Add fillers and hesitation markers to make speech sound conversational:

FillerEffectExample
umThinking pause”चाहे आप um Hindi बोलते हों”
uhShort hesitation”uh, मुझे एक second दो”
hmmContemplation”hmm, यह interesting है”
like...Casual filler”या like… कोई भी Indian language”
basically...Starting explanation”So basically… हम India की हर language को voice देते हैं”
actually...Adding emphasis”हमारी technology actually… सबको समझती है”
you know...Conversational connector”you know… यह बहुत simple है”
I mean...Self-correction”I mean… दूसरा option भी है”

Combining fillers with ellipsis for natural hesitation:

So basically… हमारा goal है कि um हर Indian language को support करें।
I mean... यह easy नहीं है... but we're getting there.

3. Code-Mixing (Hinglish)

For natural Indian speech, mix English words where they’re commonly used. This is how most urban Indians speak — the model handles it well.

Rule: Write English words in English script, Hindi words in Devanagari:

  • ✅ “Sarvam AI में आपका स्वागत है”
  • ❌ “सरवम एआई में आपका स्वागत है”

Common code-mixed categories:

CategoryExamples
Tech termstechnology, app, website, download, update, AI
Everyday wordsbasically, actually, like, amazing, simple
Social Expressionsthank you, sorry, please, welcome
Businessmeeting, deadline, budget, report, feedback

Full code-mixed examples:

So basically... हम India की हर language को voice देते हैं।
चाहे आप um Hindi बोलते हों, Tamil, Telugu, Bengali या like... कोई भी Indian language।
अगर आपको koi doubt है तो please हमें contact करें।
Meeting actually postpone हो गई है, I mean... tomorrow रखते हैं।

Keep Hindi sentence structure, swap key nouns/verbs with English:

  • “हर Indian अपनी mother tongue में technology use कर सके”
  • “आज का weather actually बहुत pleasant है”
  • “यह app basically आपकी daily life को simple बना देगा”

4. Avoid These

AvoidWhyFix
Overusing ...Too many ellipses sound choppyUse sparingly for hesitation; prefer , or line breaks for regular pauses
Complex Sanskrit wordsMay mispronounceUse simpler Hindi
Very long sentencesUnnatural breathingBreak into shorter sentences

5. Language-Specific Tips

Sentence-ending punctuation

  • If a sentence ends in Hindi or a regional language, use : "हमारी technology सबको समझती है।"
    • If a sentence ends in English, use . : "प्लान simple है, just execute."

Writing Conventions

  • Write language names in English: Tamil, Telugu, Bengali (not तमिल, तेलुगु)
  • Keep brand names in English: Sarvam AI, Google, WhatsApp

6. Target Language Code

The target_language_code parameter is required for every TTS request. It is primarily effective for handling language-specific processing of numbers, abbreviations, and special characters.

Supported Languages

LanguageCode
Englishen-IN
Hindihi-IN
Bengalibn-IN
Tamilta-IN
Telugute-IN
Kannadakn-IN
Malayalamml-IN
Marathimr-IN
Gujaratigu-IN
Punjabipa-IN
Odiaod-IN

Example

1audio = client.text_to_speech.convert(
2 text="नमस्ते! Sarvam AI में आपका स्वागत है।",
3 model="bulbul:v3",
4 target_language_code="hi-IN",
5 speaker="shubh"
6)

If your text contains mixed languages (e.g. Hinglish), set the target_language_code to the language in which you want entities (e.g numbers) in speech.


7. Understanding the Audio Output (Base64)

The TTS API returns audio data as a base64-encoded string. You must decode this string before saving or playing the audio file.

REST API Response

The REST API returns a response with an audios field — an array of base64-encoded audio strings. You need to decode them:

1import base64
2from sarvamai import SarvamAI
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6audio = client.text_to_speech.convert(
7 text="नमस्ते! Sarvam AI में आपका स्वागत है।",
8 model="bulbul:v3",
9 target_language_code="hi-IN",
10 speaker="shubh"
11)
12
13# The response contains base64-encoded audio in the 'audios' field
14# Combine all audio chunks and decode from base64
15combined_audio = "".join(audio.audios)
16audio_bytes = base64.b64decode(combined_audio)
17
18with open("output.wav", "wb") as f:
19 f.write(audio_bytes)

Streaming API Response

For the streaming (WebSocket) API, each chunk arrives as a base64-encoded audio string. Decode each chunk as it arrives:

1import asyncio
2import base64
3from sarvamai import AsyncSarvamAI, AudioOutput
4
5async def tts_stream():
6 client = AsyncSarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
7
8 async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
9 await ws.configure(
10 target_language_code="hi-IN",
11 speaker="shubh"
12 )
13
14 await ws.convert("नमस्ते! Sarvam AI में आपका स्वागत है।")
15 await ws.flush()
16
17 with open("output.wav", "wb") as f:
18 async for message in ws:
19 if isinstance(message, AudioOutput):
20 # Each chunk is base64-encoded — decode before writing
21 audio_chunk = base64.b64decode(message.data.audio)
22 f.write(audio_chunk)
23
24asyncio.run(tts_stream())

Do not write the raw base64 string directly to a file. The audio will be corrupted and unplayable. Always decode with base64.b64decode() (Python) or Buffer.from(data, "base64") (JavaScript) first.


9. Key Considerations

  • For numbers greater than 4 digits, use commas (e.g., 10,000 instead of 10000) for correct pronunciation.