Best Practices for Writing Text for TTS
A guide to writing text that produces natural-sounding speech output.
1. Punctuation for Pauses
Tip: Use … (ellipsis) to create a hesitation or trailing-off effect — it signals the speaker is thinking or pausing mid-thought. Use sparingly for natural results.
Tip: Use line breaks between paragraphs for natural breathing pauses:
2. Fillers & Hesitations for Natural Speech
Add fillers and hesitation markers to make speech sound conversational:
Combining fillers with ellipsis for natural hesitation:
3. Code-Mixing (Hinglish)
For natural Indian speech, mix English words where they’re commonly used. This is how most urban Indians speak — the model handles it well.
Rule: Write English words in English script, Hindi words in Devanagari:
- ✅ “Sarvam AI में आपका स्वागत है”
- ❌ “सरवम एआई में आपका स्वागत है”
Common code-mixed categories:
Full code-mixed examples:
Keep Hindi sentence structure, swap key nouns/verbs with English:
- “हर Indian अपनी mother tongue में technology use कर सके”
- “आज का weather actually बहुत pleasant है”
- “यह app basically आपकी daily life को simple बना देगा”
4. Avoid These
5. Language-Specific Tips
Sentence-ending punctuation
- If a sentence ends in Hindi or a regional language, use
।:"हमारी technology सबको समझती है।"- If a sentence ends in English, use
.:"प्लान simple है, just execute."
- If a sentence ends in English, use
Writing Conventions
- Write language names in English: Tamil, Telugu, Bengali (not तमिल, तेलुगु)
- Keep brand names in English: Sarvam AI, Google, WhatsApp
6. Target Language Code
The target_language_code parameter is required for every TTS request. It is primarily effective for handling language-specific processing of numbers, abbreviations, and special characters.
Supported Languages
Example
If your text contains mixed languages (e.g. Hinglish), set the target_language_code to the language in which you want entities (e.g numbers) in speech.
7. Understanding the Audio Output (Base64)
The TTS API returns audio data as a base64-encoded string. You must decode this string before saving or playing the audio file.
REST API Response
The REST API returns a response with an audios field — an array of base64-encoded audio strings. You need to decode them:
Python
JavaScript
cURL
Streaming API Response
For the streaming (WebSocket) API, each chunk arrives as a base64-encoded audio string. Decode each chunk as it arrives:
Do not write the raw base64 string directly to a file. The audio will be corrupted and unplayable. Always decode with base64.b64decode() (Python) or Buffer.from(data, "base64") (JavaScript) first.
9. Key Considerations
- For numbers greater than 4 digits, use commas (e.g.,
10,000instead of10000) for correct pronunciation.