For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
CommunityAPI StatusAPI PricingSign Up
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
  • Getting Started
    • Welcome
    • Quickstart
    • SDKs & Libraries
    • Building for Indian Languages
    • Models
    • Credits & Rate Limits
    • Errors & Troubleshooting
    • Talk to us
    • Pricing
    • Changelog
  • API Guides & Tutorials
      • Overview
      • Which API to Use
      • Rest API
      • Pronunciation Dictionary
      • Best Practices
        • Set the Language
        • Change the Speaker Voice
        • Adjust the Tone
        • Adjust the Speed
        • Adjust the Loudness
        • Set the Sample rate
        • Enable Text Preprocessing
        • Set audio format for output
        • Set bitrate for output
        • Set maximum length for sentence splitting
        • Set buffer size to start processing
LogoLogo
CommunityAPI StatusAPI PricingSign Up
On this page
  • Parameter Details
  • How It Works
  • Manual Flush Option
  • Practical Example
  • Example Streaming API code
API Guides & TutorialsText to SpeechHow-to

How to set buffer size to start processing in Streaming TTS with min_buffer_size

||View as Markdown|
Was this page helpful?
Previous

Text Processing Overview

Next
Built with

The min_buffer_size parameter sets the minimum number of characters that must accumulate in the buffer before the TTS engine begins converting text into audio. We buffer incoming text until it reaches this threshold before processing and chunking begins.

This parameter is available only in the WebSocket Streaming API. It is not supported in the REST API.

This helps balance latency with natural sentence completion during real-time TTS streaming.

Parameter Details

  • Type: Integer
  • Range: 30 to 200
  • Default: 50
  • Purpose: Minimum character length that triggers buffer flushing and TTS processing.

How It Works

  • When the buffer reaches min_buffer_size, the text is automatically processed and streamed as audio.
  • If the buffer does not reach the threshold, the text is held until:
    • More characters arrive, or
    • A flush command is sent.

Manual Flush Option

A flush command forces the TTS engine to immediately process the current buffer — even if it hasn’t reached the min_buffer_size.


Practical Example

Suppose min_buffer_size = 50, and you send an 80-character sentence in two parts:

Input ChunkCharactersResult
First part60Processed immediately
Second part20Held in buffer

To process the remaining 20 characters, send a flush message.


Example Streaming API code

1import asyncio
2import base64
3from sarvamai import AsyncSarvamAI, AudioOutput
4import websockets
5
6async def tts_stream():
7 client = AsyncSarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
8
9 async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
10 await ws.configure(
11 target_language_code="hi-IN",
12 speaker="shubh",
13 min_buffer_size= 80
14 )
15 print("Sent configuration")
16
17 text = (
18 "भारत की संस्कृति विश्व की सबसे प्राचीन और समृद्ध संस्कृतियों में से एक है।"
19 "यह विविधता, सहिष्णुता और परंपराओं का अद्भुत संगम है, "
20 "जिसमें विभिन्न धर्म, भाषाएं, त्योहार, संगीत, नृत्य, वास्तुकला और जीवनशैली शामिल हैं।"
21 )
22
23
24 await ws.convert(text)
25 print("Sent text message")
26
27 await ws.flush()
28 print("Flushed buffer")
29
30 chunk_count = 0
31 with open("output.mp3", "wb") as f:
32 async for message in ws:
33 if isinstance(message, AudioOutput):
34 chunk_count += 1
35 audio_chunk = base64.b64decode(message.data.audio)
36 f.write(audio_chunk)
37 f.flush()
38
39 print(f"All {chunk_count} chunks saved to output.mp3")
40 print("Audio generation complete")
41
42
43 if hasattr(ws, "_websocket") and not ws._websocket.closed:
44 await ws._websocket.close()
45 print("WebSocket connection closed.")
46
47
48if __name__ == "__main__":
49 asyncio.run(tts_stream())
50
51# --- Notebook/Colab usage ---
52# await tts_stream()