How to set buffer size to start processing in Streaming TTS with min_buffer_size

The min_buffer_size parameter sets the minimum number of characters that must accumulate in the buffer before the TTS engine begins converting text into audio. We buffer incoming text until it reaches this threshold before processing and chunking begins.

This helps balance latency with natural sentence completion during real-time TTS streaming.

Parameter Details

Type: Integer
Range: 30 to 200
Default: 50
Purpose: Minimum character length that triggers buffer flushing and TTS processing.

How It Works

When the buffer reaches min_buffer_size, the text is automatically processed and streamed as audio.
If the buffer does not reach the threshold, the text is held until:
- More characters arrive, or
- A flush command is sent.

Manual Flush Option

A flush command forces the TTS engine to immediately process the current buffer — even if it hasn’t reached the min_buffer_size.

Practical Example

Suppose min_buffer_size = 50, and you send an 80-character sentence in two parts:

Input Chunk	Characters	Result
First part	60	Processed immediately
Second part	20	Held in buffer

To process the remaining 20 characters, send a flush message.

Example Streaming API code

1 import asyncio
2 import base64
3 from sarvamai import AsyncSarvamAI, AudioOutput
4 import websockets
5 
6 async def tts_stream():
7     client = AsyncSarvamAI(api_subscription_key="YOUR_API_KEY")
8 
9     async with client.text_to_speech_streaming.connect(model="bulbul:v2") as ws:
10         await ws.configure(
11             target_language_code="hi-IN", 
12             speaker= "anushka",
13             min_buffer_size= 80
14         )
15         print("Sent configuration")
16 
17         text = (
18             "भारत की संस्कृति विश्व की सबसे प्राचीन और समृद्ध संस्कृतियों में से एक है।"
19             "यह विविधता, सहिष्णुता और परंपराओं का अद्भुत संगम है, "
20             "जिसमें विभिन्न धर्म, भाषाएं, त्योहार, संगीत, नृत्य, वास्तुकला और जीवनशैली शामिल हैं।"
21         )
22 
23 
24         await ws.convert(text)
25         print("Sent text message")
26 
27         await ws.flush()
28         print("Flushed buffer")
29 
30         chunk_count = 0
31         with open("output.mp3", "wb") as f:
32             async for message in ws:
33                 if isinstance(message, AudioOutput):
34                     chunk_count += 1
35                     audio_chunk = base64.b64decode(message.data.audio)
36                     f.write(audio_chunk)
37                     f.flush()
38 
39         print(f"All {chunk_count} chunks saved to output.mp3")
40         print("Audio generation complete")
41 
42         
43         if hasattr(ws, "_websocket") and not ws._websocket.closed:
44             await ws._websocket.close()
45             print("WebSocket connection closed.")
46 
47 
48 if __name__ == "__main__":
49     asyncio.run(tts_stream())
50 
51 # --- Notebook/Colab usage ---
52 # await tts_stream()