How to set buffer size to start processing in Streaming TTS with min_buffer_size

The min_buffer_size parameter sets the minimum number of characters that must accumulate in the buffer before the TTS engine begins converting text into audio. We buffer incoming text until it reaches this threshold before processing and chunking begins.

This helps balance latency with natural sentence completion during real-time TTS streaming.

Parameter Details

  • Type: Integer
  • Range: 30 to 200
  • Default: 50
  • Purpose: Minimum character length that triggers buffer flushing and TTS processing.

How It Works

  • When the buffer reaches min_buffer_size, the text is automatically processed and streamed as audio.
  • If the buffer does not reach the threshold, the text is held until:
    • More characters arrive, or
    • A flush command is sent.

Manual Flush Option

A flush command forces the TTS engine to immediately process the current buffer — even if it hasn’t reached the min_buffer_size.


Practical Example

Suppose min_buffer_size = 50, and you send an 80-character sentence in two parts:

Input ChunkCharactersResult
First part60Processed immediately
Second part20Held in buffer

To process the remaining 20 characters, send a flush message.


Example Streaming API code

1import asyncio
2import base64
3from sarvamai import AsyncSarvamAI, AudioOutput
4import websockets
5
6async def tts_stream():
7 client = AsyncSarvamAI(api_subscription_key="YOUR_API_KEY")
8
9 async with client.text_to_speech_streaming.connect(model="bulbul:v2") as ws:
10 await ws.configure(
11 target_language_code="hi-IN",
12 speaker= "anushka",
13 min_buffer_size= 80
14 )
15 print("Sent configuration")
16
17 text = (
18 "भारत की संस्कृति विश्व की सबसे प्राचीन और समृद्ध संस्कृतियों में से एक है।"
19 "यह विविधता, सहिष्णुता और परंपराओं का अद्भुत संगम है, "
20 "जिसमें विभिन्न धर्म, भाषाएं, त्योहार, संगीत, नृत्य, वास्तुकला और जीवनशैली शामिल हैं।"
21 )
22
23
24 await ws.convert(text)
25 print("Sent text message")
26
27 await ws.flush()
28 print("Flushed buffer")
29
30 chunk_count = 0
31 with open("output.mp3", "wb") as f:
32 async for message in ws:
33 if isinstance(message, AudioOutput):
34 chunk_count += 1
35 audio_chunk = base64.b64decode(message.data.audio)
36 f.write(audio_chunk)
37 f.flush()
38
39 print(f"All {chunk_count} chunks saved to output.mp3")
40 print("Audio generation complete")
41
42
43 if hasattr(ws, "_websocket") and not ws._websocket.closed:
44 await ws._websocket.close()
45 print("WebSocket connection closed.")
46
47
48if __name__ == "__main__":
49 asyncio.run(tts_stream())
50
51# --- Notebook/Colab usage ---
52# await tts_stream()