Streaming Text-to-Speech API
Real-time Processing
Real-time conversion of text into spoken audio, where the audio is generated and played back progressively as the text is being processed.- Efficient for long texts
- Real-time conversion
- Handle multiple requests easily
- Low latency audio generation and faster responses
Features
Low Latency Playback
- Audio starts playing immediately as the text is processed
- Speaks dynamic or live content as it comes in
Language Support
- Multiple Indian languages and English support
- Language code specification (e.g., “kn-IN” for Kannada)
- High accuracy transcription
Efficient Resource Usage
Streams small chunks of audio instead of generating everything at once.
Uses less memory and keeps performance stable even with long texts.
Integration
- Python and JavaScript SDK with async support
- WebSocket connections
- Easy-to-use API interface
Code Examples
Best Practices
- Always send the config message first
- Use flush messages strategically to ensure complete text processing
- Send ping messages to maintain long-running connections
End of Speech Signal
The TTS streaming API now supports an end of speech signal that allows for clean stream termination when speech generation is complete.
Using send_completion_event
When you set send_completion_event=True in the connection, the API will send a completion event when speech generation ends, allowing your application to handle stream termination gracefully.
Python
Streaming TTS WebSocket – Integration Guide
Easily convert text to speech in real time using Sarvam’s low-latency WebSocket-based TTS API.
Input Message Types
Config Message
Text Message
Flush Message
Ping Message
Sets up voice parameters and must be the first message sent after connection. Parameters:
min_buffer_size: Minimum character length that triggers buffer flushing for TTS model processingmax_chunk_length: Maximum length for sentence splitting (adjust based on content length)output_audio_codec: Supports multiple formats:mp3,wav,aac,opus,flac,pcm(LINEAR16),mulaw(μ-law), andalaw(A-law)output_audio_bitrate: Choose from 5 supported bitrate options