This guide demonstrates how to build a real-time voice agent that can listen, understand, and respond naturally using LiveKit for real-time communication and Sarvam AI for speech processing. Perfect for building voice assistants, customer support bots, and conversational AI applications for Indian languages.
A voice agent that can:
pip install livekit-agents[sarvam,openai,silero] python-dotenv.env file with your API keyspython agent.py devpython agent.py consoleCreate a file named .env in your project folder and add your API keys:
Replace the values with your actual API keys.
Create agent.py:
In a new terminal, run:
That’s it! You’ve built your first voice agent!
Difference: Saaras v3 handles both transcription (same-language output) and translation (English output) via the mode parameter. Use mode="translate" when user speaks Indian languages but you want to process/respond in English.
Note: Saaras v3 with mode="translate" automatically detects the source language (Hindi, Tamil, etc.) and translates spoken content directly to English text, making Indian language speech comprehensible to English-based LLMs.
Male (23): Shubh (default), Aditya, Rahul, Rohan, Amit, Dev, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham
Female (14): Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali
language="unknown" to automatically detect the language. Great for multilingual scenarios!When using Sarvam AI plugins with LiveKit, follow these recommendations for optimal performance:
The vad parameter should not be passed to AgentSession as Voice Activity Detection is handled internally by the Sarvam plugin.
Add flush_signal=True to the STT configuration. This enables the plugin to emit start and end of speech events, which is essential for proper turn-taking.
Add turn_detection="stt" to the AgentSession configuration. This ensures turn detection is handled by the Sarvam plugin, which emits start and end of speech signals.
Set min_endpointing_delay=0.07 in your AgentSession. The Sarvam STT plugin has a processing latency of approximately 70ms. This setting ensures the agent transitions to the next pipeline step (LLM) as soon as STT finishes processing, minimizing response delay.
Here’s a complete example incorporating all best practices:
API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.
Module not found: Run the installation command again based on your operating system (see Step 2 above).
Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code (en-IN, hi-IN, etc.).
Happy Building!