Build Your First Voice Agent using LiveKit
Overview
This guide demonstrates how to build a real-time voice agent that can listen, understand, and respond naturally using LiveKit for real-time communication and Sarvam AI for speech processing. Perfect for building voice assistants, customer support bots, and conversational AI applications for Indian languages.
What You’ll Build
A voice agent that can:
- Listen to users speaking (in multiple Indian languages!)
- Understand and process their requests
- Respond back in natural-sounding voices
Quick Overview
- Get API keys (LiveKit, Sarvam, OpenAI)
- Install packages:
pip install livekit-agents[sarvam,openai,silero] python-dotenv - Create
.envfile with your API keys - Write ~40 lines of Python code
- Run:
python agent.py dev - Test:
python agent.py console
Quick Start
1. Prerequisites
- Python 3.9 or higher
- API keys from:
- LiveKit Cloud (free account)
- Sarvam AI (get API key from dashboard)
- OpenAI (create new secret key)
2. Install Dependencies
macOS/Linux
Windows
3. Create Environment File
Create a file named .env in your project folder and add your API keys:
Replace the values with your actual API keys.
4. Write Your Agent
Create agent.py:
5. Run Your Agent
6. Test Your Agent
In a new terminal, run:
That’s it! You’ve built your first voice agent!
Customization Examples
Example 1: Hindi Voice Agent
Example 2: Tamil Voice Agent
Example 3: Multilingual Agent (Auto-detect)
Example 4: Speech-to-English Agent (Saaras)
Difference: Saarika transcribes speech to text in the same language, while Saaras translates speech directly to English text. Use Saaras when user speaks Indian languages but you want to process/respond in English.
Note: Saaras automatically detects the source language (Hindi, Tamil, etc.) and translates spoken content directly to English text, making Indian language speech comprehensible to English-based LLMs.
Available Options
Language Codes
Speaker Voices (Bulbul v2)
Female Voices:
anushka- Clear and professional (default)manisha- Warm and friendlyvidya- Articulate and precisearya- Young and energetic
Male Voices:
abhilash- Deep and authoritativekarun- Natural and conversationalhitesh- Professional and engaging
Pro Tips
- Use
language="unknown"to automatically detect the language. Great for multilingual scenarios! - Sarvam’s models understand code-mixing - your agent can naturally handle Hinglish, Tanglish, and other mixed languages.
Best Practices
When using Sarvam AI plugins with LiveKit, follow these recommendations for optimal performance:
1. Do Not Pass VAD to AgentSession
The vad parameter should not be passed to AgentSession as Voice Activity Detection is handled internally by the Sarvam plugin.
2. Enable Flush Signal in STT
Add flush_signal=True to the STT configuration. This enables the plugin to emit start and end of speech events, which is essential for proper turn-taking.
3. Set Turn Detection to STT
Add turn_detection="stt" to the AgentSession configuration. This ensures turn detection is handled by the Sarvam plugin, which emits start and end of speech signals.
4. Configure Min Endpointing Delay
Set min_endpointing_delay=0.07 in your AgentSession. The Sarvam STT plugin has a processing latency of approximately 70ms. This setting ensures the agent transitions to the next pipeline step (LLM) as soon as STT finishes processing, minimizing response delay.
Complete Optimized Example
Here’s a complete example incorporating all best practices:
Troubleshooting
API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.
Module not found: Run the installation command again based on your operating system (see Step 2 above).
Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code (en-IN, hi-IN, etc.).
Additional Resources
Need Help?
- Sarvam Support: developer@sarvam.ai
- Community: Join the Discord Community
Happy Building!