Build Your First Voice Agent using LiveKit

Overview

This guide demonstrates how to build a real-time voice agent that can listen, understand, and respond naturally using LiveKit for real-time communication and Sarvam AI for speech processing. Perfect for building voice assistants, customer support bots, and conversational AI applications for Indian languages.

What You’ll Build

A voice agent that can:

Listen to users speaking (in multiple Indian languages!)
Understand and process their requests
Respond back in natural-sounding voices

Quick Overview

Get API keys (LiveKit, Sarvam, OpenAI)
Install packages: pip install livekit-agents[sarvam,openai,silero] python-dotenv
Create .env file with your API keys
Write ~40 lines of Python code
Run: python agent.py dev
Test: python agent.py console

Quick Start

1. Prerequisites

Python 3.9 or higher
API keys from:
- LiveKit Cloud (free account)
- Sarvam AI (get API key from dashboard)
- OpenAI (create new secret key)

2. Install Dependencies

$ pip install livekit-agents[sarvam,openai,silero] python-dotenv

3. Create Environment File

Create a file named .env in your project folder and add your API keys:

1 LIVEKIT_URL=wss://your-project-xxxxx.livekit.cloud
2 LIVEKIT_API_KEY=APIxxxxxxxxxxxxx
3 LIVEKIT_API_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
4 SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
5 OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx

Replace the values with your actual API keys.

4. Write Your Agent

Create agent.py:

1 import logging
2 from dotenv import load_dotenv
3 from livekit.agents import JobContext, WorkerOptions, cli
4 from livekit.agents.voice import Agent, AgentSession
5 from livekit.plugins import openai, sarvam
6 
7 # Load environment variables
8 load_dotenv()
9 
10 # Set up logging
11 logger = logging.getLogger("voice-agent")
12 logger.setLevel(logging.INFO)
13 
14 
15 class VoiceAgent(Agent):
16     def __init__(self) -> None:
17         super().__init__(
18             # Your agent's personality and instructions
19             instructions="""
20                 You are a helpful voice assistant.
21                 Be friendly, concise, and conversational.
22                 Speak naturally as if you're having a real conversation.
23             """,
24             
25             # Saarika STT - Converts speech to text
26             stt=sarvam.STT(
27                 language="unknown",  # Auto-detect language, or use "en-IN", "hi-IN", etc.
28                 model="saarika:v2.5"
29             ),
30             
31             # OpenAI LLM - The "brain" that processes and generates responses
32             llm=openai.LLM(model="gpt-4o"),
33             
34             # Bulbul TTS - Converts text to speech
35             tts=sarvam.TTS(
36                 target_language_code="en-IN",
37                 model="bulbul:v2",
38                 speaker="anushka"  # Female: anushka, manisha, vidya, arya | Male: abhilash, karun, hitesh
39             ),
40         )
41     
42     async def on_enter(self):
43         """Called when user joins - agent starts the conversation"""
44         self.session.generate_reply()
45 
46 
47 async def entrypoint(ctx: JobContext):
48     """Main entry point - LiveKit calls this when a user connects"""
49     logger.info(f"User connected to room: {ctx.room.name}")
50     
51     # Create and start the agent session
52     session = AgentSession()
53     await session.start(
54         agent=VoiceAgent(),
55         room=ctx.room
56     )
57 
58 
59 if __name__ == "__main__":
60     # Run the agent
61     cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

5. Run Your Agent

$ python agent.py dev

6. Test Your Agent

In a new terminal, run:

$ python agent.py console

That’s it! You’ve built your first voice agent!

Customization Examples

Example 1: Hindi Voice Agent

1 stt=sarvam.STT(
2     language="hi-IN",  # Hindi
3     model="saarika:v2.5"
4 ),
5 tts=sarvam.TTS(
6     target_language_code="hi-IN",
7     model="bulbul:v2",
8     speaker="manisha"  # Or: anushka, vidya, arya, abhilash, karun, hitesh
9 )

Example 2: Tamil Voice Agent

1 stt=sarvam.STT(language="ta-IN", model="saarika:v2.5"),
2 tts=sarvam.TTS(
3     target_language_code="ta-IN",
4     model="bulbul:v2",
5     speaker="anushka"
6 )

Example 3: Multilingual Agent (Auto-detect)

1 stt=sarvam.STT(language="unknown", model="saarika:v2.5"),  # Auto-detects language
2 tts=sarvam.TTS(target_language_code="en-IN", model="bulbul:v2", speaker="karun")

Example 4: Speech-to-English Agent (Saaras)

Difference: Saarika transcribes speech to text in the same language, while Saaras translates speech directly to English text. Use Saaras when user speaks Indian languages but you want to process/respond in English.

1 # User speaks Hindi → Saaras converts to English → LLM processes → Responds in English
2 
3 stt=sarvam.STT(model="saaras:v2.5"),  # Speech-to-English translation
4 llm=openai.LLM(model="gpt-4o"),
5 tts=sarvam.TTS(target_language_code="en-IN", model="bulbul:v2", speaker="abhilash")

Note: Saaras automatically detects the source language (Hindi, Tamil, etc.) and translates spoken content directly to English text, making Indian language speech comprehensible to English-based LLMs.

Available Options

Language Codes

Language	Code
English (India)	`en-IN`
Hindi	`hi-IN`
Bengali	`bn-IN`
Tamil	`ta-IN`
Telugu	`te-IN`
Gujarati	`gu-IN`
Kannada	`kn-IN`
Malayalam	`ml-IN`
Marathi	`mr-IN`
Punjabi	`pa-IN`
Odia	`od-IN`
Auto-detect	`unknown`

Speaker Voices (Bulbul v2)

Female Voices:

anushka - Clear and professional (default)
manisha - Warm and friendly
vidya - Articulate and precise
arya - Young and energetic

Male Voices:

abhilash - Deep and authoritative
karun - Natural and conversational
hitesh - Professional and engaging

Pro Tips

Use language="unknown" to automatically detect the language. Great for multilingual scenarios!
Sarvam’s models understand code-mixing - your agent can naturally handle Hinglish, Tanglish, and other mixed languages.

Troubleshooting

API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.

Module not found: Run pip install livekit-agents[sarvam,openai,silero] python-dotenv again.

Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code (en-IN, hi-IN, etc.).

Additional Resources

Need Help?

Sarvam Support: support@sarvam.ai
Community: Join the Discord Community

Happy Building!