Build Your First Voice Agent using LiveKit

Overview

This guide demonstrates how to build a real-time voice agent that can listen, understand, and respond naturally using LiveKit for real-time communication and Sarvam AI for speech processing. Perfect for building voice assistants, customer support bots, and conversational AI applications for Indian languages.

What You’ll Build

A voice agent that can:

  • Listen to users speaking (in multiple Indian languages!)
  • Understand and process their requests
  • Respond back in natural-sounding voices

Quick Overview

  1. Get API keys (LiveKit, Sarvam, OpenAI)
  2. Install packages: pip install livekit-agents[sarvam,openai,silero] python-dotenv
  3. Create .env file with your API keys
  4. Write ~40 lines of Python code
  5. Run: python agent.py dev
  6. Test: python agent.py console

Quick Start

1. Prerequisites

2. Install Dependencies

$pip install livekit-agents[sarvam,openai,silero] python-dotenv

3. Create Environment File

Create a file named .env in your project folder and add your API keys:

1LIVEKIT_URL=wss://your-project-xxxxx.livekit.cloud
2LIVEKIT_API_KEY=APIxxxxxxxxxxxxx
3LIVEKIT_API_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
4SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
5OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx

Replace the values with your actual API keys.

4. Write Your Agent

Create agent.py:

1import logging
2from dotenv import load_dotenv
3from livekit.agents import JobContext, WorkerOptions, cli
4from livekit.agents.voice import Agent, AgentSession
5from livekit.plugins import openai, sarvam
6
7# Load environment variables
8load_dotenv()
9
10# Set up logging
11logger = logging.getLogger("voice-agent")
12logger.setLevel(logging.INFO)
13
14
15class VoiceAgent(Agent):
16 def __init__(self) -> None:
17 super().__init__(
18 # Your agent's personality and instructions
19 instructions="""
20 You are a helpful voice assistant.
21 Be friendly, concise, and conversational.
22 Speak naturally as if you're having a real conversation.
23 """,
24
25 # Saarika STT - Converts speech to text
26 stt=sarvam.STT(
27 language="unknown", # Auto-detect language, or use "en-IN", "hi-IN", etc.
28 model="saarika:v2.5"
29 ),
30
31 # OpenAI LLM - The "brain" that processes and generates responses
32 llm=openai.LLM(model="gpt-4o"),
33
34 # Bulbul TTS - Converts text to speech
35 tts=sarvam.TTS(
36 target_language_code="en-IN",
37 model="bulbul:v2",
38 speaker="anushka" # Female: anushka, manisha, vidya, arya | Male: abhilash, karun, hitesh
39 ),
40 )
41
42 async def on_enter(self):
43 """Called when user joins - agent starts the conversation"""
44 self.session.generate_reply()
45
46
47async def entrypoint(ctx: JobContext):
48 """Main entry point - LiveKit calls this when a user connects"""
49 logger.info(f"User connected to room: {ctx.room.name}")
50
51 # Create and start the agent session
52 session = AgentSession()
53 await session.start(
54 agent=VoiceAgent(),
55 room=ctx.room
56 )
57
58
59if __name__ == "__main__":
60 # Run the agent
61 cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

5. Run Your Agent

$python agent.py dev

6. Test Your Agent

In a new terminal, run:

$python agent.py console

That’s it! You’ve built your first voice agent!


Customization Examples

Example 1: Hindi Voice Agent

1stt=sarvam.STT(
2 language="hi-IN", # Hindi
3 model="saarika:v2.5"
4),
5tts=sarvam.TTS(
6 target_language_code="hi-IN",
7 model="bulbul:v2",
8 speaker="manisha" # Or: anushka, vidya, arya, abhilash, karun, hitesh
9)

Example 2: Tamil Voice Agent

1stt=sarvam.STT(language="ta-IN", model="saarika:v2.5"),
2tts=sarvam.TTS(
3 target_language_code="ta-IN",
4 model="bulbul:v2",
5 speaker="anushka"
6)

Example 3: Multilingual Agent (Auto-detect)

1stt=sarvam.STT(language="unknown", model="saarika:v2.5"), # Auto-detects language
2tts=sarvam.TTS(target_language_code="en-IN", model="bulbul:v2", speaker="karun")

Example 4: Speech-to-English Agent (Saaras)

Difference: Saarika transcribes speech to text in the same language, while Saaras translates speech directly to English text. Use Saaras when user speaks Indian languages but you want to process/respond in English.

1# User speaks Hindi → Saaras converts to English → LLM processes → Responds in English
2
3stt=sarvam.STT(model="saaras:v2.5"), # Speech-to-English translation
4llm=openai.LLM(model="gpt-4o"),
5tts=sarvam.TTS(target_language_code="en-IN", model="bulbul:v2", speaker="abhilash")

Note: Saaras automatically detects the source language (Hindi, Tamil, etc.) and translates spoken content directly to English text, making Indian language speech comprehensible to English-based LLMs.


Available Options

Language Codes

LanguageCode
English (India)en-IN
Hindihi-IN
Bengalibn-IN
Tamilta-IN
Telugute-IN
Gujaratigu-IN
Kannadakn-IN
Malayalamml-IN
Marathimr-IN
Punjabipa-IN
Odiaod-IN
Auto-detectunknown

Speaker Voices (Bulbul v2)

Female Voices:

  • anushka - Clear and professional (default)
  • manisha - Warm and friendly
  • vidya - Articulate and precise
  • arya - Young and energetic

Male Voices:

  • abhilash - Deep and authoritative
  • karun - Natural and conversational
  • hitesh - Professional and engaging

Pro Tips

  • Use language="unknown" to automatically detect the language. Great for multilingual scenarios!
  • Sarvam’s models understand code-mixing - your agent can naturally handle Hinglish, Tanglish, and other mixed languages.

Troubleshooting

API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.

Module not found: Run pip install livekit-agents[sarvam,openai,silero] python-dotenv again.

Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code (en-IN, hi-IN, etc.).


Additional Resources


Need Help?


Happy Building!