This guide demonstrates how to build a real-time voice agent that can listen, understand, and respond naturally using Pipecat for real-time communication and Sarvam AI for speech processing. Perfect for building voice assistants, customer support bots, and conversational AI applications for Indian languages.
A voice agent that can:
pip install pipecat-ai[daily,openai,sarvam] python-dotenv.env file with your API keysCreate a file named .env in your project folder and add your API keys:
Replace the values with your actual API keys.
Create agent.py:
For Daily transport:
The agent will create a Daily room and provide you with a URL to join.
Open the provided Daily room URL in your browser and start speaking. Your voice agent will listen and respond!
Difference: Saarika transcribes speech to text in the same language, while Saaras translates speech directly to English text. Use Saaras when user speaks Indian languages but you want to process/respond in English.
Note: Saaras automatically detects the source language (Hindi, Tamil, etc.) and translates spoken content directly to English text, making Indian language speech comprehensible to English-based LLMs.
Male (23): Shubh (default), Aditya, Rahul, Rohan, Amit, Dev, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham
Female (14): Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali
You can customize the TTS service with additional parameters:
Pipecat uses a pipeline architecture where data flows through a series of processors:
language="unknown" to automatically detect the language. Great for multilingual scenarios!pace to customize the voice delivery speed.gpt-4o-mini for faster responses, or gpt-4o for more complex conversations.API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.
Module not found: Run the installation command again based on your operating system (see Step 2 above).
Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code (en-IN, hi-IN, etc.).
Connection issues: Ensure you have a stable internet connection and the transport is properly configured.
Happy Building!