Tutor Agent using Pipecat
Overview
This guide demonstrates how to build a voice-based tutor agent that can teach, explain concepts, and help students with various subjects using Pipecat for real-time communication and Sarvam AI for speech processing. Perfect for EdTech platforms, online tutoring, and educational applications serving Indian students.
What You’ll Build
A tutor agent that can:
- Explain concepts in simple, student-friendly language
- Help students solve problems step by step
- Answer questions across various subjects
- Adapt explanations to the student’s level of understanding
- Communicate in multiple Indian languages
Quick Overview
- Get API keys (Sarvam, OpenAI)
- Install packages
- Create
.envfile with your API keys - Write the agent code
- Run with appropriate transport
Quick Start
1. Prerequisites
- Python 3.9 or higher
- API keys from:
2. Install Dependencies
macOS/Linux
Windows
3. Create Environment File
Create a file named .env in your project folder and add your API keys:
Replace the values with your actual API keys.
4. Write Your Agent
Create tutor_agent.py:
5. Run Your Agent
The agent will create a Daily room and provide you with a URL to join.
6. Test Your Agent
Open the provided Daily room URL in your browser and start speaking. Your tutor will listen and respond!
Customization Examples
Example 1: Hindi Tutor
For Hindi-medium students:
Example 2: Tamil Tutor
Example 3: Multilingual Tutor (Auto-detect)
For diverse student populations:
Example 4: Speech-to-English Tutor (Saaras)
When students speak in regional languages but you want English processing:
Available Options
Language Codes
Speaker Voices (Bulbul v2)
Female Voices:
anushka- Clear and professional (default)manisha- Warm and friendlyvidya- Articulate and precise (recommended for teaching)arya- Young and energetic
Male Voices:
abhilash- Deep and authoritativekarun- Natural and conversationalhitesh- Professional and engaging
TTS Additional Parameters
Customize the voice for better teaching experience:
Understanding the Pipeline
Pipecat uses a pipeline architecture where data flows through a series of processors:
- Transport Input: Receives audio from the student
- STT (Speech-to-Text): Converts audio to text using Sarvam’s Saarika
- Context Aggregator (User): Adds student’s question to conversation context
- LLM: Generates educational response using OpenAI
- TTS (Text-to-Speech): Converts response to audio using Sarvam’s Bulbul
- Transport Output: Sends audio back to the student
- Context Aggregator (Assistant): Saves tutor’s response to context
Pro Tips
- Use
language="unknown"to support students who code-mix (Hinglish, Tanglish, etc.) - Use a clear, articulate voice like
vidyafor teaching - Set a slightly slower pace (0.9) for complex explanations
- Use
gpt-4ofor better reasoning on complex problems - Encourage students to ask follow-up questions
Troubleshooting
API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.
Module not found: Run the installation command again based on your operating system.
Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code.
Connection issues: Ensure you have a stable internet connection and the transport is properly configured.
Additional Resources
Need Help?
- Sarvam Support: developer@sarvam.ai
- Community: Join the Discord Community
Happy Building!