Tutor Agent using Pipecat

Overview

This guide demonstrates how to build a voice-based tutor agent that can teach, explain concepts, and help students with various subjects using Pipecat for real-time communication and Sarvam AI for speech processing. Perfect for EdTech platforms, online tutoring, and educational applications serving Indian students.

What You’ll Build

A tutor agent that can:

Explain concepts in simple, student-friendly language
Help students solve problems step by step
Answer questions across various subjects
Adapt explanations to the student’s level of understanding
Communicate in multiple Indian languages

Quick Overview

Get API keys (Sarvam, OpenAI)
Install packages
Create .env file with your API keys
Write the agent code
Run with appropriate transport

Quick Start

1. Prerequisites

Python 3.9 or higher
API keys from:
- Sarvam AI (get API key from dashboard)
- OpenAI (create new secret key)

2. Install Dependencies

macOS/Linux

Windows

$ pip install "pipecat-ai[daily,openai]" python-dotenv loguru

3. Create Environment File

Create a file named .env in your project folder and add your API keys:

1 SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
2 OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx

Replace the values with your actual API keys.

4. Write Your Agent

Create tutor_agent.py:

1 import os
2 from dotenv import load_dotenv
3 from loguru import logger
4 from pipecat.frames.frames import LLMRunFrame
5 from pipecat.pipeline.pipeline import Pipeline
6 from pipecat.pipeline.runner import PipelineRunner
7 from pipecat.pipeline.task import PipelineTask
8 from pipecat.processors.aggregators.llm_context import LLMContext
9 from pipecat.processors.aggregators.llm_response_universal import (
10     LLMContextAggregatorPair,
11 )
12 from pipecat.runner.types import RunnerArguments
13 from pipecat.runner.utils import create_transport
14 from pipecat.services.sarvam.stt import SarvamSTTService
15 from pipecat.services.sarvam.tts import SarvamTTSService
16 from pipecat.services.openai.llm import OpenAILLMService
17 from pipecat.transports.base_transport import TransportParams
18 from pipecat.transports.daily.transport import DailyParams
19 
20 load_dotenv(override=True)
21 
22 async def bot(runner_args: RunnerArguments):
23     """Main bot entry point."""
24     
25     # Create transport (supports both Daily and WebRTC)
26     transport = await create_transport(
27         runner_args,
28         {
29             "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
30             "webrtc": lambda: TransportParams(
31                 audio_in_enabled=True, audio_out_enabled=True
32             ),
33         },
34     )
35 
36     # Initialize AI services
37     stt = SarvamSTTService(
38         api_key=os.getenv("SARVAM_API_KEY"),
39         language="unknown",  # Auto-detect for multilingual students
40         model="saarika:v2.5"
41     )
42     
43     tts = SarvamTTSService(
44         api_key=os.getenv("SARVAM_API_KEY"),
45         target_language_code="en-IN",
46         model="bulbul:v2",
47         speaker="vidya"  # Clear and articulate voice for teaching
48     )
49     
50     llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
51 
52     # Set up conversation context with tutor personality
53     messages = [
54         {
55             "role": "system",
56             "content": """You are an expert tutor designed to help students understand and excel in their studies.
57 
58 Your teaching expertise covers multiple subjects:
59 
60 **Mathematics:**
61 - Arithmetic, Algebra, Geometry, Trigonometry
62 - Calculus, Statistics, Probability
63 - Problem-solving techniques
64 
65 **Science:**
66 - Physics: Mechanics, Electricity, Optics, Thermodynamics
67 - Chemistry: Elements, Reactions, Organic Chemistry
68 - Biology: Cell Biology, Human Anatomy, Ecology
69 
70 **Languages:**
71 - English Grammar and Composition
72 - Hindi Grammar and Literature
73 - Reading Comprehension
74 
75 **Social Studies:**
76 - History, Geography, Civics
77 - Economics basics
78 
79 Teaching approach:
80 - Start with the basics and build up to complex concepts
81 - Use real-world examples and analogies to explain abstract concepts
82 - Break down complex problems into smaller, manageable steps
83 - Encourage students and praise their efforts
84 - Ask questions to check understanding
85 - Adapt your explanations based on the student's level
86 - Use simple language and avoid overwhelming with jargon
87 - When solving numerical problems, show each step clearly
88 
89 Communication style:
90 - Be patient, encouraging, and supportive
91 - Speak clearly and at a moderate pace
92 - Celebrate small victories and correct mistakes gently
93 - If a student is struggling, try a different explanation approach
94 - Make learning interesting by connecting it to everyday life
95 
96 Start by greeting the student warmly and asking what subject or topic they'd like to learn or what problem they need help with.""",
97         },
98     ]
99     context = LLMContext(messages)
100     context_aggregator = LLMContextAggregatorPair(context)
101 
102     # Build pipeline
103     pipeline = Pipeline(
104         [
105             transport.input(),
106             stt,
107             context_aggregator.user(),
108             llm,
109             tts,
110             transport.output(),
111             context_aggregator.assistant(),
112         ]
113     )
114 
115     task = PipelineTask(pipeline)
116 
117     @transport.event_handler("on_client_connected")
118     async def on_client_connected(transport, client):
119         logger.info("Student connected")
120         messages.append(
121             {"role": "system", "content": "Greet the student warmly and ask what subject or topic they'd like to learn today."}
122         )
123         await task.queue_frames([LLMRunFrame()])
124 
125     @transport.event_handler("on_client_disconnected")
126     async def on_client_disconnected(transport, client):
127         logger.info("Student disconnected")
128         await task.cancel()
129 
130     runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
131     await runner.run(task)
132 
133 if __name__ == "__main__":
134     from pipecat.runner.run import main
135     main()

5. Run Your Agent

$ python tutor_agent.py

The agent will create a Daily room and provide you with a URL to join.

6. Test Your Agent

Open the provided Daily room URL in your browser and start speaking. Your tutor will listen and respond!

Customization Examples

Example 1: Hindi Tutor

For Hindi-medium students:

1 stt = SarvamSTTService(
2     api_key=os.getenv("SARVAM_API_KEY"),
3     language="hi-IN",  # Hindi
4     model="saarika:v2.5"
5 )
6 
7 tts = SarvamTTSService(
8     api_key=os.getenv("SARVAM_API_KEY"),
9     target_language_code="hi-IN",
10     model="bulbul:v2",
11     speaker="manisha"  # Warm and friendly teacher voice
12 )
13 
14 llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 2: Tamil Tutor

1 stt = SarvamSTTService(
2     api_key=os.getenv("SARVAM_API_KEY"),
3     language="ta-IN",
4     model="saarika:v2.5"
5 )
6 
7 tts = SarvamTTSService(
8     api_key=os.getenv("SARVAM_API_KEY"),
9     target_language_code="ta-IN",
10     model="bulbul:v2",
11     speaker="vidya"
12 )
13 
14 llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 3: Multilingual Tutor (Auto-detect)

For diverse student populations:

1 stt = SarvamSTTService(
2     api_key=os.getenv("SARVAM_API_KEY"),
3     language="unknown",  # Auto-detects language
4     model="saarika:v2.5"
5 )
6 
7 tts = SarvamTTSService(
8     api_key=os.getenv("SARVAM_API_KEY"),
9     target_language_code="en-IN",
10     model="bulbul:v2",
11     speaker="vidya"
12 )
13 
14 llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 4: Speech-to-English Tutor (Saaras)

When students speak in regional languages but you want English processing:

1 # Student speaks Hindi/Tamil/etc. → Saaras converts to English → LLM processes
2 
3 stt = SarvamSTTService(
4     api_key=os.getenv("SARVAM_API_KEY"),
5     model="saaras:v2.5"  # Speech-to-English translation
6 )
7 
8 tts = SarvamTTSService(
9     api_key=os.getenv("SARVAM_API_KEY"),
10     target_language_code="en-IN",
11     model="bulbul:v2",
12     speaker="vidya"
13 )
14 
15 llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Available Options

Language Codes

Language	Code
English (India)	`en-IN`
Hindi	`hi-IN`
Bengali	`bn-IN`
Tamil	`ta-IN`
Telugu	`te-IN`
Gujarati	`gu-IN`
Kannada	`kn-IN`
Malayalam	`ml-IN`
Marathi	`mr-IN`
Punjabi	`pa-IN`
Odia	`od-IN`
Auto-detect	`unknown`

Speaker Voices (Bulbul v2)

Female Voices:

anushka - Clear and professional (default)
manisha - Warm and friendly
vidya - Articulate and precise (recommended for teaching)
arya - Young and energetic

Male Voices:

abhilash - Deep and authoritative
karun - Natural and conversational
hitesh - Professional and engaging

TTS Additional Parameters

Customize the voice for better teaching experience:

1 tts = SarvamTTSService(
2     api_key=os.getenv("SARVAM_API_KEY"),
3     target_language_code="en-IN",
4     model="bulbul:v2",
5     speaker="vidya",
6     pitch=0.0,           # Range: -1.0 to 1.0
7     pace=0.9,            # Slightly slower for better understanding
8     loudness=1.5,        # Range: 0.5 to 2.0
9     speech_sample_rate=16000  # 8000, 16000, or 24000 Hz
10 )

Understanding the Pipeline

Pipecat uses a pipeline architecture where data flows through a series of processors:

Student Audio → STT → Context Aggregator → LLM → TTS → Audio Output

Transport Input: Receives audio from the student
STT (Speech-to-Text): Converts audio to text using Sarvam’s Saarika
Context Aggregator (User): Adds student’s question to conversation context
LLM: Generates educational response using OpenAI
TTS (Text-to-Speech): Converts response to audio using Sarvam’s Bulbul
Transport Output: Sends audio back to the student
Context Aggregator (Assistant): Saves tutor’s response to context

Pro Tips

Use language="unknown" to support students who code-mix (Hinglish, Tanglish, etc.)
Use a clear, articulate voice like vidya for teaching
Set a slightly slower pace (0.9) for complex explanations
Use gpt-4o for better reasoning on complex problems
Encourage students to ask follow-up questions

Troubleshooting

API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.

Module not found: Run the installation command again based on your operating system.

Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code.

Connection issues: Ensure you have a stable internet connection and the transport is properly configured.

Additional Resources

Need Help?

Sarvam Support: developer@sarvam.ai
Community: Join the Discord Community

Happy Building!