Tutor Agent using Pipecat

Overview

This guide demonstrates how to build a voice-based tutor agent that can teach, explain concepts, and help students with various subjects using Pipecat for real-time communication and Sarvam AI for speech processing. Perfect for EdTech platforms, online tutoring, and educational applications serving Indian students.

What You’ll Build

A tutor agent that can:

  • Explain concepts in simple, student-friendly language
  • Help students solve problems step by step
  • Answer questions across various subjects
  • Adapt explanations to the student’s level of understanding
  • Communicate in multiple Indian languages

Quick Overview

  1. Get API keys (Sarvam, OpenAI)
  2. Install packages
  3. Create .env file with your API keys
  4. Write the agent code
  5. Run with appropriate transport

Quick Start

1. Prerequisites

  • Python 3.9 or higher
  • API keys from:

2. Install Dependencies

$pip install "pipecat-ai[daily,openai]" python-dotenv loguru

3. Create Environment File

Create a file named .env in your project folder and add your API keys:

1SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
2OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx

Replace the values with your actual API keys.

4. Write Your Agent

Create tutor_agent.py:

1import os
2from dotenv import load_dotenv
3from loguru import logger
4from pipecat.frames.frames import LLMRunFrame
5from pipecat.pipeline.pipeline import Pipeline
6from pipecat.pipeline.runner import PipelineRunner
7from pipecat.pipeline.task import PipelineTask
8from pipecat.processors.aggregators.llm_context import LLMContext
9from pipecat.processors.aggregators.llm_response_universal import (
10 LLMContextAggregatorPair,
11)
12from pipecat.runner.types import RunnerArguments
13from pipecat.runner.utils import create_transport
14from pipecat.services.sarvam.stt import SarvamSTTService
15from pipecat.services.sarvam.tts import SarvamTTSService
16from pipecat.services.openai.llm import OpenAILLMService
17from pipecat.transports.base_transport import TransportParams
18from pipecat.transports.daily.transport import DailyParams
19
20load_dotenv(override=True)
21
22async def bot(runner_args: RunnerArguments):
23 """Main bot entry point."""
24
25 # Create transport (supports both Daily and WebRTC)
26 transport = await create_transport(
27 runner_args,
28 {
29 "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
30 "webrtc": lambda: TransportParams(
31 audio_in_enabled=True, audio_out_enabled=True
32 ),
33 },
34 )
35
36 # Initialize AI services
37 stt = SarvamSTTService(
38 api_key=os.getenv("SARVAM_API_KEY"),
39 language="unknown", # Auto-detect for multilingual students
40 model="saarika:v2.5"
41 )
42
43 tts = SarvamTTSService(
44 api_key=os.getenv("SARVAM_API_KEY"),
45 target_language_code="en-IN",
46 model="bulbul:v2",
47 speaker="vidya" # Clear and articulate voice for teaching
48 )
49
50 llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
51
52 # Set up conversation context with tutor personality
53 messages = [
54 {
55 "role": "system",
56 "content": """You are an expert tutor designed to help students understand and excel in their studies.
57
58Your teaching expertise covers multiple subjects:
59
60**Mathematics:**
61- Arithmetic, Algebra, Geometry, Trigonometry
62- Calculus, Statistics, Probability
63- Problem-solving techniques
64
65**Science:**
66- Physics: Mechanics, Electricity, Optics, Thermodynamics
67- Chemistry: Elements, Reactions, Organic Chemistry
68- Biology: Cell Biology, Human Anatomy, Ecology
69
70**Languages:**
71- English Grammar and Composition
72- Hindi Grammar and Literature
73- Reading Comprehension
74
75**Social Studies:**
76- History, Geography, Civics
77- Economics basics
78
79Teaching approach:
80- Start with the basics and build up to complex concepts
81- Use real-world examples and analogies to explain abstract concepts
82- Break down complex problems into smaller, manageable steps
83- Encourage students and praise their efforts
84- Ask questions to check understanding
85- Adapt your explanations based on the student's level
86- Use simple language and avoid overwhelming with jargon
87- When solving numerical problems, show each step clearly
88
89Communication style:
90- Be patient, encouraging, and supportive
91- Speak clearly and at a moderate pace
92- Celebrate small victories and correct mistakes gently
93- If a student is struggling, try a different explanation approach
94- Make learning interesting by connecting it to everyday life
95
96Start by greeting the student warmly and asking what subject or topic they'd like to learn or what problem they need help with.""",
97 },
98 ]
99 context = LLMContext(messages)
100 context_aggregator = LLMContextAggregatorPair(context)
101
102 # Build pipeline
103 pipeline = Pipeline(
104 [
105 transport.input(),
106 stt,
107 context_aggregator.user(),
108 llm,
109 tts,
110 transport.output(),
111 context_aggregator.assistant(),
112 ]
113 )
114
115 task = PipelineTask(pipeline)
116
117 @transport.event_handler("on_client_connected")
118 async def on_client_connected(transport, client):
119 logger.info("Student connected")
120 messages.append(
121 {"role": "system", "content": "Greet the student warmly and ask what subject or topic they'd like to learn today."}
122 )
123 await task.queue_frames([LLMRunFrame()])
124
125 @transport.event_handler("on_client_disconnected")
126 async def on_client_disconnected(transport, client):
127 logger.info("Student disconnected")
128 await task.cancel()
129
130 runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
131 await runner.run(task)
132
133if __name__ == "__main__":
134 from pipecat.runner.run import main
135 main()

5. Run Your Agent

$python tutor_agent.py

The agent will create a Daily room and provide you with a URL to join.

6. Test Your Agent

Open the provided Daily room URL in your browser and start speaking. Your tutor will listen and respond!


Customization Examples

Example 1: Hindi Tutor

For Hindi-medium students:

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="hi-IN", # Hindi
4 model="saarika:v2.5"
5)
6
7tts = SarvamTTSService(
8 api_key=os.getenv("SARVAM_API_KEY"),
9 target_language_code="hi-IN",
10 model="bulbul:v2",
11 speaker="manisha" # Warm and friendly teacher voice
12)
13
14llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 2: Tamil Tutor

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="ta-IN",
4 model="saarika:v2.5"
5)
6
7tts = SarvamTTSService(
8 api_key=os.getenv("SARVAM_API_KEY"),
9 target_language_code="ta-IN",
10 model="bulbul:v2",
11 speaker="vidya"
12)
13
14llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 3: Multilingual Tutor (Auto-detect)

For diverse student populations:

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="unknown", # Auto-detects language
4 model="saarika:v2.5"
5)
6
7tts = SarvamTTSService(
8 api_key=os.getenv("SARVAM_API_KEY"),
9 target_language_code="en-IN",
10 model="bulbul:v2",
11 speaker="vidya"
12)
13
14llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Example 4: Speech-to-English Tutor (Saaras)

When students speak in regional languages but you want English processing:

1# Student speaks Hindi/Tamil/etc. → Saaras converts to English → LLM processes
2
3stt = SarvamSTTService(
4 api_key=os.getenv("SARVAM_API_KEY"),
5 model="saaras:v2.5" # Speech-to-English translation
6)
7
8tts = SarvamTTSService(
9 api_key=os.getenv("SARVAM_API_KEY"),
10 target_language_code="en-IN",
11 model="bulbul:v2",
12 speaker="vidya"
13)
14
15llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

Available Options

Language Codes

LanguageCode
English (India)en-IN
Hindihi-IN
Bengalibn-IN
Tamilta-IN
Telugute-IN
Gujaratigu-IN
Kannadakn-IN
Malayalamml-IN
Marathimr-IN
Punjabipa-IN
Odiaod-IN
Auto-detectunknown

Speaker Voices (Bulbul v2)

Female Voices:

  • anushka - Clear and professional (default)
  • manisha - Warm and friendly
  • vidya - Articulate and precise (recommended for teaching)
  • arya - Young and energetic

Male Voices:

  • abhilash - Deep and authoritative
  • karun - Natural and conversational
  • hitesh - Professional and engaging

TTS Additional Parameters

Customize the voice for better teaching experience:

1tts = SarvamTTSService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 target_language_code="en-IN",
4 model="bulbul:v2",
5 speaker="vidya",
6 pitch=0.0, # Range: -1.0 to 1.0
7 pace=0.9, # Slightly slower for better understanding
8 loudness=1.5, # Range: 0.5 to 2.0
9 speech_sample_rate=16000 # 8000, 16000, or 24000 Hz
10)

Understanding the Pipeline

Pipecat uses a pipeline architecture where data flows through a series of processors:

Student Audio → STT → Context Aggregator → LLM → TTS → Audio Output
  1. Transport Input: Receives audio from the student
  2. STT (Speech-to-Text): Converts audio to text using Sarvam’s Saarika
  3. Context Aggregator (User): Adds student’s question to conversation context
  4. LLM: Generates educational response using OpenAI
  5. TTS (Text-to-Speech): Converts response to audio using Sarvam’s Bulbul
  6. Transport Output: Sends audio back to the student
  7. Context Aggregator (Assistant): Saves tutor’s response to context

Pro Tips

  • Use language="unknown" to support students who code-mix (Hinglish, Tanglish, etc.)
  • Use a clear, articulate voice like vidya for teaching
  • Set a slightly slower pace (0.9) for complex explanations
  • Use gpt-4o for better reasoning on complex problems
  • Encourage students to ask follow-up questions

Troubleshooting

API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.

Module not found: Run the installation command again based on your operating system.

Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code.

Connection issues: Ensure you have a stable internet connection and the transport is properly configured.


Additional Resources


Need Help?


Happy Building!