Tutor Agent using Pipecat

View as Markdown

Overview

This guide demonstrates how to build a voice-based tutor agent that can teach, explain concepts, and help students with various subjects using Pipecat for real-time communication and Sarvam AI for speech processing. Perfect for EdTech platforms, online tutoring, and educational applications serving Indian students.

What You’ll Build

A tutor agent that can:

  • Explain concepts in simple, student-friendly language
  • Help students solve problems step by step
  • Answer questions across various subjects
  • Adapt explanations to the student’s level of understanding
  • Communicate in multiple Indian languages

Quick Overview

  1. Get API keys (Sarvam)
  2. Install packages
  3. Create .env file with your API keys
  4. Write the agent code
  5. Run with appropriate transport

Quick Start

1. Prerequisites

  • Python 3.9 or higher
  • API keys from:

2. Install Dependencies

$pip install "pipecat-ai[daily,sarvam]" python-dotenv loguru

3. Create Environment File

Create a file named .env in your project folder and add your API keys:

SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx

Replace the values with your actual API keys.

4. Write Your Agent

Create tutor_agent.py:

1import os
2from dotenv import load_dotenv
3from loguru import logger
4from pipecat.frames.frames import LLMRunFrame
5from pipecat.pipeline.pipeline import Pipeline
6from pipecat.pipeline.runner import PipelineRunner
7from pipecat.pipeline.task import PipelineTask
8from pipecat.processors.aggregators.llm_context import LLMContext
9from pipecat.processors.aggregators.llm_response_universal import (
10 LLMContextAggregatorPair,
11)
12from pipecat.runner.types import RunnerArguments
13from pipecat.runner.utils import create_transport
14from pipecat.services.sarvam.stt import SarvamSTTService
15from pipecat.services.sarvam.tts import SarvamTTSService
16from pipecat.services.sarvam.llm import SarvamLLMService
17from pipecat.transports.base_transport import TransportParams
18from pipecat.transports.daily.transport import DailyParams
19
20load_dotenv(override=True)
21
22async def bot(runner_args: RunnerArguments):
23 """Main bot entry point."""
24
25 # Create transport (supports both Daily and WebRTC)
26 transport = await create_transport(
27 runner_args,
28 {
29 "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
30 "webrtc": lambda: TransportParams(
31 audio_in_enabled=True, audio_out_enabled=True
32 ),
33 },
34 )
35
36 # Initialize AI services
37 stt = SarvamSTTService(
38 api_key=os.getenv("SARVAM_API_KEY"),
39 language="unknown", # Auto-detect for multilingual students
40 model="saaras:v3",
41 mode="transcribe"
42 )
43
44 tts = SarvamTTSService(
45 api_key=os.getenv("SARVAM_API_KEY"),
46 target_language_code="en-IN",
47 model="bulbul:v3",
48 speaker="ishita" # Clear and articulate voice for teaching
49 )
50
51 llm = SarvamLLMService(
52 api_key=os.getenv("SARVAM_API_KEY"),
53 settings=SarvamLLMService.Settings(model="sarvam-105b"),
54)
55
56 # Set up conversation context with tutor personality
57 messages = [
58 {
59 "role": "system",
60 "content": """You are an expert tutor designed to help students understand and excel in their studies.
61
62Your teaching expertise covers multiple subjects:
63
64**Mathematics:**
65- Arithmetic, Algebra, Geometry, Trigonometry
66- Calculus, Statistics, Probability
67- Problem-solving techniques
68
69**Science:**
70- Physics: Mechanics, Electricity, Optics, Thermodynamics
71- Chemistry: Elements, Reactions, Organic Chemistry
72- Biology: Cell Biology, Human Anatomy, Ecology
73
74**Languages:**
75- English Grammar and Composition
76- Hindi Grammar and Literature
77- Reading Comprehension
78
79**Social Studies:**
80- History, Geography, Civics
81- Economics basics
82
83Teaching approach:
84- Start with the basics and build up to complex concepts
85- Use real-world examples and analogies to explain abstract concepts
86- Break down complex problems into smaller, manageable steps
87- Encourage students and praise their efforts
88- Ask questions to check understanding
89- Adapt your explanations based on the student's level
90- Use simple language and avoid overwhelming with jargon
91- When solving numerical problems, show each step clearly
92
93Communication style:
94- Be patient, encouraging, and supportive
95- Speak clearly and at a moderate pace
96- Celebrate small victories and correct mistakes gently
97- If a student is struggling, try a different explanation approach
98- Make learning interesting by connecting it to everyday life
99
100Start by greeting the student warmly and asking what subject or topic they'd like to learn or what problem they need help with.""",
101 },
102 ]
103 context = LLMContext(messages)
104 context_aggregator = LLMContextAggregatorPair(context)
105
106 # Build pipeline
107 pipeline = Pipeline(
108 [
109 transport.input(),
110 stt,
111 context_aggregator.user(),
112 llm,
113 tts,
114 transport.output(),
115 context_aggregator.assistant(),
116 ]
117 )
118
119 task = PipelineTask(pipeline)
120
121 @transport.event_handler("on_client_connected")
122 async def on_client_connected(transport, client):
123 logger.info("Student connected")
124 messages.append(
125 {"role": "system", "content": "Greet the student warmly and ask what subject or topic they'd like to learn today."}
126 )
127 await task.queue_frames([LLMRunFrame()])
128
129 @transport.event_handler("on_client_disconnected")
130 async def on_client_disconnected(transport, client):
131 logger.info("Student disconnected")
132 await task.cancel()
133
134 runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
135 await runner.run(task)
136
137if __name__ == "__main__":
138 from pipecat.runner.run import main
139 main()

5. Run Your Agent

$python tutor_agent.py

The agent will create a Daily room and provide you with a URL to join.

6. Test Your Agent

Open the provided Daily room URL in your browser and start speaking. Your tutor will listen and respond!


Customization Examples

Example 1: Hindi Tutor

For Hindi-medium students:

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="hi-IN", # Hindi
4 model="saaras:v3",
5 mode="transcribe"
6)
7
8tts = SarvamTTSService(
9 api_key=os.getenv("SARVAM_API_KEY"),
10 target_language_code="hi-IN",
11 model="bulbul:v3",
12 speaker="simran" # Warm and friendly teacher voice
13)
14
15llm = SarvamLLMService(
16 api_key=os.getenv("SARVAM_API_KEY"),
17 settings=SarvamLLMService.Settings(model="sarvam-105b"),
18)

Example 2: Tamil Tutor

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="ta-IN",
4 model="saaras:v3",
5 mode="transcribe"
6)
7
8tts = SarvamTTSService(
9 api_key=os.getenv("SARVAM_API_KEY"),
10 target_language_code="ta-IN",
11 model="bulbul:v3",
12 speaker="ishita"
13)
14
15llm = SarvamLLMService(
16 api_key=os.getenv("SARVAM_API_KEY"),
17 settings=SarvamLLMService.Settings(model="sarvam-105b"),
18)

Example 3: Multilingual Tutor (Auto-detect)

For diverse student populations:

1stt = SarvamSTTService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 language="unknown", # Auto-detects language
4 model="saaras:v3",
5 mode="transcribe"
6)
7
8tts = SarvamTTSService(
9 api_key=os.getenv("SARVAM_API_KEY"),
10 target_language_code="en-IN",
11 model="bulbul:v3",
12 speaker="ishita"
13)
14
15llm = SarvamLLMService(
16 api_key=os.getenv("SARVAM_API_KEY"),
17 settings=SarvamLLMService.Settings(model="sarvam-105b"),
18)

Example 4: Speech-to-English Tutor (Saaras)

When students speak in regional languages but you want English processing:

1# Student speaks Hindi/Tamil/etc. → Saaras converts to English → LLM processes
2
3stt = SarvamSTTService(
4 api_key=os.getenv("SARVAM_API_KEY"),
5 model="saaras:v3", # Speech-to-English translation
6 mode="translate"
7)
8
9tts = SarvamTTSService(
10 api_key=os.getenv("SARVAM_API_KEY"),
11 target_language_code="en-IN",
12 model="bulbul:v3",
13 speaker="ishita"
14)
15
16llm = SarvamLLMService(
17 api_key=os.getenv("SARVAM_API_KEY"),
18 settings=SarvamLLMService.Settings(model="sarvam-105b"),
19)

Available Options

Language Codes

LanguageCode
English (India)en-IN
Hindihi-IN
Bengalibn-IN
Tamilta-IN
Telugute-IN
Gujaratigu-IN
Kannadakn-IN
Malayalamml-IN
Marathimr-IN
Punjabipa-IN
Odiaod-IN
Auto-detectunknown

Speaker Voices (Bulbul v3)

Male (23): Shubh (default), Aditya, Rahul, Rohan, Amit, Dev, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham

Female (14): Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali

TTS Additional Parameters

Customize the voice for better teaching experience:

1tts = SarvamTTSService(
2 api_key=os.getenv("SARVAM_API_KEY"),
3 target_language_code="en-IN",
4 model="bulbul:v3",
5 speaker="ishita",
6 pace=0.9, # Slightly slower for better understanding
7 speech_sample_rate=24000 # 8000, 16000, 22050, 24000 Hz (default). v3 REST API also supports 32000, 44100, 48000 Hz
8)

Understanding the Pipeline

Pipecat uses a pipeline architecture where data flows through a series of processors:

Student Audio → STT → Context Aggregator → LLM → TTS → Audio Output
  1. Transport Input: Receives audio from the student
  2. STT (Speech-to-Text): Converts audio to text using Sarvam’s Saaras v3 (transcription via mode="transcribe", or translation to English via mode="translate")
  3. Context Aggregator (User): Adds student’s question to conversation context
  4. LLM: Generates educational response using Sarvam
  5. TTS (Text-to-Speech): Converts response to audio using Sarvam’s Bulbul
  6. Transport Output: Sends audio back to the student
  7. Context Aggregator (Assistant): Saves tutor’s response to context

Pro Tips

  • Use language="unknown" to support students who code-mix (Hinglish, Tanglish, etc.)
  • Use a clear, articulate voice like ishita for teaching
  • Set a slightly slower pace (0.9) for complex explanations
  • Use sarvam-105b for better reasoning on complex problems
  • Encourage students to ask follow-up questions

Troubleshooting

API key errors: Check that all keys are in your .env file and the file is in the same directory as your script.

Module not found: Run the installation command again based on your operating system.

Poor transcription: Try language="unknown" for auto-detection, or specify the correct language code.

Connection issues: Ensure you have a stable internet connection and the transport is properly configured.


Additional Resources


Need Help?


Happy Building!