> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Tutor Agent using Pipecat

> Build a voice-based tutor agent that teaches students in multiple Indian languages using Pipecat and Sarvam AI. Perfect for EdTech applications.

## Overview

This guide demonstrates how to build a **voice-based tutor agent** that can teach, explain concepts, and help students with various subjects using **Pipecat** for real-time communication and **Sarvam AI** for speech processing. Perfect for EdTech platforms, online tutoring, and educational applications serving Indian students.

## What You'll Build

A tutor agent that can:

* Explain concepts in simple, student-friendly language
* Help students solve problems step by step
* Answer questions across various subjects
* Adapt explanations to the student's level of understanding
* Communicate in multiple Indian languages

## Quick Overview

1. Get API keys (Sarvam, OpenAI)
2. Install packages
3. Create `.env` file with your API keys
4. Write the agent code
5. Run with appropriate transport

***

## Quick Start

### 1. Prerequisites

* Python 3.9 or higher
* API keys from:
  * [Sarvam AI](https://dashboard.sarvam.ai) (get API key from dashboard)
  * [OpenAI](https://platform.openai.com/api-keys) (create new secret key)

### 2. Install Dependencies

```bash
pip install "pipecat-ai[daily,openai]" python-dotenv loguru
```

```bash
pip install pipecat-ai[daily,openai] python-dotenv loguru
```

### 3. Create Environment File

Create a file named `.env` in your project folder and add your API keys:

```env
SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx
```

Replace the values with your actual API keys.

### 4. Write Your Agent

Create `tutor_agent.py`:

```python
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.services.sarvam.tts import SarvamTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.daily.transport import DailyParams

load_dotenv(override=True)

async def bot(runner_args: RunnerArguments):
    """Main bot entry point."""
    
    # Create transport (supports both Daily and WebRTC)
    transport = await create_transport(
        runner_args,
        {
            "daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
            "webrtc": lambda: TransportParams(
                audio_in_enabled=True, audio_out_enabled=True
            ),
        },
    )

    # Initialize AI services
    stt = SarvamSTTService(
        api_key=os.getenv("SARVAM_API_KEY"),
        language="unknown",  # Auto-detect for multilingual students
        model="saaras:v3",
        mode="transcribe"
    )
    
    tts = SarvamTTSService(
        api_key=os.getenv("SARVAM_API_KEY"),
        target_language_code="en-IN",
        model="bulbul:v3",
        speaker="ishita"  # Clear and articulate voice for teaching
    )
    
    llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")

    # Set up conversation context with tutor personality
    messages = [
        {
            "role": "system",
            "content": """You are an expert tutor designed to help students understand and excel in their studies.

Your teaching expertise covers multiple subjects:

**Mathematics:**
- Arithmetic, Algebra, Geometry, Trigonometry
- Calculus, Statistics, Probability
- Problem-solving techniques

**Science:**
- Physics: Mechanics, Electricity, Optics, Thermodynamics
- Chemistry: Elements, Reactions, Organic Chemistry
- Biology: Cell Biology, Human Anatomy, Ecology

**Languages:**
- English Grammar and Composition
- Hindi Grammar and Literature
- Reading Comprehension

**Social Studies:**
- History, Geography, Civics
- Economics basics

Teaching approach:
- Start with the basics and build up to complex concepts
- Use real-world examples and analogies to explain abstract concepts
- Break down complex problems into smaller, manageable steps
- Encourage students and praise their efforts
- Ask questions to check understanding
- Adapt your explanations based on the student's level
- Use simple language and avoid overwhelming with jargon
- When solving numerical problems, show each step clearly

Communication style:
- Be patient, encouraging, and supportive
- Speak clearly and at a moderate pace
- Celebrate small victories and correct mistakes gently
- If a student is struggling, try a different explanation approach
- Make learning interesting by connecting it to everyday life

Start by greeting the student warmly and asking what subject or topic they'd like to learn or what problem they need help with.""",
        },
    ]
    context = LLMContext(messages)
    context_aggregator = LLMContextAggregatorPair(context)

    # Build pipeline
    pipeline = Pipeline(
        [
            transport.input(),
            stt,
            context_aggregator.user(),
            llm,
            tts,
            transport.output(),
            context_aggregator.assistant(),
        ]
    )

    task = PipelineTask(pipeline)

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        logger.info("Student connected")
        messages.append(
            {"role": "system", "content": "Greet the student warmly and ask what subject or topic they'd like to learn today."}
        )
        await task.queue_frames([LLMRunFrame()])

    @transport.event_handler("on_client_disconnected")
    async def on_client_disconnected(transport, client):
        logger.info("Student disconnected")
        await task.cancel()

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
    await runner.run(task)

if __name__ == "__main__":
    from pipecat.runner.run import main
    main()
```

### 5. Run Your Agent

```bash
python tutor_agent.py
```

The agent will create a Daily room and provide you with a URL to join.

### 6. Test Your Agent

Open the provided Daily room URL in your browser and start speaking. Your tutor will listen and respond!

***

## Customization Examples

### Example 1: Hindi Tutor

For Hindi-medium students:

```python
stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    language="hi-IN",  # Hindi
    model="saaras:v3",
    mode="transcribe"
)

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="simran"  # Warm and friendly teacher voice
)

llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
```

### Example 2: Tamil Tutor

```python
stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    language="ta-IN",
    model="saaras:v3",
    mode="transcribe"
)

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    target_language_code="ta-IN",
    model="bulbul:v3",
    speaker="ishita"
)

llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
```

### Example 3: Multilingual Tutor (Auto-detect)

For diverse student populations:

```python
stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    language="unknown",  # Auto-detects language
    model="saaras:v3",
    mode="transcribe"
)

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    target_language_code="en-IN",
    model="bulbul:v3",
    speaker="ishita"
)

llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
```

### Example 4: Speech-to-English Tutor (Saaras)

When students speak in regional languages but you want English processing:

```python
# Student speaks Hindi/Tamil/etc. → Saaras converts to English → LLM processes

stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    model="saaras:v3",  # Speech-to-English translation
    mode="translate"
)

tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    target_language_code="en-IN",
    model="bulbul:v3",
    speaker="ishita"
)

llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
```

***

## Available Options

### Language Codes

| Language        | Code      |
| --------------- | --------- |
| English (India) | `en-IN`   |
| Hindi           | `hi-IN`   |
| Bengali         | `bn-IN`   |
| Tamil           | `ta-IN`   |
| Telugu          | `te-IN`   |
| Gujarati        | `gu-IN`   |
| Kannada         | `kn-IN`   |
| Malayalam       | `ml-IN`   |
| Marathi         | `mr-IN`   |
| Punjabi         | `pa-IN`   |
| Odia            | `od-IN`   |
| Auto-detect     | `unknown` |

### Speaker Voices (Bulbul v3)

**Male (23):** Shubh (default), Aditya, Rahul, Rohan, Amit, Dev, Ratan, Varun, Manan, Sumit, Kabir, Aayan, Ashutosh, Advait, Anand, Tarun, Sunny, Mani, Gokul, Vijay, Mohit, Rehan, Soham

**Female (14):** Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita, Shreya, Roopa, Tanya, Shruti, Suhani, Kavitha, Rupali

### TTS Additional Parameters

Customize the voice for better teaching experience:

```python
tts = SarvamTTSService(
    api_key=os.getenv("SARVAM_API_KEY"),
    target_language_code="en-IN",
    model="bulbul:v3",
    speaker="ishita",
    pace=0.9,            # Slightly slower for better understanding
    speech_sample_rate=24000  # 8000, 16000, 22050, 24000 Hz (default). v3 REST API also supports 32000, 44100, 48000 Hz
)
```

***

## Understanding the Pipeline

Pipecat uses a **pipeline architecture** where data flows through a series of processors:

```
Student Audio → STT → Context Aggregator → LLM → TTS → Audio Output
```

1. **Transport Input**: Receives audio from the student
2. **STT (Speech-to-Text)**: Converts audio to text using Sarvam's Saaras v3 (transcription via `mode="transcribe"`, or translation to English via `mode="translate"`)
3. **Context Aggregator (User)**: Adds student's question to conversation context
4. **LLM**: Generates educational response using OpenAI
5. **TTS (Text-to-Speech)**: Converts response to audio using Sarvam's Bulbul
6. **Transport Output**: Sends audio back to the student
7. **Context Aggregator (Assistant)**: Saves tutor's response to context

***

## Pro Tips

* Use `language="unknown"` to support students who code-mix (Hinglish, Tanglish, etc.)
* Use a clear, articulate voice like `ishita` for teaching
* Set a slightly slower pace (0.9) for complex explanations
* Use `gpt-4o` for better reasoning on complex problems
* Encourage students to ask follow-up questions

***

## Troubleshooting

**API key errors**: Check that all keys are in your `.env` file and the file is in the same directory as your script.

**Module not found**: Run the installation command again based on your operating system.

**Poor transcription**: Try `language="unknown"` for auto-detection, or specify the correct language code.

**Connection issues**: Ensure you have a stable internet connection and the transport is properly configured.

***

## Additional Resources

* [Sarvam AI Documentation](https://docs.sarvam.ai)
* [Pipecat Documentation](https://docs.pipecat.ai)
* [Pipecat GitHub Repository](https://github.com/pipecat-ai/pipecat)
* [Daily.co Documentation](https://docs.daily.co)

***

## Need Help?

* Sarvam Support: [developer@sarvam.ai](mailto:developer@sarvam.ai)
* Community: [Join the Discord Community](https://discord.com/invite/5rAsykttcs)

***

**Happy Building!**