> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Sarvam-30B

> Sarvam-30B - 30B parameter multilingual language model optimized for Indian languages with strong reasoning, coding, and conversational capabilities.

**Sarvam-30B (Chat LLM)**

A 30B parameter Mixture-of-Experts reasoning model trained from scratch, optimized for Indian languages with only 2.4B active parameters per token. Delivers strong reasoning, coding, and conversational capabilities while remaining efficient to deploy.

**Highlights:**

* **30B total parameters, 2.4B active** — efficient MoE architecture with Grouped Query Attention
* Pre-trained on **16 trillion tokens** across code, math, multilingual, and web data
* State-of-the-art Indian language performance across native and romanized scripts
* Optimized inference for H100, L40S, and Apple Silicon (MXFP4)
* OpenAI-compatible chat completions API

## At a Glance

|                       |                                                                                                                                                        |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Model ID**          | `sarvam-30b`                                                                                                                                           |
| **What it does**      | Chat LLM — reasoning, coding, and conversation with 30B total / 2.4B active parameters (MoE)                                                           |
| **Languages**         | 10 most-spoken Indian languages + English; native script, romanized, and code-mixed input                                                              |
| **APIs**              | [Chat Completions](/api-reference-docs/api-guides-tutorials/chat-completion/overview) (`/v1/chat/completions`, OpenAI-compatible, streaming supported) |
| **Input limits**      | 64K-token context window — [all limits](#limits)                                                                                                       |
| **Benchmarks**        | [Key Features](#key-features) and the [model blog](https://www.sarvam.ai/blogs/sarvam-30b-105b)                                                        |
| **Pricing**           | [Pricing page](/api-reference-docs/pricing)                                                                                                            |
| **Best for**          | Voice-agent pipelines, interactive chat, high-throughput workloads                                                                                     |
| **Known limitations** | [See below](#known-limitations)                                                                                                                        |

## Key Features

Trained on the 10 most-spoken Indian languages with support for native script, romanized, and code-mixed inputs. Wins 89% of pairwise comparisons on Indian language benchmarks and 87% on STEM, math, and coding.

Mixture-of-Experts Transformer with 128 sparse experts and only 2.4B active parameters per token, enabling high throughput with 3x–6x gains on H100 and local execution on Apple Silicon via MXFP4.

Achieves 97.0 on Math500, 92.1 on HumanEval, 92.7 on MBPP, and 88.3 on AIME 25 (96.7 with tools) — exceeding typical expectations for models with similar active compute.

Native tool calling with strong performance on BrowseComp (35.5) and Tau2 (45.7) for web-search-driven tasks, planning, retrieval, and multi-step task execution.

## Learn More

For detailed information on architecture, training methodology, performance benchmarks, and inference optimizations, visit [our blog](https://www.sarvam.ai/blogs/sarvam-30b-105b).

## Model Specifications

<ul>
  <li>
    Model ID: 

    <code>sarvam-30b</code>
  </li>

  <li>
    Total Parameters: 30B (2.4B active per token)
  </li>

  <li>
    Architecture: MoE Transformer with GQA and 128 sparse experts
  </li>

  <li>
    Pre-training Data: 16T tokens
  </li>

  <li>
    Temperature range: 0 to 2
  </li>

  <li>
    Top-p range: 0 to 1
  </li>

  <li>
    Supports streaming and non-streaming responses
  </li>

  <li>
    OpenAI-compatible chat completions format
  </li>

  <li>
    License: Apache 2.0
  </li>
</ul>

## Key Capabilities

Simple, one-turn interaction where the user asks a question and the model replies with a single, direct response.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.chat.completions(
    model="sarvam-30b",
    messages=[
        {"role": "user", "content": "Explain the significance of the Indian monsoon season."}
    ],
    temperature=0.5,
    top_p=1,
    max_tokens=1000,
)

print(response.choices[0].message.content)
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY",
});

async function main() {
    const response = await client.chat.completions({
        model: "sarvam-30b",
        messages: [
            {
                role: "user",
                content: "Explain the significance of the Indian monsoon season.",
            },
        ],
        temperature: 0.5,
        top_p: 1,
        max_tokens: 1000,
    });

    console.log(response.choices[0].message.content);
}

main();
```

```bash
curl -X POST https://api.sarvam.ai/v1/chat/completions \
  -H "api-subscription-key: $SARVAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain the significance of the Indian monsoon season."}
    ],
    "model": "sarvam-30b",
    "temperature": 0.5,
    "top_p": 1,           
    "max_tokens": 1000
  }'
```

Involves multiple exchanges between the system, user, and assistant, where context is maintained across all turns for coherent and relevant responses.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.chat.completions(
    model="sarvam-30b",
    messages=[
        {"role": "system", "content": "You are a knowledgeable assistant specializing in Indian history."},
        {"role": "user", "content": "Tell me about the Mughal Empire."},
        {"role": "assistant", "content": "The Mughal Empire was one of the largest and most powerful empires in Indian history, spanning from 1526 to 1857."},
        {"role": "user", "content": "Who was the greatest Mughal emperor and why?"}
    ],
    temperature=0.7,
    top_p=1,
    max_tokens=1000
)

print(response.choices[0].message.content)
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

async function main() {
    const response = await client.chat.completions({
        model: "sarvam-30b",
        messages: [
            { role: "system", content: "You are a knowledgeable assistant specializing in Indian history." },
            { role: "user", content: "Tell me about the Mughal Empire." },
            { role: "assistant", content: "The Mughal Empire was one of the largest and most powerful empires in Indian history, spanning from 1526 to 1857." },
            { role: "user", content: "Who was the greatest Mughal emperor and why?" }
        ],
        temperature: 0.7,
        top_p: 1,
        max_tokens: 1000
    });
    
    console.log(response.choices[0].message.content);
}

main();
```

```bash
curl -X POST https://api.sarvam.ai/v1/chat/completions \
  -H "api-subscription-key: $SARVAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a knowledgeable assistant specializing in Indian history."},
      {"role": "user", "content": "Tell me about the Mughal Empire."},
      {"role": "assistant", "content": "The Mughal Empire was one of the largest and most powerful empires in Indian history, spanning from 1526 to 1857."},
      {"role": "user", "content": "Who was the greatest Mughal emperor and why?"}
    ],
    "model": "sarvam-30b",
    "temperature": 0.7,
    "top_p": 1,           
    "max_tokens": 1000
  }'
```

Stream responses token-by-token for real-time output. Ideal for chat interfaces and applications requiring low time-to-first-token.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

for chunk in client.chat.completions(
    model="sarvam-30b",
    messages=[
        {"role": "user", "content": "Write a short poem about the rivers of India."}
    ],
    temperature=0.7,
    max_tokens=500,
    stream=True,
):
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="", flush=True)

print()
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

async function main() {
    const stream = await client.chat.completions({
        model: "sarvam-30b",
        messages: [
            { role: "user", content: "Write a short poem about the rivers of India." }
        ],
        temperature: 0.7,
        max_tokens: 500,
        stream: true,
    });

    for await (const chunk of stream) {
        if (chunk.choices?.[0]?.delta?.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
    }
    console.log();
}

main();
```

```bash
curl -X POST https://api.sarvam.ai/v1/chat/completions \
  -H "api-subscription-key: $SARVAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a short poem about the rivers of India."}
    ],
    "model": "sarvam-30b",
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": true
  }'
```

## Limits

| Limit                                    | Value                                                                                          |
| ---------------------------------------- | ---------------------------------------------------------------------------------------------- |
| Context window                           | 64K tokens                                                                                     |
| `max_tokens`                             | Starter 4096 / Pro 8192 / Business 64000                                                       |
| `temperature`                            | 0–2 (default 0.5 when reasoning is enabled — the default — and 0.2 when reasoning is disabled) |
| `top_p`                                  | 0–1                                                                                            |
| `n` (completions per request)            | 1–128                                                                                          |
| `frequency_penalty` / `presence_penalty` | -2 to 2                                                                                        |
| `stop`                                   | Up to 4 sequences                                                                              |
| Rate limits                              | See [Rate Limits](/api-reference-docs/ratelimits)                                              |

## Known Limitations

| Limitation                         | Detail                                                                                                                                                                                | Workaround                                                                                                    |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| **Thinking mode is on by default** | Reasoning (`reasoning_effort`, default `low`) is enabled by default, and reasoning tokens count toward completion tokens — a small `max_tokens` can be consumed entirely by reasoning | Increase `max_tokens` to leave room for the visible answer, or disable reasoning with `reasoning_effort=None` |

## Next Steps

Learn how to integrate chat completion into your application.

Complete API documentation for chat completion endpoints.