Sarvam-30B | Sarvam API Docs

Sarvam-30B (Chat LLM)

A 30B parameter Mixture-of-Experts reasoning model trained from scratch, optimized for Indian languages with only 2.4B active parameters per token. Delivers strong reasoning, coding, and conversational capabilities while remaining efficient to deploy.

Highlights:

30B total parameters, 2.4B active — efficient MoE architecture with Grouped Query Attention
Pre-trained on 16 trillion tokens across code, math, multilingual, and web data
State-of-the-art Indian language performance across native and romanized scripts
Optimized inference for H100, L40S, and Apple Silicon (MXFP4)
OpenAI-compatible chat completions API

At a Glance


Model ID	`sarvam-30b`
What it does	Chat LLM — reasoning, coding, and conversation with 30B total / 2.4B active parameters (MoE)
Languages	10 most-spoken Indian languages + English; native script, romanized, and code-mixed input
APIs	Chat Completions (`/v1/chat/completions`, OpenAI-compatible, streaming supported)
Input limits	64K-token context window — all limits
Benchmarks	Key Features and the model blog
Pricing	Pricing page
Best for	Voice-agent pipelines, interactive chat, high-throughput workloads
Known limitations	See below

Key Features

Strong Indian Language Support

Trained on the 10 most-spoken Indian languages with support for native script, romanized, and code-mixed inputs. Wins 89% of pairwise comparisons on Indian language benchmarks and 87% on STEM, math, and coding.

Efficient MoE Architecture

Mixture-of-Experts Transformer with 128 sparse experts and only 2.4B active parameters per token, enabling high throughput with 3x–6x gains on H100 and local execution on Apple Silicon via MXFP4.

Reasoning & Coding

Achieves 97.0 on Math500, 92.1 on HumanEval, 92.7 on MBPP, and 88.3 on AIME 25 (96.7 with tools) — exceeding typical expectations for models with similar active compute.

Agentic Capabilities

Native tool calling with strong performance on BrowseComp (35.5) and Tau2 (45.7) for web-search-driven tasks, planning, retrieval, and multi-step task execution.

Learn More

For detailed information on architecture, training methodology, performance benchmarks, and inference optimizations, visit our blog.

Model Specifications

Key Considerations

Model ID: sarvam-30b
Total Parameters: 30B (2.4B active per token)
Architecture: MoE Transformer with GQA and 128 sparse experts
Pre-training Data: 16T tokens
Temperature range: 0 to 2
Top-p range: 0 to 1
Supports streaming and non-streaming responses
OpenAI-compatible chat completions format
License: Apache 2.0

Key Capabilities

Basic Chat Completion

Multi-turn Conversation

Streaming

Simple, one-turn interaction where the user asks a question and the model replies with a single, direct response.

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.chat.completions(
8     model="sarvam-30b",
9     messages=[
10         {"role": "user", "content": "Explain the significance of the Indian monsoon season."}
11     ],
12     temperature=0.5,
13     top_p=1,
14     max_tokens=1000,
15 )
16 
17 print(response.choices[0].message.content)

Limits

Limit	Value
Context window	64K tokens
`max_tokens`	Starter 4096 / Pro 8192 / Business 64000
`temperature`	0–2 (default 0.5 when reasoning is enabled — the default — and 0.2 when reasoning is disabled)
`top_p`	0–1
`n` (completions per request)	1–128
`frequency_penalty` / `presence_penalty`	-2 to 2
`stop`	Up to 4 sequences
Rate limits	See Rate Limits

Known Limitations

Limitation	Detail	Workaround
Thinking mode is on by default	Reasoning (`reasoning_effort`, default `low`) is enabled by default, and reasoning tokens count toward completion tokens — a small `max_tokens` can be consumed entirely by reasoning	Increase `max_tokens` to leave room for the visible answer, or disable reasoning with `reasoning_effort=None`

Next Steps

Developer quickstart

Learn how to integrate chat completion into your application.

API Reference

Complete API documentation for chat completion endpoints.