Sarvam-30B

Sarvam-30B (Chat LLM)

A 30B parameter Mixture-of-Experts reasoning model trained from scratch, optimized for Indian languages with only 2.4B active parameters per token. Delivers strong reasoning, coding, and conversational capabilities while remaining efficient to deploy.

Highlights:

  • 30B total parameters, 2.4B active — efficient MoE architecture with Grouped Query Attention
  • Pre-trained on 16 trillion tokens across code, math, multilingual, and web data
  • State-of-the-art Indian language performance across native and romanized scripts
  • Optimized inference for H100, L40S, and Apple Silicon (MXFP4)
  • OpenAI-compatible chat completions API

Key Features

Strong Indian Language Support

Trained on the 10 most-spoken Indian languages with support for native script, romanized, and code-mixed inputs. Wins 89% of pairwise comparisons on Indian language benchmarks and 87% on STEM, math, and coding.

Efficient MoE Architecture

Mixture-of-Experts Transformer with 128 sparse experts and only 2.4B active parameters per token, enabling high throughput with 3x–6x gains on H100 and local execution on Apple Silicon via MXFP4.

Reasoning & Coding

Achieves 97.0 on Math500, 92.1 on HumanEval, 92.7 on MBPP, and 88.3 on AIME 25 (96.7 with tools) — exceeding typical expectations for models with similar active compute.

Agentic Capabilities

Native tool calling with strong performance on BrowseComp (35.5) and Tau2 (45.7) for web-search-driven tasks, planning, retrieval, and multi-step task execution.

Learn More

For detailed information on architecture, training methodology, performance benchmarks, and inference optimizations, visit our blog.

Model Specifications

Key Considerations
  • Model ID: sarvam-30b
  • Total Parameters: 30B (2.4B active per token)
  • Architecture: MoE Transformer with GQA and 128 sparse experts
  • Pre-training Data: 16T tokens
  • Temperature range: 0 to 2
  • Top-p range: 0 to 1
  • Supports streaming and non-streaming responses
  • OpenAI-compatible chat completions format
  • License: Apache 2.0

Key Capabilities

Simple, one-turn interaction where the user asks a question and the model replies with a single, direct response.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.chat.completions(
8 model="sarvam-30b",
9 messages=[
10 {"role": "user", "content": "Explain the significance of the Indian monsoon season."}
11 ],
12 temperature=0.5,
13 top_p=1,
14 max_tokens=1000,
15)
16
17print(response.choices[0].message.content)

Next Steps