Sarvam-105B
Sarvam-105B (Flagship Chat LLM)
Sarvam AI’s flagship Mixture-of-Experts reasoning model trained from scratch, with Multi-head Latent Attention (MLA) for efficient long-context inference. Matches or outperforms most open and closed-source frontier models of its class across knowledge, reasoning, and agentic benchmarks.
Highlights:
- 105B+ total parameters — our most capable MoE model with Multi-head Latent Attention
- Pre-trained on 12 trillion tokens across code, math, multilingual, and web data
- 98.6 on Math500, 88.3 on AIME 25 (96.7 with tools), 49.5 on BrowseComp
- State-of-the-art Indian language performance: wins 90% of pairwise comparisons
- Powers Indus, Sarvam’s AI assistant for complex reasoning and agentic workflows
- OpenAI-compatible chat completions API | Apache 2.0 open-source
Key Features
Wins 90% of pairwise comparisons across Indian language benchmarks and 84% on STEM, math, and coding. Trained extensively on native script, romanized, and code-mixed inputs across the 10 most-spoken Indian languages.
98.6 on Math500, 88.3 on AIME 25 (96.7 with tools), 85.8 on HMMT, and 69.1 on Beyond AIME — reflecting deep multi-step reasoning and complex mathematical problem solving.
49.5 on BrowseComp and 68.3 on Tau2 (avg.) — highest among compared models. Optimized for tool use, long-horizon reasoning, and environment interaction in real-world workflows.
Mixture-of-Experts Transformer with 128 sparse experts and Multi-head Latent Attention (MLA), a compressed attention formulation that reduces memory requirements for long-context inference.
Learn More
For detailed information on architecture, training methodology, performance benchmarks, and inference optimizations, visit our blog.
Model Specifications
- Model ID:
sarvam-105b - Total Parameters: 105B+ with MoE architecture and 128 sparse experts
- Attention: Multi-head Latent Attention (MLA)
- Pre-training Data: 12T tokens
- Temperature range: 0 to 2
- Top-p range: 0 to 1
- Supports streaming and non-streaming responses
- OpenAI-compatible chat completions format
- License: Apache 2.0
Choosing Between Sarvam Models
Choose Sarvam-30B for a balanced performance-to-cost ratio and real-time conversational workloads, and Sarvam-105B when you need the highest quality outputs for complex reasoning and agentic tasks. Sarvam-M (24B) is still available as a legacy model.
Key Capabilities
Basic Chat Completion
Multi-turn Conversation
Streaming
Simple, one-turn interaction where the user asks a question and the model replies with the highest quality response leveraging its 105B parameter knowledge.