Sarvam-M
š§ Sarvam-M (Reasoning LLM)
Multilingual, hybrid-reasoning, text-only model built on Mistral-Small.
Post-trained for superior reasoning and Indic language support.
Performance Improvements:
- +20% on Indian language benchmarks
- +21.6% on math benchmarks
- +17.6% on programming benchmarks
- +86% on romanized Indian language GSM-8K benchmarks
Key Features:
- Hybrid Thinking Mode: Switch between āthinkā (reasoning, coding, math) and ānon-thinkā (fast conversations).
- Advanced Indic Skills: Authentically trained on Indian languages & cultural contexts.
- Superior Reasoning: Outperforms similar-sized models on coding & math.
- Seamless Chat: Works across Indic scripts & romanized text.
Key Features
Trained in 11 major Indic languages with support for native script, Romanised, and code-mixed inputs, tailored for everyday and formal Indian use cases.
Supports both āthinkā and ānon-thinkā modes, excelling in math, logic, and code-related tasks with special training for improved reasoning and direct answers.
Uses compression to make responses faster, works well even on lower-cost hardware setups, and can handle many users at once without slowing down.
Looks up facts from Wikipedia when needed, gives more accurate answers for current or detailed topics, and works across English and Indian languages.
Outperforms leading models including Mistral 3 Small, Gemma 3, and Llama models across Indian language benchmarks.
Maintains context across long conversations with 8192 token context length and intelligent reasoning capabilities.
Performance Benchmarks
Indic Vibe Check Benchmark
Model Specifications
- Maximum context length: 8192 tokens
Temperature range: 0 to 2
- Non-thinking mode: 0.2 (recommended)
- Thinking mode: 0.5 (recommended)
- Top-p range: 0 to 1
Reasoning effort options: low, medium, high
- Setting any value enables thinking mode
- Higher values increase reasoning depth
- Enable wiki_grounding for factual queries
Key Capabilities
Basic Chat Completion
Multi-turn Conversation
Wiki Grounding
Simple, one-turn interaction where the user asks a question and the model replies with a single, direct response.