Sarvam-M | Sarvam API Docs

Sarvam-M is a multilingual, hybrid-reasoning, text-only language model built on Mistral-Small. This post-trained variant shows significant gains over the base model:

+20% average improvement on Indian language benchmarks
+21.6% enhancement on math benchmarks
+17.6% boost on programming benchmarks

Notably, performance at the intersection of Indian languages and mathematics is exceptional, with a remarkable +86% improvement on romanized Indian language GSM-8K benchmarks.

Key Features

Strong Indian Language Support

Trained in 11 major Indic languages with support for native script, Romanised, and code-mixed inputs, tailored for everyday and formal Indian use cases.

Hybrid Reasoning Model

Supports both “think” and “non-think” modes, excelling in math, logic, and code-related tasks with special training for improved reasoning and direct answers.

Efficient and Fast Inference

Uses compression to make responses faster, works well even on lower-cost hardware setups, and can handle many users at once without slowing down.

Knowledge Augmentation with Wikipedia

Looks up facts from Wikipedia when needed, gives more accurate answers for current or detailed topics, and works across English and Indian languages.

Superior Performance

Outperforms leading models including Mistral 3 Small, Gemma 3, and Llama models across Indian language benchmarks.

Context-Aware Processing

Maintains context across long conversations with 8192 token context length and intelligent reasoning capabilities.

Performance Benchmarks

Indic Vibe Check Benchmark

Language	Sarvam M (24B)	Mistral 3 Small (24B)	Gemma 3 (27B)	Llama 4 Scout (17B/109B)	Llama 3 (70B)
Bengali	8.17	7.62	7.29	7.59	7.01
English	8.35	8.32	7.85	8.17	8.20
Gujarati	8.21	7.53	7.52	7.67	6.74
Hindi	8.30	8.10	7.82	7.69	7.53
Kannada	7.98	7.53	7.53	7.68	6.59
Malayalam	8.19	7.50	7.46	7.68	6.96
Marathi	8.17	7.38	7.48	7.97	7.12
Oriya	7.82	3.43	6.52	6.46	5.68
Punjabi	8.15	7.49	7.48	7.63	6.96
Tamil	7.92	7.40	7.55	7.30	6.56
Telugu	8.05	7.39	6.95	7.52	6.87
Average	8.12	7.24	7.40	7.58	6.93

Model Specifications

Key Considerations

Maximum context length: 8192 tokens
Temperature range: 0 to 2
- Non-thinking mode: 0.2 (recommended)
- Thinking mode: 0.5 (recommended)
Top-p range: 0 to 1
Reasoning effort options: low, medium, high
- Setting any value enables thinking mode
- Higher values increase reasoning depth
Enable wiki_grounding for factual queries

Key Capabilities

Basic Chat Completion

Multi-turn Conversation

Wiki Grounding

Simple, one-turn interaction where the user asks a question and the model replies with a single, direct response.

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.chat.completions(
8     messages=[
9         {"role": "user", "content": "Why is India called a land of diverse landscapes?"}
10     ],
11     temperature=0.5,
12     top_p=1,
13     max_tokens=1000,
14 )
15 
16 print(response)

Next Steps

Developer quickstart

Learn how to integrate chat completion into your application.

API Reference

Complete API documentation for chat completion endpoints.