Chat Completions Overview

View as Markdown

Sarvam AI provides powerful chat completion APIs designed to build intelligent conversational AI experiences, with native support for Indian languages and deep contextual reasoning.

Our Chat Completion APIs support the following chat models:

Choosing a Model

sarvam-30bsarvam-105b
Context length64K tokens128K tokens
LatencyLower — faster time-to-first-token, well suited for voice agents and interactive chatHigher — prioritizes output quality over speed
CostLower per tokenHigher per token
QualityStrong reasoning and Indic language support for everyday tasksHighest quality for complex reasoning, coding, and long-form generation
Best forStandard conversations, Q&A, voice-agent pipelines, high-throughput workloadsComplex multi-step reasoning, code generation, document analysis over long contexts

Simply pass the model name as the model parameter (e.g., model="sarvam-105b").

Token budgeting: the context length covers everything — your messages, any reasoning_content the model produces in think mode, and the generated reply (capped by max_tokens, default 2048). Reasoning tokens are billed as completion tokens, so high reasoning_effort increases both latency and cost. For long conversations, trim or summarize older turns instead of resending the full history.

Authentication: like every Sarvam API, this endpoint uses the api-subscription-key header. It additionally accepts Authorization: Bearer <key> for OpenAI-compatible tooling — see Authentication for details.

Sarvam-M (24B) has been deprecated and is no longer available through the Chat Completions API. Please migrate to Sarvam-30B or Sarvam-105B for improved performance.

Features

Hybrid Thinking Mode
  • Supports both “think” and “non-think” modes
  • Think mode for complex logical reasoning
  • Non-think mode for efficient conversations
  • Ideal for mathematical and coding tasks
Advanced Indic Skills
  • Post-trained on Indian languages
  • Native English proficiency
  • Authentic Indian cultural values
  • Rich understanding of local context
Superior Reasoning Capabilities
  • Outperforms similar-sized models
  • Strong performance on coding tasks
  • Excellent mathematical reasoning
  • Advanced problem-solving abilities
Seamless Chatting Experience
  • Full Indic script support
  • Romanized language support
  • Multilingual conversation handling
  • Natural language understanding

Code Examples

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {"role": "user", "content": "Hey, what is the capital of India?"}
10 ],
11)
12print(response)
Key Considerations
  • Reasoning effort options: low, medium, high

    • Thinking mode is on by default (low); pass reasoning_effort=None (Python) / reasoning_effort: null (JS, cURL) to disable it
    • Higher values increase reasoning depth
    • Reasoning tokens (returned as reasoning_content) count toward your completion tokens and bill — use lower effort or disable reasoning for latency- and cost-sensitive paths
  • Output length is capped by max_tokens (default 2048) — raise it for long-form generation

Because thinking mode is on by default, a low max_tokens (e.g. under a few hundred) can be consumed entirely by reasoning — you’ll get finish_reason: "length" with an empty content and only reasoning_content populated. Either keep max_tokens generous or disable reasoning with reasoning_effort=None for short replies.

Streaming

Set stream: true to receive the response incrementally over server-sent events instead of waiting for the full completion. This is essential for responsive chat UIs and voice-agent pipelines, where you want to start rendering (or speaking) the reply as soon as the first tokens arrive.

Both SDKs return an iterator of chat.completion.chunk objects. Each chunk carries a delta with the new portion of the message — delta.content for the reply text and, when reasoning is enabled, delta.reasoning_content for thinking tokens.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7stream = client.chat.completions(
8 model="sarvam-105b",
9 messages=[
10 {"role": "user", "content": "Write a short poem about the monsoon."}
11 ],
12 stream=True,
13)
14
15for chunk in stream:
16 # The final chunk reports usage and has no choices — guard before indexing
17 if chunk.choices and chunk.choices[0].delta.content:
18 print(chunk.choices[0].delta.content, end="", flush=True)

Over raw HTTP, each event is a data: line containing a chat.completion.chunk JSON object. The final data chunk carries usage (with an empty choices array), and the stream ends with data: [DONE]:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"sarvam-105b","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"sarvam-105b","choices":[{"index":0,"delta":{"content":" rains"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"sarvam-105b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699000000,"model":"sarvam-105b","choices":[],"usage":{"prompt_tokens":19,"completion_tokens":3,"total_tokens":22}}
data: [DONE]

When reasoning_effort is set, thinking tokens stream first via delta.reasoning_content, followed by the reply via delta.content. Check both fields if you display reasoning to users.

Tool Calling (Function Calling)

Describe functions your application exposes with the tools parameter, and the model will decide when to call them — returning the function name and JSON arguments instead of (or alongside) a text reply. You execute the function yourself, append the result as a tool message, and call the API again so the model can produce its final answer.

The flow is:

  1. Send the conversation plus tools definitions.
  2. If the model wants a tool, the response has finish_reason: "tool_calls" and message.tool_calls with the function name and stringified JSON arguments.
  3. Run the function, append the assistant message and a {"role": "tool", "tool_call_id": ..., "content": ...} message with the result.
  4. Call the API again — the model answers using the tool output.
1import json
2from sarvamai import SarvamAI
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6tools = [
7 {
8 "type": "function",
9 "function": {
10 "name": "get_weather",
11 "description": "Get the current weather for an Indian city",
12 "parameters": {
13 "type": "object",
14 "properties": {
15 "city": {"type": "string", "description": "City name, e.g. Mumbai"},
16 "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
17 },
18 "required": ["city"],
19 },
20 },
21 }
22]
23
24messages = [{"role": "user", "content": "What's the weather in Mumbai right now?"}]
25
26response = client.chat.completions(
27 model="sarvam-105b",
28 messages=messages,
29 tools=tools,
30 tool_choice="auto",
31)
32
33message = response.choices[0].message
34
35if message.tool_calls:
36 tool_call = message.tool_calls[0]
37 args = json.loads(tool_call.function.arguments)
38
39 # Run your actual function here
40 weather = {"city": args["city"], "temperature": 31, "condition": "Humid"}
41
42 messages.append(
43 {
44 "role": "assistant",
45 "tool_calls": [
46 {
47 "id": tool_call.id,
48 "type": "function",
49 "function": {
50 "name": tool_call.function.name,
51 "arguments": tool_call.function.arguments,
52 },
53 }
54 ],
55 }
56 )
57 messages.append(
58 {
59 "role": "tool",
60 "tool_call_id": tool_call.id,
61 "content": json.dumps(weather),
62 }
63 )
64
65 final = client.chat.completions(
66 model="sarvam-105b",
67 messages=messages,
68 tools=tools,
69 )
70 print(final.choices[0].message.content)

A tool-call response looks like:

1{
2 "choices": [
3 {
4 "index": 0,
5 "finish_reason": "tool_calls",
6 "message": {
7 "role": "assistant",
8 "content": null,
9 "tool_calls": [
10 {
11 "id": "call_abc123",
12 "type": "function",
13 "function": {
14 "name": "get_weather",
15 "arguments": "{\"city\": \"Mumbai\", \"unit\": \"celsius\"}"
16 }
17 }
18 ]
19 }
20 }
21 ]
22}

Controlling tool use with tool_choice

ValueBehavior
"auto" (default when tools are provided)The model decides whether to call a tool or reply directly
"none"The model never calls a tool — tools are ignored
"required"The model must call at least one tool
{"type": "function", "function": {"name": "get_weather"}}Forces the model to call the named function

function.arguments is a JSON string, not an object — always parse it (and validate against your schema) before executing the function.

Structured Outputs (JSON)

The Chat Completions API supports the OpenAI-compatible response_format parameter for getting reliably structured JSON:

response_formatBehavior
{"type": "json_schema", "json_schema": {...}}Structured Outputs — output is constrained to match the JSON Schema you supply (recommended)
{"type": "json_object"}JSON mode — output is guaranteed to be valid JSON, but not a specific schema
{"type": "text"} (default)Plain text output

Structured Outputs with json_schema

Pass a JSON Schema under json_schema.schema, and set "strict": true to enforce adherence. The structured reply arrives as a JSON string in message.content — parse it before use.

1import json
2from sarvamai import SarvamAI
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {
10 "role": "user",
11 "content": "Order: 2 masala dosas and 1 filter coffee to Koramangala, Bengaluru.",
12 }
13 ],
14 request_options={
15 "additional_body_parameters": {
16 "response_format": {
17 "type": "json_schema",
18 "json_schema": {
19 "name": "food_order",
20 "strict": True,
21 "schema": {
22 "type": "object",
23 "properties": {
24 "items": {
25 "type": "array",
26 "items": {
27 "type": "object",
28 "properties": {
29 "name": {"type": "string"},
30 "quantity": {"type": "integer"},
31 },
32 "required": ["name", "quantity"],
33 "additionalProperties": False,
34 },
35 },
36 "city": {"type": "string"},
37 },
38 "required": ["items", "city"],
39 "additionalProperties": False,
40 },
41 },
42 }
43 }
44 },
45)
46
47order = json.loads(response.choices[0].message.content)
48print(order)
49# {'items': [{'name': 'masala dosa', 'quantity': 2}, {'name': 'filter coffee', 'quantity': 1}], 'city': 'Bengaluru'}

In the current Python SDK, pass response_format through request_options={"additional_body_parameters": {...}} as shown above. The JavaScript SDK forwards response_format from the request object as-is.

The json_schema object accepts:

FieldTypeDescription
namestring (required)Name of the response format. Alphanumeric characters, underscores and dashes only
schemaobjectThe output structure, described as a JSON Schema object
strictbooleanEnable strict schema adherence when generating the output (default false)
descriptionstringWhat the format is for — helps the model decide how to respond

JSON mode with json_object

When you only need valid JSON without enforcing a specific structure, use {"type": "json_object"} and describe the desired shape in your prompt:

$curl -X POST https://api.sarvam.ai/v1/chat/completions \
> -H "api-subscription-key: $SARVAM_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "sarvam-105b",
> "messages": [
> {"role": "system", "content": "Reply with a JSON object: {\"sentiment\": \"positive\" | \"negative\" | \"neutral\", \"confidence\": number}"},
> {"role": "user", "content": "यह फिल्म शानदार थी!"}
> ],
> "response_format": {"type": "json_object"}
> }'

Even with Structured Outputs, validate the parsed JSON against your expected schema (e.g. with pydantic or zod) before acting on it — the schema constrains the model’s output shape, but your application logic may have stricter requirements (value ranges, business rules, etc.).

Alternative: Tool calling as a JSON schema

If your workflow is already built around tool calling, you can also get structured output by defining a single tool whose parameters schema describes the structure you want, and forcing it with tool_choice. The model’s arguments are then constrained to the schema.

1import json
2from sarvamai import SarvamAI
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {
10 "role": "user",
11 "content": "Order: 2 masala dosas and 1 filter coffee to Koramangala, Bengaluru.",
12 }
13 ],
14 tools=[
15 {
16 "type": "function",
17 "function": {
18 "name": "extract_order",
19 "description": "Extract a structured food order",
20 "parameters": {
21 "type": "object",
22 "properties": {
23 "items": {
24 "type": "array",
25 "items": {
26 "type": "object",
27 "properties": {
28 "name": {"type": "string"},
29 "quantity": {"type": "integer"},
30 },
31 "required": ["name", "quantity"],
32 },
33 },
34 "delivery_area": {"type": "string"},
35 "city": {"type": "string"},
36 },
37 "required": ["items", "city"],
38 },
39 },
40 }
41 ],
42 tool_choice={"type": "function", "function": {"name": "extract_order"}},
43)
44
45arguments = response.choices[0].message.tool_calls[0].function.arguments
46order = json.loads(arguments)
47print(order)
48# {'items': [{'name': 'masala dosa', 'quantity': 2}, {'name': 'filter coffee', 'quantity': 1}], 'delivery_area': 'Koramangala', 'city': 'Bengaluru'}

Alternative: Prompt-based JSON

For simple cases, you can also instruct the model to reply with JSON only, set a low temperature, and validate the output before using it (consider JSON mode instead, which guarantees valid JSON):

1import json
2from sarvamai import SarvamAI
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {
10 "role": "system",
11 "content": (
12 "Reply with a single JSON object only — no prose, no markdown fences. "
13 'Schema: {"sentiment": "positive" | "negative" | "neutral", "confidence": number}'
14 ),
15 },
16 {"role": "user", "content": "यह फिल्म शानदार थी!"},
17 ],
18 temperature=0.1,
19)
20
21raw = response.choices[0].message.content
22try:
23 result = json.loads(raw)
24except json.JSONDecodeError:
25 # Retry, or strip markdown fences / extra text before parsing
26 raise
27print(result)

Always validate model-produced JSON against your expected schema (e.g. with pydantic or zod) and add a retry path — prompt-based JSON is good, but not guaranteed.

API Response Format

Success Response Structure

1{
2 "id": "chatcmpl-abc123",
3 "object": "chat.completion",
4 "created": 1699000000,
5 "model": "sarvam-105b",
6 "choices": [
7 {
8 "index": 0,
9 "message": {
10 "role": "assistant",
11 "content": "The capital of India is New Delhi. It has been the capital since 1931."
12 },
13 "finish_reason": "stop"
14 }
15 ],
16 "usage": {
17 "prompt_tokens": 15,
18 "completion_tokens": 25,
19 "total_tokens": 40
20 }
21}

Response Fields

FieldTypeDescription
idstringUnique identifier for the completion request
objectstringAlways "chat.completion"
createdintegerUnix timestamp when the completion was created
modelstringThe model used for completion
choices[].indexintegerIndex of the choice in the list
choices[].message.rolestringAlways "assistant"
choices[].message.contentstringThe generated text response (null when the model calls a tool)
choices[].message.reasoning_contentstringThinking steps (only when reasoning_effort is set)
choices[].message.tool_callsarrayTool invocations requested by the model (only when using tool calling)
choices[].finish_reasonstringWhy generation stopped: "stop", "length", "tool_calls", "content_filter"
usage.prompt_tokensintegerTokens in the input prompt
usage.completion_tokensintegerTokens in the generated response
usage.total_tokensintegerTotal tokens used (prompt + completion)

Error Responses

All errors return a JSON object with an error field (message, code, request_id). The full error-code table, retry guidance, and SDK exception reference live on the central Errors & Troubleshooting page.

Errors specific to this endpoint:

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing messages array or missing model fieldInclude both model and a valid messages array with role/content
422unprocessable_entity_errorInvalid model name or parameter valuesCheck temperature (0-2), model name, etc.
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.chat.completions(
8 model="sarvam-105b",
9 messages=[
10 {"role": "user", "content": "What is the capital of India?"}
11 ],
12 )
13 print(response.choices[0].message.content)
14except ApiError as e:
15 if e.status_code == 400:
16 print(f"Bad request: {e.body}")
17 elif e.status_code == 403:
18 print("Invalid API key. Check your credentials.")
19 elif e.status_code == 422:
20 print(f"Invalid parameters: {e.body}")
21 elif e.status_code == 429:
22 print("Rate limit exceeded. Wait and retry.")
23 else:
24 print(f"Error {e.status_code}: {e.body}")

Limits

LimitValue
Context window64K tokens (sarvam-30b) / 128K tokens (sarvam-105b)
max_tokenssarvam-30b: Starter 4096 / Pro 8192 / Business 64000
sarvam-105b: Starter 4096 / Pro 16384 / Business 128000
(reasoning tokens count toward completion tokens)
temperature0–2 (default 0.5 when reasoning is enabled — the default — and 0.2 when reasoning is disabled)
top_p0–1
n (completions per request)1–128
frequency_penalty / presence_penalty-2 to 2
stopUp to 4 sequences
Rate limitsSee Rate Limits

Check out our detailed API Reference to explore Chat Completion and all available options.