For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
CommunityAPI StatusAPI PricingSign Up
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
DocumentationAPI ReferencesCookbookIntegrationDeveloper Tools
  • Getting Started
    • Welcome
    • Quickstart
    • SDKs & Libraries
    • Building for Indian Languages
    • Models
    • Credits & Rate Limits
    • Errors & Troubleshooting
    • Talk to us
    • Pricing
    • Changelog
  • API Guides & Tutorials
      • Overview
        • List your chat messages
        • Control response randomness
        • Control response diversity
        • Adjust the model's thinking level
        • Improve response factual accuracy
        • Encourage new topics in response
        • Reduce repetition words or phrases in response
        • Get repeatable results
        • Control the response length
        • Control where the model stops
LogoLogo
CommunityAPI StatusAPI PricingSign Up
On this page
  • Allowed values:
  • 💡 Tips:
API Guides & TutorialsChat CompletionHow-to

How to adjust the model’s thinking level with reasoning_effort

||View as Markdown|
Was this page helpful?
Previous

How to improve factual accuracy with wiki_grounding

Next
Built with

The reasoning_effort parameter controls how much effort the model puts into reasoning and planning its response.

  • Higher effort → more thoughtful, step-by-step, or structured answers
  • Lower effort → faster, simpler replies

Allowed values:

ValueBehavior
"low"Quick, simple replies
"medium"Balanced depth and speed (default value)
"high"More detailed reasoning and structured answers
NoneDisables reasoning completely

💡 Tips:

  • Use "low" when you want short, direct responses.
  • Use "medium" for balanced performance (this is the default).
  • Use "high" for tasks like explanations, problem solving, reasoning.
  • Use None to completely disable reasoning when you want the fastest possible responses.

Note:

  • Setting higher reasoning effort may increase response time slightly, since the model is thinking more.

First, install the SDK:

$pip install -Uqq sarvamai

Then use the following Python code:

1from sarvamai import SarvamAI
2
3# Initialize the SarvamAI client with your API key
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6# Example 1: Using default reasoning_effort (not specified) — defaults to "medium"
7response = client.chat.completions(
8 model="sarvam-105b",
9 messages=[
10 {"role": "system", "content": "You are a helpful assistant."},
11 {"role": "user", "content": "Summarize the story of the Ramayana."}
12 ],
13 # reasoning_effort not specified → defaults to "medium"
14)
15
16print(response.choices[0].message.content)
1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5# Example 2: Using reasoning_effort = "high" — more detailed, thoughtful response
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {"role": "system", "content": "You are a helpful assistant."},
10 {"role": "user", "content": "Summarize the story of the Ramayana."}
11 ],
12 reasoning_effort="high"
13)
14
15# Receive assistant's reply as output
16print(response.choices[0].message.content)
1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5# Example 3: Disabling reasoning completely with reasoning_effort = None
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {"role": "system", "content": "You are a helpful assistant."},
10 {"role": "user", "content": "What is the capital of India?"}
11 ],
12 reasoning_effort=None # Disables reasoning for fastest response
13)
14
15# Receive assistant's reply as output
16print(response.choices[0].message.content)