How to control response diversity with top_p

View as Markdown

The top_p parameter controls how much of the probability space the model uses when selecting the next word — this is called nucleus sampling.

Range: 0 to 1
Default: 1.0

  • Lower top_p → model chooses from a smaller set of highly likely words → more focused
  • Higher top_p → model chooses from a broader set of words → more diverse

When to use:

top_p valueBehavior
0.1Very focused, only top 10% words used
0.3Controlled diversity
0.5Balanced creativity and accuracy
0.8 - 1.0Very creative, open-ended responses
1.0 (default)Full probability space used

First, install the SDK:

$pip install -Uqq sarvamai

Then use the following Python code:

1from sarvamai import SarvamAI
2
3# Initialize the SarvamAI client with your API key
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6# Example 1: Using default top_p (1.0) — full probability space (diverse response)
7response = client.chat.completions(
8 model="sarvam-105b",
9 messages=[
10 {"role": "system", "content": "You are a helpful assistant."},
11 {"role": "user", "content": "What is the capital of France?"}
12 ],
13 # top_p is not specified → uses default 1.0
14)
15
16print(response.choices[0].message.content)
1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5# Example 2: Using top_p = 0.3 — more focused, controlled response
6response = client.chat.completions(
7 model="sarvam-105b",
8 messages=[
9 {"role": "system", "content": "You are a creative storyteller."},
10 {"role": "user", "content": "Tell me a story about a magical tiger."}
11 ],
12 top_p=0.3
13)
14
15# Receive assistant's reply as output
16print(response.choices[0].message.content)