Chat Completion API Using Sarvam Model

Overview

This notebook demonstrates how to use the Sarvam Chat Completion API to generate context-aware conversational responses. We will explore how to send messages, customize parameters like temperature and reasoning effort, and handle multi-turn conversations to build intelligent chat applications.

1. Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following command in your terminal:

1!pip install -Uqq sarvamai
2from sarvamai import SarvamAI

2. Set Up the API Endpoint and Payload

To use the Sarvam API, you need an API key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace "YOUR_API_KEY_HERE" with your actual API key.
1SARVAM_API_KEY = "YOUR_API_KEY_HERE"

2.1 Initialize the Client

Create a Sarvam client instance using your API key. This client will be used to interact with the Chat Completion API.

1client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

3. Making the API Request

3.1 Define Your Chat Messages

The messages parameter you send to the chat.completions() method must be a list of message objects, each with a role and content.

The role defines who is “speaking” at each turn of the conversation. There are three possible roles:

  • Used to set the behavior, tone, or instructions for the assistant.

2. "user" (Required)

  • Represents what the user asks or says.
  • You can have one or more user messages, especially in a multi-turn conversation.

3. "assistant" (Optional, only for context in multi-turn)

  • Represents what the assistant previously said.
  • Used when maintaining context in multi-turn conversations.
1messages = [
2 {"role": "system", "content": "You are a helpful assistant."},
3 {"role": "user", "content": "What is the capital of India?"},
4]

3.2 Send the Request and Display the Response

Use the SDK’s chat.completions() method to send your messages and receive the assistant’s reply.

1response = client.chat.completions(
2 messages=messages,
3)
4
5# Extract and print the assistant's reply
6reply = response.choices[0].message.content
7print("Response:", reply)

4. Essential Parameters

ParameterTypeRequiredDefaultDescription
modelstring (enum)YesModel ID to use, e.g., sarvam-m.
messageslist of objectsYesConversation messages with roles (system, user, assistant).
temperaturefloatNo0.2Controls randomness (0 to 2). Higher = more random output.
top_pfloatNo1Nucleus sampling (0 to 1). Alternative to temperature.
reasoning_effortenumNoDepth of reasoning: low, medium, high.
wiki_groundingbooleanNofalseEnables retrieval from Wikipedia for factual answers.

Key Considerations

  • Maximum context length: 4096 or 8192 tokens (depending on model).
  • Temperature range: 0 to 2
    • Non-thinking mode: 0.2 (recommended for straightforward responses)
    • Thinking mode: 0.5 or higher (recommended for deeper reasoning)
  • Top-p range: 0 to 1 (use either temperature or top_p, not both).
  • Reasoning effort: Setting any value enables thinking mode. Higher values increase reasoning depth.
  • Enable wiki_grounding for factual queries requiring Wikipedia-based references.

5. Example Codes

5.1: Basic Chat Completion

This example demonstrates a simple single-turn chat completion where the user asks a question and the model responds.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6response = client.chat.completions(messages=[
7 {"role": "user", "content": "Hey, what is the capital of India?"}
8])
9print(response)

5.2: Multi-turn Conversation

This example shows how to maintain context by including previous messages in a multi-turn conversation.

1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5response = client.chat.completions(
6 messages=[
7 {"role": "system", "content": "You are a helpful assistant."},
8 {"role": "user", "content": "Tell me about Indian classical music."},
9 {"role": "assistant", "content": "Indian classical music is one of the oldest musical traditions in the world..."},
10 {"role": "user", "content": "What are the two main styles?"}
11 ],
12 temperature=0.7,
13 reasoning_effort="high"
14)
15print(response.choices[0].message.content)

5.3: Wikipedia Grounded Query

This example demonstrates enabling wiki grounding to fetch fact-based answers using Wikipedia references.

1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5response = client.chat.completions(
6 messages=[
7 {"role": "user", "content": "What is the history of the Taj Mahal?"}
8 ],
9 temperature=0.2,
10 top_p=1,
11 wiki_grounding=True
12)
13print(response.choices[0].message.content)

6. Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure, and parameters.
  • 422 Unprocessable Entity Request (unprocessable_entity_error)

    • Cause: Unable to detect the language of the input text.
    • Solution: Explicitly pass the source_language_code parameter with a supported language.

7. Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:

8. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀