Chat Completion API Using Sarvam Model
Overview
This notebook demonstrates how to use the Sarvam Chat Completion API to generate context-aware conversational responses. We will explore how to send messages, customize parameters like temperature and reasoning effort, and handle multi-turn conversations to build intelligent chat applications.
1. Installation
Before you begin, ensure you have the necessary Python libraries installed. Run the following command in your terminal:
2. Set Up the API Endpoint and Payload
To use the Sarvam API, you need an API key. Follow these steps to set up your API key:
- Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
- Replace the placeholder key: In the code below, replace
"YOUR_API_KEY_HERE"
with your actual API key.
2.1 Initialize the Client
Create a Sarvam client instance using your API key. This client will be used to interact with the Chat Completion API.
3. Making the API Request
3.1 Define Your Chat Messages
The messages
parameter you send to the chat.completions()
method must be a list of message objects, each with a role
and content
.
The role defines who is “speaking” at each turn of the conversation. There are three possible roles:
1. "system"
(Optional, but Recommended)
- Used to set the behavior, tone, or instructions for the assistant.
2. "user"
(Required)
- Represents what the user asks or says.
- You can have one or more user messages, especially in a multi-turn conversation.
3. "assistant"
(Optional, only for context in multi-turn)
- Represents what the assistant previously said.
- Used when maintaining context in multi-turn conversations.
3.2 Send the Request and Display the Response
Use the SDK’s chat.completions()
method to send your messages and receive the assistant’s reply.
4. Essential Parameters
Key Considerations
- Maximum context length: 4096 or 8192 tokens (depending on model).
- Temperature range: 0 to 2
- Non-thinking mode: 0.2 (recommended for straightforward responses)
- Thinking mode: 0.5 or higher (recommended for deeper reasoning)
- Top-p range: 0 to 1 (use either
temperature
ortop_p
, not both). - Reasoning effort: Setting any value enables thinking mode. Higher values increase reasoning depth.
- Enable
wiki_grounding
for factual queries requiring Wikipedia-based references.
5. Example Codes
5.1: Basic Chat Completion
This example demonstrates a simple single-turn chat completion where the user asks a question and the model responds.
5.2: Multi-turn Conversation
This example shows how to maintain context by including previous messages in a multi-turn conversation.
5.3: Wikipedia Grounded Query
This example demonstrates enabling wiki grounding to fetch fact-based answers using Wikipedia references.
6. Error Handling
You may encounter these errors while using the API:
-
403 Forbidden (
invalid_api_key_error
)- Cause: Invalid API key.
- Solution: Use a valid API key from the Sarvam AI Dashboard.
-
429 Too Many Requests (
insufficient_quota_error
)- Cause: Exceeded API quota.
- Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
-
500 Internal Server Error (
internal_server_error
)- Cause: Issue on our servers.
- Solution: Try again later. If persistent, contact support.
-
400 Bad Request (
invalid_request_error
)- Cause: Incorrect request formatting.
- Solution: Verify your request structure, and parameters.
-
422 Unprocessable Entity Request (
unprocessable_entity_error
)- Cause: Unable to detect the language of the input text.
- Solution: Explicitly pass the source_language_code parameter with a supported language.
7. Additional Resources
For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:
- Documentation: docs.sarvam.ai
- Community: Join the Discord Community
8. Final Notes
- Keep your API key secure.
- Use clear audio for best results.
- Explore advanced features like diarization and translation.
Keep Building! 🚀