Sarvam Translate API Tutorial

Sarvam Translate API Tutorial

This tutorial demonstrates how to use the Sarvam Translate API to translate texts/paragraphs from one language to another. The API supports additional features such as transliteration (a type of conversion of a text from one script to another that involves swapping letters), output_script and gender.

Table of Contents

  1. Installation
  2. Authentication
  3. Basic Usage
  4. Translation Modes
  5. Advanced Features
  6. Error Handling
  7. Additional Resources
  8. Final Notes

1. Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:

1pip install requests
1import requests

2. Authentication

To use the Sarvam API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
1SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

3. Basic Usage

3.1. Read the Document

We have two sample documents under the data folder:

  • sample1.txt contains an essay on The Impact of Artificial Intelligence on Society in English.
  • sample2.txt contains an essay on The Impact of Artificial Intelligence on Society in Hindi.
1def read_file(file_path, lang_name):
2 try:
3 with open(file_path, "r", encoding="utf-8") as file:
4 # Read the first 5 lines
5 lines = [next(file) for _ in range(5)]
6 print(f"=== {lang_name} Text (First Few Lines) ===")
7 print("".join(lines)) # Print first few lines
8
9 # Read the remaining content
10 remaining_text = file.read()
11
12 # Combine all text
13 full_doc = "".join(lines) + remaining_text
14
15 # Count total characters
16 total_chars = len(full_doc)
17 print(f"\nTotal number of characters in {lang_name} file:", total_chars)
18
19 return full_doc
20 except FileNotFoundError:
21 print(f"Error: {file_path} not found.")
22 return None
23 except Exception as e:
24 print(f"An error occurred while reading {file_path}: {e}")
25 return None
1# Read English and Hindi documents
2english_doc = read_file("data/sample1.txt", "English")
3hindi_doc = read_file("data/sample2.txt", "Hindi")

3.2. Split the text into chunks

Since the API has a restriction of 1000 characters per request, we need to split the text accordingly.

1def chunk_text(text, max_length=1000):
2 """Splits text into chunks of at most max_length characters while preserving word boundaries."""
3 chunks = []
4
5 while len(text) > max_length:
6 split_index = text.rfind(" ", 0, max_length) # Find the last space within limit
7 if split_index == -1:
8 split_index = max_length # No space found, force split at max_length
9
10 chunks.append(text[:split_index].strip()) # Trim spaces before adding
11 text = text[split_index:].lstrip() # Remove leading spaces for the next chunk
12
13 if text:
14 chunks.append(text.strip()) # Add the last chunk
15
16 return chunks
1# Split the text
2english_text_chunks = chunk_text(english_doc)
3
4# Display chunk info
5print(f"Total Chunks: {len(english_text_chunks)}")
6for i, chunk in enumerate(english_text_chunks[:3], 1): # Show only first 3 chunks for preview
7 print(f"\n=== Chunk {i} (Length: {len(chunk)}) ===\n{chunk}")

3.3. Setting up the API Endpoint

There are three main types of translations supported:

  1. English to Indic - Translating from English to Indian languages
  2. Indic to English - Converting Indian languages to English
  3. Indic to Indic - Translating between Indian languages

Indic to English Translation

1# Define API request details
2url = "https://api.sarvam.ai/translate"
3headers = {
4 "api-subscription-key": SARVAM_API_KEY,
5 "Content-Type": "application/json"
6}
7
8# Send requests for each chunk
9translated_texts = []
10for idx, chunk in enumerate(hindi_text_chunks):
11 payload = {
12 "source_language_code": "hi-IN",
13 "target_language_code": "en-IN",
14 "speaker_gender": "Male",
15 "mode": "classic-colloquial",
16 "model": "mayura:v1",
17 "enable_preprocessing": False,
18 "input": chunk
19 }
20
21 response = requests.post(url, json=payload, headers=headers)
22
23 if response.status_code == 200:
24 translated_text = response.json().get("translated_text", "Translation not available")
25 translated_texts.append(translated_text)
26 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
27 else:
28 print(f"Error: {response.status_code}, {response.text}")
29
30# Combine all translated chunks
31final_translation = "\n".join(translated_texts)
32print("\n=== Final Translated Text ===")
33print(final_translation)

Indic to Indic Translation

1# Define API request details
2url = "https://api.sarvam.ai/translate"
3headers = {
4 "api-subscription-key": SARVAM_API_KEY,
5 "Content-Type": "application/json"
6}
7
8# Send requests for each chunk
9translated_texts = []
10for idx, chunk in enumerate(hindi_text_chunks):
11 payload = {
12 "source_language_code": "hi-IN",
13 "target_language_code": "bn-IN",
14 "speaker_gender": "Male",
15 "mode": "classic-colloquial",
16 "model": "mayura:v1",
17 "enable_preprocessing": False,
18 "input": chunk
19 }
20
21 response = requests.post(url, json=payload, headers=headers)
22
23 if response.status_code == 200:
24 translated_text = response.json().get("translated_text", "Translation not available")
25 translated_texts.append(translated_text)
26 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
27 else:
28 print(f"Error: {response.status_code}, {response.text}")
29
30# Combine all translated chunks
31final_translation = "\n".join(translated_texts)
32print("\n=== Final Translated Text ===")
33print(final_translation)

English to Indic Translation

1# Define API request details
2url = "https://api.sarvam.ai/translate"
3headers = {
4 "api-subscription-key": SARVAM_API_KEY,
5 "Content-Type": "application/json"
6}
7
8# Send requests for each chunk
9translated_texts = []
10for idx, chunk in enumerate(english_text_chunks):
11 payload = {
12 "source_language_code": "en-IN",
13 "target_language_code": "pa-IN",
14 "speaker_gender": "Male",
15 "mode": "classic-colloquial",
16 "model": "mayura:v1",
17 "enable_preprocessing": False,
18 "input": chunk
19 }
20
21 response = requests.post(url, json=payload, headers=headers)
22
23 if response.status_code == 200:
24 translated_text = response.json().get("translated_text", "Translation not available")
25 translated_texts.append(translated_text)
26 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
27 else:
28 print(f"Error: {response.status_code}, {response.text}")
29
30# Combine all translated chunks
31final_translation = "\n".join(translated_texts)
32print("\n=== Final Translated Text ===")
33print(final_translation)

4. Translation Modes

Translation Modes & Differences

  1. Formal – Highly professional, uses pure Hindi (e.g., “कुल राशि”, “देय है”). Suitable for official documents, legal papers, and corporate communication.
  2. Classic-Colloquial – Balanced mix of Hindi & English, slightly informal (e.g., “कुल जोड़”, “देना होगा”). Ideal for business emails, customer support, and semi-formal communication.
  3. Modern-Colloquial – Hinglish, casual, and direct (e.g., “Invoice total”, “due है”, “contact करो”). Best for chatbots, social media, and casual conversations.

Rule of Thumb:

  • Use Formal for official content
  • Use Classic-Colloquial for general communication
  • Use Modern-Colloquial for everyday conversations
1# To highlight the difference between the models lets use the below example.
2full_text = (
3 "The invoice total is $3,450.75, due by 15th March 2025. Contact us at support@example.com for queries. "
4 "Order #987654321 was placed on 02/29/2024. Your tracking ID is TRK12345678."
5)
1# Define API request details
2url = "https://api.sarvam.ai/translate"
3headers = {
4 "api-subscription-key": SARVAM_API_KEY,
5 "Content-Type": "application/json"
6}

4.1. Classic Colloquial

1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Male",
6 "mode": "classic-colloquial",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "input": full_text
10}
11
12# Send the request
13response = requests.post(url, json=payload, headers=headers)
14
15# Check the response
16if response.status_code == 200:
17 translated_text = response.json().get("translated_text", "Translation not available")
18 print("\n=== Translated Text ===\n", translated_text)
19else:
20 print(f"Error: {response.status_code}, {response.text}")

4.2. Formal

1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Male",
6 "mode": "formal",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "input": full_text
10}
11
12# Send the request
13response = requests.post(url, json=payload, headers=headers)
14
15# Check the response
16if response.status_code == 200:
17 translated_text = response.json().get("translated_text", "Translation not available")
18 print("\n=== Translated Text ===\n", translated_text)
19else:
20 print(f"Error: {response.status_code}, {response.text}")

4.3. Modern Colloquial

1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Male",
6 "mode": "modern-colloquial",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "input": full_text
10}
11
12# Send the request
13response = requests.post(url, json=payload, headers=headers)
14
15# Check the response
16if response.status_code == 200:
17 translated_text = response.json().get("translated_text", "Translation not available")
18 print("\n=== Translated Text ===\n", translated_text)
19else:
20 print(f"Error: {response.status_code}, {response.text}")

5. Advanced Features

5.1. Speaker Gender

The translation model supports Male and Female speaker options, which impact the tone and style of the output.

Female

1payload = {
2 "source_language_code": "en-IN",
3 "target_language_code": "hi-IN",
4 "speaker_gender": "Female",
5 "mode": "modern-colloquial",
6 "model": "mayura:v1",
7 "enable_preprocessing": False,
8 "input": full_text
9}
10
11# Send the request
12response = requests.post(url, json=payload, headers=headers)
13
14# Check the response
15if response.status_code == 200:
16 translated_text = response.json().get("translated_text", "Translation not available")
17 print("\n=== Translated Text ===\n", translated_text)
18else:
19 print(f"Error: {response.status_code}, {response.text}")

Male

1payload = {
2 "source_language_code": "en-IN",
3 "target_language_code": "hi-IN",
4 "speaker_gender": "Male",
5 "mode": "modern-colloquial",
6 "model": "mayura:v1",
7 "enable_preprocessing": False,
8 "input": full_text
9}
10
11response = requests.post(url, json=payload, headers=headers)
12if response.status_code == 200:
13 translated_text = response.json().get("translated_text", "Translation not available")
14 print("\n=== Translated Text ===\n", translated_text)
15else:
16 print(f"Error: {response.status_code}, {response.text}")

5.2. Numerals Format Feature

The numerals_format parameter controls how numbers appear in the translation. It has two options:

  1. International (Default) - Uses standard 0-9 numerals. Example: “मेरा phone number है: 9840950950.” Best for universally understood content, technical documents, and modern usage.

  2. Native - Uses language-specific numerals. Example: “मेरा phone number है: ९८४०९५०९५०.” Ideal for traditional texts, cultural adaptation, and regional content.

When to Use What?

  • Use International for wider readability and digital content
  • Use Native for localized, heritage-focused, and print media content

Native

1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Female",
6 "mode": "modern-colloquial",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "numerals_format": "native",
10 "input": full_text
11}
12
13# Send the request
14response = requests.post(url, json=payload, headers=headers)
15
16# Check the response
17if response.status_code == 200:
18 translated_text = response.json().get("translated_text", "Translation not available")
19 print("\n=== Translated Text ===\n", translated_text)
20else:
21 print(f"Error: {response.status_code}, {response.text}")

International

1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Female",
6 "mode": "modern-colloquial",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "numerals_format": "international",
10 "input": full_text
11}
12
13# Send the request
14response = requests.post(url, json=payload, headers=headers)
15
16# Check the response
17if response.status_code == 200:
18 translated_text = response.json().get("translated_text", "Translation not available")
19 print("\n=== Translated Text ===\n", translated_text)
20else:
21 print(f"Error: {response.status_code}, {response.text}")

5.3. Output Script Feature

The output_script parameter controls how the translated text is transliterated, i.e., how it appears in different scripts while keeping pronunciation intact.

Transliteration Options:

  1. Default (null) – No transliteration applied. Example: “आपका Rs. 3000 का EMI pending है।” Best for modern, mixed-language content.

  2. Roman – Converts the output into Romanized Hindi. Example: “aapka Rs. 3000 ka EMI pending hai.” Ideal for users who can speak but not read native scripts.

  3. Fully-Native – Uses formal native script transliteration. Example: “आपका रु. 3000 का ई.एम.ऐ. पेंडिंग है।” Best for official documents and structured writing.

  4. Spoken-Form-in-Native – Uses native script but mimics spoken style. Example: “आपका थ्री थाउजेंड रूपीस का ईएमअइ पेंडिंग है।” Ideal for voice assistants, conversational AI, and informal speech.

When to Use What?

  • Default – For natural, mixed-language modern writing
  • Roman – For users unfamiliar with native scripts
  • Fully-Native – For formal, structured translations
  • Spoken-Form-in-Native – For casual speech and voice applications
1payload = {
2 "source_language_code": "en-IN",
3 "target_language_code": "hi-IN",
4 "speaker_gender": "Female",
5 "mode": "modern-colloquial",
6 "model": "mayura:v1",
7 "enable_preprocessing": False,
8 "output_script":"roman",
9 "numerals_format": "international",
10 "input": full_text
11}
12
13# Send the request
14response = requests.post(url, json=payload, headers=headers)
15
16# Check the response
17if response.status_code == 200:
18 translated_text = response.json().get("translated_text", "Translation not available")
19 print("\n=== Translated Text ===\n", translated_text)
20else:
21 print(f"Error: {response.status_code}, {response.text}")
1payload = {
2 "source_language_code": "en-IN",
3 "target_language_code": "hi-IN",
4 "speaker_gender": "Female",
5 "mode": "modern-colloquial",
6 "model": "mayura:v1",
7 "enable_preprocessing": False,
8 "output_script":"spoken-form-in-native",
9 "numerals_format": "international",
10 "input": full_text
11}
12
13# Send the request
14response = requests.post(url, json=payload, headers=headers)
15
16# Check the response
17if response.status_code == 200:
18 translated_text = response.json().get("translated_text", "Translation not available")
19 print("\n=== Translated Text ===\n", translated_text)
20else:
21 print(f"Error: {response.status_code}, {response.text}")
1# Create the request payload
2payload = {
3 "source_language_code": "en-IN",
4 "target_language_code": "hi-IN",
5 "speaker_gender": "Female",
6 "mode": "modern-colloquial",
7 "model": "mayura:v1",
8 "enable_preprocessing": False,
9 "output_script":"fully-native",
10 "numerals_format": "international",
11 "input": full_text
12}
13
14# Send the request
15response = requests.post(url, json=payload, headers=headers)
16
17# Check the response
18if response.status_code == 200:
19 translated_text = response.json().get("translated_text", "Translation not available")
20 print("\n=== Translated Text ===\n", translated_text)
21else:
22 print(f"Error: {response.status_code}, {response.text}")

6. Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure and parameters.

7. Additional Resources

For more details, refer to our official documentation and we are always there to support and help you on our Discord Server:

8. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀