Translate API using Mayura and Sarvam-Translate Models

Overview

This tutorial demonstrates how to use the Sarvam translate API to translate texts/paragraphs from one language to another. The API supports additional features such as transliteration(a type of conversion of a text from one script to another that involves swapping letters), output_Script and gender .

1. Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:

1!pip install -Uqq sarvamai
1from sarvamai import SarvamAI

2. Authentication

To use the API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
1SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

3. Understanding the Parameters

🔹 The API takes several key parameters:

ParameterDescriptionMayura:v1Sarvam-Translate:v1
inputThe text to translate (max character limit)1000 characters2000 characters
source_language_codeLanguage of the input textBengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, TeluguAll Mayura:v1 languages + Assamese, Bodo, Dogri, Konkani, Kashmiri, Maithili, Manipuri, Nepali, Sanskrit, Santali, Sindhi, Urdu
target_language_codeTarget language for translationSame as sourceSame as source
speaker_genderGender of the speaker for better contextual translationSupportedSupported
modeTone or style of translationSupports formal, classic-colloquial, modern-colloquialformal supported
numerals_formatFormat for numerals in translationinternational (0-9) or native (e.g., १-९)international (0-9) or native (e.g., १-९)

language_code (String) – Newly added languages. Supported values:

  • "as-IN" (Assamese - India)
  • "brx-IN" (Bodo- India)
  • "doi-IN" (Dogri - India)
  • "kok-IN" (Konkani - India)
  • "ks-IN" (Kashmiri - India)
  • "mai-IN" (Maithili - India)
  • "mni-IN" (Manipuri (Meiteilon) - India)
  • "ne-IN" (Nepali - India)
  • "sa-IN" (Sanskrit - India)
  • "sat-IN" (Santali - India)
  • "sd-IN" (Sindhi - India)
  • "ur-IN" (Urdu - India)

4. Basic Usage

4.1: Read the Document

We have two sample documents under the data folder:

1def read_file(file_path, lang_name):
2 try:
3 with open(file_path, "r", encoding="utf-8") as file:
4 # Read the first 5 lines
5 lines = [next(file) for _ in range(5)]
6 print(f"=== {lang_name} Text (First Few Lines) ===")
7 print("".join(lines)) # Print first few lines
8
9 # Read the remaining content
10 remaining_text = file.read()
11
12 # Combine all text
13 full_doc = "".join(lines) + remaining_text
14
15 # Count total characters
16 total_chars = len(full_doc)
17 print(f"\nTotal number of characters in {lang_name} file:", total_chars)
18
19 return full_doc
20 except FileNotFoundError:
21 print(f"Error: {file_path} not found.")
22 return None
23 except Exception as e:
24 print(f"An error occurred while reading {file_path}: {e}")
25 return None
1# Read English and Hindi documents
2english_doc = read_file("sample1.txt", "English")
3hindi_doc = read_file("sample2.txt", "Hindi")

4.2: Split the input text into chunks based on model limits

For Mayura:v1, the API has a maximum chunk size of 1000 characters per request.

For Sarvam-Translate:v1, the API has a maximum chunk size of 2000 characters per request.

we need to split the text accordingly.

1def chunk_text(text, max_length=2000):
2 """Splits text into chunks of at most max_length characters while preserving word boundaries."""
3 chunks = []
4
5 while len(text) > max_length:
6 split_index = text.rfind(" ", 0, max_length) # Find the last space within limit
7 if split_index == -1:
8 split_index = max_length # No space found, force split at max_length
9
10 chunks.append(text[:split_index].strip()) # Trim spaces before adding
11 text = text[split_index:].lstrip() # Remove leading spaces for the next chunk
12
13 if text:
14 chunks.append(text.strip()) # Add the last chunk
15
16 return chunks
1# Split the text
2english_text_chunks = chunk_text(english_doc)
3
4# Display chunk info
5print(f"Total Chunks: {len(english_text_chunks)}")
6for i, chunk in enumerate(
7 english_text_chunks[:3], 1
8): # Show only first 3 chunks for preview
9 print(f"\n=== Chunk {i} (Length: {len(chunk)}) ===\n{chunk}")
1# Split the text
2hindi_text_chunks = chunk_text(hindi_doc)
3
4# Display chunk info
5print(f"Total Chunks: {len(hindi_text_chunks)}")
6for i, chunk in enumerate(
7 hindi_text_chunks[:3], 1
8): # Show only first 3 chunks for preview
9 print(f"\n=== Chunk {i} (Length: {len(chunk)}) ===\n{chunk}")

4.3: Sample Hindi to Sanskrit Translation using Sarvam-Translate:v1

sarvam-translate:v1: Supports all 22 scheduled languages of India, formal mode only.

1# Sample Hindi text (can be up to 2000 characters per chunk for sarvam-translate:v1)
2hindi_text = "भारत एक महान देश है। इसकी संस्कृति बहुत पुरानी और समृद्ध है।"
3
4# Simple chunking for demonstration (no chunk exceeds 2000 characters)
5hindi_text_chunks = [hindi_text] # In real cases, you would split longer text here
6
7# Loop through each chunk and translate
8for idx, chunk in enumerate(hindi_text_chunks):
9 response = client.text.translate(
10 input=chunk,
11 source_language_code="hi-IN",
12 target_language_code="sa-IN",
13 speaker_gender="Male",
14 mode="formal",
15 model="sarvam-translate:v1",
16 )
17
18 # Print the translated output
19 print(f"Chunk {idx + 1} Translation:\n{response.translated_text}\n")

4.4: Setting up the API Endpoint using Sarvam-Translate model

There are three main types of translations we support:

1️⃣ English to Indic 🏛 → Translating from English to Indian languages (e.g., “Invoice total is 3,450.75.""इनवॉइसकुलराशि3,450.75." → "इनवॉइस कुल राशि 3,450.75 है।”)

2️⃣ Indic to English 🌍 → Converting Indian languages to English (e.g., “आपका ऑर्डर सफलतापूर्वक दर्ज किया गया है।” → “Your order has been successfully placed.”)

3️⃣ Indic to Indic 🔄 → Translating between Indian languages (e.g., Hindi → Tamil, Bengali → Marathi).

1# Initialize SarvamAI
2
3from sarvamai import SarvamAI
4
5client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

English to Indic Translation

1translated_texts = []
2for idx, chunk in enumerate(english_text_chunks):
3 response = client.text.translate(
4 input=chunk,
5 source_language_code="en-IN",
6 target_language_code="sa-IN",
7 speaker_gender="Male",
8 mode="formal",
9 model="sarvam-translate:v1",
10 enable_preprocessing=False,
11 )
12
13 translated_text = response.translated_text
14 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
15 translated_texts.append(translated_text)
16
17# Combine all translated chunks
18final_translation = "\n".join(translated_texts)
19print("\n=== Final Translated Text in Sanskrit ===")
20print(final_translation)

Indic to English Translation

1translated_texts = []
2for idx, chunk in enumerate(hindi_text_chunks):
3 response = client.text.translate(
4 input=chunk,
5 source_language_code="hi-IN",
6 target_language_code="sd-IN",
7 speaker_gender="Male",
8 mode="formal",
9 model="sarvam-translate:v1",
10 enable_preprocessing=False,
11 )
12
13 translated_text = response.translated_text
14 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
15 translated_texts.append(translated_text)
16
17# Combine all translated chunks
18final_translation = "\n".join(translated_texts)
19print("\n=== Final Translated Text in Sindhi ===")
20print(final_translation)

Indic to Indic Translation

1translated_texts = []
2for idx, chunk in enumerate(hindi_text_chunks):
3 response = client.text.translate(
4 input=chunk,
5 source_language_code="hi-IN",
6 target_language_code="bn-IN",
7 speaker_gender="Male",
8 mode="formal",
9 model="sarvam-translate:v1",
10 enable_preprocessing=False,
11 )
12
13 translated_text = response.translated_text
14 print(f"\n=== Translated Chunk {idx + 1} ===\n{translated_text}\n")
15 translated_texts.append(translated_text)
16
17# Combine all translated chunks
18final_translation = "\n".join(translated_texts)
19print("\n=== Translated Text Chunks in Bengali ===")
20print(final_translation)

5. Advanced Features

5.1: Translation Modes & Differences

1️⃣ Formal – Highly professional, uses pure Hindi (e.g., “कुल राशि”, “देय है”). Suitable for official documents, legal papers, and corporate communication.

2️⃣ Classic-Colloquial – Balanced mix of Hindi & English, slightly informal (e.g., “कुल जोड़”, “देना होगा”). Ideal for business emails, customer support, and semi-formal communication.

3️⃣ Modern-ColloquialHinglish, casual, and direct (e.g., “Invoice total”, “due है”, “contact करो”). Best for chatbots, social media, and casual conversations.

📌 Rule of Thumb:

  • Use Formal for official content 🏛
  • Use Classic-Colloquial for general communication 💬
  • Use Modern-Colloquial for everyday conversations 🚀
1# To highlight the difference between the models lets use the below example.
2full_text = (
3 "The invoice total is $3,450.75, due by 15th March 2025. Contact us at support@example.com for queries. "
4 "Order #987654321 was placed on 02/29/2024. Your tracking ID is TRK12345678."
5)

Formal

Supported by both Mayura:v1 and Sarvam-Translate:v1.

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="formal",
7 model="sarvam-translate:v1",
8 enable_preprocessing=False,
9)
10translated_text = response.translated_text
11print("\n=== Translated Text ===\n", translated_text)

Classic Colloquial

Supported only by Mayura:v1 model.

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="classic-colloquial",
7 model="mayura:v1",
8 enable_preprocessing=False,
9)
10translated_text = response.translated_text
11print("\n=== Translated Text ===\n", translated_text)

Modern Colloquial

Supported only by Mayura:v1 model.

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="modern-colloquial",
7 model="mayura:v1",
8 enable_preprocessing=False,
9)
10translated_text = response.translated_text
11print("\n=== Translated Text ===\n", translated_text)

5.2: Speaker Gender

The translation model supports Male and Female speaker options, which impact the tone and style of the output.

1️⃣ Male Voice 🔵

2️⃣ Female Voice 🔴

Female

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Female",
6 mode="formal",
7 model="sarvam-translate:v1",
8 enable_preprocessing=False,
9)
10translated_text = response.translated_text
11print("\n=== Translated Text ===\n", translated_text)

Male

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="formal",
7 model="sarvam-translate:v1",
8 enable_preprocessing=False,
9)
10translated_text = response.translated_text
11print("\n=== Translated Text ===\n", translated_text)

5.3: Numerals Format Feature

The numerals_format parameter controls how numbers appear in the translation. It has two options:

1️⃣ International (Default) 🌍 → Uses standard 0-9 numerals.
✅ Example: “मेरा phone number है: 9840950950.”
✅ Best for universally understood content, technical documents, and modern usage.

2️⃣ Native 🔡 → Uses language-specific numerals.
✅ Example: “मेरा phone number है: ९८४०९५०९५०.”
✅ Ideal for traditional texts, cultural adaptation, and regional content.

📌 When to Use What?

  • Use International for wider readability and digital content 📱
  • Use Native for localized, heritage-focused, and print media content 📖

Native

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="formal",
7 model="sarvam-translate:v1",
8 enable_preprocessing=False,
9 numerals_format="native",
10)
11translated_text = response.translated_text
12print("\n=== Translated Text ===\n", translated_text)

International

1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="formal",
7 model="sarvam-translate:v1",
8 enable_preprocessing=False,
9 numerals_format="international",
10)
11translated_text = response.translated_text
12print("\n=== Translated Text ===\n", translated_text)

5.4: Numerals Format Feature

The output_script parameter controls how the translated text is transliterated, i.e., how it appears in different scripts while keeping pronunciation intact.

Transliteration Options for Mayura:

1️⃣ Default (null) – No transliteration applied.
✅ Example: “आपका Rs. 3000 का EMI pending है।”
✅ Best for modern, mixed-language content.

2️⃣ Roman – Converts the output into Romanized Hindi.
✅ Example: “aapka Rs. 3000 ka EMI pending hai.”
✅ Ideal for users who can speak but not read native scripts.

3️⃣ Fully-Native – Uses formal native script transliteration.
✅ Example: “आपका रु. 3000 का ई.एम.ऐ. पेंडिंग है।”
✅ Best for official documents and structured writing.

4️⃣ Spoken-Form-in-Native – Uses native script but mimics spoken style.
✅ Example: “आपका थ्री थाउजेंड रूपीस का ईएमअइ पेंडिंग है।”
✅ Ideal for voice assistants, conversational AI, and informal speech.

📌 When to Use What?

  • Default – For natural, mixed-language modern writing ✍️
  • Roman – For users unfamiliar with native scripts 🔤
  • Fully-Native – For formal, structured translations 🏛
  • Spoken-Form-in-Native – For casual speech and voice applications 🎙
1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="modern-colloquial",
7 model="mayura:v1",
8 enable_preprocessing=False,
9 output_script="roman",
10 numerals_format="international",
11)
12translated_text = response.translated_text
13print("\n=== Translated Text ===\n", translated_text)
1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="modern-colloquial",
7 model="mayura:v1",
8 enable_preprocessing=False,
9 output_script="spoken-form-in-native",
10 numerals_format="international",
11)
12translated_text = response.translated_text
13print("\n=== Translated Text ===\n", translated_text)
1response = client.text.translate(
2 input=full_text,
3 source_language_code="en-IN",
4 target_language_code="hi-IN",
5 speaker_gender="Male",
6 mode="modern-colloquial",
7 model="mayura:v1",
8 enable_preprocessing=False,
9 output_script="fully-native",
10 numerals_format="international",
11)
12translated_text = response.translated_text
13print("\n=== Translated Text ===\n", translated_text)

🚫 Note: For sarvam-translate:v1 - Transliteration is not supported

6. Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure and parameters.

7. Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:


8. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀