LID API Tutorial

🔗 Overview

This notebook demonstrates how to use the Language Identification API to detect the language and script codes. We will also explore how language identification can be used in translation and transliteration tasks to automatically detect the source language and perform the appropriate transformations.

1. Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:

1!pip install -Uqq sarvamai
2from sarvamai import SarvamAI

2. Authentication

To use the API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_API_KEY” with your actual API key.
1SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

3. Basic Usage

The API requires a single input parameter:

✅ Parameter🔍 Description
inputThe text for which the language and script codes need to be detected.

⚠️ Note: If the API cannot confidently detect the language or script, it will return null for one or both fields.

Response Parameters

  • language_code (String) – The detected language in BCP-47 format. Supported values:

    • "en-IN" (English - India)
    • "en-US" (English - United States)
    • "bn-IN" (Bengali - India)
    • "gu-IN" (Gujarati - India)
    • "hi-IN" (Hindi - India)
    • "kn-IN" (Kannada - India)
    • "ml-IN" (Malayalam - India)
    • "mr-IN" (Marathi - India)
    • "od-IN" (Odia - India)
    • "pa-IN" (Punjabi - India)
    • "ta-IN" (Tamil - India)
    • "te-IN" (Telugu - India)
    • "ur-IN" (Urdu - India)
  • script_code (String) – The detected writing script in ISO-15924 format.Supported values:

    • "Latn" → Latin (Roman script)
    • "Beng" → Bengali script
    • "Gujr" → Gujarati script
    • "Deva" → Devanagari script
    • "Knda" → Kannada script
    • "Mlym" → Malayalam script
    • "Orya" → Odia script
    • "Guru" → Gurmukhi (Punjabi) script
    • "Taml" → Tamil script
    • "Telu" → Telugu script
    • "Arab" → Arabic script

4. Language Detection Usage

1.Initialize the Client

1client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

2.Define Input Text

1example_text = "hey, what is your name?"

3.Detect Language and Script

1response = client.text.identify_language(input=example_text)
2language_code = response.language_code
3script_code = response.script_code
4
5print("\n=== Detection Results ===")
6print(f"Detected Language Code: {language_code}")
7print(f"Detected Script Code: {script_code}\n")

4.Try Another Input

1example_text = "A'in jun aatinob'aal li maare ink'a' neketaw ru."
1response = client.text.identify_language(input=example_text)
2language_code = response.language_code
3script_code = response.script_code
4
5print("\n=== Detection Results ===")
6print(f"Detected Language Code: {language_code}")
7print(f"Detected Script Code: {script_code}\n")

5. Auto Detection

To enable automatic language detection, pass "auto" as the source_language_code.

The API will return the transliterated/translated text along with the detected source language code.

🚫 Note: In case of detection failure, manually specify the source_language_code with one of the supported language codes.

If the API is unable to detect the language, the response will include an error message:

1{
2 "error": {
3 "message": "Unable to detect the language of the input text. Please explicitly pass the `source_language_code` parameter with a supported language.",
4 "code": "unprocessable_entity_error"
5 }
6}

Auto Detection in Transliterate

1response = client.text.transliterate(
2 input="मुझे कल 9:30am को appointment है",
3 source_language_code="auto",
4 target_language_code="hi-IN",
5 spoken_form=True,
6)
7transliterated_text = response.transliterated_text
8source_language_code = response.source_language_code
9
10print(f"✅ Transliteration Successful!\n🔤 Transliterated Text: {transliterated_text}")
11print(f"🌍 Detected Source Language: {source_language_code}")
1response = client.text.transliterate(
2 input="'আমার কাল সকাল ৭টায় ডাক্তার এর অ্যাপয়েন্টমেন্ট আছে",
3 source_language_code="auto",
4 target_language_code="en-IN",
5 spoken_form=True,
6)
7transliterated_text = response.transliterated_text
8source_language_code = response.source_language_code
9
10print(f"✅ Transliteration Successful!\n🔤 Transliterated Text: {transliterated_text}")
11print(f"🌍 Detected Source Language: {source_language_code}")

Auto Detection in Translate

1response = client.text.translate(
2 source_language_code="auto",
3 target_language_code="bn-IN",
4 speaker_gender="Male",
5 mode="classic-colloquial",
6 model="mayura:v1",
7 input="मुझे कल 9:30am को appointment है",
8)
9
10translated_text = response.translated_text
11source_language_code = response.source_language_code
12
13
14print(
15 f"✅ Translation Successful!\n🌍 Detected Source Language: {source_language_code}"
16)
17print(f"📝 Translated Text: {translated_text}")

6. Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure, and parameters.
  • 422 Unprocessable Entity Request (unprocessable_entity_error)

    • Cause: Unable to detect the language of the input text.
    • Solution: Explicitly pass the source_language_code parameter with a supported language.

7. Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:

8. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀