Language Identification API Tutorial

📖 Language Identification API: A Hands-on Guide

🔗 Overview

This notebook demonstrates how to use the Language Identification API to detect the language code and script code. Also, we will see how we can use language identification in translate and transliterate to auto-detect the source code and do the respective transformations.

Table of Contents

  1. Installation
  2. Authentication
  3. Basic Usage
  4. Auto Detection
  5. Error Handling
  6. Conclusion

1️⃣ Setup & Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:

1!pip install requests
1import requests

2️⃣ Authentication

To use the API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
1SARVAM_API_KEY = "Your API Key"

3️⃣ Basic Usage

The API requires a single key parameter:

input – The text for which the language code and script code need to be detected.

🚫 Note: If the API is unable to detect the language or script, it will return null for both fields.

Response Parameters

  • language_code (String) – The detected language in BCP-47 format. Supported values:

    • "en-IN" (English - India)
    • "en-US" (English - United States)
    • "bn-IN" (Bengali - India)
    • "gu-IN" (Gujarati - India)
    • "hi-IN" (Hindi - India)
    • "kn-IN" (Kannada - India)
    • "ml-IN" (Malayalam - India)
    • "mr-IN" (Marathi - India)
    • "od-IN" (Odia - India)
    • "pa-IN" (Punjabi - India)
    • "ta-IN" (Tamil - India)
    • "te-IN" (Telugu - India)
    • "ur-IN" (Urdu - India)
  • script_code (String) – The detected writing script in ISO-15924 format. Supported values:

    • "Latn" → Latin (Roman script)
    • "Beng" → Bengali script
    • "Gujr" → Gujarati script
    • "Deva" → Devanagari script
    • "Knda" → Kannada script
    • "Mlym" → Malayalam script
    • "Orya" → Odia script
    • "Guru" → Gurmukhi (Punjabi) script
    • "Taml" → Tamil script
    • "Telu" → Telugu script
    • "Arab" → Arabic script
1import requests
2
3url = "https://api.sarvam.ai/text-lid"
4headers = {
5 "api-subscription-key": SARVAM_API_KEY,
6 "Content-Type": "application/json"
7}
1example_text="hey, what is your name?"
1payload = {
2 "input": example_text
3}
4
5# Send API request
6response = requests.post(url, json=payload, headers=headers)
7
8# Process response
9if response.status_code == 200:
10 data = response.json()
11 language_code = data.get("language_code", "Language not detected")
12 script_code = data.get("script_code", "Script not detected")
13
14 print("\n=== Detection Results ===")
15 print(f"Detected Language Code: {language_code}")
16 print(f"Detected Script Code: {script_code}\n")
17else:
18 print(f"Error: {response.status_code}, {response.text}")
1example_text="A'in jun aatinob'aal li maare ink'a' neketaw ru."
1payload = {
2 "input": example_text
3}
4
5# Send API request
6response = requests.post(url, json=payload, headers=headers)
7
8# Process response
9if response.status_code == 200:
10 data = response.json()
11 language_code = data.get("language_code", "Language not detected")
12 script_code = data.get("script_code", "Script not detected")
13
14 print("\n=== Detection Results ===")
15 print(f"Detected Language Code: {language_code}")
16 print(f"Detected Script Code: {script_code}\n")
17else:
18 print(f"Error: {response.status_code}, {response.text}")

4️⃣ Auto Detection

To enable automatic language detection, pass "auto" as the source_language_code. The API will return the transliterated/translated text along with the detected source language code.

🚫 Note: In case of detection failure, manually specify the source_language_code with one of the supported language codes.

If the API is unable to detect the language, the response will include an error message:

1{
2 "error": {
3 "message": "Unable to detect the language of the input text. Please explicitly pass the `source_language_code` parameter with a supported language.",
4 "code": "unprocessable_entity_error"
5 }
6}

Auto Detection in Transliterate

1import requests
2
3# Define API request details
4url = "https://api.sarvam.ai/transliterate"
5headers = {
6 "api-subscription-key": SARVAM_API_KEY,
7 "Content-Type": "application/json"
8}
1payload = {
2 "input": "मुझे कल 9:30am को appointment है",
3 "source_language_code": "auto",
4 "target_language_code": "hi-IN",
5 "spoken_form": True,
6}
7
8# Send the request
9response = requests.post(url, json=payload, headers=headers)
10
11if response.status_code == 200:
12 response_data = response.json()
13 transliterated_text = response_data.get("transliterated_text", "Translation not available")
14 source_language_code = response_data.get("source_language_code")
15
16 print(f"✅ Transliteration Successful!\n🔤 Transliterated Text: {transliterated_text}")
17 print(f"🌍 Detected Source Language: {source_language_code}")
18else:
19 print(f"❌ Error {response.status_code}: {response.text}")
1payload = {
2 "input": "A'in jun aatinob'aal li maare ink'a' neketaw ru.",
3 "source_language_code": "auto",
4 "target_language_code": "hi-IN",
5 "spoken_form": True,
6}
7
8# Send the request
9response = requests.post(url, json=payload, headers=headers)
10
11if response.status_code == 200:
12 response_data = response.json()
13 transliterated_text = response_data.get("transliterated_text", "Translation not available")
14 source_language_code = response_data.get("source_language_code")
15
16 print(f"✅ Transliteration Successful!\n🔤 Transliterated Text: {transliterated_text}")
17 print(f"🌍 Detected Source Language: {source_language_code}")
18else:
19 print(f"❌ Error {response.status_code}: {response.text}")

Auto Detection in Translate

1import requests
2
3# Define API request details
4url = "https://api.sarvam.ai/translate"
5headers = {
6 "api-subscription-key": SARVAM_API_KEY,
7 "Content-Type": "application/json"
8}
1import requests
2
3payload = {
4 "source_language_code": "auto",
5 "target_language_code": "bn-IN",
6 "speaker_gender": "Male",
7 "mode": "classic-colloquial",
8 "model": "mayura:v1",
9 "input": "मुझे कल 9:30am को appointment है"
10}
11
12# Send the request
13response = requests.post(url, json=payload, headers=headers)
14
15if response.status_code == 200:
16 response_data = response.json()
17 translated_text = response_data.get("translated_text", "Translation not available")
18 source_language_code = response_data.get("source_language_code", "Unknown")
19
20 print(f"✅ Translation Successful!\n🌍 Detected Source Language: {source_language_code}")
21 print(f"📝 Translated Text: {translated_text}")
22else:
23 print(f"❌ Error {response.status_code}: {response.text}")

5️⃣ Error Handling

You may encounter these errors while using the API:

  • 403 Forbidden (invalid_api_key_error)

  • 429 Too Many Requests (insufficient_quota_error)

    • Cause: Exceeded API quota.
    • Solution: Check your usage, upgrade if needed, or implement exponential backoff when retrying.
  • 500 Internal Server Error (internal_server_error)

    • Cause: Issue on our servers.
    • Solution: Try again later. If persistent, contact support.
  • 400 Bad Request (invalid_request_error)

    • Cause: Incorrect request formatting.
    • Solution: Verify your request structure and parameters.
  • 422 Unprocessable Entity Request (unprocessable_entity_error)

    • Cause: Unable to detect the language of the input text.
    • Solution: Explicitly pass the source_language_code parameter with a supported language.

6️⃣ Conclusion

For more details, refer to our official documentation and we are always there to support and help you on our Discord Server:

Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀