How to specify language codes

The language_code parameter tells the STT model which language to expect in the audio. Using the correct language code improves transcription accuracy.

Supported Languages (Saaras v3)

Saaras v3 supports 22 Indian languages with BCP-47 format codes:

LanguageCodeLanguageCode
Hindihi-INAssameseas-IN
Bengalibn-INUrduur-IN
Kannadakn-INNepaline-IN
Malayalamml-INKonkanikok-IN
Marathimr-INKashmiriks-IN
Odiaod-INSindhisd-IN
Punjabipa-INSanskritsa-IN
Tamilta-INSantalisat-IN
Telugute-INManipurimni-IN
Englishen-INBodobrx-IN
Gujaratigu-INMaithilimai-IN
Dogridoi-IN

Automatic Language Detection

To enable automatic language detection, pass unknown as the language_code parameter. The model will detect the language from the audio.

Best Practice: Always specify the language code when you know the language of the audio. This improves accuracy and reduces processing time. Use unknown only when the language is truly unknown.

Example Code

1from sarvamai import SarvamAI
2
3client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4
5# Specify language for better accuracy
6response = client.speech_to_text.transcribe(
7 file=open("audio.wav", "rb"),
8 model="saaras:v3",
9 language_code="ta-IN", # Tamil
10 mode="transcribe"
11)
12
13print(response.transcript)