How to specify language codes

The language_code parameter tells the STT model which language to expect in the audio. Using the correct language code improves transcription accuracy.

Supported Languages (Saaras v3)

Saaras v3 supports 22 Indian languages with BCP-47 format codes:

Language	Code	Language	Code
Hindi	`hi-IN`	Assamese	`as-IN`
Bengali	`bn-IN`	Urdu	`ur-IN`
Kannada	`kn-IN`	Nepali	`ne-IN`
Malayalam	`ml-IN`	Konkani	`kok-IN`
Marathi	`mr-IN`	Kashmiri	`ks-IN`
Odia	`od-IN`	Sindhi	`sd-IN`
Punjabi	`pa-IN`	Sanskrit	`sa-IN`
Tamil	`ta-IN`	Santali	`sat-IN`
Telugu	`te-IN`	Manipuri	`mni-IN`
English	`en-IN`	Bodo	`brx-IN`
Gujarati	`gu-IN`	Maithili	`mai-IN`
		Dogri	`doi-IN`

Automatic Language Detection

To enable automatic language detection, pass unknown as the language_code parameter. The model will detect the language from the audio.

Best Practice: Always specify the language code when you know the language of the audio. This improves accuracy and reduces processing time. Use unknown only when the language is truly unknown.

Example Code

With Language Code

Auto Detection

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
4 
5 # Specify language for better accuracy
6 response = client.speech_to_text.transcribe(
7     file=open("audio.wav", "rb"),
8     model="saaras:v3",
9     language_code="ta-IN",  # Tamil
10     mode="transcribe"
11 )
12 
13 print(response.transcript)