Saaras

Saaras-v2.5 is our flagship domain-aware speech recognition model, designed for production environments requiring high accuracy and robust performance. It specializes in speech-to-text translation, converting spoken content directly into English text while preserving context and meaning.

Key Features

Domain-Aware Translation

Advanced prompting system for domain-specific translation and hotword retention, ensuring accurate context preservation.

Superior Telephony Performance

Optimized for 8KHz telephony audio with enhanced multi-speaker recognition capabilities.

Intelligent Entity Preservation

Preserves proper nouns and entities accurately across languages, maintaining context and meaning.

Automatic Language Detection

Built-in Language Identification (LID) with confidence scores for automatic language detection.

Speaker Diarization

Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API.

Direct Translation

Converts speech directly to English text, eliminating the need for separate transcription and translation steps.

Language Support

Saaras can translate speech from the following Indian languages to English:

Languages (Code):

Hindi (hi-IN), Bengali (bn-IN), Tamil (ta-IN), Telugu (te-IN), Gujarati (gu-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN)

All of the above are supported for speech-to-English translation.

Saaras automatically detects the source language and translates it to English. No need to specify the source language - the model handles language identification automatically.

Key Capabilities

Basic speech-to-text translation with automatic language detection. Perfect for converting Indian language speech directly to English text.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v2.5"
10)
11
12print(response)

Next Steps