Saaras
Saaras-v2.5 is our flagship domain-aware speech recognition model, designed for production environments requiring high accuracy and robust performance. It specializes in speech-to-text translation, converting spoken content directly into English text while preserving context and meaning.
Key Features
Advanced prompting system for domain-specific translation and hotword retention, ensuring accurate context preservation.
Optimized for 8KHz telephony audio with enhanced multi-speaker recognition capabilities.
Preserves proper nouns and entities accurately across languages, maintaining context and meaning.
Built-in Language Identification (LID) with confidence scores for automatic language detection.
Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API.
Converts speech directly to English text, eliminating the need for separate transcription and translation steps.
Language Support
Saaras can translate speech from the following Indian languages to English:
Languages (Code):
Hindi (hi-IN
), Bengali (bn-IN
), Tamil (ta-IN
), Telugu (te-IN
), Gujarati (gu-IN
), Kannada (kn-IN
), Malayalam (ml-IN
), Marathi (mr-IN
), Punjabi (pa-IN
), Odia (od-IN
), English (en-IN
)
All of the above are supported for speech-to-English translation.
Saaras automatically detects the source language and translates it to English. No need to specify the source language - the model handles language identification automatically.
Key Capabilities
Basic Usage
Code-Mixed Speech
Domain Prompting
Basic speech-to-text translation with automatic language detection. Perfect for converting Indian language speech directly to English text.