Saaras
Saaras is a high accuracy real-time speech recognition service optimized for a wide variety of audio inputs. It automatically detects the input language, transcribes the speech, and translates the transcript to English. Saaras is built to make Indic languages LLM-comprehensible, offering accurate English translated transcriptions across 10 major Indian languages.
Saaras-v2.5 is our flagship domain-aware speech recognition model, designed for production environments requiring high accuracy and robust performance. It specializes in speech-to-text translation, converting spoken content directly into English text while preserving context and meaning.
Key Features
Advanced prompting system for domain-specific translation and hotword retention, ensuring accurate context preservation.
Optimized for 8KHz telephony audio with enhanced multi-speaker recognition capabilities.
Preserves proper nouns and entities accurately across languages, maintaining context and meaning.
Built-in Language Identification (LID) with confidence scores for automatic language detection.
Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API.
Converts speech directly to English text, eliminating the need for separate transcription and translation steps.
Language Support
Saaras supports 11 languages: English, Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, and Odia.
Languages (Code):
Hindi (hi-IN
), Bengali (bn-IN
), Tamil (ta-IN
), Telugu (te-IN
), Gujarati (gu-IN
), Kannada (kn-IN
), Malayalam (ml-IN
), Marathi (mr-IN
), Punjabi (pa-IN
), Odia (od-IN
), English (en-IN
)
Additional Capabilities:
- Includes dialects and accents of the above languages
- Code-mixed audio support
- Intelligent Proper Noun and Entity Preservation to ensure proper nouns, regional names, and entities are recognized and retained accurately during transcription
All of the above are supported for speech-to-English translation.
Translation Quality
COMET score, a robust metric for evaluating machine speech-translations, assesses semantic accuracy, fluency, and contextual relevance. Saaras achieves exceptional performance on the Vistaar+Indicvoices Benchmark, a dataset curated from diverse Indian language audio sources, including code-mixed content, noisy environments, and regional accents.
COMET Score Performance:
- Across 11 Languages: 89.3%
- English: 94.62%
- Hindi: 91.83%
- 9 Other languages: 88.41%
*Higher is better; Compared on VISTAAR + IndicVoices Benchmark
Why COMET? It evaluates not only lexical accuracy but also how well the translation captures meaning and context, critical for Indic languages with complex structures.
Dataset Description: Contains real-world, multi-accented speech samples that covers 10 major Indic languages, ensuring representation of India’s linguistic diversity. Includes code-mixed phrases, domain-specific vocabulary, and colloquial expressions.
Saaras automatically detects the source language and translates it to English. No need to specify the source language - the model handles language identification automatically.
Key Capabilities
Basic Usage
Code-Mixed Speech
Domain Prompting
Basic speech-to-text translation with automatic language detection. Perfect for converting Indian language speech directly to English text.