Saaras
Saaras is a high accuracy real-time speech recognition service optimized for a wide variety of audio inputs. It automatically detects the input language, transcribes the speech, and translates the transcript to English. Saaras is built to make Indic languages LLM-comprehensible, offering accurate English translated transcriptions across 10 major Indian languages.
Key Features
Advanced prompting system for domain-specific translation and hotword retention, ensuring accurate context preservation.
Optimized for 8KHz telephony audio with enhanced multi-speaker recognition capabilities.
Preserves proper nouns and entities accurately across languages, maintaining context and meaning.
Built-in Language Identification (LID) for automatic language detection.
Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API.
Converts speech directly to English text, eliminating the need for separate transcription and translation steps.
Language Support
Saaras supports 11 languages with comprehensive dialect and accent coverage, including code-mixed audio support and intelligent proper noun preservation for speech-to-English translation.
Saaras automatically detects the spoken language,no language codes needed. It identifies the input language on its own and returns a clean, translated transcript in English, making your pipeline simpler and reducing friction for end users.
Additional Capabilities:
- Includes dialects and accents of the above languages
- Code-mixed audio support
- Intelligent Proper Noun and Entity Preservation to ensure proper nouns, regional names, and entities are recognized and retained accurately during transcription
Translation Quality
COMET score, a robust metric for evaluating machine speech-translations, assesses semantic accuracy, fluency, and contextual relevance. Saaras achieves exceptional performance on the Vistaar+Indicvoices Benchmark, a dataset curated from diverse Indian language audio sources, including code-mixed content, noisy environments, and regional accents.
COMET Score Performance:
- Across 11 Languages: 89.3%
- English: 94.62%
- Hindi: 91.83%
- 9 Other languages: 88.41%
*Higher is better; Compared on VISTAAR + IndicVoices Benchmark
Why COMET? It evaluates not only lexical accuracy but also how well the translation captures meaning and context, critical for Indic languages with complex structures.
Dataset Description: Contains real-world, multi-accented speech samples that covers 10 major Indic languages, ensuring representation of India’s linguistic diversity. Includes code-mixed phrases, domain-specific vocabulary, and colloquial expressions.
Saaras automatically detects the source language and translates it to English. No need to specify the source language - the model handles language identification automatically.
Key Capabilities
Basic Usage
Code-Mixed Speech
Domain Prompting
Basic speech-to-text translation with automatic language detection. Perfect for converting Indian language speech directly to English text.