Saarika
Saarika-v2.5 is our flagship speech recognition model, specifically designed for Indian languages and accents. It always transcribes the audio in the same language it was spoken. It excels in handling complex multi-speaker conversations, telephony audio, and code-mixed speech with superior accuracy across 11 languages.
Deprecation Notice: Saarika v2.5 will be deprecated soon. For transcription features, we recommend using Saaras v3 with mode="transcribe", which offers improved accuracy and additional output modes.
Key Features
Optimized for 8KHz telephony audio with enhanced noise handling and superior multi-speaker recognition capabilities.
Preserves proper nouns and entities accurately across languages, maintaining context and meaning in transcriptions.
Optional automatic language identification with LID output. Use “unknown” when language is not known for automatic detection.
Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API processing.
Intelligently handles mid-sentence language switches in code-mixed speech, perfect for India’s multilingual conversations.
Comprehensive support for Indian languages with high accuracy in mixed-language environments.
Language Support
Saarika supports 11 languages with comprehensive dialect and accent coverage, including code-mixed audio support and intelligent proper noun preservation.
For automatic language detection, use language_code="unknown". The model will automatically identify the spoken language and return it in the response.
Performance Benchmarks
Saarika delivers exceptional accuracy across all supported languages, as measured on the VISTAAR Benchmark.
CER (Character Error Rate) Scores
Lower is better - Compared on VISTAAR Benchmark
- Across 11 Languages: 4.96%
- English: 4.45%
- Hindi: 4.42%
- 9 Other languages: 5.07%
WER (Word Error Rate) Scores
Lower is better - Compared on VISTAAR Benchmark
- Across 11 Languages: 18.32%
- English: 8.26%
- Hindi: 11.81%
- 9 Other languages: 20.15%
Detailed CER Performance by Language
CER (Character Error Rate) measures the percentage of characters that are wrong in a transcription. Lower scores are better, with 0% being perfect.
Key Capabilities
Legacy Model: Saarika v2.5 is a legacy model and code examples have been removed to avoid new integrations against it. We recommend using Saaras v3 (model="saaras:v3") with the mode parameter for the best accuracy and features. See the Saaras documentation for up-to-date code examples covering basic transcription, code-mixed speech, and automatic language detection.
Saarika supports the same core transcription capabilities now available in Saaras v3:
- Basic transcription with a specified
language_codefor single-language content. - Code-mixed speech with automatic detection of language switches within a sentence.
- Automatic language detection via
language_code="unknown".
For implementation, use Saaras v3 with mode="transcribe" — see the Saaras documentation.