For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
CommunityAPI StatusAPI PricingSign Up
DocumentationAPI ReferencesCookbook
DocumentationAPI ReferencesCookbook
  • Getting Started
    • Welcome
    • Quickstart
    • SDKs & Libraries
    • Models
      • Saaras
      • Bulbul
      • Mayura
      • Sarvam Translate
      • Sarvam 30B
      • Sarvam 105B
      • Sarvam Vision
      • Sarvam M (Legacy)
      • Saarika (Legacy)
    • Credits & Rate Limits
    • Talk to us
    • Pricing
    • Changelog
  • API Guides & Tutorials
  • Integration
    • Build Voice Agent with LiveKit
    • Build Voice Agent with Pipecat
    • Build workflows with Sarvam AI in n8n
LogoLogo
CommunityAPI StatusAPI PricingSign Up
On this page
  • Key Features
  • Language Support
  • Performance Benchmarks
  • CER (Character Error Rate) Scores
  • WER (Word Error Rate) Scores
  • Detailed CER Performance by Language
  • Key Capabilities
  • Next Steps
Getting StartedModels

Saarika

Was this page helpful?
Previous

Credits & Rate Limits

Next
Built with

Saarika-v2.5 is our flagship speech recognition model, specifically designed for Indian languages and accents. It always transcribes the audio in the same language it was spoken. It excels in handling complex multi-speaker conversations, telephony audio, and code-mixed speech with superior accuracy across 11 languages.

Deprecation Notice: Saarika v2.5 will be deprecated soon. For transcription features, we recommend using Saaras v3 with mode="transcribe", which offers improved accuracy and additional output modes.

Key Features

Superior Telephony Performance

Optimized for 8KHz telephony audio with enhanced noise handling and superior multi-speaker recognition capabilities.

Intelligent Entity Preservation

Preserves proper nouns and entities accurately across languages, maintaining context and meaning in transcriptions.

Automatic Language Detection

Optional automatic language identification with LID output. Use “unknown” when language is not known for automatic detection.

Speaker Diarization

Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API processing.

Automatic Code Mixing

Intelligently handles mid-sentence language switches in code-mixed speech, perfect for India’s multilingual conversations.

Multi-Language Support

Comprehensive support for Indian languages with high accuracy in mixed-language environments.

Language Support

Saarika supports 11 languages with comprehensive dialect and accent coverage, including code-mixed audio support and intelligent proper noun preservation.

LanguageLanguage Code
Englishen-IN
Hindihi-IN
Bengalibn-IN
Tamilta-IN
Telugute-IN
Gujaratigu-IN
Kannadakn-IN
Malayalamml-IN
Marathimr-IN
Punjabipa-IN
Odiaod-IN

For automatic language detection, use language_code="unknown". The model will automatically identify the spoken language and return it in the response.

Performance Benchmarks

Saarika delivers exceptional accuracy across all supported languages, as measured on the VISTAAR Benchmark.

CER (Character Error Rate) Scores

Lower is better - Compared on VISTAAR Benchmark

  • Across 11 Languages: 4.96%
  • English: 4.45%
  • Hindi: 4.42%
  • 9 Other languages: 5.07%

WER (Word Error Rate) Scores

Lower is better - Compared on VISTAAR Benchmark

  • Across 11 Languages: 18.32%
  • English: 8.26%
  • Hindi: 11.81%
  • 9 Other languages: 20.15%

Detailed CER Performance by Language

CER (Character Error Rate) measures the percentage of characters that are wrong in a transcription. Lower scores are better, with 0% being perfect.

0123456CER (Character Error Rate)4.80Bengali4.45English5.92Gujarati4.42Hindi4.27Kannada5.05Malayalam4.58Marathi5.07Oria4.72Punjabi5.79Tamil5.47TeluguLanguages

Key Capabilities

Legacy Model: Saarika v2.5 is a legacy model. We recommend using Saaras v3 (model="saaras:v3") with the mode parameter for the best accuracy and features. See the Saaras documentation for details.

Basic Usage
Code-Mixed Speech
Automatic Language Detection

Basic transcription with specified language code. Perfect for single-language content with clear audio quality.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY"
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="hi-IN"
11)
12
13print(response)

Next Steps

Developer quickstart

Learn how to integrate the Saarika API within your application.

API Reference

Complete API documentation for speech to text endpoints.

Cookbook

Step-by-step tutorial for speech-to-text transcription.