Saarika

Saarika-v2

Overview

Saarika-v2 is our flagship speech recognition model, specifically designed for Indian languages and accents. It excels in handling complex multi-speaker conversations, telephony audio, and code-mixed speech with superior accuracy.

Key Features

Superior Telephony Performance

Optimized for 8KHz telephony audio with enhanced noise handling and superior multi-speaker recognition capabilities.

Intelligent Entity Preservation

Preserves proper nouns and entities accurately across languages, maintaining context and meaning in transcriptions.

Automatic Language Detection

Optional automatic language identification with LID output. Use “unknown” when language is not known for automatic detection.

Speaker Diarization

Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API processing.

Automatic Code Mixing

Intelligently handles mid-sentence language switches in code-mixed speech, perfect for India’s multilingual conversations.

Multi-Language Support

Comprehensive support for Indian languages with high accuracy in mixed-language environments.

Key Capabilities

Basic transcription with specified language code. Perfect for single-language content with clear audio quality.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2",
10 language_code="hi-IN"
11)