Saaras | Sarvam API Docs

Saaras-v2.5 is our flagship domain-aware speech recognition model, designed for production environments requiring high accuracy and robust performance. It specializes in speech-to-text translation, converting spoken content directly into English text while preserving context and meaning.

Key Features

Domain-Aware Translation

Advanced prompting system for domain-specific translation and hotword retention, ensuring accurate context preservation.

Superior Telephony Performance

Optimized for 8KHz telephony audio with enhanced multi-speaker recognition capabilities.

Intelligent Entity Preservation

Preserves proper nouns and entities accurately across languages, maintaining context and meaning.

Automatic Language Detection

Built-in Language Identification (LID) with confidence scores for automatic language detection.

Speaker Diarization

Provides diarized outputs with precise timestamps for multi-speaker conversations through batch API.

Direct Translation

Converts speech directly to English text, eliminating the need for separate transcription and translation steps.

Language Support

Saaras can translate speech from the following Indian languages to English:

Languages (Code):

Hindi (hi-IN), Bengali (bn-IN), Tamil (ta-IN), Telugu (te-IN), Gujarati (gu-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN)

All of the above are supported for speech-to-English translation.

Saaras automatically detects the source language and translates it to English. No need to specify the source language - the model handles language identification automatically.

Key Capabilities

Basic Usage

Code-Mixed Speech

Domain Prompting

Basic speech-to-text translation with automatic language detection. Perfect for converting Indian language speech directly to English text.

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
5 )
6 
7 response = client.speech_to_text.translate(
8     file=open("audio.wav", "rb"),
9     model="saaras:v2.5"
10 )
11 
12 print(response)

Next Steps

Developer quickstart

Learn how to integrate speech to text translation into your application.

API Reference

Complete API documentation for speech to text translation endpoints.