Speech To Text Quickstart Guide
Sarvam AI offers two powerful speech models:
View our pricing page for detailed information about model-specific pricing and usage tiers.
Saarika: Our Speech to Text Transcription Model
Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.
Speech to Text Features
Basic Transcription
With Diarization
Basic Speech to Text Transcription
Convert speech to text with high accuracy. Supports multiple Indian languages and accents. Features include:
- Multi-language support
- Automatic language detection
- High-quality noise filtering
- Support for various audio formats
Python
JavaScript
cURL
Saaras Model: Our SOTA Speech to Text Translation Model
Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.
Translation Features
Basic Translation
With Diarization
Speech to Text Translation
Directly translate speech from one language to another. Ideal for content localization and international communication. Features include:
- Support for major Indian languages
- High-quality translations
- Preservation of context and tone
- Real-time translation capability
Python
JavaScript
cURL
For detailed API documentation and advanced usage, visit our API Reference.