Speech-to-Text Rest API
Synchronous Processing
Process short audio files with immediate response. Best for quick transcriptions and testing. Features include:
- Instant results
- Simple integration
- Support for multiple audio formats
- Maximum duration: 30 seconds
Features
- Instant results
- Simple integration
- Maximum duration: 30 seconds
- Multiple audio formats
- High accuracy transcription
- Multiple Indian languages and English support
Code Examples
Saarika: Our Speech to Text Transcription Model
Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.
Speech to Text Features
Basic Transcription
Basic Speech to Text Transcription
Convert speech to text with high accuracy. Supports multiple Indian languages and accents. Features include:
- Multi-language support
- Automatic language detection
- High-quality noise filtering
- Support for various audio formats
The input_audio_codec
is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.
Python
JavaScript
cURL
Check out our detailed API Reference to explore Speech To Text Transcription and all available options.
Saaras Model: Our SOTA Speech to Text Translation Model
Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.
Translation Features
Basic Translation
Speech to Text Translation
Translate speech from any supported Indian language directly into English. Ideal for content localization and international communication. Features include:
- Support for major Indian languages
- High-quality translations
- Preservation of context and tone
- Real-time translation capability
The input_audio_codec
is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.
Python
JavaScript
cURL
Next Steps
Need help? Contact us on discord for guidance.