FAQs
Frequently Asked Questions
Find answers to common questions about our speech-to-text services
General Questions
What audio formats are supported?
REST and Batch APIs support a wide range of audio formats including:
- WAV
- MP3
- M4A
- AAC
- OGG
- FLAC
- WebM
- PCM (pcm_s16le, pcm_l16, pcm_raw)
WebSocket/Streaming APIs only support:
- WAV
- Raw PCM (pcm_s16le, pcm_l16, pcm_raw)
For optimal results, we recommend:
- Sample rate: 16kHz or higher
- Bit depth: 16-bit
- Channels: Mono or Stereo
What languages are supported?
Our models support multiple Indian and global languages:
Indian Languages
- Hindi
- English (Indian)
- Bengali
- Tamil
- Telugu
- Kannada
- Malayalam
- Marathi
- Gujarati
- Punjabi
Global Languages
- English (US, UK, AU)
- French
- German
- Spanish
- Japanese
Check our models page for the complete list and specific model capabilities.
What is the maximum file size and duration?
The limits vary by API endpoint:
REST API
- Maximum file size: 1GB
- Maximum duration: 4 hours
WebSocket API (Streaming)
- No file size limit
- Maximum continuous stream duration: 8 hours
For longer audio files, we recommend:
- Splitting into smaller segments
- Using batch processing
- Contacting support for custom solutions
How accurate is the transcription?
Accuracy varies based on several factors:
Typical Accuracy Rates
- Clear speech, minimal background noise: 95-98%
- Multiple speakers, moderate noise: 90-95%
- Heavy accent or background noise: 85-90%
Factors affecting accuracy:
- Audio quality
- Background noise
- Speaker accent
- Speaking speed
- Domain-specific terminology
Use our playground to test with your specific audio.
Technical Questions
How does speaker diarization work?
Speaker diarization identifies and labels different speakers in the audio:
-
Process:
- Voice activity detection
- Speaker segmentation
- Speaker clustering
- Speaker labeling
-
Usage (via Batch API):
- Output:
What are the rate limits?
Rate limits are applied per account based on your subscription plan:
File Size Limits
- REST API: Max 1GB file, up to 4 hours duration
- Streaming API: No file size limit, up to 8 hours continuous
For batch endpoints, implement a minimum 5ms delay between status polling requests.
View the full Credits & Rate Limits page for details on HTTP headers, error handling, and upgrade paths.
How do I handle errors?
Common errors and solutions:
1. Authentication Errors (401)
Solution: Check API key validity and proper configuration
2. Rate Limit Errors (429)
Solution: Implement exponential backoff or upgrade plan
3. Invalid Input (400)
Solution: Check supported formats and requirements
See our error handling guide for more details.
How do I optimize for real-time transcription?
Tips for optimal real-time performance:
- Audio Settings
- Chunk Size
- Optimal: 100ms - 500ms chunks
- Balance between latency and accuracy
- WebSocket Connection
- Error Handling
View our real-time guide for detailed examples.
Billing & Support
How is usage calculated?
Usage is calculated based on:
- Audio Duration
- Rounded up to the nearest second
- Minimum charge: 1 second
- Features Used
- Base transcription
- Speaker diarization (+20%)
- Language detection (+10%)
- Word timestamps (+10%)
- Model Type
- Saarika: Base rate
- Saaras: Premium rate
Example calculation:
How do I get support?
Multiple support channels available:
- Documentation
- Community
- Direct Support
- Email: developer@sarvam.ai
- Enterprise: Dedicated support manager