Speech-to-Text APIs | Sarvam API Docs

Sarvam AI offers two powerful speech models:

Advanced speech recognition model with superior multi-speaker handling and automatic code-mixing support for Indian languages.

Domain-aware translation model with enhanced telephony support and intelligent entity preservation.

View our pricing page for detailed information about model-specific pricing and usage tiers.

API Types

Process short audio files synchronously with immediate response. Best for files under 1 minute.

Handle large audio files asynchronously. Ideal for long recordings.

Real-time audio streaming with instant results. Perfect for live transcription.

The STT and STTT APIs support over 10 major audio formats and MIME type variants. Supported formats and MIME types are listed below:

Language Support

API Types

Advanced Features

Select the appropriate API type based on your use case.

Deploy your integration and monitor usage in the dashboard.

Need help choosing the right API? Contact us on discord for guidance.