Speech-to-Text Rest API | Sarvam API Docs

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing. Features include:

Instant results
Simple integration
Support for multiple audio formats
Maximum duration: 30 seconds

Features

Processing

Instant results
Simple integration
Maximum duration: 30 seconds

Audio Support

Multiple audio formats
High accuracy transcription
Multiple Indian languages and English support

Code Examples

Saarika: Our Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Speech to Text Features

Basic Transcription

Basic Speech to Text Transcription

Convert speech to text with high accuracy. Supports multiple Indian languages and accents. Features include:

Multi-language support
Automatic language detection
High-quality noise filtering
Support for various audio formats

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.speech_to_text.transcribe(
8     file=open("audio.wav", "rb"),
9     model="saarika:v2.5",
10     language_code="gu-IN"
11 )
12 
13 print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: Our SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

Translation Features

Basic Translation

Speech to Text Translation

Translate speech from any supported Indian language directly into English. Ideal for content localization and international communication. Features include:

Support for major Indian languages
High-quality translations
Preservation of context and tone
Real-time translation capability

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_API_SUBSCRIPTION_KEY",
5 )
6 
7 response = client.speech_to_text.translate(
8     file=open("audio.wav", "rb"),
9     model="saaras:v2.5"
10 )
11 
12 print(response)

Next Steps

Get API Key

Test Integration

Try the API with sample audio files.

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.