Speech To Text Quickstart Guide

Sarvam AI offers two powerful speech models:

View our pricing page for detailed information about model-specific pricing and usage tiers.

Supported Audio Formats & MIME Types

For STT & STTT APIs - supports 10+ major audio formats and MIME variants.

Below is a complete list of supported formats and MIME types:

Format GroupSupported MIME Types
MP3 Variantsaudio/mpeg, audio/mp3, audio/mpeg3, audio/x-mpeg-3, audio/x-mp3
WAV Variantsaudio/wav, audio/x-wav, audio/wave
AAC Variantsaudio/aac, audio/x-aac
AIFF Variantsaudio/aiff, audio/x-aiff
OGG / Opus Formatsaudio/ogg, audio/opus
FLAC Variants (Lossless)audio/flac, audio/x-flac
MP4 / M4A Audioaudio/mp4, audio/x-m4a
AMR (Narrowband)audio/amr
WMA (Windows Media Audio)audio/x-ms-wma
WEBM (Audio & Video)audio/webm, video/webm

Saarika: Our Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Speech to Text Features

Basic Speech to Text Transcription

Convert speech to text with high accuracy. Supports multiple Indian languages and accents. Features include:

  • Multi-language support
  • Automatic language detection
  • High-quality noise filtering
  • Support for various audio formats
1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="gu-IN"
11)
12
13print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: Our SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

Translation Features

Speech to Text Translation

Translate speech from any supported Indian language directly into English. Ideal for content localization and international communication. Features include:

  • Support for major Indian languages
  • High-quality translations
  • Preservation of context and tone
  • Real-time translation capability
1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY",
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v2.5"
10)
11
12print(response)

Check out our detailed API Reference to explore Speech To Text Translation and all available options.