Speech-to-Text Rest API | Sarvam API Docs

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saarika: Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Automatic Language Detection: Set language_code to "unknown" to enable automatic language detection. The API will identify the spoken language and return the transcript along with the detected language code.

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

Code Examples for Speech to Text Transcription

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.speech_to_text.transcribe(
8     file=open("audio.wav", "rb"),
9     model="saarika:v2.5",
10     language_code="gu-IN"  # Or use "unknown" for automatic language detection
11 )
12 
13 print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

Code Examples for Speech to Text Translation

Python

JavaScript

cURL

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_API_SUBSCRIPTION_KEY",
5 )
6 
7 response = client.speech_to_text.translate(
8     file=open("audio.wav", "rb"),
9     model="saaras:v2.5"
10 )
11 
12 print(response)

Check out our detailed API Reference to explore Speech To Text Translation and all available options.

API Response Format

Speech to Text Response

Response Schema

Example Response

request_id

string

Unique identifier for the request

transcript

stringRequired

The transcribed text from the provided audio file

Example: "नमस्ते, आप कैसे हैं?"

language_code

string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code. Returns null if no language is detected.

Example: "hi-IN"

Speech to Text Translate Response

Response Schema

Example Response

request_id

string

Unique identifier for the request

transcript

stringRequired

Translated transcript of the provided speech in English

language_code

string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code.

Supported Languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

Next Steps

Get API Key

Test Integration

Try the API with sample audio files.

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.