Speech-to-Text Rest API

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saarika: Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Automatic Language Detection: Set language_code to "unknown" to enable automatic language detection. The API will identify the spoken language and return the transcript along with the detected language code.

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

Code Examples for Speech to Text Transcription

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="gu-IN" # Or use "unknown" for automatic language detection
11)
12
13print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

Code Examples for Speech to Text Translation

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY",
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v2.5"
10)
11
12print(response)

Check out our detailed API Reference to explore Speech To Text Translation and all available options.

API Response Format

Speech to Text Response

request_id
string

Unique identifier for the request

transcript
stringRequired

The transcribed text from the provided audio file

Example: "नमस्ते, आप कैसे हैं?"

language_code
string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code. Returns null if no language is detected.

Example: "hi-IN"

Speech to Text Translate Response

request_id
string

Unique identifier for the request

transcript
stringRequired

Translated transcript of the provided speech in English

language_code
string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code.

Supported Languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

Next Steps

1

Get API Key

Sign up and get your API key from the dashboard.

2

Test Integration

Try the API with sample audio files.
3

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.