Speech-to-Text Rest API

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing. Features include:

  • Instant results
  • Simple integration
  • Support for multiple audio formats
  • Maximum duration: 30 seconds

Features

Processing
  • Instant results
  • Simple integration
  • Maximum duration: 30 seconds
Audio Support
  • Multiple audio formats
  • High accuracy transcription
  • Multiple Indian languages and English support

Code Examples

Saarika: Our Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Speech to Text Features

Basic Speech to Text Transcription

Convert speech to text with high accuracy. Supports multiple Indian languages and accents. Features include:

  • Multi-language support
  • Automatic language detection
  • High-quality noise filtering
  • Support for various audio formats

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="gu-IN"
11)
12
13print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: Our SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

Translation Features

Speech to Text Translation

Translate speech from any supported Indian language directly into English. Ideal for content localization and international communication. Features include:

  • Support for major Indian languages
  • High-quality translations
  • Preservation of context and tone
  • Real-time translation capability

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_API_SUBSCRIPTION_KEY",
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v2.5"
10)
11
12print(response)

API Response Format

Speech to Text Response

request_id
string

Unique identifier for the request

transcript
stringRequired

The transcribed text from the provided audio file

Example: "नमस्ते, आप कैसे हैं?"

language_code
string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code. Returns null if no language is detected.

Example: "hi-IN"

timestamps
object

Contains timestamps for the transcribed text. Only included when with_timestamps is set to true.

Properties:

  • words (array of strings): List of words in the transcript
  • start_time_seconds (array of numbers): Start times of words in seconds
  • end_time_seconds (array of numbers): End times of words in seconds

Speech to Text Translate Response

request_id
string

Unique identifier for the request

transcript
stringRequired

Translated transcript of the provided speech in English

language_code
string

The BCP-47 code of the language spoken in the input. If multiple languages are detected, returns the most predominant language code.

Supported Languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

Next Steps

1

Get API Key

Sign up and get your API key from the dashboard.

2

Test Integration

Try the API with sample audio files.
3

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.