Speech-to-Text Rest API

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saarika: Speech to Text Transcription Model

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings. It offers automatic code-mixing and enhanced multilingual support, making it ideal for a wide range of applications.

Automatic Language Detection: Set language_code to "unknown" to enable automatic language detection. The API will identify the spoken language and return the transcript along with the detected language code.

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

Code Examples for Speech to Text Transcription

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="gu-IN" # Or use "unknown" for automatic language detection
11)
12
13print(response)

Check out our detailed API Reference to explore Speech To Text Transcription and all available options.

Saaras Model: SOTA Speech to Text Translation Model

Saaras is a domain-aware translation model with enhanced telephony support and intelligent entity preservation. It is designed to handle complex language variations and domain-specific content, making it ideal for call center and telephony applications.

The input_audio_codec is an optional parameter. Our API automatically detects all codec formats, so you don’t necessarily need to pass this parameter. However, for PCM files specifically (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter. Note that PCM files are supported only at 16kHz sample rate.

Code Examples for Speech to Text Translation

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v2.5"
10)
11
12print(response)

Check out our detailed API Reference to explore Speech To Text Translation and all available options.

API Response Format

Speech to Text Transcription Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringThe transcribed text from the audio file
language_codestringBCP-47 language code of detected language (e.g., hi-IN). Returns null if no language detected
1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "transcript": "नमस्ते, आप कैसे हैं?",
4 "language_code": "hi-IN"
5}

Speech to Text Translation Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringTranslated text in English
language_codestringBCP-47 code of the detected source language

Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "transcript": "Hello, how are you?",
4 "language_code": "hi-IN"
5}

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1{
2 "error": {
3 "message": "Human-readable error description",
4 "code": "error_code_for_programmatic_handling",
5 "request_id": "unique_request_identifier"
6 }
7}

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck request format and required fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorInvalid audio format or file too largeUse supported formats: WAV, MP3, AAC, FLAC, OGG
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent
503rate_limit_exceeded_errorService temporarily overloadedRetry with exponential backoff

Example Error Response

1{
2 "error": {
3 "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
4 "code": "unprocessable_entity_error",
5 "request_id": "20241115_abc12345"
6 }
7}
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saarika:v2.5",
10 language_code="hi-IN"
11 )
12 print(response.transcript)
13except ApiError as e:
14 if e.status_code == 400:
15 print(f"Bad request: {e.body}")
16 elif e.status_code == 403:
17 print("Invalid API key. Check your credentials.")
18 elif e.status_code == 429:
19 print("Rate limit exceeded. Wait and retry.")
20 elif e.status_code == 503:
21 print("Service overloaded. Retry with backoff.")
22 else:
23 print(f"Error {e.status_code}: {e.body}")

Next Steps

1

Get API Key

Sign up and get your API key from the dashboard.

2

Test Integration

Try the API with sample audio files.
3

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.