Speech-to-Text Rest API | Sarvam API Docs

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saaras v3: State-of-the-Art Speech Recognition (Recommended)

Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.

Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more about Saaras v3.

Output Modes

Mode	Description
`transcribe` (default)	Standard transcription in the original language
`translate`	Translates speech to English
`verbatim`	Exact word-for-word transcription
`translit`	Romanization to Latin script
`codemix`	Code-mixed text output

Code Examples for Saaras v3

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 # Transcribe mode (default)
8 response = client.speech_to_text.transcribe(
9     file=open("audio.wav", "rb"),
10     model="saaras:v3",
11     mode="transcribe"  # or "translate", "verbatim", "translit", "codemix"
12 )
13 
14 print(response)

Check out our detailed API Reference to explore all available options.

Legacy Models (Deprecated Soon)

The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.

Saarika v2.5: Speech to Text Transcription

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings.

Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.speech_to_text.transcribe(
8     file=open("audio.wav", "rb"),
9     model="saaras:v3",
10     mode="transcribe",
11     language_code="hi-IN"
12 )
13 
14 print(response)

Saaras v2.5: Speech to Text Translation

Saaras v2.5 is available in the Speech-to-Text Translate endpoint for translating speech directly to English.

Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.

1 from sarvamai import SarvamAI
2 
3 client = SarvamAI(
4     api_subscription_key="YOUR_SARVAM_API_KEY",
5 )
6 
7 response = client.speech_to_text.translate(
8     file=open("audio.wav", "rb"),
9     model="saaras:v3",
10     mode="translate"
11 )
12 
13 print(response)

API Response Format

Speech to Text Transcription Response

Field	Type	Description
`request_id`	string	Unique identifier for the request
`transcript`	string	The transcribed text from the audio file
`language_code`	string	BCP-47 language code of detected language (e.g., `hi-IN`). Returns `null` if no language detected

1 {
2   "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3   "transcript": "नमस्ते, आप कैसे हैं?",
4   "language_code": "hi-IN"
5 }

Speech to Text Translation Response

Field	Type	Description
`request_id`	string	Unique identifier for the request
`transcript`	string	Translated text in English
`language_code`	string	BCP-47 code of the detected source language

Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

1 {
2   "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3   "transcript": "Hello, how are you?",
4   "language_code": "hi-IN"
5 }

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1 {
2   "error": {
3     "message": "Human-readable error description",
4     "code": "error_code_for_programmatic_handling",
5     "request_id": "unique_request_identifier"
6   }
7 }

Error Codes Reference

HTTP Status	Error Code	When This Happens	What To Do
`400`	`invalid_request_error`	Missing required parameters or malformed request	Check request format and required fields
`403`	`invalid_api_key_error`	API key is invalid, missing, or expired	Verify your API key in the dashboard
`422`	`unprocessable_entity_error`	Invalid audio format or file too large	Use supported formats: WAV, MP3, AAC, FLAC, OGG
`429`	`insufficient_quota_error`	API quota or rate limit exceeded	Wait for reset or upgrade your plan
`500`	`internal_server_error`	Unexpected server error	Retry the request; contact support if persistent
`503`	`rate_limit_exceeded_error`	Service temporarily overloaded	Retry with exponential backoff

Example Error Response

1 {
2   "error": {
3     "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
4     "code": "unprocessable_entity_error",
5     "request_id": "20241115_abc12345"
6   }
7 }

Error Handling Code Example

1 from sarvamai import SarvamAI
2 from sarvamai.core.api_error import ApiError
3 
4 client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5 
6 try:
7     response = client.speech_to_text.transcribe(
8         file=open("audio.wav", "rb"),
9         model="saaras:v3",
10         mode="transcribe"
11     )
12     print(response.transcript)
13 except ApiError as e:
14     if e.status_code == 400:
15         print(f"Bad request: {e.body}")
16     elif e.status_code == 403:
17         print("Invalid API key. Check your credentials.")
18     elif e.status_code == 429:
19         print("Rate limit exceeded. Wait and retry.")
20     elif e.status_code == 503:
21         print("Service overloaded. Retry with backoff.")
22     else:
23         print(f"Error {e.status_code}: {e.body}")

Next Steps

Get API Key

Test Integration

Try the API with sample audio files.

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.