Speech-to-Text Rest API

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.

Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more about Saaras v3.

Output Modes

ModeDescription
transcribe (default)Standard transcription in the original language
translateTranslates speech to English
verbatimExact word-for-word transcription
translitRomanization to Latin script
codemixCode-mixed text output

Code Examples for Saaras v3

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7# Transcribe mode (default)
8response = client.speech_to_text.transcribe(
9 file=open("audio.wav", "rb"),
10 model="saaras:v3",
11 mode="transcribe" # or "translate", "verbatim", "translit", "codemix"
12)
13
14print(response)

Check out our detailed API Reference to explore all available options.


Legacy Models (Deprecated Soon)

The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.

Saarika v2.5: Speech to Text Transcription

Saarika is a speech-to-text transcription model that excels in handling multi-speaker content, mixed language content, and conference recordings.

Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saaras:v3",
10 mode="transcribe",
11 language_code="hi-IN"
12)
13
14print(response)

Saaras v2.5: Speech to Text Translation

Saaras v2.5 is available in the Speech-to-Text Translate endpoint for translating speech directly to English.

Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY",
5)
6
7response = client.speech_to_text.translate(
8 file=open("audio.wav", "rb"),
9 model="saaras:v3",
10 mode="translate"
11)
12
13print(response)

API Response Format

Speech to Text Transcription Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringThe transcribed text from the audio file
language_codestringBCP-47 language code of detected language (e.g., hi-IN). Returns null if no language detected
1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "transcript": "नमस्ते, आप कैसे हैं?",
4 "language_code": "hi-IN"
5}

Speech to Text Translation Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringTranslated text in English
language_codestringBCP-47 code of the detected source language

Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

1{
2 "request_id": "20241115_12345678-1234-5678-1234-567812345678",
3 "transcript": "Hello, how are you?",
4 "language_code": "hi-IN"
5}

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

1{
2 "error": {
3 "message": "Human-readable error description",
4 "code": "error_code_for_programmatic_handling",
5 "request_id": "unique_request_identifier"
6 }
7}

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck request format and required fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorInvalid audio format or file too largeUse supported formats: WAV, MP3, AAC, FLAC, OGG
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent
503rate_limit_exceeded_errorService temporarily overloadedRetry with exponential backoff

Example Error Response

1{
2 "error": {
3 "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
4 "code": "unprocessable_entity_error",
5 "request_id": "20241115_abc12345"
6 }
7}
1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 response = client.speech_to_text.transcribe(
8 file=open("audio.wav", "rb"),
9 model="saaras:v3",
10 mode="transcribe"
11 )
12 print(response.transcript)
13except ApiError as e:
14 if e.status_code == 400:
15 print(f"Bad request: {e.body}")
16 elif e.status_code == 403:
17 print("Invalid API key. Check your credentials.")
18 elif e.status_code == 429:
19 print("Rate limit exceeded. Wait and retry.")
20 elif e.status_code == 503:
21 print("Service overloaded. Retry with backoff.")
22 else:
23 print(f"Error {e.status_code}: {e.body}")

Next Steps

1

Get API Key

Sign up and get your API key from the dashboard.

2

Test Integration

Try the API with sample audio files.
3

Go Live

Deploy your integration and monitor usage.

Need help? Contact us on discord for guidance.