Change Log
June 2025
Translation
Sarvam-Translate Launched: Released sarvam-translate:v1
, an open-weights model supporting 22 Indian languages.
Text Quickstart Updated: Added sarvam-translate
usage examples to the Python SDK Quickstart and Playground.
Speech to Text (STT)
Real-Time STT via WebSocket: Added WebSocket support for live transcription with ultra-low latency in both Python and JavaScript SDKs.
Audio Format Support Expanded: STT now accepts mp3
, wav
, aac
, aiff
, ogg/opus
, flac
, mp4/m4a
, and amr
input formats.
Batch STT (Alpha): Introduced alpha support for batch transcription in Python SDK. Install via pip install sarvamai==0.1.11a2
.
Saaras & Saarika WebSocket Support: Both saaras
and saarika
models now support real-time streaming via WebSocket.
Text to Speech (TTS)
Real-Time TTS Streaming: Generate speech on the fly using WebSocket streaming. Available in Python and JavaScript SDKs.
Audio Format Support Expanded: TTS now outputs in mp3
, linear16
, mulaw
, alaw
, opus
, flac
, aac
, and wav
.
SDKs
Python SDK v4.23.2: Includes real-time streaming support, batch transcription (alpha), new translation APIs, and model updates.
JavaScript SDK Updated: Added real-time STT & TTS WebSocket support and updated documentation with streaming quickstarts.
Models
New Model Added: Introduced sarvam-m
to the model family.
Model Updates: saarika
and saaras
updated to v2.5
with character limit improvements and stability enhancements.
Documentation
Cookbooks Expanded: Added examples for sarvam-translate
, updated LID and chat completion cookbooks, updated with the SDK package and refreshed starter notebooks.
AI-Powered Docs Assistant: Added an AI assistant to the documentation search bar for instant Q&A and developer support.
Dashboard
Usage Analytics Dashboard: Released real-time API usage and credit tracking at dashboard.sarvam.ai/usage.
May 2025
Models
Sarvam-M Released
Introduced sarvam-m
, a 24B open-weights hybrid model based on Mistral Small.
- Available now via API Playground and detailed in the technical blog.
- Benchmarks added to the official documentation.
Saarika v2.5 Released
Improved voice quality and naturalness in saarika:v2.5
, now available for use in the STT API.
TTS WebSocket (Beta)
Real-time streaming TTS now supported via WebSocket for beta users. Contact support to request access.
API & Dashboard
Upgraded Developer Dashboard
- Unified dashboard with no-code playground and API key management in one place.
- Easily test endpoints like LLM, TTS, STT, and Translate without writing code.
- Prebuilt examples include Resume Translate and Hinglish Code Debug.
API Playground Enhancements
- Added full support for live testing of Sarvam-M, Sarvam-Translate, and TTS/STT APIs.
- Playground features instant feedback and parameter tuning without leaving the dashboard.
SDKs
Official SDKs Released
- Python:
pip install sarvamai
- JavaScript:
npm install sarvamai
- Abstracts away HTTP and response parsing with clean, unified methods across APIs.
- Reduces integration time from hours to minutes.
Documentation
Revamped Developer Documentation
- New cookbooks added with real-world SDK use cases including chat completion, translation, and speech.
- Improved API navigation and content structure for faster discovery.
- Code snippets updated for clarity and consistency with latest SDKs.
Navigation Updates
- Exposed core API endpoints directly in sidebar navigation.
- Streamlined structure to help developers reach reference and guides faster.
April 2025
Text to Speech (TTS)
Bulbul v2 Released
Introduced bulbul:v2
, the latest version of our Indian Text-to-Speech model.
- More natural and expressive voice output with better emotional tone
- Enhanced preprocessing for improved handling of mixed-language inputs
Bulbul v1 Deprecation Notice
bulbul:v1
will be officially deprecated on April 30, 2025. Users should migrate to bulbul:v2
to ensure uninterrupted service.
24kHz Audio Support
Speech generation now supports 24kHz
sample rate for higher quality output.
Speech to Text (STT)
Batch ASR API Released
New Batch ASR API allows uploading up to 20 audio files (up to 60 minutes each).
- Ideal for calls, meetings, and long-form media
- Available in both
speech-to-text
andspeech-to-text-translate
endpoints
Real-Time API Improvements
Improved transcription speed for real-time requests.
- 3× faster processing for up to 30s audio snippets
- Optimized for use cases like voice bots and instant assistants
Streaming WebSocket (Beta)
WebSocket-based real-time transcription now in beta. Early access available via the Sarvam Discord community.
Text
Language Detection Enhancements
Added source_language_code
to API responses.
- Automatically detects input language
- Improves performance on multilingual and code-switched text
Indic Translation Upgrades
Improved support for two-way translation between Indic and English.
- Supports colloquial, modern, classical, and formal registers
- Ideal for education, localization, and content creation platforms
New APIs Introduced
-
Transliterate API
Converts text between writing systems while preserving pronunciation -
Language Identification (LID) API
Detects both language and script from input
Documentation
Cookbook Updated
- Added new real-world examples for Batch ASR and Bulbul v2 usage
- Included advanced translation flows (e.g., tone-aware translation)
- Refreshed LID and transliteration notebooks
- Integrated with latest SDK structure
Cleanup and Refactoring
Removed deprecated analytics
and parse
APIs from SDK for better maintainability.
March 2025
Text
Translation & Transliteration Schema Update
Translation and Transliteration APIs will now automatically detect the input language.
- Responses now include
source_language_code
- Improves handling of multilingual and code-switched inputs
- Enables simplified workflows with less manual language tagging
SarvamParse
SarvamParse “Small” Mode Introduced
A lightweight variant of SarvamParse is now available.
- Lower cost
- Faster response time
- Ideal for real-time or cost-sensitive applications
Language Identification
LID API Released
The new Language Identification (LID) endpoint detects both language and script from raw input text.
- Supports multiple Indian and international languages
- Detects script types like Latin, Devanagari, Kannada, and more
- Full API Reference: LID API Docs
- Cookbook: Language Identification Notebook
February 2025
APIs
Sarvam Parse API Released
A new API that transforms PDF documents into structured data.
- Accepts PDF input and returns base64-encoded XML
- Useful for information extraction, content indexing, and document analysis
- API Reference: Sarvam Parse Docs
- Cookbook: Parse PDF Notebook
Doc Translate API Released
Translate full PDF documents and receive structured translated output.
- Returns translated content as base64-encoded XML
- Ideal for cross-lingual access to documents in enterprise, government, or education
- API Reference: Doc Translate Docs
- Cookbook: Doc Translate Notebook
Speech to Text (STT)
Max Duration Limit Update
To improve responsiveness and lower latency for all users, the maximum duration per STT request has been updated.
- New limit: 30 seconds per request (previously 8 minutes)
- Applies to both
saaras
andsaarika
models - For longer audio requirements, contact the team to explore tailored solutions
January 2025
Developer Experience
Meta-Prompt Introduced
Launched a universal meta-prompt to help guide any AI chat model in using Sarvam’s APIs effectively.
- Offers structured API context for accurate prompt engineering
- Compatible with models like Gemini, GPT, Claude, etc.
- API Reference: Meta-Prompt Docs
- Example: Meta-Prompt in Action
Sarvam AI Cookbook Launched
Released the official Sarvam AI Cookbook, an open-source repository with practical code examples and notebooks.
- Covers use cases for STT, TTS, Translation, Parse, and more
- Includes best practices, integration tips, and tutorials
- Repository: Sarvam AI Cookbook on GitHub