Change Log

June 2025

Translation

Sarvam-Translate Launched: Released sarvam-translate:v1, an open-weights model supporting 22 Indian languages.

Text Quickstart Updated: Added sarvam-translate usage examples to the Python SDK Quickstart and Playground.

Speech to Text (STT)

Real-Time STT via WebSocket: Added WebSocket support for live transcription with ultra-low latency in both Python and JavaScript SDKs.

Audio Format Support Expanded: STT now accepts mp3, wav, aac, aiff, ogg/opus, flac, mp4/m4a, and amr input formats.

Batch STT (Alpha): Introduced alpha support for batch transcription in Python SDK. Install via pip install sarvamai==0.1.11a2.

Saaras & Saarika WebSocket Support: Both saaras and saarika models now support real-time streaming via WebSocket.

Text to Speech (TTS)

Real-Time TTS Streaming: Generate speech on the fly using WebSocket streaming. Available in Python and JavaScript SDKs.

Audio Format Support Expanded: TTS now outputs in mp3, linear16, mulaw, alaw, opus, flac, aac, and wav.

SDKs

Python SDK v4.23.2: Includes real-time streaming support, batch transcription (alpha), new translation APIs, and model updates.

JavaScript SDK Updated: Added real-time STT & TTS WebSocket support and updated documentation with streaming quickstarts.

Models

New Model Added: Introduced sarvam-m to the model family.

Model Updates: saarika and saaras updated to v2.5 with character limit improvements and stability enhancements.

Documentation

Cookbooks Expanded: Added examples for sarvam-translate, updated LID and chat completion cookbooks, updated with the SDK package and refreshed starter notebooks.

AI-Powered Docs Assistant: Added an AI assistant to the documentation search bar for instant Q&A and developer support.

Dashboard

Usage Analytics Dashboard: Released real-time API usage and credit tracking at dashboard.sarvam.ai/usage.


May 2025

Models

Sarvam-M Released

Introduced sarvam-m, a 24B open-weights hybrid model based on Mistral Small.

Saarika v2.5 Released

Improved voice quality and naturalness in saarika:v2.5, now available for use in the STT API.

TTS WebSocket (Beta)

Real-time streaming TTS now supported via WebSocket for beta users. Contact support to request access.

API & Dashboard

Upgraded Developer Dashboard

  • Unified dashboard with no-code playground and API key management in one place.
  • Easily test endpoints like LLM, TTS, STT, and Translate without writing code.
  • Prebuilt examples include Resume Translate and Hinglish Code Debug.

API Playground Enhancements

  • Added full support for live testing of Sarvam-M, Sarvam-Translate, and TTS/STT APIs.
  • Playground features instant feedback and parameter tuning without leaving the dashboard.

SDKs

Official SDKs Released

  • Python: pip install sarvamai
  • JavaScript: npm install sarvamai
  • Abstracts away HTTP and response parsing with clean, unified methods across APIs.
  • Reduces integration time from hours to minutes.

Documentation

Revamped Developer Documentation

  • New cookbooks added with real-world SDK use cases including chat completion, translation, and speech.
  • Improved API navigation and content structure for faster discovery.
  • Code snippets updated for clarity and consistency with latest SDKs.
  • Exposed core API endpoints directly in sidebar navigation.
  • Streamlined structure to help developers reach reference and guides faster.

April 2025

Text to Speech (TTS)

Bulbul v2 Released

Introduced bulbul:v2, the latest version of our Indian Text-to-Speech model.

  • More natural and expressive voice output with better emotional tone
  • Enhanced preprocessing for improved handling of mixed-language inputs

Bulbul v1 Deprecation Notice

bulbul:v1 will be officially deprecated on April 30, 2025. Users should migrate to bulbul:v2 to ensure uninterrupted service.

24kHz Audio Support

Speech generation now supports 24kHz sample rate for higher quality output.

Speech to Text (STT)

Batch ASR API Released

New Batch ASR API allows uploading up to 20 audio files (up to 60 minutes each).

  • Ideal for calls, meetings, and long-form media
  • Available in both speech-to-text and speech-to-text-translate endpoints

Real-Time API Improvements

Improved transcription speed for real-time requests.

  • 3× faster processing for up to 30s audio snippets
  • Optimized for use cases like voice bots and instant assistants

Streaming WebSocket (Beta)

WebSocket-based real-time transcription now in beta. Early access available via the Sarvam Discord community.

Text

Language Detection Enhancements

Added source_language_code to API responses.

  • Automatically detects input language
  • Improves performance on multilingual and code-switched text

Indic Translation Upgrades

Improved support for two-way translation between Indic and English.

  • Supports colloquial, modern, classical, and formal registers
  • Ideal for education, localization, and content creation platforms

New APIs Introduced

  • Transliterate API
    Converts text between writing systems while preserving pronunciation

  • Language Identification (LID) API
    Detects both language and script from input

Documentation

Cookbook Updated

  • Added new real-world examples for Batch ASR and Bulbul v2 usage
  • Included advanced translation flows (e.g., tone-aware translation)
  • Refreshed LID and transliteration notebooks
  • Integrated with latest SDK structure

Cleanup and Refactoring

Removed deprecated analytics and parse APIs from SDK for better maintainability.


March 2025

Text

Translation & Transliteration Schema Update

Translation and Transliteration APIs will now automatically detect the input language.

  • Responses now include source_language_code
  • Improves handling of multilingual and code-switched inputs
  • Enables simplified workflows with less manual language tagging

SarvamParse

SarvamParse “Small” Mode Introduced

A lightweight variant of SarvamParse is now available.

  • Lower cost
  • Faster response time
  • Ideal for real-time or cost-sensitive applications

Language Identification

LID API Released

The new Language Identification (LID) endpoint detects both language and script from raw input text.


February 2025

APIs

Sarvam Parse API Released

A new API that transforms PDF documents into structured data.

  • Accepts PDF input and returns base64-encoded XML
  • Useful for information extraction, content indexing, and document analysis
  • API Reference: Sarvam Parse Docs
  • Cookbook: Parse PDF Notebook

Doc Translate API Released

Translate full PDF documents and receive structured translated output.

  • Returns translated content as base64-encoded XML
  • Ideal for cross-lingual access to documents in enterprise, government, or education
  • API Reference: Doc Translate Docs
  • Cookbook: Doc Translate Notebook

Speech to Text (STT)

Max Duration Limit Update

To improve responsiveness and lower latency for all users, the maximum duration per STT request has been updated.

  • New limit: 30 seconds per request (previously 8 minutes)
  • Applies to both saaras and saarika models
  • For longer audio requirements, contact the team to explore tailored solutions

January 2025

Developer Experience

Meta-Prompt Introduced

Launched a universal meta-prompt to help guide any AI chat model in using Sarvam’s APIs effectively.

Sarvam AI Cookbook Launched

Released the official Sarvam AI Cookbook, an open-source repository with practical code examples and notebooks.

  • Covers use cases for STT, TTS, Translation, Parse, and more
  • Includes best practices, integration tips, and tutorials
  • Repository: Sarvam AI Cookbook on GitHub