For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
CommunityAPI StatusAPI PricingSign Up
DocumentationAPI ReferencesCookbookIntegration
DocumentationAPI ReferencesCookbookIntegration
  • Getting Started
    • Welcome
    • Quickstart
    • SDKs & Libraries
    • Models
    • Credits & Rate Limits
    • Talk to us
    • Pricing
    • Changelog
  • API Guides & Tutorials
LogoLogo
CommunityAPI StatusAPI PricingSign Up
On this page
  • Credits
  • How Rate Limits Work
  • Per-API Rate Limits by Plan
  • Speech to Text
  • Real-time REST (stt-rt)
  • WebSocket Streaming (stt-ws)
  • Batch (stt-batch)
  • Text to Speech
  • Real-time REST (tts-rt)
  • WebSocket Streaming (tts-ws)
  • Translation & Text Services
  • Translate (ms-ts)
  • Chat Completion (LLM)
  • Default models (ms-llm)
  • Sarvam-30B & Sarvam-105B models
  • Vision
  • Document Intelligence (vis-doc-dig)
  • Vision Real-time (vis-rt)
  • Plan Overview
  • Upgrading Your Limits
  • Managing Your Credits
Getting Started

Credits & Rate Limits

||View as Markdown|
Was this page helpful?
Previous

Talk to us

Next
Built with

Credits

Sarvam offers ₹100 worth of free credits for every user on signup. These credits can be used across any of our APIs — explore, prototype, and build without upfront cost.

Credits are universal and never expire. Once exhausted, add more credits or upgrade your plan from the Sarvam Dashboard.


How Rate Limits Work

Rate limits restrict the number of API requests your account can make within a given time window. Key points:

  • Per-account enforcement — limits apply to your account as a whole, not individual API keys. All keys share the same rate limit pool.
  • Continuous replenishment — capacity refills steadily over the window period rather than resetting all at once (token bucket model).
  • Per-API granularity — each API has its own independent rate limits. WebSocket, Vision, and LLM APIs have different limits from standard REST APIs — check your specific API below.

Per-API Rate Limits by Plan

Rate limits vary significantly by API type and plan. Review the limits for each API below before building your integration.

Speech to Text

Real-time REST (stt-rt)

StarterProBusiness
Rate Limit60 req/min100 req/min4,000 req/min

WebSocket Streaming (stt-ws)

StarterProBusiness
Rate Limit20 concurrent100 concurrent100 concurrent

Batch (stt-batch)

StarterProBusiness
Rate Limit20 req/min100 req/min500 req/min

For batch endpoints, implement a minimum 5ms delay between consecutive status polling requests to avoid hitting rate limits unnecessarily.


Text to Speech

Real-time REST (tts-rt)

StarterProBusiness
Rate Limit60 req/min200 req/min1,000 req/min

For bulbul:v3 model specifically, Starter rate limit is 30 req/min. Pro and Business limits are the same as the default above.

WebSocket Streaming (tts-ws)

StarterProBusiness
Rate Limit60 concurrent200 concurrent1,000 concurrent

For bulbul:v3 model specifically, Starter rate limit is 30 concurrent. Pro and Business limits are the same as the default above.


Translation & Text Services

Translate (ms-ts)

StarterProBusiness
Rate Limit60 req/min200 req/min1,000 req/min

Chat Completion (LLM)

Default models (ms-llm)

StarterProBusiness
Rate Limit60 req/min200 req/min1,000 req/min

Sarvam-30B & Sarvam-105B models

These large models have lower limits due to their compute requirements.

StarterProBusiness
Rate Limit40 req/min60 req/min120 req/min

Applies to: sarvam-30b, sarvam-105b


Vision

Vision API limits are uniform across all plans (Starter, Pro, and Business). Upgrading your plan does not increase Vision limits.

Document Intelligence (vis-doc-dig)

StarterProBusiness
Rate Limit10 req/min10 req/min10 req/min

Vision Real-time (vis-rt)

StarterProBusiness
Rate Limit30 req/min30 req/min30 req/min

Plan Overview

StarterProBusinessEnterprise
PricePay as you go₹10,000₹50,000Custom
Bonus Credits—₹100₹7,500Custom
SupportCommunityEmailEmailDedicated
Best ForPrototyping & testingStartups & POCsProduction workloadsScale deployments

Rate limits are measured per account, not per API key. All keys under an account share the same limit pool. Your current limits are visible on the Dashboard → Rate Limits page.


Upgrading Your Limits

1

Check your current limits

Visit the Dashboard → Rate Limits to see your exact per-API limits.

2

Upgrade your plan

Purchase a higher plan directly from the dashboard. Your rate limits increase immediately after upgrade — no downtime.

3

Need custom limits?

For limits beyond Business tier, contact our team for an Enterprise arrangement.

Upgrade Plan

View plans and upgrade directly from the dashboard. Rate limits update instantly.

Enterprise & Custom Limits

Need higher rate limits, dedicated infrastructure, or custom SLAs? Talk to our team.


Managing Your Credits

If your credits are exhausted, API requests will return errors. You can add credits at any time — adding credits does not change your plan or rate limits.

  1. Add Credits — Top up from the Billing page at any time. Credits never expire.
  2. Upgrade Your Plan — Higher plans include bonus credits and increased rate limits.
  3. Enterprise — For volume discounts and custom billing arrangements, email developer@sarvam.ai.