Credits & Rate Limits

Credits

Sarvam offers ₹1,000 worth of free credits for every user on signup. These credits can be used across any of our APIs — explore, prototype, and build without upfront cost.

Credits are universal and never expire. Once exhausted, add more credits or upgrade your plan from the Sarvam Dashboard.


How Rate Limits Work

Rate limits restrict the number of API requests your account can make within a given time window. Key points:

  • Per-account enforcement — limits apply to your account as a whole, not individual API keys. All keys share the same rate limit pool.
  • Continuous replenishment — capacity refills steadily over the window period rather than resetting all at once (token bucket model). Short bursts may still trigger limits.
  • Per-API granularity — each API has its own independent concurrency limits. WebSocket, Vision, and LLM APIs have different limits from standard REST APIs — check your specific API below.

Concurrency Modes

Each API enforces limits across three concurrency modes:

ModeWhat it means
ProvisionedThe number of requests you can run simultaneously, guaranteed. This capacity is always available to you regardless of platform load.
BurstThe peak number of simultaneous requests during a short traffic spike. Temporary extra capacity for sudden surges.
High ThroughputThe number of simultaneous requests you can sustain when the system is under heavy overall load. During peak platform traffic, your capacity may scale down to this level.

Per-API Rate Limits by Plan

Rate limits vary significantly by API type and plan. Review the limits for each API below before building your integration.

Speech to Text

Real-time REST (stt-rt)

StarterProBusiness
Provisioned60 req/min100 req/min4,000 req/min
Burst100 req/min200 req/min5,000 req/min
High Throughput5 req/min60 req/min1,000 req/min

WebSocket Streaming (stt-ws)

StarterProBusiness
Provisioned20 concurrent100 concurrent100 concurrent
Burst40 concurrent150 concurrent150 concurrent
High Throughput5 concurrent60 concurrent100 concurrent

Batch (stt-batch)

StarterProBusiness
Provisioned20 req/min100 req/min500 req/min
Burst50 req/min200 req/min1,000 req/min
High Throughput5 req/min60 req/min300 req/min

For batch endpoints, implement a minimum 5ms delay between consecutive status polling requests to avoid hitting rate limits unnecessarily.


Text to Speech

Real-time REST (tts-rt)

StarterProBusiness
Provisioned60 req/min200 req/min1,000 req/min
Burst100 req/min300 req/min1,200 req/min
High Throughput5 req/min60 req/min800 req/min

For bulbul:v3 model specifically, Starter provisioned limit is 30 req/min (burst: 50). Pro and Business limits are the same as the default above.

WebSocket Streaming (tts-ws)

StarterProBusiness
Provisioned60 concurrent200 concurrent1,000 concurrent
Burst100 concurrent300 concurrent1,200 concurrent
High Throughput5 concurrent60 concurrent800 concurrent

For bulbul:v3 model specifically, Starter provisioned limit is 30 concurrent (burst: 50). Pro and Business limits are the same as the default above.


Translation & Text Services

Translate (ms-ts)

StarterProBusiness
Provisioned60 req/min200 req/min1,000 req/min
Burst100 req/min300 req/min2,000 req/min
High Throughput5 req/min60 req/min1,000 req/min

Chat Completion (LLM)

Default models (ms-llm)

StarterProBusiness
Provisioned60 req/min200 req/min1,000 req/min
Burst100 req/min300 req/min2,000 req/min
High Throughput5 req/min60 req/min1,000 req/min

Sarvam-30B & Sarvam-105B models

These large models have lower limits due to their compute requirements.

StarterProBusiness
Provisioned40 req/min60 req/min120 req/min
Burst60 req/min80 req/min200 req/min
High Throughput5 req/min20 req/min50 req/min

Applies to: sarvam-30b, sarvam-30b-16k, sarvam-105b, sarvam-105b-32k


Vision

Vision API limits are uniform across all plans (Starter, Pro, and Business). Upgrading your plan does not increase Vision limits.

Document Intelligence (vis-doc-dig)

StarterProBusiness
Provisioned10 req/min10 req/min10 req/min
Burst20 req/min20 req/min20 req/min
High Throughput5 req/min5 req/min5 req/min

Vision Real-time (vis-rt)

StarterProBusiness
Provisioned30 req/min30 req/min30 req/min
Burst50 req/min50 req/min50 req/min
High Throughput15 req/min15 req/min15 req/min

Plan Overview

StarterProBusinessEnterprise
PricePay as you go₹10,000₹50,000Custom
Bonus Credits₹1,000₹7,500Custom
SupportCommunityEmailEmailDedicated
Best ForPrototyping & testingStartups & POCsProduction workloadsScale deployments

Concurrency limits are measured per account, not per API key. All keys under an account share the same limit pool. Your current limits are visible on the Dashboard → Rate Limits page.


Upgrading Your Limits

1

Check your current limits

Visit the Dashboard → Rate Limits to see your exact per-API limits.

2

Upgrade your plan

Purchase a higher plan directly from the dashboard. Your rate limits increase immediately after upgrade — no downtime.

3

Need custom limits?

For limits beyond Business tier, contact our team for an Enterprise arrangement.


Managing Your Credits

If your credits are exhausted, API requests will return errors. You can add credits at any time — adding credits does not change your plan or rate limits.

  1. Add Credits — Top up from the Billing page at any time. Credits never expire.
  2. Upgrade Your Plan — Higher plans include bonus credits and increased rate limits.
  3. Enterprise — For volume discounts and custom billing arrangements, email developer@sarvam.ai.