Credits & Rate Limits

Credits

Sarvam offers ₹1,000 worth of free credits for every user on signup. These credits can be used across any of our APIs — explore, prototype, and build without upfront cost.

Credits are universal and never expire. Once exhausted, add more credits or upgrade your plan from the Sarvam Dashboard.


How Rate Limits Work

Rate limits restrict the number of API requests your account can make within a given time window. Key points:

  • Per-account enforcement — limits apply to your account as a whole, not individual API keys. All keys share the same rate limit pool.
  • Continuous replenishment — capacity refills steadily over the window period rather than resetting all at once (token bucket model). Short bursts may still trigger limits.
  • Per-API granularity — each API has its own concurrency limits across three modes (provisioned, burst, and high throughput), configured based on your plan tier.

Rate Limit Tiers

Rate limits are applied per account based on your subscription plan. Your current tier is visible on the Dashboard → Rate Limits page.

StarterProBusinessEnterprise
PricePay as you go₹10,000₹50,000Custom
IP Rate Limit10,000 req/min10,000 req/min10,000 req/minCustom
Bonus Credits₹1,000₹7,500Custom
SupportCommunityEmailSlack + Solutions EngineerDedicated
Best ForPrototyping & testingStartups & POCsProduction workloadsScale deployments

Concurrency limits are measured per account, not per API key. All keys under an account share the same limit pool. Each API has its own provisioned, burst, and high throughput limits visible on the dashboard.


Per-API Concurrency Limits

Each API has its own concurrency limits across three modes. These are configured per account and visible on the Dashboard → Limits page.

ModeWhat it means
ProvisionedThe number of requests you can run at the same time, guaranteed. This capacity is always available to you, no matter what.
BurstThe peak number of simultaneous requests you can send during a short traffic spike. Think of it as temporary extra capacity for sudden surges.
High ThroughputThe number of simultaneous requests you can sustain when the system is under heavy overall load. During peak platform traffic, your capacity may scale down to this level.

The following APIs each have independent concurrency limits configured per account:

  • Speech to Text (Real-time)
  • Speech to Text (Streaming)
  • Speech to Text (Batch)
  • Text to Speech (Real-time)
  • Text to Speech (Streaming)
  • Translate & Text Services
  • Chat Completion

Your exact per-API limits (provisioned, burst, and high throughput) are shown on the Dashboard → Limits page. Limits vary by plan and update instantly when you upgrade.

For batch endpoints (Speech-to-Text, Speech-to-Text-Translate), implement a minimum 5ms delay between consecutive status polling requests to avoid hitting rate limits unnecessarily.


Upgrading Your Limits

1

Check your current limits

Visit the Dashboard → Rate Limits to see your exact per-API limits.

2

Upgrade your plan

Purchase a higher plan directly from the dashboard. Your rate limits increase immediately after upgrade — no downtime.

3

Need custom limits?

For limits beyond Business tier, contact our team for an Enterprise arrangement.


Managing Your Credits

If your credits are exhausted, API requests will return errors. You can add credits at any time — adding credits does not change your plan or rate limits.

  1. Add Credits — Top up from the Billing page at any time. Credits never expire.

  2. Upgrade Your Plan — Higher plans include bonus credits and increased rate limits.

  3. Enterprise — For volume discounts and custom billing arrangements, email developer@sarvam.ai.