Sarvam Vision

View as Markdown

Sarvam Vision is a 3B parameter state-space Vision Language Model (VLM) purpose-built for high-accuracy Document Intelligence. It powers our Document Intelligence pipeline.

At a Glance

Model IDsarvam-vision
What it doesDocument intelligence — text extraction, table conversion, and structure preservation from PDFs and scans
Languages23 (22 Indian + English) — full list
APIsDocument Digitization Batch API
Input limits10 pages per PDF, 200 MB per file — all limits
PricingPricing page
Best forDigitizing scanned archives, Indic OCR, table-heavy documents
Known limitations10-page cap per job — split larger PDFs before uploading

Why Sarvam Vision?

One of the most challenging problems in vision AI today is accurate document intelligence for Indian languages. Much of India’s knowledge—historical texts, government records, academic papers, and cultural archives—remains locked in libraries, scanned collections, and legacy documents. Unlocking this vast repository is essential for preserving cultural heritage and making knowledge accessible.

While frontier Vision Language Models have set a high bar for processing modern English documents, a significant gap remains: most global models treat Indian languages as secondary, often resulting in lower accuracy for regional scripts. Sarvam Vision bridges this gap with native support for 22 Indian languages, delivering world-class accuracy where others fall short.

Want to learn more about how we built Sarvam Vision? Check out our blog post.


What You Can Do

  • Text Extraction: Extract text from PDFs and scanned documents in 23 languages (22 Indian + English)
  • Tables: Convert complex tables to HTML or Markdown
  • Structure Preservation: Maintain document layout, reading order, and hierarchies

Supported Languages

All 22 official Indian languages plus English:

LanguageCodeLanguageCodeLanguageCode
Hindihi-INAssameseas-INKonkanikok-IN
Bengalibn-INUrduur-INMaithilimai-IN
Tamilta-INSanskritsa-INSindhisd-IN
Telugute-INNepaline-INKashmiriks-IN
Marathimr-INDogridoi-INManipurimni-IN
Gujaratigu-INBodobrx-INSantalisat-IN
Kannadakn-INPunjabipa-INEnglishen-IN
Malayalamml-INOdiaod-IN

Capabilities

High-Fidelity Document Intelligence

Sarvam Vision extracts text from documents with exceptional accuracy, preserving the original structure and reading order across 23 languages (22 Indian + English).

Features:

  • High-accuracy text extraction from PDFs and scanned documents
  • Preserves document layout and reading order
  • Native support for all Indian scripts
  • Outputs clean HTML or Markdown

Quick Start

Get started with Document Intelligence with high-fidelity text extraction across all supported languages.

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY"
5)
6
7# Create a document intelligence job
8job = client.document_intelligence.create_job(
9 language="hi-IN",
10 output_format="md"
11)
12print(f"Job created: {job.job_id}")
13
14# Upload document
15job.upload_file("document.pdf")
16print("File uploaded")
17
18# Start processing
19job.start()
20print("Job started")
21
22# Wait for completion
23status = job.wait_until_complete()
24print(f"Job completed with state: {status.job_state}")
25
26# Get processing metrics
27metrics = job.get_page_metrics()
28print(f"Page metrics: {metrics}")
29
30# Download output (ZIP file containing the processed document)
31job.download_output("./output.zip")
32print("Output saved to ./output.zip")

Model Specifications

Technical Specifications
  • Model Size: 3B parameters
  • Supported Input Formats: PDF, PNG, JPG, ZIP (flat archive with JPG/PNG document pages)
  • Output Formats: HTML, Markdown (md) (delivered as ZIP file). JSON with structured page-level data is always included by default, regardless of the chosen output format.
  • Languages: 23 languages (22 Indian + English)

Limits

LimitValue
Max pages per PDF10 (422 max_page_limit_exceeded if exceeded)
Max images per ZIP10
Max file size200 MB
Supported input formatsPDF, PNG, JPG, ZIP
Rate limit10 requests/minute (all plans) — see Rate Limits

Next Steps