> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Sarvam Vision

> Sarvam Vision - A 3B parameter multimodal model delivering world-class Document Intelligence and visual understanding with unmatched accuracy for 23 languages (22 Indian + English).

**Sarvam Vision** is a 3B parameter state-space Vision Language Model (VLM) purpose-built for high-accuracy Document Intelligence. It powers our Document Intelligence pipeline.

## Why Sarvam Vision?

One of the most challenging problems in vision AI today is accurate document intelligence for Indian languages. Much of India's knowledge—historical texts, government records, academic papers, and cultural archives—remains locked in libraries, scanned collections, and legacy documents. Unlocking this vast repository is essential for preserving cultural heritage and making knowledge accessible.

While frontier Vision Language Models have set a high bar for processing modern English documents, a significant gap remains: most global models treat Indian languages as secondary, often resulting in lower accuracy for regional scripts. **Sarvam Vision bridges this gap** with native support for 22 Indian languages, delivering world-class accuracy where others fall short.

Want to learn more about how we built Sarvam Vision? Check out our [blog post](https://www.sarvam.ai/blogs).

***

## What You Can Do

* **Text Extraction**: Extract text from PDFs and scanned documents in 23 languages (22 Indian + English)
* **Tables**: Convert complex tables to HTML or Markdown
* **Structure Preservation**: Maintain document layout, reading order, and hierarchies

***

## Supported Languages

All 22 official Indian languages plus English:

| Language  | Code    | Language | Code     | Language | Code     |
| --------- | ------- | -------- | -------- | -------- | -------- |
| Hindi     | `hi-IN` | Assamese | `as-IN`  | Konkani  | `kok-IN` |
| Bengali   | `bn-IN` | Urdu     | `ur-IN`  | Maithili | `mai-IN` |
| Tamil     | `ta-IN` | Sanskrit | `sa-IN`  | Sindhi   | `sd-IN`  |
| Telugu    | `te-IN` | Nepali   | `ne-IN`  | Kashmiri | `ks-IN`  |
| Marathi   | `mr-IN` | Dogri    | `doi-IN` | Manipuri | `mni-IN` |
| Gujarati  | `gu-IN` | Bodo     | `brx-IN` | Santali  | `sat-IN` |
| Kannada   | `kn-IN` | Punjabi  | `pa-IN`  | English  | `en-IN`  |
| Malayalam | `ml-IN` | Odia     | `od-IN`  |          |          |

***

## Capabilities

<h3>
  High-Fidelity Document Intelligence
</h3>

<p>
  Sarvam Vision extracts text from documents with exceptional accuracy, preserving the original structure and reading order across 23 languages (22 Indian + English).
</p>

**Features:**

* High-accuracy text extraction from PDFs and scanned documents
* Preserves document layout and reading order
* Native support for all Indian scripts
* Outputs clean HTML or Markdown

<h3>
  Mastering Complex Tables
</h3>

<p>
  Financial reports and scientific papers are notorious for complex tables—merged cells, multi-level headers, and invisible borders. Where traditional tools scramble this data into a jumbled mess, Sarvam Vision understands the spatial relationships.
</p>

**Features:**

* Preserves row/column structure perfectly
* Handles merged cells and multi-level headers
* Supports invisible borders and complex layouts
* Outputs clean HTML or Markdown tables

<h3>
  End-to-End Indic Support
</h3>

<p>
  Unlike other models that force translation to English, Sarvam Vision supports both input and output in all 23 languages (22 Indian + English).
</p>

**Examples:**

* Marathi financial report → Structured Marathi content
* Tamil official document → Tamil structured output
* Bengali textbook → Full Bengali structured output

***

## Quick Start

Get started with Document Intelligence with high-fidelity text extraction across all supported languages.

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY"
)

# Create a document intelligence job
job = client.document_intelligence.create_job(
    language="hi-IN",
    output_format="md"
)
print(f"Job created: {job.job_id}")

# Upload document
job.upload_file("document.pdf")
print("File uploaded")

# Start processing
job.start()
print("Job started")

# Wait for completion
status = job.wait_until_complete()
print(f"Job completed with state: {status.job_state}")

# Get processing metrics
metrics = job.get_page_metrics()
print(f"Page metrics: {metrics}")

# Download output (ZIP file containing the processed document)
job.download_output("./output.zip")
print("Output saved to ./output.zip")
```

```javascript
import { SarvamAIClient } from "sarvamai";

const client = new SarvamAIClient({
    apiSubscriptionKey: "YOUR_SARVAM_API_KEY"
});

async function main() {
    // Create a document intelligence job
    const job = await client.documentIntelligence.createJob({
        language: "hi-IN",
        outputFormat: "md"
    });
    console.log(`Job created: ${job.jobId}`);

    // Upload document
    await job.uploadFile("document.pdf");
    console.log("File uploaded");

    // Start processing
    await job.start();
    console.log("Job started");

    // Wait for completion
    const status = await job.waitUntilComplete();
    console.log(`Job completed with state: ${status.job_state}`);

    // Get processing metrics
    const metrics = job.getPageMetrics();
    console.log("Page metrics:", metrics);

    // Download output (ZIP file containing the processed document)
    await job.downloadOutput("./output.zip");
    console.log("Output saved to ./output.zip");
}

main();
```

***

## Model Specifications

<ul>
  <li>
    <strong>Model Size</strong>

    : 3B parameters
  </li>

  <li>
    <strong>Supported Input Formats</strong>

    : PDF, PNG, JPG, ZIP (flat archive with JPG/PNG document pages)
  </li>

  <li>
    <strong>Output Formats</strong>

    : HTML, Markdown (md) (delivered as ZIP file). JSON with structured page-level data is always included by default, regardless of the chosen output format.
  </li>

  <li>
    <strong>Languages</strong>

    : 23 languages (22 Indian + English)
  </li>
</ul>

***

## Next Steps

Learn how to integrate Document Intelligence into your application.

Complete API documentation for Document Intelligence endpoints.

Get your API key and start processing documents.