Document Intelligence Overview

Sarvam’s Document Intelligence API provides enterprise-grade document processing powered by Sarvam Vision, our state-of-the-art multimodal model.

Transform any document into structured, searchable, and machine-readable data with world-class accuracy.


What is Document Intelligence?

Document Intelligence is a comprehensive document processing pipeline powered by Sarvam Vision that:

  1. Extracts Text: High-fidelity text extraction across 23 languages (22 Indian + English)
  2. Preserves Structure: Maintains document layout, reading order, and hierarchies
  3. Parses Tables: Transforms tables into structured HTML or Markdown formats
  4. Outputs Structured Data: Generates clean, machine-readable HTML or Markdown output

Key Features

23 Language Support

Native support for all Constitutionally recognized Indian languages and English with script-native accuracy.

Multiple Output Formats

Export to HTML or Markdown files, delivered as a ZIP archive with clean, structured formatting.

Table Extraction

Intelligent table detection and conversion to structured formats.

Batch Processing

Process multi-page documents and ZIP archives with automatic page handling.

Layout Preservation

Intelligent reading order detection and complex layout handling.

Enterprise-Ready

Scalable API with job management, progress tracking, and error handling.


Supported Languages

Document Intelligence supports all 22 Constitutionally recognized Indian languages:

LanguageCodeScript
Hindihi-INDevanagari
Bengalibn-INBengali
Tamilta-INTamil
Telugute-INTelugu
Marathimr-INDevanagari
Gujaratigu-INGujarati
Kannadakn-INKannada
Malayalamml-INMalayalam
Odiaod-INOdia
Punjabipa-INGurmukhi
Englishen-INLatin

Supported Input Formats

FormatExtensionDescription
PDF.pdfMulti-page PDF documents
PNG.pngDocument page images
JPEG.jpg, .jpegDocument page images
ZIP.zipFlat archive containing document page images (JPG/PNG)

For ZIP files, include only JPG and PNG document pages in a flat structure (no nested folders). The API will process all pages in the archive and maintain page order based on filename.


Quick Start

Get started with Document Intelligence in minutes:

1from sarvamai import SarvamAI
2
3client = SarvamAI(
4 api_subscription_key="YOUR_SARVAM_API_KEY"
5)
6
7# Create a Document Intelligence job
8job = client.document_intelligence.create_job(
9 language="hi-IN", # Target language (BCP-47 format)
10 output_format="md" # Output format: "html" or "md" (delivered as ZIP)
11)
12
13# Upload your document
14job.upload_file("document.pdf")
15
16# Start processing
17job.start()
18
19# Wait for completion
20status = job.wait_until_complete()
21print(f"Job completed: {status.job_state}")
22
23# Get processing metrics
24metrics = job.get_page_metrics()
25print(f"Pages processed: {metrics['pages_processed']}")
26
27# Download the output (ZIP file containing the processed document)
28job.download_output("./output.zip")
29print("Output saved to ./output.zip")

Response Format

Job Status Response

1{
2 "job_id": "abc123-def456-ghi789",
3 "job_state": "Completed",
4 "created_at": "2026-02-04T10:30:00Z",
5 "updated_at": "2026-02-04T10:35:00Z",
6 "page_metrics": {
7 "total_pages": 10,
8 "pages_processed": 10,
9 "pages_succeeded": 10,
10 "pages_failed": 0
11 }
12}

Job States

StateDescription
AcceptedJob created, awaiting file upload
PendingFile uploaded, waiting to start
RunningJob is being processed
CompletedAll pages processed successfully
PartiallyCompletedSome pages succeeded, some failed
FailedAll pages failed or job-level error

Error Handling

1from sarvamai import SarvamAI
2from sarvamai.core.api_error import ApiError
3
4client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
5
6try:
7 job = client.document_intelligence.create_job(
8 language="hi-IN",
9 output_format="md"
10 )
11 job.upload_file("document.pdf")
12 job.start()
13 status = job.wait_until_complete()
14
15 if status.job_state == "Completed":
16 job.download_output("./output.zip")
17 print("Output saved to ./output.zip")
18 else:
19 print(f"Job failed: {status}")
20
21except ApiError as e:
22 if e.status_code == 400:
23 print(f"Bad request: {e.body}")
24 elif e.status_code == 403:
25 print("Invalid API key")
26 elif e.status_code == 429:
27 print("Rate limit exceeded")
28 else:
29 print(f"Error {e.status_code}: {e.body}")
30except FileNotFoundError:
31 print("Document file not found")

Error Codes

HTTP StatusError CodeDescription
400invalid_request_errorInvalid parameters or missing required fields
403invalid_api_key_errorInvalid or missing API key
404not_found_errorJob not found
422unprocessable_entity_errorInvalid file format or corrupted file
429insufficient_quota_errorRate limit or quota exceeded
500internal_server_errorServer error, retry the request

Best Practices

Choose the Right Format

Use Markdown for human-readable output and HTML for web rendering and rich formatting.

Specify Language

Always specify the correct language code for optimal text extraction accuracy, especially for Indian languages.

Handle Large Documents

For large documents, monitor page_metrics to track progress and handle partial failures gracefully.

Use HTML for Tables

Choose HTML output format when you need to preserve table structures and rich formatting.


Next Steps