Document Intelligence Overview
Sarvam’s Document Intelligence API provides enterprise-grade document processing powered by Sarvam Vision, our state-of-the-art multimodal model.
Transform any document into structured, searchable, and machine-readable data with world-class accuracy.
What is Document Intelligence?
Document Intelligence is a comprehensive document processing pipeline powered by Sarvam Vision that:
- Extracts Text: High-fidelity text extraction across 23 languages (22 Indian + English)
- Preserves Structure: Maintains document layout, reading order, and hierarchies
- Parses Tables: Transforms tables into structured HTML or Markdown formats
- Outputs Structured Data: Generates clean, machine-readable HTML or Markdown output
Key Features
Native support for all Constitutionally recognized Indian languages and English with script-native accuracy.
Export to HTML or Markdown files, delivered as a ZIP archive with clean, structured formatting.
Intelligent table detection and conversion to structured formats.
Process multi-page documents and ZIP archives with automatic page handling.
Intelligent reading order detection and complex layout handling.
Scalable API with job management, progress tracking, and error handling.
Supported Languages
Document Intelligence supports all 22 Constitutionally recognized Indian languages:
Primary Languages
Additional Languages
Supported Input Formats
For ZIP files, include only JPG and PNG document pages in a flat structure (no nested folders). The API will process all pages in the archive and maintain page order based on filename.
Quick Start
Get started with Document Intelligence in minutes:
Response Format
Job Status Response
Job States
Error Handling
Error Handling Example
Error Codes
Best Practices
Use Markdown for human-readable output and HTML for web rendering and rich formatting.
Always specify the correct language code for optimal text extraction accuracy, especially for Indian languages.
For large documents, monitor page_metrics to track progress and handle partial failures gracefully.
Choose HTML output format when you need to preserve table structures and rich formatting.