Start Document Intelligence Job

{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "job_state": "Accepted",
  "created_at": "2024-01-15T09:30:00Z",
  "updated_at": "2024-01-15T09:30:00Z",
  "storage_container_type": "Azure",
  "total_files": 1,
  "successful_files_count": 0,
  "failed_files_count": 0,
  "error_message": "",
  "job_details": [
    {
      "inputs": [
        {
          "file_name": "invoice_2024_01.pdf",
          "file_id": "file_1234567890abcdef"
        }
      ],
      "outputs": [
        {
          "file_name": "invoice_2024_01_output.json",
          "file_id": "file_abcdef1234567890"
        }
      ],
      "state": "Pending",
      "total_pages": 5,
      "pages_processed": 0,
      "pages_succeeded": 0,
      "pages_failed": 0,
      "error_message": "",
      "error_code": null,
      "page_errors": []
    }
  ]
}

Validates the uploaded file and starts processing.

Validation Checks:

File must be uploaded before starting
File size must not exceed 200 MB
PDF must be parseable by the PDF parser
ZIP must contain only JPEG/PNG images
ZIP must be flat (no nested folders beyond one level)
ZIP must contain at least one valid image
Page/image count must not exceed 10 (returns 422 with max_page_limit_exceeded if exceeded)
User must have sufficient credits

Processing: Job runs asynchronously. Poll the status endpoint or use webhook callback for completion notification.

Validates the uploaded file and starts processing. **Validation Checks:** - File must be uploaded before starting - File size must not exceed 200 MB - PDF must be parseable by the PDF parser - ZIP must contain only JPEG/PNG images - ZIP must be flat (no nested folders beyond one level) - ZIP must contain at least one valid image - Page/image count must not exceed 10 (returns `422` with `max_page_limit_exceeded` if exceeded) - User must have sufficient credits **Processing:** Job runs asynchronously. Poll the status endpoint or use webhook callback for completion notification.

Authentication

api-subscription-keystring

API Key authentication via header

Path parameters

job_idstringRequiredformat: "uuid"

The unique identifier of the job

Response

Successful Response

job_idstringformat: "uuid"

Job identifier (UUID)

job_stateenum

Current job state

created_atstringformat: "date-time"

Job creation timestamp (ISO 8601)

updated_atstringformat: "date-time"

Last update timestamp (ISO 8601)

storage_container_typeenum

Storage backend type

total_filesintegerDefaults to 0

Total input files (always 1)

successful_files_countintegerDefaults to 0

Files that completed successfully

failed_files_countintegerDefaults to 0

Files that failed

error_messagestringDefaults to

Job-level error message

job_detailslist of objects

Per-file processing details with page metrics

Errors

400

Bad Request Error

403

Forbidden Error

429

Too Many Requests Error

500

Internal Server Error

503

Service Unavailable Error

$	curl -X POST https://api.sarvam.ai/doc-digitization/job/v1/job_id/start \
>	-H "api-subscription-key: <apiSubscriptionKey>" \
>	-H "Content-Type: application/json" \
>	-d '{}'