> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Create Document Intelligence Job

POST https://api.sarvam.ai/doc-digitization/job/v1
Content-Type: application/json

Creates a new Document Intelligence job.

**Supported Languages (BCP-47 format):**
- `hi-IN`: Hindi (default)
- `en-IN`: English
- `bn-IN`: Bengali
- `gu-IN`: Gujarati
- `kn-IN`: Kannada
- `ml-IN`: Malayalam
- `mr-IN`: Marathi
- `or-IN`: Odia
- `pa-IN`: Punjabi
- `ta-IN`: Tamil
- `te-IN`: Telugu
- `ur-IN`: Urdu
- `as-IN`: Assamese
- `bodo-IN`: Bodo
- `doi-IN`: Dogri
- `ks-IN`: Kashmiri
- `kok-IN`: Konkani
- `mai-IN`: Maithili
- `mni-IN`: Manipuri
- `ne-IN`: Nepali
- `sa-IN`: Sanskrit
- `sat-IN`: Santali
- `sd-IN`: Sindhi

**Output Formats (delivered as ZIP file):**
- `html`: Structured HTML files with layout preservation
- `md`: Markdown files (default)
- `json`: Structured JSON files for programmatic processing

Reference: https://docs.sarvam.ai/api-reference-docs/document-intelligence/initialise

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: ''
  version: 1.0.0
paths:
  /doc-digitization/job/v1:
    post:
      operationId: initialise
      summary: Create Document Intelligence Job
      description: |-
        Creates a new Document Intelligence job.

        **Supported Languages (BCP-47 format):**
        - `hi-IN`: Hindi (default)
        - `en-IN`: English
        - `bn-IN`: Bengali
        - `gu-IN`: Gujarati
        - `kn-IN`: Kannada
        - `ml-IN`: Malayalam
        - `mr-IN`: Marathi
        - `or-IN`: Odia
        - `pa-IN`: Punjabi
        - `ta-IN`: Tamil
        - `te-IN`: Telugu
        - `ur-IN`: Urdu
        - `as-IN`: Assamese
        - `bodo-IN`: Bodo
        - `doi-IN`: Dogri
        - `ks-IN`: Kashmiri
        - `kok-IN`: Konkani
        - `mai-IN`: Maithili
        - `mni-IN`: Manipuri
        - `ne-IN`: Nepali
        - `sa-IN`: Sanskrit
        - `sat-IN`: Santali
        - `sd-IN`: Sindhi

        **Output Formats (delivered as ZIP file):**
        - `html`: Structured HTML files with layout preservation
        - `md`: Markdown files (default)
        - `json`: Structured JSON files for programmatic processing
      tags:
        - subpackage_documentIntelligence
      parameters:
        - name: api-subscription-key
          in: header
          required: true
          schema:
            type: string
      responses:
        '202':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationCreateJobResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationErrorMessage'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationErrorMessage'
        '429':
          description: Quota Exceeded / Rate Limited
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationErrorMessage'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationErrorMessage'
        '503':
          description: Service Unavailable
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DocDigitizationErrorMessage'
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DocDigitizationCreateJobRequest'
servers:
  - url: https://api.sarvam.ai
components:
  schemas:
    DocDigitizationSupportedLanguage:
      type: string
      enum:
        - hi-IN
        - en-IN
        - bn-IN
        - gu-IN
        - kn-IN
        - ml-IN
        - mr-IN
        - or-IN
        - pa-IN
        - ta-IN
        - te-IN
        - ur-IN
        - as-IN
        - bodo-IN
        - doi-IN
        - ks-IN
        - kok-IN
        - mai-IN
        - mni-IN
        - ne-IN
        - sa-IN
        - sat-IN
        - sd-IN
      description: >-
        BCP-47 language code specifying the primary language of the document.
        Supports 23 languages: 22 Indian languages (Hindi, Bengali, Tamil,
        Telugu, Marathi, Gujarati, Kannada, Malayalam, Odia, Punjabi, Assamese,
        Urdu, Sanskrit, Nepali, Konkani, Maithili, Sindhi, Kashmiri, Dogri,
        Manipuri, Bodo, Santali) and English language.
      title: DocDigitizationSupportedLanguage
    DocDigitizationOutputFormat:
      type: string
      enum:
        - html
        - md
        - json
      description: ' Output format for extracted document content, delivered as a ZIP file. ''html'' returns structured HTML files, ''md'' returns human-readable Markdown files, ''json'' returns structured JSON files for programmatic processing.'
      title: DocDigitizationOutputFormat
    DocDigitizationJobParameters:
      type: object
      properties:
        language:
          $ref: '#/components/schemas/DocDigitizationSupportedLanguage'
          description: >-
            Primary language of the document in BCP-47 format (e.g. en-IN,
            hi-IN). Use this field name — not language_code (which other Sarvam
            APIs use). Sending language_code is ignored and defaults to hi-IN.
        output_format:
          $ref: '#/components/schemas/DocDigitizationOutputFormat'
          description: >-
            Output format for the extracted content (delivered as a ZIP file).
            Accepted values: md, html, json. Use md for Markdown — not markdown
            (returns 400).
      description: >-
        Configuration parameters for Document Intelligence job. Specify the
        document language and desired output format.
      title: DocDigitizationJobParameters
    DocDigitizationWebhookCallback:
      type: object
      properties:
        url:
          type: string
          format: uri
          description: HTTPS webhook URL to call upon job completion (HTTP not allowed)
        auth_token:
          type: string
          default: ''
          description: Authorization token sent as X-SARVAM-JOB-CALLBACK-TOKEN header
      required:
        - url
      description: Webhook configuration for job completion notification
      title: DocDigitizationWebhookCallback
    DocDigitizationCreateJobRequest:
      type: object
      properties:
        job_parameters:
          $ref: '#/components/schemas/DocDigitizationJobParameters'
          description: >-
            Configuration parameters for the Document Intelligence job including
            language and output format. Defaults to Hindi (hi-IN) and Markdown
            output if omitted.
        callback:
          oneOf:
            - $ref: '#/components/schemas/DocDigitizationWebhookCallback'
            - type: 'null'
          description: Optional webhook for completion notification
      description: Request body for creating a new document intelligence job
      title: DocDigitizationCreateJobRequest
    StorageContainerType:
      type: string
      enum:
        - Azure
        - Local
        - Google
        - Azure_V1
      title: StorageContainerType
    DocDigitizationJobState:
      type: string
      enum:
        - Accepted
        - Pending
        - Running
        - Completed
        - PartiallyCompleted
        - Failed
      description: Current state of the document intelligence job
      title: DocDigitizationJobState
    DocDigitizationCreateJobResponse:
      type: object
      properties:
        job_id:
          type: string
          format: uuid
          description: Unique job identifier (UUID)
        storage_container_type:
          $ref: '#/components/schemas/StorageContainerType'
          description: Storage Container Type
        job_parameters:
          $ref: '#/components/schemas/DocDigitizationJobParameters'
          description: '  Job configuration parameters'
        job_state:
          $ref: '#/components/schemas/DocDigitizationJobState'
      required:
        - job_id
        - storage_container_type
        - job_parameters
        - job_state
      title: DocDigitizationCreateJobResponse
    DocDigitizationErrorCode:
      type: string
      enum:
        - invalid_request_error
        - internal_server_error
        - insufficient_quota_error
        - invalid_api_key_error
        - rate_limit_exceeded_error
        - high_load_error
      title: DocDigitizationErrorCode
    DocDigitizationErrorDetails:
      type: object
      properties:
        message:
          type: string
          description: Message describing the error
        code:
          $ref: '#/components/schemas/DocDigitizationErrorCode'
          description: Error code for the specific error that has occurred.
        request_id:
          type: string
          default: ''
          description: 'Unique identifier for the request. Format: date_UUID4'
      required:
        - message
        - code
      title: DocDigitizationErrorDetails
    DocDigitizationErrorMessage:
      type: object
      properties:
        error:
          $ref: '#/components/schemas/DocDigitizationErrorDetails'
          description: Error details
      required:
        - error
      title: DocDigitizationErrorMessage
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: api-subscription-key

```

## SDK Code Examples

```typescript
import { SarvamAIClient } from "sarvamai";

async function main() {
    const client = new SarvamAIClient({
        apiSubscriptionKey: "YOUR_API_KEY_HERE",
    });
    await client.documentIntelligence.initialise({});
}
main();

```

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_API_KEY_HERE",
)

client.document_intelligence.initialise()

```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.sarvam.ai/doc-digitization/job/v1"

	payload := strings.NewReader("{}")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("api-subscription-key", "<apiSubscriptionKey>")
	req.Header.Add("Content-Type", "application/json")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.sarvam.ai/doc-digitization/job/v1")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["api-subscription-key"] = '<apiSubscriptionKey>'
request["Content-Type"] = 'application/json'
request.body = "{}"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.post("https://api.sarvam.ai/doc-digitization/job/v1")
  .header("api-subscription-key", "<apiSubscriptionKey>")
  .header("Content-Type", "application/json")
  .body("{}")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.sarvam.ai/doc-digitization/job/v1', [
  'body' => '{}',
  'headers' => [
    'Content-Type' => 'application/json',
    'api-subscription-key' => '<apiSubscriptionKey>',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.sarvam.ai/doc-digitization/job/v1");
var request = new RestRequest(Method.POST);
request.AddHeader("api-subscription-key", "<apiSubscriptionKey>");
request.AddHeader("Content-Type", "application/json");
request.AddParameter("application/json", "{}", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "api-subscription-key": "<apiSubscriptionKey>",
  "Content-Type": "application/json"
]
let parameters = [] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://api.sarvam.ai/doc-digitization/job/v1")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```