> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# Batch - Initiate Job

POST https://api.sarvam.ai/speech-to-text/job/v1
Content-Type: application/json

Create a new speech to text bulk job and receive a job UUID and storage folder details for processing multiple audio files. Set `job_parameters.input_audio_codec` when uploads are raw PCM (`pcm_s16le`, `pcm_l16`, or `pcm_raw`); the API auto-detects other formats. PCM must be 16 kHz.

Reference: https://docs.sarvam.ai/api-reference-docs/speech-to-text/stt/job/initiate

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: ''
  version: 1.0.0
paths:
  /speech-to-text/job/v1:
    post:
      operationId: initialise
      summary: Initiate Speech to Text Bulk Job V1
      description: >-
        Create a new speech to text bulk job and receive a job UUID and storage
        folder details for processing multiple audio files. Set
        `job_parameters.input_audio_codec` when uploads are raw PCM
        (`pcm_s16le`, `pcm_l16`, or `pcm_raw`); the API auto-detects other
        formats. PCM must be 16 kHz.
      tags:
        - subpackage_speechToTextJob
      parameters:
        - name: api-subscription-key
          in: header
          required: true
          schema:
            type: string
      responses:
        '202':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BulkJobInitResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
        '422':
          description: Unprocessable Entity
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
        '429':
          description: Quota Exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
        '503':
          description: Service Overloaded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorMessage'
      requestBody:
        content:
          application/json:
            schema:
              $ref: >-
                #/components/schemas/BulkJobInitRequestV1_SpeechToTextJobParameters_
servers:
  - url: https://api.sarvam.ai
    description: Production
components:
  schemas:
    Sarvam_Model_API_SpeechToTextLanguage:
      type: string
      enum:
        - unknown
        - hi-IN
        - bn-IN
        - kn-IN
        - ml-IN
        - mr-IN
        - od-IN
        - pa-IN
        - ta-IN
        - te-IN
        - en-IN
        - gu-IN
        - as-IN
        - ur-IN
        - ne-IN
        - kok-IN
        - ks-IN
        - sd-IN
        - sa-IN
        - sat-IN
        - mni-IN
        - brx-IN
        - mai-IN
        - doi-IN
      description: >-
        Languages supported for Speech-to-Text.


        **saarika:v2.5 supports (12 languages):** unknown, hi-IN, bn-IN, kn-IN,
        ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN


        **saaras:v3 supports all 23 languages** including: as-IN, ur-IN, ne-IN,
        kok-IN, ks-IN, sd-IN, sa-IN, sat-IN, mni-IN, brx-IN, mai-IN, doi-IN
      title: Sarvam_Model_API_SpeechToTextLanguage
    Sarvam_Model_API_SpeechToTextModel:
      type: string
      enum:
        - saarika:v2.5
        - saaras:v3
      description: >-
        Model to be used for speech to text.


        - **saarika:v2.5** (default): Transcribes audio in the spoken language.


        - **saaras:v3**: State-of-the-art model with flexible output formats.
        Supports multiple modes via the `mode` parameter: transcribe, translate,
        verbatim, translit, codemix.
      title: Sarvam_Model_API_SpeechToTextModel
    Sarvam_Model_API_Mode:
      type: string
      enum:
        - transcribe
        - translate
        - verbatim
        - translit
        - codemix
      description: >-
        Mode of operation for saaras:v3 model.


        Example audio: 'मेरा फोन नंबर है 9840950950'


        - **transcribe** (default): Standard transcription in the original
        language with proper formatting and number normalization.
          - Output: `मेरा फोन नंबर है 9840950950`

        - **translate**: Translates speech from any supported Indic language to
        English.
          - Output: `My phone number is 9840950950`

        - **verbatim**: Exact word-for-word transcription without normalization,
        preserving filler words and spoken numbers as-is.
          - Output: `मेरा फोन नंबर है नौ आठ चार zero नौ पांच zero नौ पांच zero`

        - **translit**: Romanization - Transliterates speech to Latin/Roman
        script only.
          - Output: `mera phone number hai 9840950950`

        - **codemix**: Code-mixed text with English words in English and Indic
        words in native script.
          - Output: `मेरा phone number है 9840950950`
      title: Sarvam_Model_API_Mode
    Sarvam_Model_API_InputAudioCodec:
      type: string
      enum:
        - wav
        - x-wav
        - wave
        - mp3
        - mpeg
        - mpeg3
        - x-mp3
        - x-mpeg-3
        - aac
        - x-aac
        - aiff
        - x-aiff
        - ogg
        - opus
        - flac
        - x-flac
        - mp4
        - x-m4a
        - amr
        - x-ms-wma
        - webm
        - pcm_s16le
        - pcm_l16
        - pcm_raw
      description: >-
        Audio codec/format of the input file. Our API automatically detects all
        codec formats, but for PCM files specifically (pcm_s16le, pcm_l16,
        pcm_raw), you must pass this parameter. PCM files are supported only at
        16kHz sample rate.
      title: Sarvam_Model_API_InputAudioCodec
    SpeechToTextJobParameters:
      type: object
      properties:
        language_code:
          oneOf:
            - $ref: '#/components/schemas/Sarvam_Model_API_SpeechToTextLanguage'
            - type: 'null'
          default: unknown
          description: >-
            Specifies the language of the input audio in BCP-47 format.


            **Available Options:**

            - `unknown` (default): Use when the language is not known; the API
            will auto-detect.

            - `hi-IN`: Hindi

            - `bn-IN`: Bengali

            - `kn-IN`: Kannada

            - `ml-IN`: Malayalam

            - `mr-IN`: Marathi

            - `od-IN`: Odia

            - `pa-IN`: Punjabi

            - `ta-IN`: Tamil

            - `te-IN`: Telugu

            - `en-IN`: English

            - `gu-IN`: Gujarati


            **Additional Options (saaras:v3 only):**

            - `as-IN`: Assamese

            - `ur-IN`: Urdu

            - `ne-IN`: Nepali

            - `kok-IN`: Konkani

            - `ks-IN`: Kashmiri

            - `sd-IN`: Sindhi

            - `sa-IN`: Sanskrit

            - `sat-IN`: Santali

            - `mni-IN`: Manipuri

            - `brx-IN`: Bodo

            - `mai-IN`: Maithili

            - `doi-IN`: Dogri
        model:
          $ref: '#/components/schemas/Sarvam_Model_API_SpeechToTextModel'
          description: >-
            Model to be used for speech to text.


            - **saarika:v2.5** (default): Transcribes audio in the spoken
            language.


            - **saaras:v3**: State-of-the-art model with flexible output
            formats. Supports multiple modes via the `mode` parameter:
            transcribe, translate, verbatim, translit, codemix.
        mode:
          oneOf:
            - $ref: '#/components/schemas/Sarvam_Model_API_Mode'
            - type: 'null'
          default: transcribe
          description: >-
            Mode of operation. **Only applicable when using saaras:v3 model.**


            Example audio: 'मेरा फोन नंबर है 9840950950'


            - **transcribe** (default): Standard transcription in the original
            language with proper formatting and number normalization.
              - Output: `मेरा फोन नंबर है 9840950950`

            - **translate**: Translates speech from any supported Indic language
            to English.
              - Output: `My phone number is 9840950950`

            - **verbatim**: Exact word-for-word transcription without
            normalization, preserving filler words and spoken numbers as-is.
              - Output: `मेरा फोन नंबर है नौ आठ चार zero नौ पांच zero नौ पांच zero`

            - **translit**: Romanization - Transliterates speech to Latin/Roman
            script only.
              - Output: `mera phone number hai 9840950950`

            - **codemix**: Code-mixed text with English words in English and
            Indic words in native script.
              - Output: `मेरा phone number है 9840950950`
        with_timestamps:
          type: boolean
          default: false
          description: Whether to include timestamps in the response
        with_diarization:
          type: boolean
          default: false
          description: >-
            Enables speaker diarization, which identifies and separates
            different speakers in the audio. In beta mode
        num_speakers:
          type:
            - integer
            - 'null'
          description: >-
            Number of speakers to be detected in the audio. This is used when
            with_diarization is true.
        input_audio_codec:
          $ref: '#/components/schemas/Sarvam_Model_API_InputAudioCodec'
          description: >-
            Audio codec/format of uploaded files. The API automatically detects
            most formats; for PCM files (pcm_s16le, pcm_l16, pcm_raw), you must
            specify this parameter. PCM files are supported only at 16kHz sample
            rate.
      title: SpeechToTextJobParameters
    BulkJobCallback:
      type: object
      properties:
        url:
          type: string
          description: Webhook url to call upon job completion
        auth_token:
          type: string
          default: ''
          description: Authorization token required for the callback Url
      required:
        - url
      title: BulkJobCallback
    BulkJobInitRequestV1_SpeechToTextJobParameters_:
      type: object
      properties:
        job_parameters:
          $ref: '#/components/schemas/SpeechToTextJobParameters'
          description: Job Parameters for the bulk job
        callback:
          oneOf:
            - $ref: '#/components/schemas/BulkJobCallback'
            - type: 'null'
          description: Parameters for callback URL
      required:
        - job_parameters
      title: BulkJobInitRequestV1_SpeechToTextJobParameters_
    StorageContainerType:
      type: string
      enum:
        - Azure
        - Local
        - Google
        - Azure_V1
      title: StorageContainerType
    BaseJobParameters:
      type: object
      properties: {}
      title: BaseJobParameters
    JobState:
      type: string
      enum:
        - Accepted
        - Pending
        - Running
        - Completed
        - Failed
      title: JobState
    BulkJobInitResponse:
      type: object
      properties:
        job_id:
          type: string
          description: Job UUID.
        storage_container_type:
          $ref: '#/components/schemas/StorageContainerType'
          description: Storage Container Type
        job_parameters:
          $ref: '#/components/schemas/BaseJobParameters'
        job_state:
          $ref: '#/components/schemas/JobState'
      required:
        - job_id
        - storage_container_type
        - job_parameters
        - job_state
      title: BulkJobInitResponse
    ErrorCode:
      type: string
      enum:
        - invalid_request_error
        - internal_server_error
        - unprocessable_entity_error
        - insufficient_quota_error
        - invalid_api_key_error
        - authentication_error
        - rate_limit_exceeded_error
        - not_found_error
      title: ErrorCode
    ErrorDetails:
      type: object
      properties:
        message:
          type: string
          description: Message describing the error
        code:
          $ref: '#/components/schemas/ErrorCode'
          description: >-
            Error code for the specific error that has occurred. Refer to the
            error code documentation for more details.
        request_id:
          type: string
          default: ''
          description: 'Unique identifier for the request. Format: date_UUID4'
      required:
        - message
        - code
      title: ErrorDetails
    ErrorMessage:
      type: object
      properties:
        error:
          $ref: '#/components/schemas/ErrorDetails'
          description: Error details
      required:
        - error
      title: ErrorMessage
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: api-subscription-key

```

## Examples



**Request**

```json
{}
```

**Response**

```json
{
  "job_id": "job_9f8b7c6d5e4a3b2c1d0e",
  "storage_container_type": "Azure_V1",
  "job_parameters": {},
  "job_state": "Accepted"
}
```

**SDK Code**

```typescript
import { SarvamAIClient } from "sarvamai";

async function main() {
    const client = new SarvamAIClient({
        apiSubscriptionKey: "YOUR_API_KEY_HERE",
    });
    await client.speechToTextJob.initialise({});
}
main();

```

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_API_KEY_HERE",
)

client.speech_to_text_job.initialise()

```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.sarvam.ai/speech-to-text/job/v1"

	payload := strings.NewReader("{}")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("api-subscription-key", "<apiSubscriptionKey>")
	req.Header.Add("Content-Type", "application/json")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.sarvam.ai/speech-to-text/job/v1")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["api-subscription-key"] = '<apiSubscriptionKey>'
request["Content-Type"] = 'application/json'
request.body = "{}"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.post("https://api.sarvam.ai/speech-to-text/job/v1")
  .header("api-subscription-key", "<apiSubscriptionKey>")
  .header("Content-Type", "application/json")
  .body("{}")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.sarvam.ai/speech-to-text/job/v1', [
  'body' => '{}',
  'headers' => [
    'Content-Type' => 'application/json',
    'api-subscription-key' => '<apiSubscriptionKey>',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.sarvam.ai/speech-to-text/job/v1");
var request = new RestRequest(Method.POST);
request.AddHeader("api-subscription-key", "<apiSubscriptionKey>");
request.AddHeader("Content-Type", "application/json");
request.AddParameter("application/json", "{}", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "api-subscription-key": "<apiSubscriptionKey>",
  "Content-Type": "application/json"
]
let parameters = [] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://api.sarvam.ai/speech-to-text/job/v1")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```