> For clean Markdown of any page, append `.md` to the page URL.
> For a complete documentation index, see https://docs.sarvam.ai/llms.txt.
> For full documentation content in one file, see https://docs.sarvam.ai/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.sarvam.ai/_mcp/server.

# REST

POST https://api.sarvam.ai/speech-to-text-translate
Content-Type: multipart/form-data

## Speech to Text Translation API

This API automatically detects the input language, transcribes the speech, and translates the text to English.

### Available Options:
- **REST API** (Current Endpoint): For quick responses under 30 seconds with immediate results
- **Batch API**: For longer audio files [Follow this documentation](https://docs.sarvam.ai/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api)
  - Supports diarization (speaker identification)

### Note:
- Pricing differs for REST and Batch APIs
- Diarization is only available in Batch API with separate pricing
- Please refer to [here](https://docs.sarvam.ai/api-reference-docs/pricing) for detailed pricing information

Reference: https://docs.sarvam.ai/api-reference-docs/speech-to-text-translate/translate

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: ''
  version: 1.0.0
paths:
  /speech-to-text-translate:
    post:
      operationId: translate
      summary: Speech To Text Translate
      description: >-
        ## Speech to Text Translation API


        This API automatically detects the input language, transcribes the
        speech, and translates the text to English.


        ### Available Options:

        - **REST API** (Current Endpoint): For quick responses under 30 seconds
        with immediate results

        - **Batch API**: For longer audio files [Follow this
        documentation](https://docs.sarvam.ai/api-reference-docs/api-guides-tutorials/speech-to-text/batch-api)
          - Supports diarization (speaker identification)

        ### Note:

        - Pricing differs for REST and Batch APIs

        - Diarization is only available in Batch API with separate pricing

        - Please refer to
        [here](https://docs.sarvam.ai/api-reference-docs/pricing) for detailed
        pricing information
      tags:
        - subpackage_speechToText
      parameters:
        - name: api-subscription-key
          in: header
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: >-
                  #/components/schemas/Sarvam_Model_API_SpeechToTextTranslateResponse
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '422':
          description: Unprocessable Entity
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '429':
          description: Quota Exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
        '503':
          description: Service Overloaded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Sarvam_Model_API_ErrorMessage'
      requestBody:
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  description: >-
                    The audio file to transcribe. Supported formats include WAV,
                    MP3, AAC, AIFF, OGG, OPUS, FLAC, MP4/M4A, AMR, WMA, WebM,
                    and PCM formats. The API automatically detects most codec
                    formats, but for PCM files (pcm_s16le, pcm_l16, pcm_raw),
                    you must specify the input_audio_codec parameter. PCM files
                    are supported only at 16kHz sample rate.

                    Works best at 16kHz. Multiple channels will be merged.
                prompt:
                  type:
                    - string
                    - 'null'
                  description: >-
                    Conversation context can be passed as a prompt to boost
                    model accuracy. However, the current system is at an
                    experimentation stage and doesn't match the prompt
                    performance of large language models.
                model:
                  $ref: >-
                    #/components/schemas/Sarvam_Model_API_SpeechToTextTranslateModel
                  description: >-
                    Model to be used for speech to text translation.


                    - **saaras:v2.5** (default): Translation model that
                    translates audio from any spoken Indic language to English.
                      - Example: Hindi audio → English text output
                input_audio_codec:
                  $ref: '#/components/schemas/Sarvam_Model_API_InputAudioCodec'
                  description: >-
                    Audio codec/format of the input file. Our API automatically
                    detects all codec formats, but for PCM files specifically
                    (pcm_s16le, pcm_l16, pcm_raw), you must pass this parameter.
                    PCM files are supported only at 16kHz sample rate.
              required:
                - file
servers:
  - url: https://api.sarvam.ai
    description: Production
components:
  schemas:
    Sarvam_Model_API_SpeechToTextTranslateModel:
      type: string
      enum:
        - saaras:v2.5
      description: >-
        Model to be used for speech to text translation.


        - **saaras:v2.5** (default): Translation model that translates audio
        from any spoken Indic language to English.
          - Example: Hindi audio → English text output
      title: Sarvam_Model_API_SpeechToTextTranslateModel
    Sarvam_Model_API_InputAudioCodec:
      type: string
      enum:
        - wav
        - x-wav
        - wave
        - mp3
        - mpeg
        - mpeg3
        - x-mp3
        - x-mpeg-3
        - aac
        - x-aac
        - aiff
        - x-aiff
        - ogg
        - opus
        - flac
        - x-flac
        - mp4
        - x-m4a
        - amr
        - x-ms-wma
        - webm
        - pcm_s16le
        - pcm_l16
        - pcm_raw
      description: >-
        Audio codec/format of the input file. Our API automatically detects all
        codec formats, but for PCM files specifically (pcm_s16le, pcm_l16,
        pcm_raw), you must pass this parameter. PCM files are supported only at
        16kHz sample rate.
      title: Sarvam_Model_API_InputAudioCodec
    Sarvam_Model_API_SpeechToTextTranslateLanguage:
      type: string
      enum:
        - hi-IN
        - bn-IN
        - kn-IN
        - ml-IN
        - mr-IN
        - od-IN
        - pa-IN
        - ta-IN
        - te-IN
        - gu-IN
        - en-IN
        - as-IN
        - ur-IN
        - ne-IN
        - kok-IN
        - ks-IN
        - sd-IN
        - sa-IN
        - sat-IN
        - mni-IN
        - brx-IN
        - mai-IN
        - doi-IN
      description: >-
        Languages supported for Speech-to-Text-Translate (detected source
        language).


        **saaras:v2.5 supports (11 languages):** hi-IN, bn-IN, kn-IN, ml-IN,
        mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN


        **All 22 languages available** including: as-IN, ur-IN, ne-IN, kok-IN,
        ks-IN, sd-IN, sa-IN, sat-IN, mni-IN, brx-IN, mai-IN, doi-IN
      title: Sarvam_Model_API_SpeechToTextTranslateLanguage
    Sarvam_Model_API_DiarizedEntry:
      type: object
      properties:
        transcript:
          type: string
          description: transcript of the segment of that audio
        start_time_seconds:
          type: number
          format: double
          description: Start time of the word in seconds.
        end_time_seconds:
          type: number
          format: double
          description: End time of the word in seconds.
        speaker_id:
          type: string
          description: Speaker ID for the word.
      required:
        - transcript
        - start_time_seconds
        - end_time_seconds
        - speaker_id
      title: Sarvam_Model_API_DiarizedEntry
    Sarvam_Model_API_DiarizedTranscript:
      type: object
      properties:
        entries:
          type: array
          items:
            $ref: '#/components/schemas/Sarvam_Model_API_DiarizedEntry'
          description: List of diarized transcript entries.
      required:
        - entries
      title: Sarvam_Model_API_DiarizedTranscript
    Sarvam_Model_API_SpeechToTextTranslateResponse:
      type: object
      properties:
        request_id:
          type:
            - string
            - 'null'
        transcript:
          type: string
          description: Transcript of the provided speech
        language_code:
          oneOf:
            - $ref: >-
                #/components/schemas/Sarvam_Model_API_SpeechToTextTranslateLanguage
            - type: 'null'
          description: >-
            This will return the BCP-47 code of language spoken in the input. If
            multiple languages are detected, this will return language code of
            most predominant spoken language. If no language is detected, this
            will be null
        diarized_transcript:
          oneOf:
            - $ref: '#/components/schemas/Sarvam_Model_API_DiarizedTranscript'
            - type: 'null'
          description: Diarized transcript of the provided speech
        language_probability:
          type:
            - number
            - 'null'
          format: double
          description: >-
            Float value (0.0 to 1.0) indicating the probability of the detected
            language being correct. Higher values indicate higher confidence.


            **When it returns a value:**

            - When `language_code` is not provided in the request

            - When `language_code` is set to `unknown`


            **When it returns null:**

            - When a specific `language_code` is provided (language detection is
            skipped)


            The parameter is always present in the response.
      required:
        - request_id
        - transcript
        - language_code
      title: Sarvam_Model_API_SpeechToTextTranslateResponse
    Sarvam_Model_API_ErrorCode:
      type: string
      enum:
        - invalid_request_error
        - internal_server_error
        - unprocessable_entity_error
        - insufficient_quota_error
        - invalid_api_key_error
        - authentication_error
        - not_found_error
        - rate_limit_exceeded_error
      title: Sarvam_Model_API_ErrorCode
    Sarvam_Model_API_ErrorDetails:
      type: object
      properties:
        request_id:
          type:
            - string
            - 'null'
        message:
          type: string
          description: Message describing the error
        code:
          $ref: '#/components/schemas/Sarvam_Model_API_ErrorCode'
          description: >-
            Error code for the specific error that has occurred. Refer to the
            error code documentation for more details.
      required:
        - request_id
        - message
        - code
      title: Sarvam_Model_API_ErrorDetails
    Sarvam_Model_API_ErrorMessage:
      type: object
      properties:
        error:
          $ref: '#/components/schemas/Sarvam_Model_API_ErrorDetails'
          description: Error details
      required:
        - error
      title: Sarvam_Model_API_ErrorMessage
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: api-subscription-key

```

## Examples


**Request**

```json
{
  "file": "<file: meeting_recording.wav>"
}
```

**Response**

```json
{
  "request_id": "20240615_1a2b3c4d-5678-90ab-cdef-1234567890ab",
  "transcript": "Good morning everyone, let's start the project update meeting.",
  "language_code": "en-IN",
  "diarized_transcript": {
    "entries": [
      {
        "transcript": "Good morning everyone,",
        "start_time_seconds": 0,
        "end_time_seconds": 3.2,
        "speaker_id": "speaker_1"
      },
      {
        "transcript": "let's start the project update meeting.",
        "start_time_seconds": 3.3,
        "end_time_seconds": 7.5,
        "speaker_id": "speaker_2"
      }
    ]
  },
  "language_probability": 0.98
}
```

**SDK Code**

```typescript
import { SarvamAIClient } from "sarvamai";

async function main() {
    const client = new SarvamAIClient({
        apiSubscriptionKey: "YOUR_API_KEY_HERE",
    });
    await client.speechToText.translate(, {});
}
main();

```

```python
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_API_KEY_HERE",
)

client.speech_to_text.translate(
    file="example_file",
)

```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.sarvam.ai/speech-to-text-translate"

	payload := strings.NewReader("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"; filename=\"meeting_recording.wav\"\r\nContent-Type: application/octet-stream\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"prompt\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"model\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"input_audio_codec\"\r\n\r\n\r\n-----011000010111000001101001--\r\n")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("api-subscription-key", "<apiSubscriptionKey>")
	req.Header.Add("Content-Type", "multipart/form-data; boundary=---011000010111000001101001")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.sarvam.ai/speech-to-text-translate")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["api-subscription-key"] = '<apiSubscriptionKey>'
request["Content-Type"] = 'multipart/form-data; boundary=---011000010111000001101001'
request.body = "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"; filename=\"meeting_recording.wav\"\r\nContent-Type: application/octet-stream\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"prompt\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"model\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"input_audio_codec\"\r\n\r\n\r\n-----011000010111000001101001--\r\n"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.post("https://api.sarvam.ai/speech-to-text-translate")
  .header("api-subscription-key", "<apiSubscriptionKey>")
  .header("Content-Type", "multipart/form-data; boundary=---011000010111000001101001")
  .body("-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"; filename=\"meeting_recording.wav\"\r\nContent-Type: application/octet-stream\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"prompt\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"model\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"input_audio_codec\"\r\n\r\n\r\n-----011000010111000001101001--\r\n")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.sarvam.ai/speech-to-text-translate', [
  'multipart' => [
    [
        'name' => 'file',
        'filename' => 'meeting_recording.wav',
        'contents' => null
    ]
  ]
  'headers' => [
    'api-subscription-key' => '<apiSubscriptionKey>',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.sarvam.ai/speech-to-text-translate");
var request = new RestRequest(Method.POST);
request.AddHeader("api-subscription-key", "<apiSubscriptionKey>");
request.AddParameter("multipart/form-data; boundary=---011000010111000001101001", "-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"file\"; filename=\"meeting_recording.wav\"\r\nContent-Type: application/octet-stream\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"prompt\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"model\"\r\n\r\n\r\n-----011000010111000001101001\r\nContent-Disposition: form-data; name=\"input_audio_codec\"\r\n\r\n\r\n-----011000010111000001101001--\r\n", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "api-subscription-key": "<apiSubscriptionKey>",
  "Content-Type": "multipart/form-data; boundary=---011000010111000001101001"
]
let parameters = [
  [
    "name": "file",
    "fileName": "meeting_recording.wav"
  ],
  [
    "name": "prompt",
    "value": 
  ],
  [
    "name": "model",
    "value": 
  ],
  [
    "name": "input_audio_codec",
    "value": 
  ]
]

let boundary = "---011000010111000001101001"

var body = ""
var error: NSError? = nil
for param in parameters {
  let paramName = param["name"]!
  body += "--\(boundary)\r\n"
  body += "Content-Disposition:form-data; name=\"\(paramName)\""
  if let filename = param["fileName"] {
    let contentType = param["content-type"]!
    let fileContent = String(contentsOfFile: filename, encoding: String.Encoding.utf8)
    if (error != nil) {
      print(error as Any)
    }
    body += "; filename=\"\(filename)\"\r\n"
    body += "Content-Type: \(contentType)\r\n\r\n"
    body += fileContent
  } else if let paramValue = param["value"] {
    body += "\r\n\r\n\(paramValue)"
  }
}

let request = NSMutableURLRequest(url: NSURL(string: "https://api.sarvam.ai/speech-to-text-translate")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```