POST
/
speech-to-text

Headers

api-subscription-key
string
required

Your unique subscription key for authenticating requests to the Sarvam AI Speech-to-Text API. Here are the steps to get your api key

Body

multipart/form-data
file
file
required

The audio file to transcribe. Supported formats are WAV (.wav) and MP3 (.mp3). The API works best with audio files sampled at 16kHz. If the audio contains multiple channels, they will be merged into a single channel.

language_code
enum<string>

Specifies the language of the input audio. This parameter is required to ensure accurate transcription. For the saarika:v1 model, this parameter is mandatory. For the saarika:v2 model, it is optional. unknown: Use this when the language is not known; the API will detect it automatically. Note:- that the saarika:v1 model does not support unknown language code.

Available options:
unknown,
hi-IN,
bn-IN,
kn-IN,
ml-IN,
mr-IN,
od-IN,
pa-IN,
ta-IN,
te-IN,
en-IN,
gu-IN
model
enum<string>

Specifies the model to use for speech-to-text conversion. Note:- Default model is saarika:v2

Available options:
saarika:v1,
saarika:v2
with_diarization
boolean
default:
false

Enables speaker diarization, which identifies and separates different speakers in the audio. When set to true, the API will provide speaker-specific segments in the response.

with_timestamps
boolean
default:
false

Enables timestamps in the response. If set to true, the response will include timestamps in the transcript.

Response

200 - application/json
request_id
string | null
required
transcript
string
required

The transcribed text from the provided audio file.

diarized_transcript
object | null

Diarized transcript of the provided speech

language_code
string | null

This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null

timestamps
object | null

Contains timestamps for the transcribed text. This field is included only if with_timestamps is set to true