POST
/
speech-to-text

Headers

api-subscription-key
string
default:

Body

multipart/form-data
file
file
required

The audio file to transcribe. Supported formats are wave (.wav) and MPEG-3 (.mp3). Works best at 16kHz. Multiple channels will be merged.

language_code
enum<string>
required

Language code

Available options:
hi-IN,
bn-IN,
kn-IN,
ml-IN,
mr-IN,
od-IN,
pa-IN,
ta-IN,
te-IN,
gu-IN
model
enum<string>

Model to be used for speech to text

Available options:
saarika:v1
with_timestamps
boolean
default: false

Use this to enable word level timestamps

Response

200 - application/json
transcript
string
required

Transcript of the provided speech.

timestamps
object | null

Timestamps of words in the transcript.