POST
/
call-analytics

Headers

api-subscription-key
string
required

Your unique subscription key for authenticating requests to the Sarvam AI Speech-to-Text API. Here are the steps to get your api key

Body

multipart/form-data
file
file
required

The audio file to be analyzed. Must be passed as a form input if using multipart/form-data. Supported formats are WAV (.wav) and MP3 (.mp3). Optimal sample rate is 16kHz. Multi-channel audio will be merged to mono. File size must be less than 10MB and audio duration must not exceed 600 seconds (10 minutes).

questions
string
required

List of questions to be answered based on the call content. Each question should be a valid JSON object with the following structure: {id: string, text: string, description: string (optional), type: string, properties: object}. The type field must be one of: boolean, enum, short answer, long answer, or number. For enum type questions, include an 'options' list in the properties.

hotwords
string | null

Optional comma-separated string of keywords specific to your domain. These keywords will be preserved as-is in the transcript.

model
enum<string>

Model to be used for converting speech to text in target language

Available options:
saaras:v1,
saaras:v2
with_diarization
boolean
default:
false

Include diarization in the output

Response

200 - application/json
language_code
enum<string> | null
required

This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null

Available options:
hi-IN,
bn-IN,
kn-IN,
ml-IN,
mr-IN,
od-IN,
pa-IN,
ta-IN,
te-IN,
gu-IN,
en-IN
request_id
string | null
required
transcript
string
required

Full transcript of the call generated by Sarvam's inhouse speech-to-text model.

answers
object[] | null

List of answers to predefined questions, derived from the call analysis. It can be null if no valid answers were generated.

diarized_transcript
object | null

Diarized transcript of the provided speech

duration_in_seconds
number | null

Duration of the analyzed call in seconds.

file_name
string | null

Unique identifier for the analyzed audio file.