Call Analytics
Given an audio file of a call between two parties and a list of questions, this API analyzes the content and returns the transcript, along with responses to the questions. Each response is supported by reasoning and exact phrases extracted from the transcript.
Headers
Your unique subscription key for authenticating requests to the Sarvam AI Speech-to-Text API. Here are the steps to get your api key
Body
The audio file to be analyzed. Must be passed as a form input if using multipart/form-data. Supported formats are WAV
(.wav) and MP3
(.mp3). Optimal sample rate is 16kHz. Multi-channel audio will be merged to mono. File size must be less than 10MB and audio duration must not exceed 600 seconds (10 minutes).
List of questions to be answered based on the call content. Each question should be a valid JSON
object with the following structure: {id: string, text: string, description: string (optional), type: string, properties: object}
.
The type
field must be one of: boolean
, enum
, short answer
, long answer
, or number
.
For enum
type questions, include an 'options' list in the properties.
Optional comma-separated string of keywords specific to your domain. These keywords will be preserved as-is in the transcript.
Model to be used for converting speech to text in target language
saaras:v1
, saaras:v2
Include diarization in the output
Response
This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null
hi-IN
, bn-IN
, kn-IN
, ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, gu-IN
, en-IN
Full transcript of the call generated by Sarvam's inhouse speech-to-text model.
List of answers to predefined questions, derived from the call analysis. It can be null if no valid answers were generated.
Diarized transcript of the provided speech
Duration of the analyzed call in seconds.
Unique identifier for the analyzed audio file.