Call Analytics

Given an audio file of a call between two parties and a list of questions, This API analyzes the content and returns the transcript, along with responses to the questions. Each response is supported by reasoning and exact phrases extracted from the transcript.

⚠️ Important: Please use Batch API Notebook for call recordings more than 30 seconds.

Duration Guidelines:

  • Files under 30 seconds: Use this direct API endpoint
  • Files over 30 seconds: Use our Batch API (required)

Resources:

  1. Interactive Demo: Try Call Analytics Playground
  2. Batch API Documentation: View Notebook

Headers

api-subscription-keystringRequired

Request

This endpoint expects a multipart form containing a file.
filefileRequired

The audio file to be analyzed. Must be passed as a form input if using multipart/form-data. Supported formats are WAV (.wav) and MP3 (.mp3). Optimal sample rate is 16kHz. Multi-channel audio will be merged to mono. File size must be less than 10MB and audio duration must not exceed 600 seconds (10 minutes).

questionsstringRequired

List of questions to be answered based on the call content. Each question should be a valid JSON object with the following structure: {id: string, text: string, description: string (optional), type: string, properties: object}. The type field must be one of: boolean, enum, short answer, long answer, or number. For enum type questions, include an ‘options’ list in the properties.

hotwordsstringOptional

Optional comma-separated string of keywords specific to your domain. These keywords will be preserved as-is in the transcript.

modelenumOptional
Model to be used for converting speech to text in target language
Allowed values:

Response

Successful Response
transcriptstring

Full transcript of the call generated by Sarvam’s inhouse speech-to-text model.

request_idstring or null
file_namestring or null
Unique identifier for the analyzed audio file.
answerslist of objects or null
List of answers to predefined questions, derived from the call analysis. It can be null if no valid answers were generated.
duration_in_secondsdouble or null
Duration of the analyzed call in seconds.
language_codeenum or null

This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null

diarized_transcriptobject or null
Diarized transcript of the provided speech

Errors