Call Analytics

Given an audio file of a call between two parties and a list of questions, This API analyzes the content and returns the transcript, along with responses to the questions. Each response is supported by reasoning and exact phrases extracted from the transcript.

⚠️ Important: Please use Batch API Notebook for call recordings more than 30 seconds.

Duration Guidelines:

  • Files under 30 seconds: Use this direct API endpoint
  • Files over 30 seconds: Use our Batch API (required)

Resources:

  1. Interactive Demo: Try Call Analytics Playground
  2. Batch API Documentation: View Notebook

Headers

api-subscription-keystringRequired

Request

This endpoint expects a multipart form containing a file.
filefileRequired

The audio file to be analyzed. Must be passed as a form input if using multipart/form-data. Supported formats are WAV (.wav) and MP3 (.mp3). Optimal sample rate is 16kHz. Multi-channel audio will be merged to mono. File size must be less than 10MB and audio duration must not exceed 600 seconds (10 minutes).

questionsstringRequired

List of questions to be answered based on the call content. Each question should be a valid JSON object with the following structure: {id: string, text: string, description: string (optional), type: string, properties: object}. The type field must be one of: boolean, enum, short answer, long answer, or number. For enum type questions, include an ‘options’ list in the properties.

hotwordsstringOptional

Optional comma-separated string of keywords specific to your domain. These keywords will be preserved as-is in the transcript.

modelenumOptional

Model to be used for converting speech to text in target language

Allowed values:

Response

Successful Response

transcriptstring

Full transcript of the call generated by Sarvam’s inhouse speech-to-text model.

request_idstringOptional
file_namestringOptional

Unique identifier for the analyzed audio file.

answerslist of objectsOptional

List of answers to predefined questions, derived from the call analysis. It can be null if no valid answers were generated.

duration_in_secondsdoubleOptional

Duration of the analyzed call in seconds.

language_codeenumOptional

This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null

diarized_transcriptobjectOptional

Diarized transcript of the provided speech

Errors