Call Analytics

Given an audio file of a call between two parties and a list of questions, This API analyzes the content and returns the transcript, along with responses to the questions. Each response is supported by reasoning and exact phrases extracted from the transcript.

⚠️ Important: Please use Batch API Notebook for call recordings more than 30 seconds.

Duration Guidelines:

Files under 30 seconds: Use this direct API endpoint
Files over 30 seconds: Use our Batch API (required)

Resources:

Interactive Demo: Try Call Analytics Playground
Batch API Documentation: View Notebook

Request

This endpoint expects a multipart form containing a file.

filefileRequired

The audio file to be analyzed. Must be passed as a form input if using multipart/form-data. Supported formats are WAV (.wav) and MP3 (.mp3). Optimal sample rate is 16kHz. Multi-channel audio will be merged to mono. File size must be less than 10MB and audio duration must not exceed 600 seconds (10 minutes).

questionsstringRequired

List of questions to be answered based on the call content. Each question should be a valid JSON object with the following structure: {id: string, text: string, description: string (optional), type: string, properties: object}. The type field must be one of: boolean, enum, short answer, long answer, or number. For enum type questions, include an ‘options’ list in the properties.

hotwordsstringOptional

Optional comma-separated string of keywords specific to your domain. These keywords will be preserved as-is in the transcript.

modelenumOptional

Model to be used for converting speech to text in target language

Allowed values:

Response

Successful Response

transcriptstring

Full transcript of the call generated by Sarvam’s inhouse speech-to-text model.

request_idstring or null

file_namestring or null

Unique identifier for the analyzed audio file.

answerslist of objects or null

List of answers to predefined questions, derived from the call analysis. It can be null if no valid answers were generated.

duration_in_secondsdouble or null

Duration of the analyzed call in seconds.

language_codeenum or null

This will return the BCP-47 code of language spoken in the input. If multiple languages are detected, this will return language code of most predominant spoken language. If no language is detected, this will be null

diarized_transcriptobject or null

Diarized transcript of the provided speech

1	curl -X POST https://api.sarvam.ai/call-analytics \
2	-H "api-subscription-key: <api-subscription-key>" \
3	-H "Content-Type: multipart/form-data" \
4	-F file=@foo \
5	-F questions="foo"

1	{
2	"transcript": "Agent: Thank you for calling customer support. How may I assist you today?\nCustomer: Hi, I'm having issues with my internet connection. It keeps cutting out.\nAgent: I'm sorry to hear that. Let's troubleshoot this issue...",
3	"request_id": "foo",
4	"file_name": "call_20230901_123456.mp3",
5	"answers": [
6	{
7	"id": "q001",
8	"question": "What was the main issue discussed in the call?",
9	"reasoning": "The customer repeatedly mentioned issues with their internet connection.",
10	"response": "INTERNET_ISSUES",
11	"utterance": "My internet keeps cutting out every few minutes. It's really frustrating."
12	}
13	],
14	"duration_in_seconds": 180.5,
15	"language_code": "hi-IN"
16	}

Headers

Request

Response

Errors