Call Analytics Cookbook | Sarvam API Docs

Overview

This cookbook demonstrates a robust, production-ready call analytics pipeline using Sarvam’s SDK. It leverages the Sarvam’s Speech-to-Text Translate Batch API with diarization, parses speaker-wise transcripts, and uses Sarvam’s LLM for deep analysis. All outputs are saved in structured files for further review.

Business Value of Call Analytics Module

Improve agent effectiveness
Understand customer sentiment
Detect operational issues early
Spot upsell/cross-sell opportunities
Generate real-time dashboards

Where is it useful

E-commerce / D2C: Understand refund requests, delivery concerns, or dissatisfaction with product quality.
Contact Centers / BPOs: Automate call reviews to improve training and ensure compliance at scale.
Healthcare & Insurance: Analyze patient queries, support delays, and sentiment in sensitive service calls.

Why Diarization and Speaker-wise Parsing?

Diarization assigns speaker labels, linking each line of text to the speaker who said it. This enables:
- Accurate agent/customer identification
- Speaker-specific sentiment analysis
- Monitoring agent talk-time vs. listening time
Speaker-wise parsing preserves the chronological flow, enabling deeper insights and more accurate LLM analysis.

You can find sample audio files in the GitHub cookbook.

1. Install the SDK and Dependencies

Before you begin, ensure you have the necessary Python libraries installed. Run the following command in your terminal or notebook:

$ pip install sarvamai

2. Authentication

To use the API, you need an API subscription key. Follow these steps to set up your API key:

Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
Replace the placeholder key: In the Full Workflow below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.

3. Set Up Essential Modules and Output Directory

Set up your imports and create an output directory for all generated files (transcripts, analysis, summaries, etc.):

1 import os
2 import json
3 import hashlib
4 from pathlib import Path
5 from datetime import datetime
6 from typing import List, Dict, Optional
7 from pydub import AudioSegment
8 from sarvamai import SarvamAI
9 import textwrap
10 
11 OUTPUT_DIR = "outputs"
12 Path(OUTPUT_DIR).mkdir(exist_ok=True)

4. The Call Analytics Class

This class encapsulates the full workflow: splitting audio, batch transcription with diarization, parsing, analysis, Q&A, and summary generation.

split_audio Splits a long audio file into smaller chunks if its duration exceeds 1 hour, since the Batch API can process up to 1 hour per file.

1 def split_audio(audio_path: str, chunk_duration_ms: int = 60 * 60 * 1000) -> List[AudioSegment]:
2     audio = AudioSegment.from_file(audio_path)
3     return [audio[i:i + chunk_duration_ms] for i in range(0, len(audio), chunk_duration_ms)] if len(audio) > chunk_duration_ms else [audio]

4.1 Class definition and initialization

1 class CallAnalytics:
2     def __init__(self, client):
3         self.client = client
4         self.transcriptions = {}

4.1.1: process_audio_file Creates a transcription job using Sarvam’s STT Batch API, waits for job completion, downloads and parses transcription output, and calls analysis on the parsed conversation.

For longer audio files, make sure to set the timeout parameter in upload_files to a sufficiently high value to allow the upload to complete successfully.

1     def process_audio_files(self, audio_paths: List[str]) -> Dict[str, str]:
2         if not audio_paths:
3             print("No audio files provided")
4             return {}
5 
6         print(f"Processing {len(audio_paths)} audio files...")
7 
8         try:
9             job = client.speech_to_text_translate_job.create_job(
10 
11 
12                 model="saaras:v2.5",
13                 with_diarization=True,
14 
15             )
16 
17             job.upload_files(file_paths=audio_paths, timeout=300)
18             job.start()
19 
20             print("Waiting for transcription to complete...")
21             job.wait_until_complete()
22 
23             if job.is_failed():
24                 print("Transcription failed!")
25                 return {}
26 
27             output_dir = Path(f"{OUTPUT_DIR}/transcriptions_{job.job_id}")
28             output_dir.mkdir(parents=True, exist_ok=True)
29             job.download_outputs(output_dir=str(output_dir))
30             json_files = list(output_dir.glob("*.json"))
31             if not json_files:
32               raise FileNotFoundError(f"No .json transcription files found in {output_dir}.")
33 
34             transcriptions = self._parse_transcriptions(output_dir)
35             self.transcriptions.update(transcriptions)
36 
37             print(f"Successfully transcribed {len(transcriptions)} files!")
38 
39             for file_name, data in transcriptions.items():
40                 self.analyze_transcription(data["conversation_path"], output_dir, file_name)
41 
42             return transcriptions
43 
44         except Exception as e:
45             print(f"Error processing audio files: {e}")
46             return {}

4.1.2: _parse_transcriptions Reads downloaded JSON transcription files, extracts speaker-wise lines, writes a .txt file with clean conversation format, and calculates total speaking time per speaker.

timing.json: Tracks the total speaking time per speaker in seconds. Beneficial in identifying the dominant speaker and helps in monitoring Agent Talk-Time vs. Listening Time.

1     def _parse_transcriptions(self, output_dir: Path) -> Dict[str, dict]:
2         transcriptions = {}
3         for json_file in output_dir.glob("*.json"):
4             try:
5                 with open(json_file, "r", encoding="utf-8") as f:
6                     data = json.load(f)
7                 diarized = data.get("diarized_transcript", {}).get("entries")
8                 speaker_times = {}
9                 lines = []
10                 if diarized:
11                     for entry in diarized:
12                         speaker = entry["speaker_id"]
13                         text = entry["transcript"]
14                         lines.append(f"{speaker}: {text}")
15                         start = entry.get("start_time_seconds")
16                         end = entry.get("end_time_seconds")
17                         if start is not None and end is not None:
18                             duration = end - start
19                             speaker_times[speaker] = speaker_times.get(speaker, 0.0) + duration
20                 else:
21                     lines = [f"UNKNOWN: {data.get('transcript', '')}"]
22                 
23                 conversation_text = "\n".join(lines)
24                 txt_path = output_dir / f"{json_file.stem}_conversation.txt"
25                 with open(txt_path, "w", encoding="utf-8") as f:
26                     f.write(conversation_text)
27                 
28                 timing_path = None
29                 if speaker_times:
30                     timing_path = output_dir / f"{json_file.stem}_timing.json"
31                     with open(timing_path, "w", encoding="utf-8") as f:
32                         json.dump(speaker_times, f, indent=2)
33                 transcriptions[json_file.stem] = {
34                     "entries": diarized or [],
35                     "conversation_path": str(txt_path),
36                     "timing_path": str(timing_path) if timing_path else None
37                 }
38             except Exception as e:
39                 print(f"Error parsing {json_file}: {e}")
40         return transcriptions

4.1.3: analyze_transcription Reads the conversation file and sends it to Sarvam LLM with a detailed analysis prompt to extract structured insights.

1 ANALYSIS_PROMPT_TEMPLATE = """
2 Analyze this call transcription thoroughly from start to finish.
3 
4 TRANSCRIPTION:
5 {transcription}
6 
7 Please answer the following:
8 
9 1. Identify which speaker is the **customer** and which one is the **agent**.
10 2. Determine if the customer is a **new/potential customer** or an **existing customer**.
11 3. What **problem, query, or doubt** did the customer raise at the beginning?
12 4. What **services/products** was the customer inquiring about or facing issues with?
13 5. How did the agent respond to and resolve the issue throughout the call?
14 6. Was the **customer satisfied** at the end of the call?
15 7. Did the customer express any **emotions or sentiments** (positive, negative, or neutral)?
16 8. Were there any mentions of **competitors**, or any opportunities for **upselling or cross-selling**?
17 9. Summarize the **resolution** and whether it was successful.
18 
19 Provide your answer in a clear, structured format with section headings and bullet points.
20 """

1     def analyze_transcription(self, conversation_path: str, output_dir: Path, file_name: str) -> Dict:
2         try:
3             with open(conversation_path, "r", encoding="utf-8") as f:
4                 transcription = f.read()
5             analysis_prompt = textwrap.dedent(ANALYSIS_PROMPT_TEMPLATE.format(transcription=transcription))
6             messages = [
7                 {"role": "system", "content": "You are a call analytics expert working for a company's support operations team. Your job is to understand customer calls end-to-end and provide structured insights to improve customer experience and agent effectiveness."},
8                 {"role": "user", "content": analysis_prompt},
9             ]
10             response = self.client.chat.completions(messages=messages)
11             analysis = response.choices[0].message.content
12             analysis_path = output_dir / f"{file_name}_analysis.txt"
13             with open(analysis_path, "w", encoding="utf-8") as f:
14                 f.write(analysis.strip())
15             print(f"Analysis saved to {analysis_path}")
16             return {"file_name": file_name, "analysis_path": str(analysis_path)}
17         except Exception as e:
18             error_msg = f"Error analyzing transcription: {str(e)}"
19             print(error_msg)
20             return {"file_name": file_name, "error": error_msg, "timestamp": datetime.now().isoformat()}

4.1.4: answer_question Answers a user-defined question based on the parsed conversation transcript and saves the answer to a file.

1     def answer_question(self, question: str, output_dir: Optional[Path] = None) -> None:
2         for file_name, data in self.transcriptions.items():
3             try:
4                 with open(data["conversation_path"], "r", encoding="utf-8") as f:
5                     transcription = f.read()
6                 prompt = f"Based on this call transcription, answer the question below:\n\nTRANSCRIPTION:\n{transcription}\n\nQUESTION: {question}"
7                 messages = [
8                     {"role": "system", "content": ""},
9                     {"role": "user", "content": prompt},
10                 ]
11                 response = self.client.chat.completions(messages=messages)
12                 answer = response.choices[0].message.content
13                 q_hash = hashlib.sha1(question.encode()).hexdigest()[:6]
14                 path = Path(data["conversation_path"]).parent / f"{file_name}_question_{q_hash}.txt"
15                 with open(path, "w", encoding="utf-8") as f:
16                     f.write(f"Question: {question}\n\nAnswer:\n{answer}")
17                 print(f"Answer saved to {path}")
18             except Exception as e:
19                 print(f"Error answering question for {file_name}: {e}")

4.1.5: get_summary Generates a concise summary for each call analysis and saves it to a summary file for easy review.

1 SUMMARY_PROMPT_TEMPLATE = """
2 Based on this call analysis, summarize each of the following in 2–3 words:
3 
4 {analysis_text}
5 
6 1. Customer & Agent
7 2. Customer Type
8 3. Main Issue
9 4. Service Discussed
10 5. Agent's Response
11 6. Customer Satisfaction
12 7. Sentiment
13 8. Competitor or Upsell
14 9. Resolution
15 """

1     def get_summary(self, output_dir: Optional[Path] = None) -> None:
2         output_dir = output_dir or Path(OUTPUT_DIR)
3         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
4         summary_path = output_dir / f"summary_{timestamp}.txt"
5         try:
6             with open(summary_path, "w", encoding="utf-8") as f:
7                 f.write("CALL ANALYTICS SUMMARY REPORT\n")
8                 f.write("=" * 60 + "\n")
9                 f.write(f"Generated: {datetime.now()}\n")
10                 f.write(f"Total Calls: {len(self.transcriptions)}\n")
11                 f.write("=" * 60 + "\n\n")
12                 for file_name, data in self.transcriptions.items():
13                     analysis_file = Path(data["conversation_path"]).parent / f"{file_name}_analysis.txt"
14                     if not analysis_file.exists():
15                         print(f"Analysis file not found for {file_name}, skipping.")
16                         continue
17                     with open(analysis_file, "r", encoding="utf-8") as af:
18                         analysis_text = af.read()
19                     summary_prompt = textwrap.dedent(SUMMARY_PROMPT_TEMPLATE.format(analysis_text=analysis_text))
20                     messages = [
21                         {"role": "system", "content": "You are a call analytics summarizing expert. Provide concise and clear answers to each point "},
22                         {"role": "user", "content": summary_prompt},
23                     ]
24                     response = self.client.chat.completions(messages=messages)
25                     concise_summary = response.choices[0].message.content.strip()
26                     f.write(f"Call File: {file_name}\n")
27                     f.write("-" * 30 + "\n")
28                     f.write(f"{concise_summary}\n\n")
29             print(f"Summary saved to {summary_path}")
30         except Exception as e:
31             print(f"Error writing summary: {e}")

5. Full Workflow Example

This code block runs the full call analytics workflow: transcribes the audio, analyzes the conversation, answers a specific user question, and generates a concise summary.

1 client = SarvamAI(api_subscription_key="YOUR_SARVAM_AI_API_KEY")
2 analytics = CallAnalytics(client=client)
3 
4 audio_path = "/path/to/your/audio/file.mp3"  
5 analytics.process_audio_files([audio_path]) 
6 analytics.answer_question("Add your question here")
7 analytics.get_summary()

For long recordings, the resulting transcription may exceed the maximum token limit of the LLM.

6. Sample Output

This is the sample output of the analysis you will get if you upload the file Sample_product_refund.mp3.

Here's a structured analysis of the call transcription:
### 1. Speaker Identification
* **Customer:** SPEAKER_00 (Adam Wilson)
* **Agent:** SPEAKER_01 (Sam from Coaching Downs)
### 2. Customer Type
* **Existing customer:** The customer has previously made a purchase (order number provided) and is now initiating a return and refund request.
### 3. Initial Problem/Query
* The customer called to:
    * Return an item due to incorrect size.
    * Inquire about the status of their refund, as it hasn't reflected in their account yet.
### 4. Services/Products Involved
* **Product:** Clothing item (implied by the return and size issue).
* **Services:** Return processing and refund issuance.
### 5. Agent's Response and Resolution Process
* **Initial Steps:**
    * Requested the order number, customer name, and contact details (phone number and email).
    * Verified the order details in the system.
* **Issue Identification:**
    * The order wasn't immediately found in the system, leading to further verification.
    * The customer lacked the return tracking number from the courier company.
* **Resolution Steps:**
    * Agent confirmed the return request date (15th of November) and noted it was outside the standard refund processing timeframe.
    * Agent escalated the case to the corporate office for review and promised to send an email update within 2-4 business days.
    * Agent reassured the customer about the refund timeline and confirmed the email address for communication.
### 6. Customer Satisfaction
* **Neutral to slightly positive:** The customer seemed somewhat reassured by the agent's explanation and the promise of a prompt update. However, there was initial frustration about the refund delay.
### 7. Customer Sentiments
* **Initial Frustration:** Expressing concern about the missing refund and the potential delay in processing.
* **Reassurance:** After the agent's explanation, the customer seemed more at ease, though still awaiting confirmation.
### 8. Competitors/Upselling Opportunities
* **No mention of competitors.**
* **No clear upselling/cross-selling opportunities identified during the call.** The focus was solely on resolving the return and refund issue.
### 9. Resolution Summary and Success
* **Resolution:** The agent escalated the case to the corporate office for review and promised an email update within 2-4 business days.

7. Additional Resources

For more details, refer to our official documentation and join our community for support:

Documentation: docs.sarvam.ai
Community: Join the Discord Community

8. Final Notes

Keep your API key secure.
Use clear audio for best results.
All outputs (transcripts, analysis, summaries) are saved in the outputs/ directory for easy review.

Keep Building!